Posts

Showing posts from March, 2007

links for 2007-03-30

Implementing Data Cubes Efficiently
How to choose which views to materialize in an OLAP cube, when it is too expensive to materialize all views. This is the next optimization for our aggregation strategies in ActiveWarehouse.
(tags: olapdatabase)

ActiveWarehouse Gets Some Love

ActiveWarehouse, the Ruby on Rails plugin for data warehouse development, was written up by InfoQ in their article ActiveWarehouse, a New Step for Enterprise Ruby.

I've been writing different aggregation strategies for ActiveWarehouse, trying to find something that's not too slow or cumbersome. ActiveWarehouse supports pluggable aggregation, or rollup, strategies, so you can use what works best for you. We have some very large data sets and very large dimensions (one dimension we have has 215 million rows). So if ActiveWarehouse can eventually handle that, I think we're in good shape.

I can say that ActiveWarehouse will work great if you have a smallish data set. I would say up to a million rows in your dimensions would be big enough. Of course, no matter how much work we put into optimizing ActiveWarehouse's aggregation schemes, smart database tuning will always help tremendously.

links for 2007-03-28

Pentaho Analysis Services: Aggregate Tables
How Mondrian builds and utilizes aggregate tables to help query performance of large cubes.
(tags: olapdatabase)


On the Computation of Multidimensional Aggregates
This paper presents fast algorithms for computing a collection of groupbys.
(tags: olapdatabase)

Creating Combinations of Sets/Arrays/Things in Ruby

I was looking for a way to create combinations of things in Ruby and I found an article by Uncle Bob detailing his attempt at writing a combination generator in Ruby. I modified it slightly to use an array of items, instead of simple indexes.



require 'pp'

def choose(n, k)
return [[]] if n.nil? || n.empty? && k == 0
return [] if n.nil? || n.empty? && k > 0
return [[]] if n.size > 0 && k == 0
c2 = n.clone
c2.pop
new_element = n.clone.pop
choose(c2, k) + append_all(choose(c2, k-1), new_element)
end

def append_all(lists, element)
lists.map { |l| l << element }
end

all = [:a, :b, :c, :d]

pp choose(all,3)



The above code prints out:

[[:a, :b, :c], [:a, :b, :d], [:a, :c, :d], [:b, :c, :d]]

If you don't want these types of combinations, there is a Ruby library for calculating Permutations which will give you all the different permutations, or orderings, of a set of things.

Goodbye Productivity, Hello Desktop Tower Defense

I'm not usually one for online games or flash games. Heck, with a newborn in the house, I'm happy to sit and eat for five minutes. But having discovered Desktop Tower Defense, I can say that I've found a great fun little flash game. Inspired by Warcraft, this flash game has you deploying defensive towers to counter an onslaught of little gray circle guys. The more guys you kill, the more money you get and the more towers you can deploy or upgrade. Simple, fun, and you can shoot missles. Good times.

links for 2007-03-23

Dell Inspiron 6000 Service Manual
my laptop has a burning smell coming from the back. not good.
(tags: delllaptop)

Oracle 11g Gains Native OWL Support

Oracle 11g will gain native OWL support.

From the article:

> (2) Native OWL inferencing (for an OWL subset that includes property characteristics, class comparisons, proprety comparisons, individual comparions and class expressions) [New API]

Way to go, Oracle! I've always had a soft spot for Oracle's RDF support. The way that you can blend RDF data sets and traditional relational data sets in the same query helps to deploy RDF slowly but surely. Not to mention that Oracle has already solved all the main problems that a RDBMS should solve (like ACID compliance, backup and recovery, strong security, wide developer toolset) makes Oracle's RDF support (and soon OWL) a strong contender for RDF data stores.

Code Comment o’ the Day

Found this little gem in some code I'm working with:

if admin?
logger.info("i'm admin lol")

lol indeed.

Why the Semantic Web Marketing Message Has Failed

So some guy writes why the semantic web will fail and ends up on Slashdot. How slashdot picks their articles, I'll never know. The article is pure opinion and guesswork (as all predictions seem to be), and it's perfectly OK for this guy to blog his opinions.

I'm not going to argue that the semantic web (that's *small s* semantic) will succeed, although I think it will prove useful in a large sense in some form, even if that form isn't RDF. I think what's really telling about the doom and gloom post is that the marketing message of the semantic web has failed.

For example, a quote from the blog post:

> The Semantic Web will never work because it depends on businesses working together, on them cooperating

Where, in all of the W3C's semantic web literature does it says that companies must work together for the semantic web to succeed? I think this is one of the biggest misinterpretations about the semantic web. For some reason, people think that the semant…

16 Years of Discovery Magazine Now Online

16 years of discovery magazine is now online for your pop science needs. It's a fantastic resource for science reading that's lighter and fluffier than something like Nature or Science.

Installing OpenSSL Support for Ruby on Ubuntu

The more I work with Ubuntu, the more I think it's a very good desktop, but not a good development machine. For instance, you can install Ruby 1.8.4 from the package management system, but not 1.8.5 (or 1.8.6 which is now the latest). So you're stuck compiling ruby on your own.

Usually that's not too big of a deal. However, for some reason, the default way of compiling Ruby from source on Ubuntu leaves out the installation of OpenSSL support. I had the development openssl libraries package installed, so that wasn't it. I didn't see any errors in the configure process or during compilation.

Turns out, to get OpenSSL to compile and install with Ruby on Ubuntu, you need to follow these steps *after you've installed ruby*:

cd ruby_src_dir/ext/openssl
ruby extconf.rb
make
make install

Success!

That seems a bit harder than it should be, huh?

Here’s a Funny Riddle For Ya

My friend Caty Cakes told me this on the way home today. Had a good laugh.

> Three women are sharing a hotel room, which is $30 a night. Each woman pays $10, and heads up to the room. Later that night, the manager of the hotel realizes he overcharged them by five dollars. He pulls $5 bill out from the drawer and hands it to the bellhop, instructing him to run it up to the ladies. The bellhop gives each woman $1 and pockets $2 for himself. Each woman has now paid $9. Nine times three is 27. 27 + 2 is 29. Where did the extra dollar go?

Enjoy!

links for 2007-03-14

Twitter is Dumb

> Of all the masturbatory ego-fluffers on the Web, nothing chafes me worse than Twitter.

Brilliant. In an age where style easily trumps content, Twitter has neither.

I’m Squinting… But No Agents So Far

Jim Hendler asks so where are the agents? More specifically, I'd like to ask What do we need before agents can be deployed?

Let me define what I believe an agent is by looking at what it would do for me. I think a software agent is a program that can be given a set of rules and able to seek out data that satisfies those rules. Agents are different from other sets of software that can answer queries in that Agents would be able to reason about the world and would be capable of acting towards its goal(s) over a long period of time. These agents would act without direct human control, which is especially import if the task would take some time to complete.

Given that definition of an Agent, I revisit the question: What do we need before agents can be deployed?

Because Agents are task focused, we need a way to define the task in such a way that the Agent understands it. I can imagine simple use cases like "Schedule my dentist appointment every six months. Of course, make sure…

links for 2007-03-09

Video Interviews from Early Google Team
Back in 2002, these videos captured Google's early thought processes. Not just technology, but business and culture. Some key quotes: "Do the hard problems first."
(tags: videogooglebusiness)

links for 2007-03-08

Hierarchical Dwarfs
Dwarf OLAP cubes which support hierarchical dimensions
(tags: olapdatabase)


Construction and compression of Dwarf
An alternative implementation of Dwarf which hopes to simplify the algorithm and reduce storage requirements.
(tags: olapdatabase)

baetle - Ontology for Software Bugs

baetle is an ontology for software bugs and bug tracking systems. Henry Story has opened the baetle project on Google Code.

baetle is an effort to standardize a view into the software bug tracking world. Thereareakazillionbugtrackingsystemsoutthere. Heck, see for yourself.

So what's a use case for being able to have a consistent view into bugs and issues across all thoses systems? For one, you could query one system just like another system. Another use case might be if your enterprise runs and maintains multiple different bug tracking systems, and you need to query across all of them.

Hmm, sounds like a Data Warehouse, doesn't it? Multiple systems combined and filtered into one cohesive view for reporting and querying. Ontologies allowing for a way to combine and filter all those data sources. SPARQL for all that querying.

So is Ontologies and SPARQL the new ETL?

links for 2007-03-07

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Granddaddy of papers on Cubes within SQL. Many papers cite this one, so starting at the beginning.
(tags: olapdatabase)

links for 2007-03-06

SPARQL Via HTTP Methods

Querying the web might get a bit easier, with the union of SPARQL directly with HTTP. TripleSoup, a promising proposal at Apache, aims to expose Triple Stores (RDF databases) directly via HTTP.

This reminds me of URIQA, which is an effort to provide native HTTP methods for accessing metadata about a certain resource. URIQA was interesting because it allows you to say

MGET /foo HTTP/1.1

which means "Retrieve the metadata for resource `/foo`"

It looks like TripleSoup is a bit different, in that the URI in the request methods is some type of application. TripleSoup seems to be a gateway directly into the triple store, whereas URIQA masks the concept of talking to the triple store. In URIQA, it looks like the triple store *is* the server you are connecting to. With TripleSoup, the triple store is located at the URI you are sending requests to.

URIQA's advantage is that you don't need to know the URI to the application or triple store, you can just send an MGET to the…

Timothy Berners-Lee Speaks on Future of the Web

Whenever Tim Berners-Lee speaks, I listen. He spoke to the US House of Representatives on The Future of the World Wide Web on March 1st, 2007.

There were references to RDF and OWL and how Data Is The New Document. Ironically, I couldn't find an RDF document that described the event or the transcript.