Over the weekend, I've been hard at work at writing an implementation of the Dwarf algorithm for compressing and calculating (aggregating) OLAP cubes. The initial results look good, and I've learned a great deal. To begin, a little background. OLAP cubes typically perform numerous calculations and aggregations on a dimensional model in order to speed up query performance. Data warehouses are all about storing extremely large sets of data, and presenting it to the user for analysis. Users being users, they don't want to wait while you perform a SQL GROUP BY and SUM over 10 million rows. So an effective OLAP cube calculates all of the GROUP BY combinations ahead of time, dramatically speeding up users' queries against the cube. The trouble is that the number of GROUP BY combinations increases exponentially as the numbers of dimensions (and dimension attributes) increases. Plus, data warehouses are meant to store data over multiple years, and at a very fine gra...