ISR News Story
The Dwarf Structure for Creating, Storing, and Querying Highly Compressed Data Cubes (ISR IP)
ISR intellectual property available to license
Inventors: Nicholas Roussopoulos, John Sismanis, Antonios Deligiannakis, Ioannis Kotidis
U.S. Patent: 7,133,876
Dwarf automatically identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions.
The dwarf structure plays a double role as a storage and indexing mechanism for high dimension data. Roll-up and drill-down queries seem to benefit from the dwarf structure due to common paths that are exploited while caching. In terms of update speed, dwarf by far outperforms the closest competitor for storing the full data cube, while their performance is comparable when the competitor is reduced to storing only a partial cube of the same size as Dwarf. For the APB-1 Benchmark density 5, the fastest performance results are published by Oracle-Express and take 4.5 hours to generate a partial cube of about 30GB while the corresponding Dwarf cube is generated in 21 minutes and has a size of 4.3GB.
A number of Dwarf patents are pending.
For more information
ISR-IP-Roussopoulos ISR-IP-multidimensional ISR-IP databases
June 19, 2007