One of the crazy thoughts that I had last week was that about grid computing with Essbase. Somewhere among the scores of RSS feeds I consume, somebody said something to the effect that Google has the entire internet in memory. Huh? What? Hmm.
How would I go about meeting the Google challenge for Essbase? Well, it's quite obvious that way down in the internals level, there are very cool memory models that do smart paging from I/O that almost nobody ever uses. That's because nobody has, in the implementation world anyway, scoped out the efficiency of Essbase's caching system. Which is to say precisely this. I was instructed that Essbase guesses the proximity of one query to the next and sometimes pages in some guestimate blocks into virtual memory from disk and then flushes them back smartly after some time. This process can be tweaked by setting commit points etc. There's got to be somebody on this planet with a development tool that visualizes the efficiency of virtual memory models. They have to had pointed such a tool at Essbase at some point in its development. Boy would I like to get my hands on that.
So what if I had enough memory to handle an entire cube? OK that's a no brainer, just adjust the size of the cache to an obscenely large size and make sure nothing gets flushed so often. But what about huge cubes or a set of them? In that case, we already have transparent partitions and distributed OLAP. In other words, you can put a distributed cube across multiple Essbase servers and keep pieces of them 'in the air' so to speak - transparent partitions fully in memory. The only question is whether or not it's really efficient to get across n partitions on n servers, each with sufficient memory to keep the entire partition in fast RAM.
It turns out that the Essbase team has looking very specifically at enhancements to transparent partitions in the 11 release. This, ladies and gentlemen, actually gets my blood up. I don't know if this kind of gridding is what they had in mind, but I sure as hell want to try it out.
Imagine if you will, a nice 9 dimensional cube about 40GB large staged across four 64bit machines each with 12GB of RAM. You could easily get a queryable section of that distributed nicely. I mean even with that distributed services software you could have done that and query balanced stuff with disk redundancy, but what if you wanted to go super fast on calcs or queries and just keep the whole thing in RAM? Push a few levers and boom, the whole thing is 100% in RAM.
Now in order to go to the next level, you'd want to look at some automatic partitioning strategy, maybe even dynamic sized partitions. I've got to believe that the ASO engineers have looked very precisely at this problem because they know more ways to slice up multidimensional data than we mere mortals ever dream. And once again, in 11 it appears that they have looked very closely at how to get BSO partitions and ASO partitions on better speaking terms. Sounds delicious to me.
So one of these days, I'm going to have some lab time. Bwaahhahahah!