I've started to look at getting closer to technology and re-energize my traditional track which is data architecture. One of the things I have found, much to my dismay, is that some of the web scale guys have really taken off over the past four years and have really beat us enterprise DW & BI guys in terms of scalability.
So there are a lot of places I've been reading to put my head into that game in my spare time. There are names and blogs that I'm going to socialize into Cubegeek. A lot of this started with Cloud talk, and it's clear to me that that's just a bit too broad, and according to some folks I talk to, premature. That doesn't stop it from being very interesting. I've paid a lot of attention lots of places. Curt Monash was my first stop.
Now it turns out that about two years ago, I made a call with my buddy out to West LA to speak with a guy named Jody Mulkey. Over at Shopzilla, they had some scalability problems with Essbase. So I talked to the guys there and they turned out to be pretty sharp. They seemed to have done all the reasonable stuff in tuning what Essbase they had, but were not up on the latest versions. So since the company I work with is an Oracle partner, we basically had to wait for the Oracle rep to get the proper paperwork signed so that they could get all of their enterprise licensing together. With any luck they'd call me back and then I could maybe hang out and try several things with some clustered ASO. After all, I did know a little bit about running multiple Essbase servers. Well, it turned out that the Oracle rep(s) involved weren't so particularly interested in getting the paperwork done and there were quarter end considerations and all that kind of malarkey. Bottom line, we never got called back in.
It was a cool day because I did get to meet Mulkey who seemed like a cool guy and I liked the way they talked about systems there as well. It was an interesting day, the day just after Obama got elected. I had on my monkey suit and people were looking at me differently - but what I remember most was the bigscreen they had there where sampled random sales from Shopzilla referrals popped up, GIS style. The contrasts were splitting my head because here was a serious IT shop in a very casual communications stymied by dumb bureaucracy, and I was the guy wearing the suit - but they had awoken my inner geek.
Since then, I started thinking about clouds and whatnot, but my company was not interested. I've also been following Greenplum, Vertica, AsterData and news about them. Way before that, if you go back in this blog, you'll see I was looking at Bigtable and such. I understand the strategy, but now it's time to get to nuts and bolts. Nuts and bolts means Java APIs and more primative ROLAPery than I've been used to playing with Essbase.
Over the past couple weeks, I've been playing around with multiple installations and finding out how surprisingly many linux apps in the open source world are also functional on Mac. My general aim will be to get more deeply engaged in the open source tech & business, which I am starting to get a better feel for. What I hope is that by engaging at the Java API level, I'll pick up a technology which is not entirely disposable. As well, of course I'll be making sense of Hadoop clusters, and interfaces to all the data. In other words, I'm going back to being a hardcore data architect, harder than before. I expect to use a larger toolkit with more open pieces. Even though MSSQL Server is free, there's more stuff out there.
The main aim is big data. It's the new frontier.
Hi,
When you speak about Cloud, OLAP and big data (interesting subjects) , with this week Amazon Cloud announce, I think of a Palo OLAP cube into a Amazon GPU instance.
Do you know this?
http://news.softpedia.com/news/NVIDIA-GPU-Supercomputing-Available-via-Amazon-Cloud-Offering-166521.shtml
http://www.jedox.com/en/products/palo-gpu-accelerator.html
Victor
Posted by: victor | November 16, 2010 at 02:57 PM
I heard about the GPU announcement this week. It immediately made me think about SeaMicro because they too are using a different chipset to enable higher performance computing.
It looks like Palo is hooked up on many levels, and I like the idea of a memory resident OLAP. Now that Im thinking a little bit about ROLAP and various necessities for creating aggregate tables, Im wondering how and why some of that remains necessary.
Let me go off on a little tangent. For about six or seven years, I kept track of the amount of data I could get an Essbase cube to aggregate per hour. I stopped counting maybe five years ago when it got around 10GB. And Ive also kept track of the number of dimensions that are generally human cognizable. So when I think about the amount of data presented to a small set of users, it makes no sense to me that there is anything that should require a great deal of time to compute. It would take me a very long time to visualize what amounts to 10GB of data - certainly more than an hour, so if Im just trying to see a particularly small set why shouldnt my response time be anything less than instantaneous.
Now theoretically, the ability to recognize all of the ancestors and dependents measures for any known visualization set, ie what a reasonable user would query in a known standard report - should be deterministic. It should be a simple matter to specify every bit of info I need. So the first thing I would think of, especially in a planning application with writeback that a user wants immediately updated, is a machine generated profile of sorts. It seems that this is something that Essbases ASO should do well. You could certainly reverse-engineer something like that. The lower in the hierarchy the set is, the easier to define the ancestor set. But any ROLAP should have the same properties. At some level of the API, MDX could be generated to cache that set of aggregations. I should be able, basically, to materialize any set of sub-cube data I want, and then having done so put them on a fast track for update - given that I have a clean/dirty facility - by knowing the standardized reports.
Posted by: Cobb | November 16, 2010 at 04:21 PM