Yes Virginia there are interesting and smart people in Southern California. I just came back from my third Meetup, and I'm telling you that's a great tool. I met folks in the web business and they are doing some pretty interesting things. The flavor is definitely entrepreneurial and I've heard some exceptional stories. Maybe because they're new to me, but it's really refreshing to know that folks are making money themselves on the strength of new ideas and new technology.
William has got momentum now, and he had to change the venue from a smaller joint to Wokcano which is rather chic even though it costs 4 bucks for a Red Bull. He hipped me to RIAK by Basho, and although I don't quite have my head around how I might get BI knowledge out of massive document stores, it is interesting to hear that people frustrated with Hadoop are happy with RIAK. I seem to recall an Xtranormal movie joke about Hadoop cluster failures, and that's the thing that RIAK doesn't do. Surely Cloudera will come up with a smarter way to do system monitoring, but the legendary (now) Foursquare failure is a lesson that the big data community is trying to learn from. Oh yeah and Joyent. I remember those guys. I guess they're the hotshots - they should know a whole lot by now.
So I'm expecting that what I might like to learn is something about the fuzzy area between unstructured and structured data. Given my understanding about analytic consumption and EPM feedback loops I'll add value. So this is where Derrick was telling me about Solr. It sounds like Solr is part of the missing link between structured and unstructured, or at the very least can assist in indexing massive data sets. While I've been warned away from Cassandra, there sounds to be something Solr can do to make up for its shortcomings - all of which is very interesting. He also tells me that I should check out Splunk. Hey wait a minute. I know that CEO. Well, whaddya know? He was keying in on the phrase 'machine generated data'. Yep. Nice. The other term he used was 'faceting'. That's a nice way to structure up stuff on a big data set if you don't know exactly what it might contain beforehand.
Let me think about that for a minute, adding to the minute I thought about what was interesting and weird about Qlikview. In Qlikview and in SAS, I seem to recall the ability to do what seemed to be a random kind of drilldown - not in the way that made sense from a multidimensional design standpoint. I'll call it something like faceted search, not knowing exactly what the proper definition is. But imagine that I index all of several fields in a data set, but don't organize them into hierarchichal dimensions. I can still drill down a path to narrow the data set without predetermining dimensions given the cardinality of remaining index counts as my key - the cardinality being an interesting part of the dataset itself.
We commonly do this at BestBuy's website. You look at cameras, and then down the left banner, you get counts of all the brands of cameras available. You also have certain attribute ranges on the prices. 0-100, 101-250, etc, and counts on those. At the bottom of your drill down are particular documents. Nice. This kind of navigation is looking for a single item or maybe a single class of items - it doesn't make sense to aggregate all of the data up to the highest level, so this could be a preliminary search into interesting data - something sitting on top of cubes to go - a nice way to winnow down a huge data set. Considering that free Splunk will parse and index 500MB/day, that's a good way to get started. Thanks Derrick.
Last year when I went to the CalTech seminar on clouds, I met a couple guys in the lobby who were ripready to roar with their hosting. It looks like cloud hosting is hot and heavy here in LA. Dennis is already considering another property. It has been several years since I signed on at Dreamhost, so there's evidently a new generation going.
I really enjoyed talking with Jad. He's got the right ideas for aggressively attacking important markets with targeted solutions. Smart pricing model too. He reminds me of Levi whom I met last week over at McCabe's. One smart guy designing with a small team of engineers can take a big bite out of the marketshare of the larger software firms. Agility matters, especially when open source interoperability is so real. How real? That I don't know, but I do know that one size cannot fit all and the software vendors who come to recognize that their products can be rightsized to their customers are going to win big. Think about that for a minute. Why pay for a legacy of features you don't use?
Shout outs to John from Boston and Ian. And to the guy from the VA, it's Charles Wyble.
Since you mentioned Riak and Solr, I figured I should probably turn you on to Riak Search (apologies if you've already investigated it). We just released it with the 0.13 release of Riak. It's an indexing and search engine that is built on top of and tightly integrated with Riak.
Details here: http://blog.basho.com/2010/10/11/riak-0.13-released/ and here: https://wiki.basho.com/display/RIAK/Riak+Search
Nice post, btw.
Mark
Posted by: Pharkmillups | November 12, 2010 at 11:23 AM