Is Hadoop of course.
I'm setting up my first cluster at home. I just got the OReilly : Tom White's second edition and I'm going to plow through it. I know, I keep saying all these things I'm going to do, and I never seem to have enough time to follow through, but now I don't feel so much like a fool alone in the wilderness as I did when I first looked at Nutch.
I'm looking back to when I first had some notion about this which is when I joined Hackett - or maybe a couple years before. See, there were some guys at Tellme.com that I wrote about here. They used MapReduce to setup data to feed to Essbase. So I want to follow a similar path.
It turns out that our old friends at Pentaho announced that PDI 4.0 aka Kettle, now has a version that talks Hadoop. It's a big deal. And while I'm keeping an eye on Karmasphere, I think that's a good bunch of news. So the other day I got Spoon up and running and played around a bit. It reminds me of exactly what a friend of mine told me about Talend, the community editions have UIs in name only, the enterprise editions and add-on tools make the the damned thing productive. Plus they know the performance shortcuts. Fair enough I suppose, so long as it's working glue, I won't complain. Considering that MSSQL has a free version too, there has got to be at least a few things PDI can kick ass at. That's to be determined as I sneak another server into the house.
I'm going to build a little Hadoop cluster in my garage and attach a couple terabytes. It shouldn't be hard and I don't really care about performance right now. I just want to get a working environment in my grubbies and have, at the very least, a working knowledge of how to put the thing together. I wish I knew somebody in my neighborhood that was interested in this kinda stuff.
Comments