What tumultuous times!
I've got to say it has been really weird looking at the empty space that is Cubegeek for the past 60 days or so. But one of my goals for the year is actually going to be convergence. I've finally put so many things in philosophical order in my life, at long last, that I don't feel the necessity to separate all of my avocations from my vocations. Plus I worked out a good deal with my own skills, my intentions and the good folks at Dreamhost. But there's a lot of ground to cover.
The huge news, of course, is Amazon Redshift which throws a big 500 pound gorilla wrench in everybody's business model. A number of pundits have pooh-poohed the whole thing but I have to tell you, this is a major part of the future, whether you like it or not. Redshift is Moore's Law for databases. It's impossible to ignore. Quite frankly I'm not even sure how I can deal with the fact of its existence, because basically somebody with a reasonable amount of skills can put together a data warehouse, quickly. The upshot is that a lot of consulting can be done at home, the way I do it, and a lot of cheap - even throwaway DWs can be built. This has scary implications for the quality of said DWs and nobody knows exactly what sections of the market Redshift will come to dominate, but I can tell you this, our friends in the database world are defacating building materials.
I have been working with Redshift for several weeks now and its strengths are many. Primarily, I'm all focused on its elascticity and its price. Additionally I like that I can script everything at the API level. I haven't done all that yet, but I know that I can. It is lacking some nice developer tools at the Toad level, and if I were one of the guys at Panic Software, I'd make sure that is my next project. As much as I love the Bootstrap web interface that Amazon has got running, nothing beats a finely honed fat client. Anyway, the biggest strength of Redshift right now is its ability to load data from S3, and we're thinking up some techniques and product designs that are going to take advantage of that. So check back with me in six months and ask about Project Kleiglight. In the meantime, we are learning by doing in Redshift.
Here's my first opinion. Everybody who is using MySQL or MSSQL should migrate to Redshift as soon as they think they're ready for more performance. Period. Whatever market that is, I'll take it.
Here's my second opinion. Teradata is toast.
My Ruby-fu is up marginally. I picked up the Nokogiri gem and am now working a bit smarter with File. I've done some nice integration with standard unix command and also with loggers. So I would call myself competent with XML, YAML and JSON. I still haven't swung back to improve my Cucumber but am plenty comfortable with rspec. I'm working on a utility gem of my own for some text manipulation stuff that I do all the time. Next I'm going to play with the parallel gem to see how I can scale up certain ops.
I'm lagging on my seal book - the OReilly on Exploring with R, and I'm finally getting rid of the paranoia that sent me wheeling two months ago. Nevertheless, I still read Darkside and attend a couple security hacker meetups.
I've seriously upped my Chef game in the past couple months. Working on our elasticPM code has gotten me fairly deep into the implementation end of orchestration. It is now clear to me that much of what we have been doing is so utterly advanced - we've been on the edge in many ways of what Chef can do with Windows, and our unorthodox approach has been what has been making Chef's learning curve more difficult than what I expected. In addition, working more with Vagrant has improved my capabilities with virtualization. The next version of Vagrant is going to be awesome, I hear. So I've got about a dozen VMs here on my Mac. As they become migratable into AWS AMIs it's going to be awesome.
Speaking of which, I did get a chance to migrate an AMI across a region via the (2 month) old way of moving core snapshots. So before I could write code to automate that (but I've been busy) Amazon introduced a way to do it directly. So I haven't done it the newest way, but there's one more barrier to internationalization knocked over.
You have to realize that these days I consider myself to be something of an IT guy in the biggest IT shop on the planet, which is AWS. The new architecture is improving every month. More on this separately.
The reintegration project starts with me getting into a couple , three web architectures. I've gotten the static blog thing worked out with Jekyll and Octopress. So I'll probly migrate all this Cubegeek stuff under the single new site. But I really have to get this Node.js and Rails thing knocked out so I can speak that language to customers as well. You see a lot of our business comes from people with low resistance to moving their assets to the cloud - since a lot of them have used colocation before. I'm going to try to head a lot of them off at the pass since Amazon has DynamoDB, RDS, Redshift and Hadoop, three of which I've had my hands on. So a lot of the confusion I used to have over MongoDB, Cassandra, Riak, CouchDB and SOLR, I no longer have. I just ape the party line and say go Dynamo.
Vertica has been very very good to me. So my take on this in splitting the difference goes something like this. Redshift is for when you have invested *some* time into your DW and you want something low maintenance. Vertica is for when you need to tune the crap out of your system and you want near-realtime stuff. Basically, Vertica has all the bells and whistles for extreme computing. Redshift is more like MSSQL to Vertica's Oracle. Sorry, I hate analogies too, but that's about as close as I want to get to a hardball assessment in this post. I've played with a lot of databases in my time and I love Essbase and Vertica for the same reasons - their internals are beautiful and they enable an entirely new class of computing. However, I like Redshift for the same reason I like MSSQL, simplicity and elegance - except I know Redshift has a lot more upside than MSSQL.
I have worked with just about every major database technology going back to something called BCC out of Utah. Right now is the golden age, because today we have all the major technologies available in stacks that can be built on Amazon. It a very exciting time to be a data architect.