So it looks like I'm going to go back to heads down. That's a good thing, I suppose. The reason is that I get to learn something new. Yay. The thing I will be learning is Kafka streaming and all that I need to know about how to do Fast Data. And the good thing is that this will be very new - and I have to tell you that I never liked writing logic for commits and rollbacks in SQL. It just wasn't languagey enough. So I expect that within a few months I will be dangerous, and within a year I'll be first rate if not world class. My secret is my nose. I have a good nose for the pure flavor as well as strategic import. What I don't have are years to sit down in a fulltime gig and wait for my boss to ask me to do something interesting. Anyway. I need to write this preface in order to jumpstart the emotional part of this memory as I go forward so one day when I look back on my learning process I can say "Oh yeah I remember how shit I felt not getting that F job because they wanted somebody who was better than me at saying 'I live transactions'".
Anyway, in order to add to this frustration. Here is me mumbling about Kinesis and Kafka in May of 2016 (wow that was a long time ago).At the time, I was very enthused about transaction processing. Here I wrote Dreaming about Streaming.
A couple weeks ago I presented a webinar in which I discussed one of our realtime apps which is attached to online gaming. On the back end of this architecture we are processing about 30,000 transactions per minute with a small two shard cluster of VoltDB. This setup hardly makes a peep over 25% CPU utilization running 24/7 with no downtime in two years. Cool enough, but then you realize that is handling more mobile transactions per year than PayPal. I did the math. And oh what fun it is to do this kind of math and realize what's computable for the kind of analytic data frameworks we build.
And Transactions, The Final Frontier
I am newly aware of the idea of immutable data, but I have yet to work with such a designed data set. What I have seen are transaction systems that generate records that are overwritten to 'the latest state'. Consider the following idea. I have a plan. The plan meets my boss. I change my plan. The plan meets the enemy. I change my plan again. The plan meets resource constraints and a series of unknown unknowns and it changes yet again. It is finally set. Now comes the implementation, and something else entirely changes. This is a chain of events that are almost never conceived of in the design of the system expressed to generate the transactions. And yet we are called to audit and analyze such strings of transactions all of the time in our analytical applications. Most likely thing that happens? Add an adhoc field to the end of the transaction and have humans interpret that at the end of the day. There have been seven alterations of the planned activity, but you only have three records. The initial plan, the final plan, the actual transaction.
It is on this last point that I was particularly trying to express, recently, the idea that I had to go back and build window functions that created a new semantic layer in to post-processed transaction data and then output that to a user queryable exhibit. It would have been nicer, of course, to put that layer ahead of my DW consumer, neh? So now I should get the chance as we will be, as I said, 'cloudifying a mainframe'. In otherwords re-engineering an IBM MQ based message system with Kafka. This is actually something I've been wanting to do since.. NARDW. (for further reference, put date here.) NARDW was build in 2005? And it's when I scratch-built the workflow for 17 data marts in an ERP integration with IBM guys.
So I've run some basics. I put together and then tore down AWS MSK and a client machine. I also put together a console producer and consumer for a test topic on a new machine in my home setup. Now I've got a couple books to read. See ya.
Comments