There is a marvelous kind of awareness that emerges in the years beyond ambition and mastery. I am in the midst of that new prime, and it is more than cool and more than lovely to be in my mid 50s as a techno-whatever I am. For me this is happening as the aegis of systems I am responsible for go back through the tiers and more towards the origins of data.
The extraordinary thing I have witnessed and participated in having migrated my mentality and lots of systems to the cloud is that I am in much more control. When things break, which is always a learning opportunity, I am more responsible for debugging through more layers of the application than ever before. The very first gig I had in this new job had me finding out where timeouts were taking place - in the networks, in the database, in the middleware? We had to chase all that down and found it after several weeks, in the network defaults. I had a breath of relief because that was not my monkey, but it was my circus. These days, I have lots more monkeys and more than three rings in my circus.
Transaction processing is my new monkey. The reasons for that are various, but they can be generally summarized like this: A lot of analytics are required of management and analysts that need magic. And they need that magic because people need to count things that are abstracted in ways that transaction systems do not necessarily create explicitly. Let me give you an example. An Uber driver picks up a customer at (point A, time 0). He drops of that customer at (point B, time 1). He then picks up another customer at (point C, time 2) and drops off that customer at (point D, time 3). We can say that the driver made two trips and count the elapsed time, revenue, distance, satisfaction rating, etc from those trips. Let's say those metrics suffice. But later somebody decides that they want to analyze something we would call a transit: the time and distance between point B and point C. Now if you were streaming all of these events at a low enough atomicity, you could generate transit transactions from your data lake. But wait, who does that?
Almost nobody with legacy systems does that.Instead, what I have had to do is to build, according to new business rules another layer of interpretation of the transaction data that exists. Now since I use Vertica's marvelous window functions, I do have that capability. But it would be awfully nice if I could generate transit transactions from the raw stream of trips data that comes from my current transaction system.
I haven't had that kind of mandate - to upgrade transaction systems and make them generate more analytically precise transactions. Sometimes we spend a lot of processing on big data just aggregating and averaging these new metrics. Aggregating and averaging is easy in an analytic database. What is not so easy is simulating transactions by generating materialized views of nearly atomic data through a set of business rules so that these new counts can be aggregated and averaged. This is particularly something I have learned in the last application I built.
I am newly aware of the idea of immutable data, but I have yet to work with such a designed data set. What I have seen are transaction systems that generate records that are overwritten to 'the latest state'. Consider the following idea. I have a plan. The plan meets my boss. I change my plan. The plan meets the enemy. I change my plan again. The plan meets resource constraints and a series of unknown unknowns and it changes yet again. It is finally set. Now comes the implementation, and something else entirely changes. This is a chain of events that are almost never conceived of in the design of the system expressed to generate the transactions. And yet we are called to audit and analyze such strings of transactions all of the time in our analytical applications. Most likely thing that happens? Add an adhoc field to the end of the transaction and have humans interpret that at the end of the day. There have been seven alterations of the planned activity, but you only have three records. The initial plan, the final plan, the actual transaction.
I want immutable data that gives me a key to that transaction in all of its plan adjustments and permutations, because I want an audit trail that tells me who did what and when. In other words, I want more than just OLAP, online analytical processing. I want online audit tracking. That means I need to pull in more transactions and the ability to abstract the mutations and count those mutations.
Now I have a great deal of confidence that there are folks who understand this domain very well. To that end, I'm going to school myself on transaction processing systems with a view towards 'slowly changing dimensions'. And that will help me get my terminology right on matters of mutability and immutability. In the meantime, I think it's fascinating just to think about the opportunities. I'm swimming upstream and getting closer to the source.