Here's a link to the webinar I did last week with VoltDB. Having Volt as part of our architecture has enabled us to think about a whole new class of applications. Right now, I would say that we're at the point where we're really ready to deal with massive IOT streams. It's just a matter of getting the right people together. These days I'm brainstorming this kind of stuff, excited as I am by looking at real-time events and figuring out what I need to drive those data dogies.
In this presentation I talk about three different apps that are part of our Fast Data portfolio. All of them are real customers using this technology in production as part of Full360's manage service offerings. We designed these apps and built them as well. All of them run securely and reliably in AWS VPCs and customers love them. These are exemplary of our multi-tier DW framework that we call elasticBI. Why Panigale? Because this is all about making very fast decisions in real time. A second late is too late.
This is an old story but particularly interesting.
1995, I was working for a paper products company in Atlanta. I cannot even recall the project's details except to remember that most of the employees of the company ran PCs without TCP/IP stacks and all of their networking was done by Citrix. Still, the project was going slowly.
I got a chance to meet with the president, whose family was involved in the business and his top financial analyst who looked like Lucy Liu and was deadly with a spreadsheet. It was clear that these two were the brains of the operation. We were going over boring bills of lading, literally shipping documents when suddenly the president got an insight. He picked up one piece of paper as we sat in the conference room and called a warehouse manager.
"Ralph, this is Sam at HQ. You know our customer X?"
"Yes it's our biggest customer"
"When did you last send them a shipment?"
"And how much did you send them?"
"1500 pounds. We send them twice a week, Tuesday & Thursday"
The president went on to discover that the warehouse manager used a flatbed stake truck that had a capacity for ten pallets and that 1500 pounds of product took up three. But the thing that caught his eye in the paperwork and that he knew was that it was customary to discount the freight cost when selling to your biggest buyers. This was a discount given at the discretion of the 85 warehouse managers nationwide. The freight cost was calculated per truck trip and tied to the cost of gasoline. Why make two trips with half a truckload and give the discount twice, if you could make one trip and give the discount once?
He had the financial analyst crunch the numbers and figure out how much the company could save by making fewer trips and/or not discounting freight. It was massive. I pulled the historical shipping records from the database. Within two hours we figured out how to save the company up to 1.2 million dollars per year.
The president asked me how long it would take to build a system that would use the spreadsheet formula against the shipping records at each warehouse. Two months, I said. At the time, however, none of the warehouse managers had networked computers.
Lessons Learned. 1. You need an executive with a keen insight to how the business actually operates. He knew how to read a shipping document, and he understood how warehouse managers work. Why they do things the way they do, how they perceive orders from HQ and what their financial incentives are.
2. You need to be able to make accurate models of the actual costs involved and prove them out before you build any systems. You can't just build a system that captures 'everything' and expect that it's going to tell you something valuable.
3. You need a business culture where you can push down responsibility for costs and revenues to the people who actually do the work, and show them in dollar terms, how a change in behavior affects the bottom line. If your system only has a few end users at HQ, so what? There's a difference between knowing the right answer and doing something about it. Knowing only took us two hours.
4. You have to have compute infrastructure in place at a low enough cost so that the rare insights of cost savings are worth implementing a system for in the first place.
In the case of that company, number four killed the whole deal. If you counted up the warehouse managers, the cost of upgrading & networking them per warehouse, the time to build the system and the licensing cost per new user, it would have completely offset the cost savings. The company had to end up issuing a memo and policy change, like an order from God instead of building the system to let the warehouse managers see their efficiency rewarded.
These days enterprise software takes a back seat to the cloud in terms of total cost of ownership. But like 1995, most companies are not forward enough in their thinking to rise to the leading edge of technology. Even so, technology is only one part of the equation in improving the business.
BigDoor decided to move away from its custom ETL solution and move into a more mature data warehousing platform. Malek said he chose Full 360 and Vertica because the solution was both advanced and easy to implement. "There couldn't have been a better fit for what we needed," he said. "Vertica had a reputation for being flat-out fast; Full 360 made the information-gathering, licensing and installation process simple. It was perfect."
I've been going through my entire library this weekend and finding many treasures, came across the old Tereplex white paper. It contains the classic OLAP definition - the five strengths of Essbase over Relational DBs. Now is a good time to refresh and reconsider in the light of new scalability, pricing and infrastructure. I think, based on these five, that OLAP stands up well. I tend to wonder if the market understands what it can do, given the vast array of products out there.
Online Analytic Processing (OLAP) Because OLAP technology provides user and data scalability, performance, read/write capabilities and calculation functionality, it meets all the requirements of a data mart. Two other options— personal productivity tools, and data query and reporting tools—cannot provide the same level of support. Personal productivity tools such as spreadsheets and statistical packages reside on individual PCs, and therefore support only small amounts of data to a single user. Data query and reporting tools are SQL-driven, and frequently used for list-oriented, basic drill-down analysis and report generation. These tools do not offer the predictable performance or robust calculations of OLAP. The OLAP technology option supports collaboration throughout the business management cycle of reporting, analysis, what-if modeling and planning.
Most important in OLAP technology are its sophisticated analytic capabilities, including:
Aggregations, which simply add numbers based upon levels defined by the application. For example, the application may call for adding up sales by week, month, quarter and year.
Matrix calculations, which are similar to calculations executed within a standard spreadsheet. For example, variances and ratios are matrix calculations.
Cross-dimensional calculations, which are similar to the calculations executed when spread- sheets are linked and formulas combine cells from different sheets. A percent product share calculation is a good example of this, as it requires the summation of a total and the calculation of percentage contribution to total sales of a given product.
Procedural calculations, in which specific calculation rules are defined and executed in a specific order. For example, allocating advertising expense as a percent of revenue contribution per product is a procedural calculation, requiring procedural logic to properly model and execute sophisticated business rules that accurately reflect the business.
OLAP-aware calculations, which provide the analytical intelligence necessary for multi-dimensional analysis, such as the understanding of hierarchy relationships within dimensions. These calculations include time intelligence and financial intelligence. For example, an OLAP-aware calculation would calculate inventory balances in which Q1 ending inventory is understood not to be the sum of January, February and March inventories.
OLAP technology may be either relational or multidimensional in nature. Relational OLAP tech- nologies, while suitable for large, detail-level sets of data, have inherent weaknesses in a deci- sion-support environment. Response time for decision-support queries in a relational framework can vary from minutes to hours. Calculations are limited to aggregations and simple matrix processing. Changes to metadata structures—for example, the organization of sales territories— usually require manual administrator intervention and re-creation of all summary tables. Typically, these relational solutions are read-only due to security and performance concerns, and therefore cannot support forward-looking modeling, planning or forecasting applications.
In addition, resolving simple OLAP queries, such as: “Show me the top ten and bottom ten products based on sales growth by region, and show the sales of each as a percentage of the total for its brand,” can require hundreds of SQL statements and huge amounts of system resources. For these reasons, many sites that initially deploy these technologies to support ad hoc reporting and analysis are forced to disable access and limit the number of concurrent queries.
For analytic and decision-support applications, implementation and maintenance are often more cumbersome in a relational environment. There are very few tools to define, build or manage relational schemes, forcing developers and consultants to manually design and continually optimize databases, leading to long implementation times. Furthermore, a large IT support staff is required to implement, maintain and update the environment, increasing the overall cost and limiting the IT organization’s capacity to address other strategic information systems projects. Yet another concern is security, as a Relational Database Management Systems (RDBMS) provides table/column security only and cannot easily control access to individual facts in a star schema. The result is that it is often difficult or impossible to provide robust user data access security in an analytic relational database other than at the report level.
Multidimensional technology is free from the limitations that relational databases face in decision-support environments, as multidimensional OLAP delivers sub-second response times while supporting hundreds and thousands of concurrent users. In addition, it supports the full range of calculations, from aggregations to procedural calculations. Companies using Hyperion Essbase are able to rapidly deploy data marts and adapt to changing business environments. Since Hyperion Essbase is a server-centric technology, companies can share information readily and securely, with protection down to the most granular levels. Multiple users can update the database and see the impact of those updates, which is essential in planning and forecasting applications.
A couple notes about these claims.
The OLAP aware query is the most substantial and time-saving aspect of writing with Essbase. It is just as significant now as it ever was, if not more. While I've seen very few applications with full requirements of historically contextual slowly changing dimensions (most people restate), keeping metadata aware queries stable as the dimensions change is almost always a requirement. Dimensions change, your queries shouldn't have to.
Security is still key. The ability to lock down to the cell level and determine sections of the database that are read/write vs read-only is a key differentiator.
Aggregations and matrix calculations can be done quite well in relational tech. In columnar tech, cross-dimensional data can be handled as well, although it takes a bit of doing. But Essbase still shines in procedural and the other two areas.
Whichever way the technology goes, my colleagues and I at Full 360 will offer a broad selection in the best environment. Which brings us to one key paragraph - the one about ROLAP. We've got that handled, and the way we put together our two tiered database environments (when necessary) have managed all of the pain out of staffing for DW development and maintenance. We've come a long way in the past decade. While much of the theory is in force, technologies and practices have moved forward.
The situation is as follows. The customer has created a reporting requirement that defies the many to many rule of multidimensional database. They want to call something a name depending on the condition in another dimension.
The answer can be implemented in one of four places.
A. In the source system, reclass the transactions by creating a new member, the doppelganger member.
B. In the source interface to Essbase, make an sql fix that does a union query on the exceptional member and hardcodes the new doppelganger member in the select of the second query.
C. In Essbase itself, write a calc that re-allocates based upon the rule and send zeros to the original number based on the @isdesc of the driving dimension.
D. Report logic.
The higher upstream this is done, the less maintenance it will take in Essbase. However, the C method makes it easiest to handle history.
I just joined a twitterfeed called bigdata. There is discussion afoot about IBM's new initiative to field a 4000 consultant army in Analytics. I think it's going to be a force to be reckoned with. In fact, it's something I've always hoped happened with Hyperion technology. But aside from that, a commenter named Hugo made the following point.
How many of the businesses who import into America use all these
'smart' systems? Very few. They have good management. Any American
business that looks like they may need ERP (Enterprise Resource
Planning) software, could ask them selves some questions: Have you
tried splitting business units into units no bigger than 300 people?
have you allowed those business units to make their own software
choices, i.e a simple industry specific system price $100-$150 k,
instead of $ millions? Have you checked the CEO is someone who
understand the product, and is not a financial person? When the
training budget is spent, is it mainly on the 5-10% in management, or
are all employees included in some kind of training, including problem
solving, idea generating? When these questions are answered, you
probably won't need to waste money on expensive systems, and can invest
the money in R&D, maintenance, training - and also renewable energy
systems, so lowering break even.
I've seen what happens in big businesses that have decentralized systems in the pre-ERP era, and those who have avoided a centrally planned and disciplined systems approach. They are invariably hamstrung during economic times like these when the company needs to trim costs across the board.
In other words, it doesn't help at all to have highly capable execs at the top of the company if they do not exert fiscal control down to the departments, and that is impossible without centralized, coordinated financial planning and reporting.
I'll give you a concrete example.
One of the largest RFPs I've ever seen in this business called for the replacement of a system to manage 17,000 independent project budgets. Imagine a company so large that it ran that many independently funded projects every year. The opportunities for economies of scale may not exist at all on the systems delivery side. Surely as license agreements are concerned, vendors like IBM are going to be interested in premium pricing for their biggest offerings. Clearly there's a lot of consulting to pay for as well. But it has always been the advantages of economies of scale on the business side that have made these matters attractive to the buyers. Only a company can know how much savings they can realize through centralized planning and control. In the case of this RFP, we identified the potential for quickly making connections through that population of financial managers. Why? Because the corporation could always deploy an analytical group that could find opportunities throughout the system. In this case, buyers of titanium in 400 decentralized projects would never think to or be enabled to make a single coordinated buy, or do something as simple as multi-supplier comparison shopping with stovepiped specialty systems.
Such efficiencies became immediately obvious as we put together the design for the system - it was a home run built into the prospect of a single-vendor enterprise class solution. But such things are not always so obvious.
Here's a second example, one that I've told many times. I worked, back in 95, for a manufacturer that had a nationwide product and its own distribution network. I worked very closely with the CFO and his top analyst. He grew up in the business, she was an ice-sharp Ivy League MBA. In an afternoon they started to look at the books and try to squeeze costs out of the system. The analyst found a contradiction in the matter of the way discounts were booked in the ERP system. I confirmed it by looking at transaction types that were my feed to the dashboard. The CFO immediately got on the phone to several warehouse managers.
It turned out that there was a discount business rule tied to a customer attribute. If you were a good customer, good defined by total tons shipped per year, you got an automatic 10% discount on freight. But we discovered by looking at the aggregate numbers that something was amiss. In one afternoon of phone calls by the CFO, it was discovered that warehouse managers marked their discounts different ways. In particular, the most predominant way was that they took 10% off the fuel costs per delivery.
So here's where a smart manager who knows the business makes the right call. He named a customer and asked the warehouse manager if a shipment was going out that day to one of the big customers. Yes. How many pallets onthat truck? 1 and a half, was the answer. The truck could handle 10 pallets, but was taking these small shipments on demand and discounting the gas for every trip.
The CFO turned to me and said, I want you to build a system that shows the difference in our profitability when we send half-empty trucks out to customers and change the incentive to warehouse managers so they can see it come out of their bonus. Because today they all think that if they send a truck out every day or every other day it improves customer satisfaction. Yes it does, but it's killing us on fuel costs.
I had to shrug, because it was 1995 and nobody put EIS capable PCs in the hands of warehouse managers. Warehouses were not networked and doing so would require Citrix, very expensive in those days.
These days it is not very often that the lethal combination of line managers who grow up in the business work in coordination with hotshot analysts and soup to nuts developers that can deploy systems into nationwide networks. I think those times are going to come back when cloud-resident systems become a reality. At that end of this evolution companies like IBM who field large consulting forces are going to be at a distinct disadvantage. In the short term, they will have the advantage.
In putting together an Essbase management book, I'm digging deep into my memory for engagements I had with customers and prospects and recalling specific issues they had towards which we applied our knowhow and technology.
There's a big national insurance company that has some large earthquake proof buildings in Los Angeles. Let's call them City State. I recall a trip I made out to them in my buddy's new Nissan Maxima. We engaged the customer in a use case design session, reviewing their entire FOCUS-based repository. From all of their source systems, they had 36 dimensions and many more which were set for aggregation in FOCUS. They called us to help them understand how Essbase could be employed. The managers at City State were tasked with serving a large number of customers from different areas across the business. Data from all lines of business were represented in the current FOCUS database and more were scheduled to be added. FOCUS could no longer handle adhoc queries. They wanted to be able to query anything at any time using the full set of dimensions which included a great deal of demographic data. In short, they had a data mining problem but did not wish to invest in data mining technology.
My use case analysis revealed that there were different clusters of interest around the data which could be identified by analyst group, and began a process that could be tied to incentives for the analyst groups. His solution considered both data supply and data demand.
On the demand side, the query process was undisciplined. Any user from any group could build ad-hoc queries against the whole of the FOCUS database. Nobody knew exactly what queries were being run or for what purposes. The users had become accustomed to asking for everything. They had no organized way of qualifying their requirements from IT and were not satisfied by the results.
So I held a workshop and through that process we discovered that the Fire LOB analysts knew from their own professional experience that demographic information was not pertinent to the number and severity of claims related to fire. Consequently, the value of demographic dimensional data for that subset of insurance vehicles was de-emphasized and other dimensional attributes, like roofing type & proximity to hydrants were emphasized. We were able to establish a framework through which City State could confidently gather all of the data attributes on all of their lines of business and provide strict guidelines for marting that data to the various user groups.
Once this process of generating requirements was made demonstrated, the path to solving supply and demand issues for City State was made clear. This made blocking and tackling projects a great deal more responsive to the needs of the analyst communities, and gave IT a way to make sense of the backlog and frustrations. Once analysts 'got it' with regard to looking at a set of measures and then associating the most key dimensions to them both sides could see what work was required that could get them through development smartly.
You can follow this kind of guide for your purposes. What are the problem areas that analysts recognize? You can ask this realizing that nobody may have been assigned to solve the problem - that management is busy elsewhere, but the folks in the cubes know that the problem is real. It might start with a very simple metric. How long are people on hold for customer service? Or how about Number of exempt employees with a Masters or Doctorate degrees
expressed as a percentage of the total number of exempt employees? It doesn't take too much creativity to realize that any number of business issue can be quantified in this manner.
The next step obviously is to consider within the vast resources of your company's computer generated data, which systems might have information that could credibly generate a dataset worthy of analysis. Certainly any number of operational systems might contain that information. And that would be easy if analysts are struggling to get reports out of the current systems which might provide some insight. HR analysts might know that total number of exempt employees are in one system, but that number of masters or doctorate degrees by employee is in another system. The merging of these two would be the job of the new project manager who requires the assistance of those current data stewards. The analysts can contribute by drawing a reasonable box around the universe of data to be explored, and senior managers can consider what priorities are to be emphasized based upon expectations of what will be found in the data. This insures that the area of inquiry is credible and actionable.
I pretty much decided this morning that the Cubegeek blog will be the working version for the book that I'm going to write. I've been thinking about it for half a dozen years and now I'm just going to do it. Since I've been blogging for at least 3 years, I'm very comfortable with the form and I know I'll aggregate plenty of material.
One of the ideosyncratic points of my book will be that it's meant to be something of a guide for apprentices and masters alike, but angled towards practitioners, not theorists. So when I start talking about 'Case Studies' I'm going to use narrative, not academic jargon. I'll call them 'Case Stories'.
The difference between a Case Story and a Case Study is that the story is portable enough to become something of an urban legend. In computing this is a good thing because we are about building knowledge in the field and working with knowledge in the field. Probably the most famous story in our field is the legend about 'beer and diapers'. I'll not repeat it here. The point is not in the details of the story, but in the fact that it communicates well the concept of correlations in retailing.
What it important to know is that these kind of conceptual breakthroughs are possible - that people in different industries are trying to use computers to help them solve problems that are far from obvious. It is not important that one understands correlation algorithms. I'm trying to teach things that cannot be googled, you see. You can find out on your own time the extent to which the originating beer and diapers story is or is not true or applicable, but you cannot deny the force and impact of the story. This is the difference between a study and a story.
As you build applications in the field, you will be forced to study. You will find details and results that will only make sense in the context of your study and you will write code to manage and nail down that problem. But above all that work will be a tale, a story, that builds your reperotoire of understanding. This is what helps you associate the particulars of your geekdom with the practical forces of the market and the industry. That's where we're going.
Every solution has to be sold. People will live with problems their entire lives until some charismatic character convinces them to change their life. This is certainly true of computer systems. You can be that charismatic character if you can pull a narrative out of your bag of tricks. Case Stories are a means to that end.