Good business and data analysis should be done well in advance of a detailed SOW. A good design can be fully compromised by an inability to deliver good data. We all know the GIGO theory, but everything between garbage and gold is not necessarily good.
My approach to to sourcing multidimenal data is by the Measure by Measure process. This means that I go down to the individual measure and find out which system(s) has the data available that will fit nicely into my multidimensional model. Let's try an example.
At my company, I know that my end-users which are many will have to deal with Wholesale Volume. Right now, in the Vehicle Operations department, I have at least two upstream systems which has WV. Which should we use? In this particular situation, I have recently discovered that while most of the end users get these numbers for a report named Carflow, this is not actually a system. There are at least two systems upstream of the Carflow report. They are NDSN and VPipe. In addition, there is at least one other system that various systems analysts use to create the Carflow report, depending upon which set of numbers you are talking about. These individuals will mix data from those various sources and use their understanding of the business to guide their judgement about which numbers to use. Those processes should be understood if we are to source 'Carflow' from their sources. What will help us is the Measure by Measure approach.
In the Carflow report itself the dimensionality is implicity expressed. It needs to be explicitly expressed as a valid data source. For example, the WV figures are expressed in 12 periods for an Model over the past three Model Years. We need to examine each of those three dimensions for the WV measure. On the same report, Fleet Volume is expressed for the same Model but Model Year is not broken out. This indicates that some aggregation of FV is done elsewhere. Some systems analysis proves this to be true and it was confirmed that the Fleet numbers were generated by another individual who contributes to the report.
Our requirements for the larger application into which 'Carflow' numbers must be input therefore require some of the detail behind the actual Carflow report. These systems are VPipe, NDSN and others. Our requirements are relatively simple but they must be consistent across all measures so that all of them will fit in the same multidimensional model.
Measures: (Wholesale Volume, Fleet Volume)
Models
The Model dimension presents the first challenge. In the Carflow report, each spreadsheet tab represents a Model. 'Sentra' is a model. However in our requirements, we need Model at the 'P4' level. Additionally, we know that P4 code is a function of 'Model Line' and 'Model Year'. This makes things a bit more complicated with regard to explicit expression. For example, the P4 code for XTerra is WD22, but only in Model Years before 2005. In 2005 and beyond, XTerra becomes N50. However in the Carflow report, this distinction is not explicitly expressed. We therefore assume that the analysts who assemble the Carflow report understand this for Wholesale Volume and use a combination of P4 codes appropriate to the Model Year.
Model Year
Model Year is not so problematic vis a vis what is explicitly needed in the targe and what is implicit in the source. There are, in the Carflow report, several lines which indicate 'Current Model Year' which need to be derived from common sense. There are also some lines which have no listing of mdel year.
Month
There are obviously 12 months in a calendar fiscal year, but there may be some measures, like BOM inventories, that need more than 12 months of source data to be properly calculated. There may also be accrual type calculations which are necessary for the creation of a certain number. Or there may be rolling averages involved.
So for each Measure we seek to include in the multidimensional model, we must evaluate the various sources and determine if the dimensions work. So we can ask questions and make requirements like - "I need NA CV by Model, Model Year" as raw data and not get confused by what shows up in reports vs source systems.