Kimball or Inmon in an enterprise environment
datawarehousing out of the book?
In the last few years I have seen a dozen of data warehouse projects. more than 70% were build upon the (Kimball) Bus architecture having a 2-tier layer. Most of the time build “out of the book”. With other words the architecture has not been applied to what the business needs, but how it is common to do in the field of datawarehousing. And although I have been a supporter of the Kimball way, I’m currently stepping down of this architectural approach in several circumstances. In practice it appears to be a rigid architecture. Lots of today’s (enterprise) data warehouses build upon this approach got loads of problems in keeping in touch with the level of changes required by the business. Also maintenance cost increase rapidly over time when the dwh project evolves.
The Bus Architecture: 2-tier model
The main objectives i have against such an architecture in larger projects or enterprise setup are:
1. Some projects that were set up by organizations with little knowledge of data warehousing took the architecture from the shelf. They heard about Kimball, read his book(s) and were convinced to adopt the architecture. In that way it’s wrong to apply an architecture. Architecture has to do with bridging the gap between IT and Business. Without listening to your business/organization and yet adopting an architecture will fail as an project from the start. It is never sure that the chosen architecture will fit the needs.
2. In the bus architecture most of the time a 2-tier model is applied. It is the well known ETL approach. However a 2-tier model needs mappings that has to Transform and Load in once. Thus (very) complex and non standard mappings arise. The main side effects I have seen is that a snooty 3-tier model arises with the new tier in between the stage and dwh layer. This new layer then has no fixed model but consists of preparing or temporary tables to bridge the gap between stage and dwh models. It has no standards or guidelines at all because it only has a technical reason for existence. Spaghetti code can arise which will evolve into increasing maintenance costs.
3. With the bus architecture there is no clear difference between data (from the source) and information (“enriched” data with a.o. business rules applied) Because of this data marts arise that contain “enriched” data completely historized. This causes rigid architectures when (large) changes are required (for example: source integration or lowering of granularity) With other words a grey area comes into existence. By designing a clear 3-tier model (based for example upon the Corporate Information Factory or hub-spoke architecture) with in the middle the data warehouse (without transformed data) this can be avoided. the top layer then contains the data marts that can always be rebuild from the middle layer when (large) changes are required (thing for example of lowering the granularity)
The Corporate Information Factory overview: 3-tier model4. As a result of point 3 it is clearly to be concluded that there is more clarity where to place business rules in a 3-tier model. The more generic the business rule, the closer it can be placed towards the DWH. They however always are placed after the DWH in the dataflow. The DWH itself always consist of (raw) data that was extracted from the source. With other words, the more specific a business rule is required, the more closer it can be applied to the one who asked for it.
5. A data mart in a 3-tier model does not have to be a set of tables. It can be a cube, (set of) view(s) or BI meta layer (universe) as well. The only restriction is that the end-user can never query on the dwh tables directly nor through meta-layers or views.
6. And as a result of point 5 it can be concluded that a 3 tier model actually can support Operational BI as well. This because the load till the DM consists of standard way of development without any complexity. Thus performance (besides the hardware etc) must be fast enough to support near-real time reporting.
7. Also to mention separately: mappings will become more standardized between source and stage and between stage and DWH. Even the selection outwards the DWH for star schema based data marts can be standardized. In that way also traceability can be taken care off in a better way.
8. As a result of point 7 I can say that when there are no political and technical issues the staging area can even be skipped. Loading a predefined modelled data warehouse will result in standard non-complex data integration code/mappings.
conclusion: To repeat, I’m absolutely not against the Kimball approach. An architecture has to be setup against the business needs and goals valid for the involved organisation. But with corporate wide and/or larger projects I am convinced that the CIF way (including data vault) is the better way to choose….
So far my 2 cents.
Please be aware that I’ve set this up from a technical point of view. At the end a good architectural approach starts with the Business. Their requirements and boundaries must lead to a good architecture to bridge the gap to IT.
I am convinced that there are more points to be mentioned either to be argued.


Pingback: Kimball or Inmon in an enterprise environment | Gilles' blog