The Great Data Warehousing Debate

Photo by comfreak on Pixabay

 

Pioneers Bill Inmon, known as the ‘father of data warehousing’ and Ralph Kimball, a thought leader in dimensional data warehousing, have an ongoing debate. According to Kimball: “The data warehouse is nothing more than the union of all data marts”, to which Inmon responds: “You can catch all the minnows in the ocean and stack them together – they still do not make a whale”.

Here’s what they’re arguing about: In a typical data warehouse, we begin with a set of OLTP data sources. These could be XL sheets, ERP Systems, Files or basically any other source of data. After the data arrives into the staging environment, ETL tools are used to process and transform the data and then feed it into the data warehouse. According to Inmon, data should be fed directly into the data warehouse straight after the ETL process. Kimball, however, maintains that after the ETL process, data should be loaded into data marts, with the union of all these data marts creating a conceptual (not actual) data warehouse.

 

 

Inmon and Kimball approaches to data warehousing

The Inmon approach is referred to as the top-down or data-driven approach, whereby we start from the data warehouse and break it down into data marts, specialized as needed to meet the needs of different departments within the organization, such as finance, accounting, HR etc.

The Kimball approach is referred to as bottom-up or user driven, because we start from the user-specific data marts, and these form the very building blocks of our conceptual data warehouse. It’s important to know from the outset which model best suits your needs, so that it can be built into the data warehouse schema.

 

The Inmon approach (diagram by the author)
The Kimball approach (diagram by the author)

To illustrate, we can consider a data warehouse to be like a filing cabinet, and the data marts its drawers. For Inmon, we transfer all the data into our filing cabinet (aka data warehouse) and we then decide which subject-specific drawer of the cabinet (aka data mart) to put the different files into. Conversely, for Kimball, we begin with a number of subject-specific drawers (data marts) that reflect who needs to use what data, and we can stack them into a cabinet formation (the data warehouse) if we want to, but at the end of the day, they’re just a load of drawers, whether we bring them together into a cabinet or not.

 

It is the business needs of a certain entity that determine the correct approach for them.

Here are some Examples to illustrate:

  • Insurance: In order to manage risk based on future predictions, we need to form a broad picture across all policyholders, made up of a range of data such as profitability, history, demography, etc. All these aspects are interrelated, so the Inmon approach of starting with all the data in the warehouse and filtering it according to need is the most suitable of the two.
  • Manufacturing: In the manufacturing process, a wide range of interrelated functions need to be taken into account for the smooth running of the business, such as inventory, store capacity, production capacity, man hours etc. Again, therefore, the Inmon approach is ideal, making all available information accessible for use as needed.
  • Marketing: This is a specialized field, and we don’t need to look at every aspect of marketing for the purposes of analysis. So, we do not need an enterprise warehouse – a few data marts is enough – aka the Kimball approach.

Conclusion

In 2017, Gartner estimated that 60% of data warehouse implementations would have only limited acceptance or fail entirely. To enhance acceptance and success, it is important to set up your data warehouse correctly for your needs, from the very beginning. The implications of taking the wrong approach are costly and time consuming. When considering which approach to take – the Inmon or the Kimball – consider factors such as your budget, data volume, data velocity, data variety and data veracity. Then watch out for pitfalls, such as inappropriate software, poor communication between the business and the team, poor cost estimation etc.
This is where experience really counts. What you might think is the right approach may not be the one that I think is right, and without a proper understanding of all the implications, mistakes can be made. With the right experience, you can find a cost-effective method to build a time-efficient solution.