Perhaps the topic I’ve witnessed the biggest arguments over is which data modelling approach to take.
I don’t mean a choice between Inmon (3NF) vs. Kimball (dimensional).
No, this concerns whether to develop the data model in-house, or buy an off-the-shelf data model. It’s the old ‘build vs. buy’ choice.
Throughout the 1990s and into the noughties I’d only ever come across data warehouse systems where the data model was developed in house.
In-house developed models shared common traits. The table full of customer data was called something useful like ‘customer.’ Ditto addresses in the ‘address’ table. Rinse and repeat.
All of a sudden, or so it felt, we started to see data warehouse platform vendors, and others, pushing the notion that models should be bought and not built.
The sales pitch included benefits such as ‘get started quicker,’ ‘don’t re-invent the wheel,’ ‘flexibility,’ ‘distilled industry insights,’ ‘directly supports your reporting needs.’
Said models often came with price tags of over £250k, require an NDA before you could read the docs, and had to be tailored to your needs.
It always felt to me like the platform vendors were trying to sell a product where it wasn’t needed. Sales staff have got to eat, after all.
One of the most interesting projects I’ve worked on was to act as referee in a data modelling standoff. This was for one of Europe’s most well known brands.
The architects were in the ‘build’ camp whereas the modellers were in the ‘buy’ camp. The latter were winning the argument. The industry standard model had been duly bought and implemented.
After spending several weeks interviewing all concerned I was genuinely shocked at what I found.
Inbound files ended up being mapped to dozens of tables in some cases. The core data model was so complex it was basically off limits to end users. Even if they could navigate the model logically, the impact on the platform from so many joins was catastrophic.
Make no mistake, in-house models are usually significantly easier to understand than off-the-shelf models. Remember that point I made in post #1 about simplicity? Well, data model slingers don’t seem to agree.
The use of the complex off-the-shelf model required a full-fat semantic/reporting/BI layer (pick your label) to support end user access to the data. This took hours to build every day, thus delaying end users access to the newly updated data warehouse.
I’ve seen this play out several times since. The semantic layer takes database space, time and valuable compute resource to build and all because of the off-the-shelf model complexity.
Don’t be suckered dear reader. You know your data best. Data modelling isn’t as hard as some like to suggest.
Above all, take your users with you and KEEP IT SIMPLE.