Q&A: Deficiencies in 3NF that DV addresses?

hi folks, i’m back – and i’ve just received an email from one of my friends who wants to know the answers to a number of frequently asked questions.  i will blog the questions, and the answers here in this posting.   in this post we cover some of the deficiencies that 3rd normal form as a data warehouse has.  if you have comments, thoughts, or other experiences i encourage you to add your comment to the end of this posting.   as you know, i’ve launched one-on-one coaching, this is the kind of knowledge you get within the walls of one-on-one coaching. but for today  – i’m giving you the answers free to show you the kind of value you get when you sign up for my coaching sessions.  contact me today: coach@danlinstedt.com for more information.

what are the deficiencies that exists in 3nf that dv will address?

this is answered in my business book, if you haven’t gotten the business book i encourage you to do so right away.  to purchase a copy of the book (download or full-color print edition), click the dv business book link on the side of this blog.  you can also watch the video of the business presentation slides on youtube (albeit there is no audio to the slides).  by the way, the audio for the slides is available inside the coaching section of my blog. 

here’s a few extras to consider when answering this question:

  1. 3nf is relationship bound – meaning when you build a 3nf model as a data warehouse you introduce 1 to many, many to 1, and 1 to 1 relationships in the model.  this causes all kinds of dependencies on todays perception of how the business runs.  in the long run, it makes the model inflexible for change, and causes all kinds of “downstream” impacts when a change does come in.
  2. 3nf is intended to meet oltp  needs.  it is and was never intended to be a data warehouse!!!  when you introduce “time-variance” to the primary key structures, you introduce cascading child impacts across the data model when changes come in.  you can also see my blog entries about this topic on: http://empoweredholdings.com – this is my business blog for edw, data vault, and a variety of other business related topics.
  3. 3nf (when it has parent-child dependencies) can’t process real-time inflow streams of data!!  there’s nothing worse than trying to “process the parent, then child 1, then child 2, then child 3, etc…” especially when you have 10,000 or more transactions per secound pouring in to your data warehousing queue.  so, you want real-time?  then forget 3nf as an edw!!!
  4. 3nf when it has “time-series” introduced to it’s model, along with parent-child complexities, won’t scale properly.   oh the model has no problem scaling, the data set has no problem scaling – it’s the performance of the etl that drops like a rock in water.  it’s the performance of the queries that fail when too much data is embedded.  query performance suffers even more when the rows of a table get too wide for efficient query parallelism.   it’s these kinds of things that kill the data warehouse down the road, and force the business to “start fresh” with complete re-designs.  you don’t want that kind of legacy do you?   if not, then use the data vault and do it right the first time!

there are 100 more items i can list as reasons to why 3nf should not be used for data warehousing.  if you like these answers and want more, then sign up for my one-on-one coaching session!  i not only answer these questions with explanations, i show you what the impacts are on cost, time, investment. you get easy to use content delivered in private blogs, videos (not found anywhere else on the web), articles, documentation, and audio.

compare reporting queries?

what do you mean compare reporting queries?  this makes absolutely no sense at all.   how do you compare queries against two different models, let alone two different sets of data!!  the data in the data vault is raw – the data in traditional data marts is massaged.  you can’t compare reporting queries.  this is an impossible question to answer.  even if i had a 3nf model side-by-side with a dv model and i put raw data in both, how do you compare the queries?

Tags: , , , ,

No comments yet.

Leave a Reply