i’ve had the pleasure recently of discussing this topic both publicly and privately with all kinds of people in the data vault community. for this, i say thank-you for all your wonderful insights. i now see a bit where the confusion lies (ie: what is and is not a data vault), what are the acceptable modifications? can i make modifications and still call it a data vault? how do i use and leverage the dv techniques without “being so rigid”, and yet make them work for my customers?
i will try to address these questions and more in this post. let me try to sum it all up as efficiently as possible:
- the one and only data vault 1.0 modeling standards are published in my book: super charge your data warehouse (available on amazon.com, and on http://datavaultalliance.com as pdf / download content
- the data vault 2.0 standards are available in my new book: building a scalable data warehouse with data vault 2.0 (available on amazon.com)
- data vault 2.0 includes: architecture, methodology, modeling, and implementation. within this the goals are: agility, scrum, nosql platform support, big data support, performance enhancements (to the model and loading processes), performance enhancements to querying, business data vaults, and virtualization of information marts, and much much more.
- there is only one authorizing body to get “trained” in standards: data vault 2.0 boot camp and certification is offered by myself and two of my authorized partners: doerffler & partner in europe, and analytics8 in australia. i offer the courses in us, canada, and the rest of the world. the certification is for: certified data vault 2.0 practitioner (cdvp2) – more well rounded, includes the modeling bits, along with all the other components.
core standards that should never be changed, re-defined, or broken (or you and your customer run the risks associated with re-engineering, and re-design because of high volume (big data), high velocity (real time), re-architecture (unstructured or multi-structured data).
- hubs are always business keys
- links are always time independent relationship tables
- satellites are always descriptive
- never foreign key a hub
- never have more than one parent / one foreign key per satellite
- always use the date /time (down to the millisecond or microsecond) of data arrival as your load date
- never apply “soft” business rules on the way in to the raw data vault
principles to live by:
- always try to “integrate” business keys in to single hubs (where they represent the same grain, and same semantic definition).
- never build disparate un-integrated source system data vaults.
- virtualize your information marts through views until performance dictates you must physicalize the tables and replicate the data.
if you follow these basic six principles, you are “sticking” with data vault modeling standards for the most part, and should not run in to any major roadblocks mentioned above.
beyond these foundation rules, we should be able to customize the attributes to our hearts content.
if what you’ve built, or what your customer has built follows these basic standards and these principles, then what they have “should” be a data vault model. however, it may still need a little review now and again.
hope this helps clear the air.
for early bird discounts and news of new courses: http://datavaultalliance.com