Notepad with Systems Thinking on a wooden surface.

Want to change or add a #DataVault Standard?

For many years, I have built, authored and maintained the #Datavault standards.  This includes Data Vault 1.0, and Data Vault 2.0.  There are others in the community who believe that “these standards should evolve and be changed by consensus of the general public”.

I have a number of issues with this approach.  In this article I will describe what it takes to author a standard for the Data Vault 2.0 System of Business Intelligence.  You are certainly more than welcome to contribute to the standards body of knowledge around Data Vault, I simply want contributions to be held to the highest level of integrity.

Introduction

Why people insist on “breaking the rules and standards” I set forth is beyond me.  Would you trust a heart surgeon whom has never been to school for proper training (standard methods and procedures) for operating on your heart?  How about a brain surgeon?  Well, of course it goes without saying that when your life depends on it (whatever it is, from a car functioning properly in a crash, to an airplane flying according to the laws of physics) that all of the sudden: good standards matter.

Now with all sets of standards there are the purist standards (those that I document), and the pragmatic standards (those that contain minor alterations or deviations).  Now, the bigger the gap between purist standards and pragmatic standards, the more likely the project / process / design will fail under stress.

The issue isn’t necessarily the alteration itself, it’s the lack of rigor applied to testing the pragmatic approach and alteration proposed that eventually results in failure.

There are some cases, on specific projects where I have vetted and approved minor alterations for a pragmatic approach to implementing Data Vault.  One such case is for Teradata.  The way the relational engine works, a Hash Key is not necessary, oh by the way, neither is a surrogate sequence identifier!  Teradata can and does hash it’s primary key / business key under the covers.  This is an optimization NOT made by most other platforms (except SAP HANA).

Most of the time however, the standards as I have defined them, must stay in place – or some part of the architecture, methodology, model, or implementation will suffer (in some cases, multiple parts will suffer).  Then come my competitors whom I originally taught Data Vault 1.0 to.  They make claims as they see fit.  I’ve made a list of some of their false claims below:

Poor Judgement Claims Made in the Market:

  • A Link can be a Hub
  • A Hub can have a foreign key to another Hub
  • A Satellite can have it’s own Sequence ID / Primary Key identifier
  • Sequences are FINE to continue utilizing, you don’t need Hash Keys
  • Standards for Data Vault should be managed by consensus, and by the community at large
  • Satellites can have more than one foreign key to more than one parent structure
  • You don’t need Change Data Capture
  • Data Vault 2.0 is nothing more than a change  from Sequences to Hash Keys in the Modeling Level

And more!  Some are far too outlandish to list here, they would simply provide a good laugh.

Want to Suggest a Change to the Standards?

I am not saying you cannot suggest changes.  I have always kept my door open (and continue to do so).  I welcome suggestions, and thoughts around how the standards can evolve to better suit the needs of the market place, automation, big data, and so on.  In fact, it was with a team of individuals that I collaborated with in order to innovate Data Vault 2.0 in the first place.  This team included: Kent Graziano, Michael Olschimke, Sanjay Pande, Bill Inmon, Gabor Gollnhofer, and a few others…

I didn’t make sweeping changes by myself, or just because I thought it would be a good idea, no – I tested (and tested and tested), and vetted the ideas with my colleagues before announcing (about 2.5 years later) the Data Vault 2.0 system of business intelligence.

I am more than happy to have you suggest changes, or to hear your ideas.  Standards do need to evolve, change, adapt (hopefully without causing re-engineering effots). That said, I expect you to apply proper rigor before making suggestions.  Below are a list of conditions I expect you to run your changes through, and bring documented results of – before I can consider the change to the greater standard.

  1. Test against Large Volumes of data (these days it must be > 500TB of data)  This number will continue to increase as systems are capable of handling larger data sets.
  2. Test against Real-Time feeds (burst rates of up to 400k transactions per second).  This number too, will continue to increase as systems are capable of handling larger data sets.
  3. Test against Change Data Capture and Restartability
  4. Test against multiple platforms, including (but not limited to) Oracle, SQLServer, DB2, Teradata, MySQL, Hadoop (HDFS and Hive and Spark), Cloudera, MapR, HortonWorks, SnowflakeDB.
  5. Test in multiple coding languages: Python, Ruby, Rails, Java, C, C#, C++, Perl, SQL, PHP (to name a few)
  6. Test in Recovery situations: restore, and backup

Below are a list (sample list) of questions I typically ask of the change: (I track, record metrics around these items)

  1. Does it negatively impact the agility or productivity of the team?
  2. Can it be automated for 98% or better of all cases put forward?
  3. Is it repeatable?
  4. Is it consistent?
  5. Is it restartable without massive impact? (when it comes to workflow processes)
  6. Is it cross-platform?  Does it work regardless of platform implementation?
  7. Can it be defined ONCE and used many times? (goes back to repeatability)
  8. Is it easy to understand and document?  (if not, it will never be maintainable, repeatable, or even automatable)
  9. Does it scale without re-engineering? (for example: can the same pattern work for 10 records, as well as 100 billion records without change?)
  10. Does it handle alterations / iterations with little to no re-engineering?
  11. Can this “model” be found in nature?  (model might be process, data, design, method, or otherwise, nature – means reality, beyond the digital realm)
  12. Is it partitionable?  Shardable?
  13. Does it adhere to MPP mathematics and data distribution?
  14. Does it adhere to Set Logic Mathematics?
  15. Can it be measured by KPI’s?
  16. Is the process / data / method auditable?  If not, what’s required to make it auditable?
  17. Does it promote / provide a basis for parallel independent teams?
  18. Can it be deployed globally?
  19. Can it work on hybrid platforms seamlessly?

And quite a few more.  There are those out there who say: volume and velocity don’t matter…  Well I beg to differ.  Volume and velocity (data moving within a fixed time window from point to point) cause architectures, models, and processes to fail – having to be re-engineered at the end of the day.  Unless you’ve had this level of exposure (today at the 400TB + level) you would never have this experience.

If volume and velocity did not matter, we never would have seen the creation of Hadoop and NoSQL in the first place.

In conclusion…

I welcome suggestions to changing the standards – all I ask is that you put the proper rigor and testing behind the changes first.  One-off cases or one-time changes do not work and will never be accepted as changes to the core standards.  Just a refresher: I have put in 30,000 test cases between 1990 and 2001, and another 10,000 test cases since then in order to build common standards that everyone can use, and create successes in your organization.

With the advent of Data Vault 2.0 I have (finally) included the necessary documentation for the Methodology, Architecture and Implementation.  I’ve enhanced the Modeling components to meet the needs of Big Data, Hybrid Solutions, Geographically split solutions, privacy and country regulations.  The changes to the data modeling paradigm (while subtle) are important.

I did not build these standards by myself in a closet somewhere.  I had a team of 5 people at Lockheed Martin every step of the way, and no – my current competitors were NOT part of that team.  In fact, they didn’t even know Data Vault existed at that time, because it was still under development between 1990 and 2001.  That team consisted of: myself, Jack, Arlen, Jackie, and John.  All of whom worked for Lockheed Martin.   I have reserved their last names to protect their privacy.

Please note: I have just released the new Data Vault Data Modeling Standard v2.0.1 FREE for you.  You can get a copy of it by registering for http://DataVaultAlliance.com

Coming soon: Data Vault Implementation Standard v2.0.0, and a few more!!

Have something negative or positive to say?

Post a comment below, happy to hear from you directly
(C) Copyright Dan Linstedt, 2018 all rights reserved.

Tags: , , , , , ,

4 Responses to “Want to change or add a #DataVault Standard?”

  1. James Snape 2018/06/04 at 4:50 am #

    “How about a brain surgeon?”

    Interesting analogy. The key point for medical standards is they are researched, as you mention, but also published so others can try and replicate their outcomes. Its all part of the scientific process. I don’t see a lot of data vault research being published.

    James.

    p.s. Medical policies are also decided by the medical community.

  2. Michal 2018/06/04 at 6:05 am #

    Hi Dan,

    The figures and requirements seem big, but probably valid.
    If one can prove things work at that scale it is probably worth considering.

    But what does sound like a value to community around and some way to back our claims it is as great as we all think is disclose some of these test cases that you have made for DV.
    Are these just textual cases or something that could be scripted and shown to technical and business people?

    Regards
    Michal

  3. Dan Linstedt 2018/06/05 at 4:07 pm #

    True, however it’s a viable medical community with training on the standard operating procedures. Yes you are right, research is published. Part of my new initiative for DataVaultAlliance is to provide the platform to publish the research (and not just from my position) but from everyone who builds.

    Every year at my conference WWDVC.com we have independent customers, consultants, and more “publishing” the results of their research and implementations. I ensure that the presenters bring their “warts and all” – in other words: their successes, their current struggles, and failures if they have them. We share, network, and all work together to propose new thoughts, ideas, and so on. This has been going on for the past 6 years, and will only grow. Next year we are bringing WWDVC to Germany, as well as the US (again). I record all the presentations, and every year I offer them on-line for folks to watch and learn from.

    Medical “policies” may be decided by the medical community, however “Standard Operating Procedures” are fairly stable, and are decided only by a small group of officials at the top. Why? because peoples’ lives are at stake.

    It’s also very difficult to take all of the 10 years of research and design that was done at Lockheed Martin for the NSA and Dept of Defense, and get it published. Mostly because Lockheed owns that intellectual property, and after 10 years, I had binders and binders and binders full of paperwork. That said, I am doing my best to get permission to release new research, as well as old research – in the form of standards and so on. It’s coming, it is part of the value proposition of DataVaultAlliance.com

    Thank you for your feedback, it’s very good to hear.

  4. Dan Linstedt 2018/06/05 at 4:11 pm #

    Yes, there are a lot of interesting developments in automation around test cases, test processes, test procedures (standardized, repeatable, methodological) for Data Vault. They are based on the patterns that exist (always have and always will be). I am working on getting that intellectual property ready for release, but that is going to take quite a bit of effort. As far as publishing the test cases I documented at Lockheed Martin, sadly I am not allowed – as that is the intellectual property of their company. What I can do is publish the testing methodology, the processes, and the standards for developing test cases.

    It’s coming. By the way, I’ve been working with Intact Financial in Canada for the past 5 years, and helped them build their testing harnesses, standards, and procedures over that time. They have presented this year at WWDVC (the video will be available shortly) and discussed their current metrics about time savings, procedures, and best practices.

    Thank you for your valuable feedback, Dan

Leave a Reply

*