this entry takes a step far back in time, to when i created the data vault model. i will share with you the thoughts around it’s creation, it’s intention and it’s original purpose. the reason why this entry is here, is because i’ve begun hearing that: “the data vault model is not business focused” – and i want to put that notion to bed, immediately – because it simply isn’t true. if this is what you believe about the data vault model, then a) you’re not building it right, b) you don’t fully understand it’s purpose c) you might never have taken the time to take an authorized training course from me or one of my authorized trainers.
data as an asset on the books…
one of the original intents of the data vault model was and still is, to provide value to the enterprise – in terms of data as an asset on the books. something that can be assigned a monetary value, and then depreciated over time. now, how that is done is a very very difficult task (actually assigning a dollar value). i will leave those thoughts for another blog post. returning to the original premise…
hypothesis: in order for raw data to have value, it must: a) be tied directly to business processes, b) must contain enough completeness so that when a human looks at the data, they can make decisions about how to turn that data in to information (make it available to business decision making). after all, this is where perceived value comes from – when the impact of the business decision is felt and measured against the profitability and overhead curves of the business.
supporting the hypothesis…
this means, in order to tie data directly to business processes, it must be linked (in some fashion) to business keys. why?
imagine a ferrari, that at the dealer has a value (sticker price) of $245,000. you purchase the car, and the dealer says: oh, wait, we lost the key. what is the value of the car to you as an official and registered owner of the car, while the key is lost?
once the key is found, and you drive the car home, the value of the car with the key (while it’s still on the dealers lot) is the purchase price of the car.
business keys are important, very important…
business keys are the unique identifiers which the business people use to find, assign, modify, create and enrich descriptive data to. business keys are the unique identifiers which help the business people and the machines / applications tie the data together, and track it horizontally through the business (from sales to finance to contracts to manufacturing for instance). without business keys, the value of the data itself is drum roll please… zero.
by the way, all of this and more is taught in any of my authorized cdvp2 (certified data vault 2.0 practitioner) courses)- you will not get this from anywhere else! check out more at: http://datavaultalliance.com
business keys are everywhere…
ever had a bank loan? what was the loan number? how about a wireless phone? what was the phone number? how about a water or electricity bill? what was your customer number? what about a contract with someone, what was the contract number?
you don’t have to look very far in the real world to find your business keys. the very companies you work for use them to track associations of data directly to you. now, what’s the value of you as a customer for one of these companies? the answer? it varies over time – depending on how much money you invest in them, or how much money you pay them monthly or over the life of the contract. but that’s just the measurable value. then, there’s the intrinsic value – because if you’ve spent money with them before, then chances are pretty good (unless something bad happens) that you will spend money with them again.
however, if you sign up with them again, you will most likely be assigned a new business key. this key will be used to track your information back through the business processes.
ok, fine. what’s that got to do with data vault?
the data vault model in it’s original form, is built to track business keys and their surrounding context through the lifecycle of business processes. the business processes are executed by “source system applications” – they constitute the reality (not the perception of how the business should work) – but the reality of how the business actually is working. so that said, we arrive at our first construct: the hub.
the original hub in the data vault model (when built properly, and not containing degenerate or weak keys, as some out there on the internet would have you believe).. actually was just the business key. you heard me right, no load date, no record source, no primary key (surrogate or hash). it actually was, a purely all natural original business key value. the hub was just that… a hub (a unique list of business keys).
now, you think: well, there’s all this debate around surrogate vs hash keys, and debate around how to use the load date, and how to use the record source. let me stop you right there – none of these fields hold business value except: when it comes time to troubleshoot the arriving nature of the data.
that said, the original satellite had a “replication” of the hubs natural business key, combined with a load date – this had to be done in order to provide history, or data over time. again, the original satellite definition had no record source, no load end date (which are dead now anyway), no current record indicator, or anything else… just the key structure plus the descriptive data that arrived on the source feeds.
which brings me to the link. the original link had (again) a replication of just the business keys from two or more parent hubs. no load date, no record source, no hash, no surrogate.
well, why is that important?
because folks are now arguing with me, and with others over “hash key versus surrogate key” – when there is no true business value there, and it is not related in any way to the original purpose of data vault modeling.
why are hashes or surrogates used as primary keys then?
one word answer: performance. in reality, the surrogates (if you still use them or insist on using them) provide join performance – especially if they are clustered on non-mpp solutions. on mpp, they really don’t matter, as the internal optimizer changes the values over to a different representation to find the data on a node/amp/module, etc.. hashes, on the other hand (as explained previously) are there to solve heterogeneous cross-platform load performance bottlenecks – eliminating all the caching lookups that take place. i refuse to get in to the technical details any further on this blog entry. search my blog for many many posts about hashes, sequences, joins, bottlenecks, etc…
the point is: the original model never had these “technical performance components anyway” – so why are we still arguing over what to use? we are losing precious valuable time when we should instead be focused on solving business problems.
so now tell me, what’s this got to do with data as an asset?
right, back to the business. the original data vault model is positioned to help categorize and place in to a basic flattened hierarchy, business keys – and their surrounding context. the original data vault model provides pure business value by demonstrating the gaps between the business perception (of the way they think their business is running) and the reality (the way the data is actually captured and processed through business processes and systems).
wait a minute… you’re telling me there’s gold in the raw data?
yep yep yep… especially when you build the correct data vault model. if you build a source system data vault model, the value of the solution drops to one tenth of one percent overall. integrate your business keys in your hubs – properly to achieve maximum asset valuation.
what’s the best way to build a data vault model?
don’t start with the data vault model, start with an ontology of business terms. a properly built ontology of business terms (when taken to the right level of grain) can and should identify: (you guessed it!!) business keys and their relationships. a properly built ontology can even be assigned business estimated intrinsic value (ie: how much money would we lose if we had this key blank?, how much money will it cost us to have duplicates? how much money will it cost us if the data is incomplete or wrong?) these valuations can be assigned, (and yes they need to be maintained through good data stewardship, and data governance. but… you can take a properly built ontology and generate the right data vault model. yes, automation and generation.
this, leads ultimately to tying direct dollar figures to the data sets through a proper ontology and data governance strategy. this is something that one of my earliest customers: qsuper in australia has been doing for years, all on data vault 2.0, and yes – with hash keys. this, is something that can make sense to the business.
at the end of the day, you still should be constructing some form of a business based output layer. should it be a business vault? maybe – depends on the case. it some cases today (more and more frequently) we are constructing virtual business views (virtual marts) directly on top of the raw data vault. and if you’re unsure about this part (performance wise), then read up on point-in-time and bridge tables – the two most misunderstood, and misused, and misapplied modeling techniques in the data vault landscape today.
but i digress… the summary of it all please..
here are my points:
- data vault modeling never was about the “source system”, never was about the sequences vs hash key battles
- data vault modeling was, is and always will be about the business. and if the data vault you have in place today is not currently about the business, then unfortunately you’ve hired the wrong people, and those people need to go back to school and re-learn what data vault really means. or you’ve built the wrong solution, and you need to fix it – immediately.
- there is value in raw data – when integrated by business keys. think gap analysis!
- there is a need to understand truly what a hub is and is not, truly what a link is and is not, same with the satellite. a link can never be a hub – sorry, that’s the way it is.
- there is a need to tie data as an asset back to the business, this is done through the business keys.
- ontologies are a very very important asset to the corporation – if built at the enterprise level, you must focus on ontologies while you are building the data vault solution, or the full value of the raw data vault cannot be realized.
there will be more on these subjects as we go forward.
thank-you for your consideration, as always, i’m open to your thoughts.
(c) copyright dan linstedt 2016 all rights reserved.
ps: if you want to offer a negative comment, or a differing viewpoint, that’s fine – but please don’t hide behind anonymous emails, don’t be afraid to tell the world who you really are, and take a stand by golly.