in my previous entry i spoke about the nature of data lakes, and data swamps. i introduced two central themes to making unstructured data and multi-structured data useful (ie measurable as an asset to the business). in this post i will dive a little deeper in to the notion of business keys, why they are important, and the kinds of business questions you can and shouldbe asking from any business intelligence solution.
if you go to a ferrari dealership and you purchase a car for $250,000 usd, what is the value of that car the minute you purchase it? well – that’s a no-brainer… $250k (or the price you paid for it).
now let’s say on the way home, you stop at a convenience store, and you lose the key to the car. the car is locked. while the key is “missing / lost” – what is the value of the car to you?
right: zero, the value of the asset is zero dollars until the key is found or a new key is made.
what if.. the car you bought had no (vehicle identification) vin number, and what if the ferrari dealership made deals on a handshake with no paperwork? how could they make a new key for your car?
wait, a vehicle without a vin? impossible… right, you say that now – but before the standard (there’s that other ugly word again), motorized vehicles were not marked with vin numbers. in fact,if your ferrari had no vin, and the dealership made deals on handshakes – the fact would be: they could not make a new key for your specific car. there would be no unique identifying information that would allow them to do so.
which brings me to this point:
with no working natural business keys, the valuation of your data as an asset is near zero.
wait a minute… did you just say all that data that we’ve collected in our data sewage is useless or valueless?
no – not what i said: there is useful value in the aggregation of the details, or looking for the outliers across the aggregate sets. but that’s a different statement, and yes – will be covered in another blog.
what i said was: in order to attach a true monetary or intrinsic value to a single set of data (old-school row of data, new school record or file of data) you must uniquely identify it. by uniquely, i mean across all the data that you currently have stored in the data sewage. once you’ve done this, you can begin to separate true data sewage from other data living in a data lake.
speaking of data lake, did you know
even real-world lakes are stratified? yep. they have their own ecosystems with sludge and toxic gasses at the bottom of the lake (where stuff goes to die and rot), as you get closer to the top of the lake, the life is more vibrant, the water is cleaner. but i digress – i will cover data stratification and valuation in yet another post.
have you ever heard a business user say to you:
well i don’t have that problem, i have working surrogate keys…
ok, if that is true, then you either have a single source system with all your enterprise data in it (remember when sap tried to be one-size-fits-all?). or you have properly identified these “surrogates” as master keys – they never change, they never get re-used, and they are consistent across all your enterprise source systems.
without true natural business keys…
it is near impossible to trace the data from one business unit to the next. it is near impossible to classify and organize the information (although without keys, tags do pretty well here) – but the tags must be relevant to the search or the question you are trying to ask. so what kinds of value based questions in business can you ask if you identify true natural world business keys?
what does businesscare about?
scenario: 7 sectors of business: sales, contracts, finance, procurement, planning, manufacturing, delivery. each sector uses it’s own source system application, to manage it’s business. the customer buys rockets. these sales are worth 150 million and up, sometimes a single contract can be 1 billion dollars. the business is interested in cycle time reduction and lean initiatives (i highly recommend you go learn these terms, you will be more valuable to your business users if you understand these principles).
the business wants to answer the following questions:
- i need to see a bill of materials for: as sold, vs as contracted, vs as financed, vs as procured, vs as planned, vs as manufactured, vs as launched.
- i need to know how longdoes a customers’ contract sit in finance before it is passed to procurement.
- i need to know how many times does a customers’ rocket get passed from manufacturing back to planning, back to manufacturing. i need to understand why this is happening for some rockets and not for others.
- i need to know how many customers i have for the entire company?
- if i change the business process (change the way we procure contracts), what is the impact to the planning and manufacturing and delivery cycle? do we make rockets faster or slower? do we have better quality rockets or more failures? did i make the business process more or less complex?
of course, the list goes on, and on and on. these are the kinds of business questions that business users want – they don’t care if they get it from a data lake, a data swamp, or a relational data warehouse.
for this purpose, the customer / our stakeholder must be able to uniquely identify each contract and each customer (this is the true nature of master data. they must be able to trace these contracts, customers, and bill of materials through and across each line of business irrespective of source application.
one could argue: “data mining can coalesce (conform) these records together for dimensions” – ok, yea. but data mining has the propensity to change the answer each time it is run, leading to inconsistent results. one then could argue: we can do it statistically, (ie: statistical recognition of similar / like records – then run the risk of putting two different records together, that don’t belong together).
you see? without the business key, all tracability across the source system is completely lost. note: this is still just raw data we are talking about here – attempting to turn data in to an asset through unique identification.
let’s wrap this one up shall we?
please pay attention now, this is the foundation for turning your data in to an asset that is quantifiable by business…
- if you want to build master data, you must have consistent business keys
- if you want to find, trace, and track your data records / data files as a single unit of work, you must have consistent business keys
- if the value of the contract is $1 billion, and your customer calls you – but you can’t pull up the record because you don’t have the key – will you lose the customer? if you lose the customer, what is the value of that contract to your company while you had it? right: $1 billion. hence, what’s the value of the data identified with that business key? right again: $1 billion.
- unique business keys are the only link between different lines of business, multiple systems operating on that data, the business processes, and the business user. sure, tags exist (first & last name, name of rocket, name of project, etc…) and if tags exist you have other ways to “help” you along the way, but to definitively identify a set of data is to attach a full working, unique business key.
this is the first working step to a data vault model (whether it be data vault in nosql, or data vault on hybrid, or data vault in relational only) – the value of a data vault is in helping you identify your data with business keys, and is the first step in turning your data in to quantifiable and measurable assets.