Operational Data Vault

Q&A:ODS as a Data Vault – working with BI EDW?

this is a very short entry that discusses operational data vault (or ods modeled in a data vault method).  there are a lot of moving pieces to this concept, this is one thing that i encourage you to seek my assistance with – it is not easy.   if you have comments, thoughts, or other experiences i encourage you to add your comment to the end of this posting.

how does the ods in a data vault model implementation work with the bi edw?

ods in a data vault model implementation is similar to operational data warehousing. the data vault model can be effectively utilized in the context of building an ods (operational data store).  that said, there are some caveats, changes, and things to be aware of if you are going to go this route.

#1: the data vault model (v1.0 and v2.0) are targeted at capturing history.

the ods – typically only captures transactional history, all other information (generally) that it captures or is supposed to capture is up to date, and contains no history.

because of this, the dv  model (if utilized as an ods) needs to be slightly modified as follows:

  • the load date in satellites, is moved out of the pk and in to the attribute list
  • the load end date in satellites disappears
  • more often than not, the links “act” and are modeled as transactional links.

this satisfies the “no history” clause, as well as the transactional history clause of the pure definition of the ods.

why “no history” in the ods?

generally, the minute you put history (descriptive in nature) in the ods, it immediately reverts to the data warehouse world.  why? because it inherits all the problems of meshing, managing, producing a historical view for queries – and it will over time, as it collects history, cause problems with query timing.

please do not confuse the ods with the data warehouse and the nature of their tasks.  even “limited history” in the ods (like the past three days etc…) can cause issues with synchronization, and make time based queries problematic.  (i will explain this shortly).

#2: the ods isn’t an ods unless it is capturing & producing (in real-time) enriched operational transactions.

let me explain…  a data warehouse (particularly a data vault), generally is not an operational data warehouse (although as of 2010, these situations are becoming more and more common).  so in this case, the ods is meant to be the “mother of all source systems”, ie: a landing and enrichment zone, for all source systems to reconcile their data set.  this, oddly enough, blurs the lines between an ods and what a master data system is supposed to do.  (the differences here are not the focus of this post, however i will say this: master data isn’t master data without accurate and up to date metadata lineage and forced source system alignment).

#3: objectives for data access might be different between the ods and the data warehouse, and variable query performance *may* be an issue

generally, the requirements for an ods are different than a data warehouse, and occassionally (although less-so these days), the hardware and software underneath can struggle with performance in responding to ad-hoc queries for both “analytical deep data over history” along side of the typical ods ad-hoc query of “get me the most recent data across the board for this business subject.”

these are just some of the pieces that make an ods “different” from a data warehouse (any data warehouse, data vault or not).  however, all of that said, let’s now chat about how & why you can integrate the two by leveraging today’s technology, and a single data vault model.

enter: the operational data vault system

the odv (or operational data vault) is, first and foremost, and operational data warehouse.  it inherits all the requirements and all the necessary restrictions that a true operational system has – including but not limited to: up-time, resiliancy, reliability, hot backup & restore, redundancy (at the system level), resource loading, and guaranteed query performance response times.

that said, as long as you accept these principles, you can (and many systems have done this since 2012) build a single data vault model that houses both the “ods” view of the business and the data warehouse view(s) of the business.

this is called the operational data vault.

due to the fact that the data vault standards dictate that we store the lowest level of grain, and raw data (unmunged, unchanged) data in the vault, we can then proceed to hook it in as a point of  capture and historical storage for all operational data.  what that means business process-wise is that the odv *must* be enabled and hooked in to the message queues, to receive & send real-time data / exchange real-time data with the operational systems directly.

once this has been properly accomplished, the transactional data across the enterprise can be subscribed to and enriched when needed, by the operational data vault / operational data warehouse.

now, what would stop you from doing this? or succeeding properly?

for starters, the infrastructure of the organization – if it is incapable of delivering and/or utilizing real-time message queues, then the point of an ods (data vault enabled or not) is really a moot point.

second: the systems capabilities which would house the operational data vault.  if these systems are taxed, overflowed, or simply can’t perform with guaranteed response times in an operational sense (because remember they are collecting operational data with a full historical timeline), then an odv may seem like a good idea, but in fact, will be detrimental to the organization – because the organization is unwilling or unable to “turn the entire bi/edw in to a fully operational system” with all the requirements.

please note, that is all the time i have for now to discuss this subject.  if you are interested in additional details, please contact me.  i’d be happy to help you set this up in your organization if you are interested.

hope this helps,

dan linstedt

Tags: , , , , ,

No comments yet.

Leave a Reply