I recently received the following question about an Operational Data Vault, to which I will attempt to formulate an answer: This customer is using CDC to transfer source system changes to a staging area then a scheduled process (every 5 mins/24 hours) runs to populate the data vault. It was developed to process each table independently from all other tables, so there is no dependency issues.
A solution architect has a design that calls for “real-time” loading of an ODS which will be in the implementation of Data As A Service which will use the Tibco bus to publish data to applications to consume.
Could this ODS be the data vault layer of the data warehouse if we sped up the loading of certain data – maybe use triggers on the staging tables?
Triggers on the staging tables are usually bad ideas – especially in a batch situation as they would slow down the batch loading process by a factor of 10x or more. However, triggers in an ODV staging area should be just fine, as triggers are meant to be operational and transactional in nature, and it sounds as if the data will be flowing as a transaction into the ODV staging area. If the transactions are small, and they flow quickly in & out of the staging area, then yes – you can use the ODV staging area with triggers.
But the question is deeper than this… The question that really should be asked is: why isn’t the real-time loading process sending data directly to the Data Vault in the first place? What are the drivers/needs to place the data in to the staging area before feeding it to the Data Vault?
If it’s for reasons of backup and restore, then yes – the staging area can exist for a rolling backup (hot backup) piece and it makes sense, if it’s to join the data with other information, then yes – landing in a staging area is a good idea.
I would suggest the following solution:
Run the data to the staging area for backup, but in PARALLEL, run the data directly in to the Data Vault without stopping at the staging area; this way no triggers are necessary. IF however, you have requirements that dictate the need to stop in a staging area anyhow, then it would be ok to use triggers on the staging tables to move the transactions in to the Data Vault at the time of arrival.
Anyone else have thoughts on this process?