#datavault and #automation – reduce and reuse

There is a lot of talk about Data Vault automation.  In this post I discuss the easiest method possible to automate the loading of Data Vault.

The problem…

When you build ETL / ELT load routines for Loading a Data Vault, generally, you end up with hundreds of processes.  At least one per source field/set of source fields that are tracked to each target.   So if you have the following source table:

NDC_FIRMS
firm_name
firm_address
firm_city
firm_state
firm_country

You end up with a minimum of the following load objects:

  • Load Hub Firm
  • Load Sat Firm
  • Update Sat Firm End Date

So, a minimum of 3 ETL objects (data flow designs) for one single source table.  So if you have 25 staging tables, you end up with a minimum of 75 ETL objects.

So how can we reduce the number of ETL objects and still maintain the loading cycle?

The solution…

It’s really quite easy if you think about it.  The solution is: parameter driven ETL routines. Yes, very easy.

So, you end up with the following patterns: (requiring only one load map per pattern)

  • Load Hub
  • Load Link
  • Load Satellite
  • Update Satellite End Date
  • Update Last Seen Date Hub
  • Update Last Seen Date Link

Now, by parameters – I mean very simple parameters.  The following parameters are necessary:

  1. the SQL statement to get the columns out of the source staging table
  2. the cross join parent lookup SQL statement (to find the parent Sequence Number)
  3. the Insert statement or a dynamic target name – so that the target table can be changed on the fly
  4. in the case of updates – the dynamic update statement.

There are a few more pieces to get this going properly, but it can be done in Informatica, Kettle, SSIS, SQL Procedures, and I’ve been told it should work in Data Stage too.

At the end of this, I end up with maybe 20 or 25 ETL “template” objects, and as many parameter files as I need to move the data from source to target.  Each parameter file may have up to 4 SQL based statements.

In case you are interested in downloading a working example on how to make this work, you can reply here (let me know you are interested).  My newest on-line class over at: http://LearnDataVault.com/training/ will be covering this in detail.

PS: Data Vault 2.0 specifications help with performance and consistency.  The DV2.0 specifications really help make these topics work much better, and I hope to release them in January 2013.

6 Responses to “#datavault and #automation – reduce and reuse”

  1. ramces 2012/11/26 at 7:59 am #

    Hello,

    I’m interested in this part.

    Thank you for all

  2. Pierre 2012/11/26 at 10:21 am #

    Hi Dan, I’m interested in downloading a SSIS working example. How can I get the code downloaded?

    Thank You,
    Pierre.

  3. dlinstedt 2012/11/26 at 10:34 am #

    Hi Pierre,

    The code / working examples will be included as part of the on-line class. Once I have the class / lessons posted, you can purchase access to the on-line lesson, which will give you the working example (along with the video that shows you how to actually make it work).

    I hope to post the first class (using Informatica PowerCenter 9.x) + examples, by the 15th of December. Shortly there after (in January) I should be able to post the SSIS edition of the class.

    Thanks,
    Dan L

  4. dlinstedt 2012/11/26 at 10:35 am #

    Hi Ram,

    Thanks for your feedback. You can watch the release (or register for the mailing list) by going to http://LearnDataVault.com

    The class will be for sale (for a reasonably priced fee) very very shortly. I hope to have one class for PowerCenter, one for SSIS, and one for Pentaho Kettle on the site.

    Thanks,
    Dan Linstedt

  5. Jonathan 2012/11/26 at 10:36 am #

    Hey Dan,

    I am very interested in reviewing a working example of this concept. Penu and I have been floating around ideas on how to do this, and would like to see your approach.

    Thanks,
    Jonathan

  6. dlinstedt 2012/11/26 at 10:38 am #

    Hi Jonathan,

    Basically (for all intensive purposes), I will be posting this as an on-line class with working examples.

    In reality this makes “nearly all code generators for loading Data Vault” – obsolete.

    Instead of “generating the code” for each of the ETL tools, now all you need to generate are the parameter files.

    Makes it REALLY simple, and easy to maintain. EXCEPT in one particular area: loading Satellites. This can become difficult, but there are several solutions to this that you might like. I will be covering all of this in detail (pros/cons/risks, etc..) in my upcoming on-line class.

    Thanks,
    Dan Linstedt

Leave a Reply

*