this entry discusses the age-old-question of etl performance – a data vault approach and a traditional approach. if you have comments, thoughts, or other experiences i encourage you to add your comment to the end of this posting. as you know, i’ve launched one-on-one coaching, this is the kind of knowledge you get within the walls of one-on-one coaching. but for today – i’m giving you the answers free to show you the kind of value you get when you sign up for my coaching sessions. contact me today: email@example.com for more information.
how does etl performance compare to a traditional approach? qualify
first off, there’s no “real” way to compare these things, why? can you define for me what a “traditional approach” is? every data warehouse i’ve ever worked on (except data vaults) all claims to have their own special sauce, their “we did it here because….” – no one has a traditional approach, and i wouldn’t even know where to go to get a definition for this. anyhow, from the heart here are my thoughts:
the etl / elt performance for loading a data vault screams. there’s no better way to put it. on the right hardware (as close to a comparison as i can get) – a full data vault loading cycle might take 40 or 50 minutes total time, (50 staging tables, 1 terabyte incomming data) – where a so-called “traditional” (type 2 dimension + facts) loading cycle for the same data sets would take about 3 to 4 hours…. the data vault model is built on patterns. put simply this makes it powerful. the hubs, links, and satellites isolate the work (divide and conquer) so that it can all happen at once. and it can all happen in parallel… i’ve blogged, and blogged, and blogged on this – and yes, inside my coaching area i’ve built answers to all the secrets, i explain why this works.