Q&A:Are there Hardware Dependencies for DV?

there literally are no hardware dependencies for using the data vault model, but you still might want to read this entry… i cover why, and expose a small bit of the databases and hardware underneath.  if you have comments, thoughts, or other experiences i encourage you to add your comment to the end of this posting. 

as you know, i’ve launched one-on-one coaching, this is the kind of knowledge you get within the walls of one-on-one coaching. but for today  – i’m giving you the answers free to show you the kind of value you get when you sign up for my coaching sessions.  contact me today: coach@danlinstedt.com for more information.

what are the hardware dependencies of this approach?

none.  absolutely none.  the data vault model is a data modeling technique.  like any “data modeling technique” the data vault favors mpp hardware for scalability – but only if or when you actually have “big data needs”.  when i talk about big data, i’m discussing data sets that move 1tb per hour, and that have in excess of 80tb in their data vault already… 

at the same time, i have a 30gb data vault on a sqlserver instance, windows home vista, amd turion 64 bit – and it’s just fine.  i’ve built data vaults on 2cpu oracle boxes, linux hardware, standard servers – with up to 5 to 8 tb of data, and they run fine…  so there are no hardware dependencies.

by the way, the world’s smallest data vault (and gives you business value) is one hub + one satellite.  talk about scope control.

Tags: , , ,

2 Responses to “Q&A:Are there Hardware Dependencies for DV?”

  1. Marcel Franke 2011/03/26 at 6:27 pm #

    Hm…I guess depends hardly on the performance requirements. If you need to get out data quickly, and this is very often the case, and if the data model has over 100 of tables you will have lot’s of joins. To perform quick queries you neeed a good I/O system and doing this on a SMP system with a 1TB of data in the warehouse can be very frustrating.

  2. dlinstedt 2011/03/27 at 8:15 am #

    Hi Marcel,

    Please read through the new technical book, in it I discuss the nature of two query-assist tables called Point-in-time and Bridge tables. These tables “span keys” of queries and greatly reduce the number of joins needed to get data out. These types of tables are not necessary within systems like Teradata where the natural parallelism of the engine, along with the performance characteristics do not require “helpers” to pull data of large sizes and quantities across hundreds of terabytes of information.

    One more thing about the General Data Warehousing in large data sets, 75% or more of the queries that “go after large data” generally are deep queries, that involve relatively low numbers of tables. They usually are NOT wide queries that tend to be more shallow. If you’re doing wide queries for data mining, then you should “take the data from the Data Vault” and prepare it by releasing it to a data mart or exploration mart for just such purposes.

    So the argument about the number of joins in the DV architecture being “too many too support large data sets” is really a moot point, unless you are on an underpowered system, or you are trying to “deliver” virtual data marts direct from the Data Vault to the business on underpowered hardware.

    Hope this helps,
    Dan Linstedt
    My new tech book can be found at: http://learnDataVault.com

Leave a Reply