Well well, we’ve come full circle haven’t we? There’s an interesting (yet long and dry and somewhat technical) explanation of in-database analytic technology here. (Sybase IQ, Forrester, Fuzzy Logix). But I have my own opinions. I’ve blogged and written about these topics for years on B-Eye-Network.com and in Teradata Magazine. In this entry I will explore the meaning, the consolidation, and the relationships to the next generation EDW – which I have also dubbed: Operational Data Warehousing. continue
I’m always on the lookout for ways to improve the understandability and readability of the Data Vault, especially when it comes to solving business user problems or issues. As you know, the Data Vault model relies on the best possible business keys being chosen for Hubs. Knowing your business data, understanding the business processes, and linking the two to the data warehouse, all happens through business keys. The more consistent the business keys are (across lines of business and through the business processes), the better the analytics will be, and the better the Data Vault model will be. In this entry I will be diving in a little bit to the technical aspects of unique identification. I will discuss MD5, hashing in general, uniqueness, and sequence numbers. continue
Posted by (0) Comment
I’ve been researching Solid State Disk and it’s impact on Data Warehousing / Business Intelligence for quite some time. Quite frankly, I should have seen this sooner! Anyhow, in this entry, I’ll dive in to the SSD technology and discuss it’s impact on the future of what I think it could do for Data Warehousing appliances, data modeling, and databases. I’m going to touch on things like performance, scalability, and making the most of your existing hardware and database system (sunk cost), so read on! continue
Posted by (0) Comment
Just released a new draft of my Data Vault Modeling book to my students in the coaching area. It’s getting close – 2 chapters left to write! Sign up for coaching today, and get access to the book NOW.
Posted by (2) Comment
I’m often asked about the Data Vault, and the Staging Area – when to use it, why to use it, how to use it – and what the best practices are around using it. Those of you who’ve been through my training, understand that there is a LOT of ground to cover, and I cover all of this (and more) with examples, inside my one-on-one coaching area. That said, I will answer some of the above questions here in this brief post. This post is focused on batch processing and micro batch processing. This post does not answer the real-time feed questions. continue
Posted by (1) Comment
Many have been talking for years about the ability to capture RAW data, add to that the categorization of different taxonomies and ontologies (meta-data defining the raw data); in order to spin the information differently according to different needs. Not just for accountability and reliability, but for the ability to process the information how you want it, and when you want it. Well, it’s interesting to note that maybe I’m really not that crazy as I thought I was… It seems there are a few others out there wanting this to happen too, and guess what? They’re BUSINESS users!!
Posted by (3) Comment
In a recent post, I described loading patterns for Data Vault and how this might work going forward. There was a question asked about comparing the loading patterns for loading source->staging->DV per system versus for all systems simultaneously. Within the coaching area I go through these topics in great detail, and discuss the pros and cons, the different options, and also teach how-to make this a success. In this blog entry, I will highlight some of the issues, and provide a few hints and techniques on making each one work. continue