i’ve been pondering this for years, and i’ve written about it (as have many others) over the past few years… now i feel the urge to post about it. the term “data warehouse” and data warehousing may be old, but where we’re headed is new… or is it?
when data warehousing first evolved it split off from transactional systems in order to reduce load, store history, and provide better reporting capabilities. along the way, it picked up massive amounts of data, shifted to huge integration, and storage of raw data instead of “business altered data”. in other words, it evolved in to a system-of-integrated-record for historical storage of facts.
for many years, people were satisfied… and said: “let’s grow bi and analytics…” well now, through the years we’ve had shifts, changes, bumps and bruises along the way. people demanded inflow (still are) of real-time data, transactional data to mix with existing history. now, because of regulations and everything else (unstructured data, multiple sources, transactional feeds) and even self-service bi (the rise of this anyhow) i believe we’ve come full circle.
what i’m referring to is what i’ve written about… which is: the convergence of data warehousing with operational systems (again to return to the oltp side of the house). why? what’s happened that brings us back?
well, to be quite frank: the hardware, the infrastructure, the speed of transactions, and the capabilities of analytical applications. of course, don’t forget that “data warehousing” as an industry has finally matured, gotten standardized (or is in the process of standardization), and can (or will be) (mostly) automated within the next several years. the business rules will shift back to the hands of the business user that will drive self-service-bi to become mainstream. raw data will be used in ways we didn’t even think was possible.
but enough of the rambling… the big news? convergence – yes, convergence. the data warehouse will no longer exist as it’s own platform on it’s own hardware. because of cloud technologies and virtualization, companies will no longer care where the data warehouse lives, but will begin caring more about “how fast can this historical store deliver?” furthermore, “how fast can it adapt to changes to the source systems?” and to do that, it must re-attach to source systems and be “loosely coupled” (at least seamlessly coupled) to receive the transactions at the same time as the source systems.
now, will the convergence be in the database systems itself? probably not (not yet anyhow, that may come soon though) – mostly because the types of queries are different (oltp demands wide – lots of transactions, joined together). data warehousing usually demands deep queries (lots of historical data). until the database manufacturers, and hardware manufacturers can solve this problem, full convergence at the hardware level may not be done.
but wait… back to cloud – the point is, with cloud and virtualization topped with an out-of-the-box analytical application, the business users won’t care where the data lives… they just want it all seamless, and they want control over how to extract and manipulate it. we still have the massive problem of integration to solve, especially with hoards of historical data, and mis-formatted external data sets.
but convergence is happening, and will continue to happen. just look around… what do you say to a system that uses hadoop? or a no-sql database that absorbs files and makes them easily accessible/indexed? or how about the pre-packaged analytics apps that come from sap or oracle, or others?
am i saying that data warehousing is “dead?” far from it… i blogged on that one just hours ago, no – it’s not dead, it’s just getting “pushed” to the back-office for automation where it belongs! it’s eventually going to live on the enterprise service bus as just another publisher and subscriber to the messages streaming by. it will be plugged in to the cloud, it will be virtualized in some cases – but who cares? as long as it’s accessible, and the business user can get to it via a packaged application.
this is where i see it going, convergence… yes – not quite 100% (and i wonder if it ever will be) – i don’t think it will be, because of the difficult nature of integrating external data.. but that’s a story for another day.
what do you think is going to happen in 2012? do you have some ideas/thoughts or comments? love to hear from you.
ps: you can learn more about automating your back-office data warehouse at: http://datavaultalliance.com