What’s next after the Data Vault?

i often get this question, people seem to want to know what i’m working on, what i’m thinking about, what i’m creating. i’m honored that people want to know, and for that, i thank-you. so on with the topic. in this entry i will be describing what i see as the “future of data modeling?” or, in other words: what’s next after the data vault?

some of you who know me understand that i’m working on many different things right now. one of which is the ability of a machine to “learn” about structures – how to modify them on the fly, how to “think” about context, and how to correct processes and access points dynamically. by making the machines “smarter” or more self-adaptable, we can move more processes to the back-end hardware. things like nosql databases are making this possible (see wikipedia for more info).

anyhow, the world is changing – as always, faster than ever before. for the data modelers out there, or even the architects in the crowd, here are some thought provoking (game changing) ideas that i’m working on. i predict that these will come to pass in the next 5 years (or less!). the ways of “old data modeling” are dying, and here’s what’s happening:

  1. physical data modeling is disappearing, and will be fully “gone” when relational database systems morph to nosql style systems/interfaces. physical data modeling, and the art of representing storage for tuning, performance, and other controls are being taken over (as they should be) by mathematics engineers at the hardware level. hence the rise of nosql style database engines.
  2. logical modeling will morph away from “structural understanding” in to “categorical understanding”, the foundations of which will be “ontology based work”. i believe it will become 100% ontology and business based “categorization and hierarchy” with rules and processes governing the relationships and data sets as they move through, around, and across ontologies. this lends itself to semantic definition, which is a huge asset to machine learning and data understanding.

my view is that modeling will continue to be very important, but the game is changing. understanding logical concepts and appropriately assembling blocks of categories for a business operation will become the foundation of where we go and how we govern our systems. furthermore, this will be done at a business user level – no longer at an it function, with hardware that will adapt automatically to “moving/shifting” the data sets as the paradigm changes.

what??? this coming from a guy who created the data vault?
yep.  you heard it here first folks, eventually the data vault concepts will have to morph in order to survive too.  everything, and i mean everything that has a data touchpoint will morph to handle business user ontologies (metadata) with “drawing based” 2d/3d interfaces.  the business users will interact with their data sets directly.  by the way, if you don’t believe me, just look at microsoft, oracle, ibm, teradata, and other nosql database engines.  they are already changing their bi interaction levels to focus more on metadata.

microsoft has a really interesting modeling concept on the way that will shift the playing field even more.  they currently have demo videos on technet that show “image browsing and collection manipulation” through icons and colors.  there’s another company: “visualcue”  http://www.visualcue.com that is doing the same thing for business users and levels of understanding.  and of course, there’s this company i blogged on yesterday:  http://www.futurepointsystems.com

and if you still don’t believe things are changing, just look at this article:  http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

a word to the wise:  learn, adapt, change – and put the power back in the business users hands – in other words, if your it today, study business.  add mathematics or chemistry or physics to your profile, get some background in neural networking, vldb/extremely large data sets, and artificial intelligence and you’ll be set for the future.

i think the reasons for the data vault model direction are clearly stated here:

at the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. it calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. it forces us to view data mathematically first and establish a context for it later. for instance, google conquered the advertising world with nothing more than applied mathematics. it didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. and google was right.  http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

but wait!  i thought you said “data modeling” is dying…  why then use the data vault?

physical data modeling is dying. logical data modeling will be around with us for a very long time.  in case you haven’t noticed, or haven’t bothered to look at the core concepts of the data vault model – you will have missed one major theme:  the data vault model is based on ontological categorization!!  separation of the relationships from the term representation and definition.  this is exactly what owl and rdf discuss.  the ability to “represent” any order of magnitude ontology (scale free network) in a model that provides ultimate access by the business users directly to the data set.

whoa – too much to handle, you have said for a long time: never allow business users into your data vault, what’s changed?

correct.  today’s data vault is still technically modeled – and therefore: hard to understand.  but: put a logical ontology of business terms on top of the dv physical structure, and visualize it… voila!   you have an n dimensional space that can be utilized effectively directly in the bi layer.  add to that, data mining, ai, cluster analysis, etc.. and you end up with: http://www.futurepointsystems.com

the game is afoot folks, the paradigm shift is happening whether you “like it or not”, and if you don’t acquire the new skills i’ve discussed, you’ll end up out of a job (just my 2 cents anyhow).  sure, there will be years to get there (like anything else), and no it won’t suddenly happen overnight.  it’s a gradual shift that’s coming, and i like to be on the forefront riding the wave.

if you end up being left behind in this, you’ll start asking: dude?  where’d you park my data warehouse?

dan l

ps: if you’ve got ideas or comments, i’d love to hear from you.  it’s free to register and comment.

Tags: , , ,

No comments yet.

Leave a Reply