many have been talking for years about the ability to capture raw data, add to that the categorization of different taxonomies and ontologies (meta-data defining the raw data); in order to spin the information differently according to different needs. not just for accountability and reliability, but for the ability to process the information how you want it, and when you want it. well, it’s interesting to note that maybe i’m really not that crazy as i thought i was… it seems there are a few others out there wanting this to happen too, and guess what? they’re business users!!
i just got done reading this really interesting (yet short) article called big data and boundary spinning. really interesting article that eludes to a business user/scientist who works with imputed values, fuzzy logic (data mining), and loads of immediately available raw data. there are several things there that struck me as curious:
1) big data, and i mean big… i’ve also been reading other articles lately which talk about google, yahoo, youtube, and on and on, having 2.5 petabytes, and more… all having to run through analytics at the drop of the hat. and if you think that’s big, just wait – the data is going to get larger, especially with the move to cloud computing.
2) their mention of role-based bi. now that’s a twist! imagine a world where raw data knows how to integrate in to the right place on the back-end, the platform / appliance (whatever you want to call it), understands how to find the right place in the “model” , make the correct correlation analysis (hot/cold/whatever) and assign new data feeds to new places. now, the bi end-user has a role in the organization. they don’t care, and don’t want to know where or how the data is stored, just that they have access to it, and to it all in raw form, so they can manipulate it using their data driven rules, and they can map it using their metadata or ontologies…
but the big company used to do it the wrong way because it collected data once and tried to distribute the same set of information across 50 business functions. you really wanted 50 different runs of this so each business function had its own set of information. http://www.information-management.com/blogs/big_data_analytics_nc_state_ibm-10018505-1.html?et=informationmgmt:e1665:1053049a:&st=email&utm_source=editorial&utm_medium=email&utm_campaign=im_daily_081210
but wait, this sounds familiar… yep… i’m not the only one working on these notions or concepts, apparently ibm has a huge interest in this, and the industry currently calls this advanced analytics… what the heck (exactly) does that mean?
don’t believe me? just look at the trends that gartner, forrester and other research firms are saying: reduce it dependence, increase independence at the business user level, put more power (not less) in the business users hands… this is what’s happening, and i for one think it’s a good thing. many years ago on the b-eye-network blog, i posted an entry that said something like this: the future it specialist will not just be a technologist, but will have to have knowledge in the sciences, life, biology, mathematics, chemistry, etc… they will need to become domain experts to be the leaders.
today, analytics and business intelligence (bi) systems are mostly restricted to the realm of the it department and still ruled by the high priests of this technology, the trained data analysts. although it has a huge array of bi tools at their disposal, bi systems primarily serve the needs of senior management because of the huge cost and lengthy time for development. department managers and other business professionals are eyeing bi as a way of solving some of their departments’ problems, but most software available is still too hard to use, too costly and takes too long to implement. http://www.information-management.com/specialreports/2008_95/10001884-1.html
to me this is a really cool example of mixing an analyst/domain expert skills with what we’re (maybe) assuming prematurely is “big data” processing too fast to pause for reflection or human intervention. http://www.information-management.com/blogs/big_data_analytics_nc_state_ibm-10018505-1.html?et=informationmgmt:e1665:1053049a:&st=email&utm_source=editorial&utm_medium=email&utm_campaign=im_daily_081210
ontologies and taxonomies (metadata) are the key!!!
it’s just the opposite to kouri, who talks about big, viable data stores with rich metadata and role/function based views of data much like we’ve talked about the nirvana of role-based bi for years. http://www.information-management.com/blogs/big_data_analytics_nc_state_ibm-10018505-1.html?et=informationmgmt:e1665:1053049a:&st=email&utm_source=editorial&utm_medium=email&utm_campaign=im_daily_081210
so what does that mean to us data modelers? or even the bi / edw world?
that means we better start understanding business terminology, business processes, how they work, how they flow, and we better start mapping these terms to the data sets we store. we better start building business taxonomies of terms that the business ultimately owns and maintains. we should be looking at the petabyte ranges and new metadata driven solutions for bi visualization. we are fast outgrowing (as is the business) the single two dimensional spreadsheet.
i recently posted an interesting piece of software used for unique visualization of unstructured data sets. what do they focus on as their key element for “understanding” information? taxonomies, and business terms – along with correlation analysis.
there is real power here. not to take anything away from this discussion,
but how do you start? where do you start?
the where is simple, the how requires a bit of sweat equity. where you should start is with the business processes. you should identify both the critical and non-critical path business processes (manual and automated) that flow the data / information through the business. you should also start with the identification of this data (ie: the business key). in other words: what does the business use to manage, control, search, and identify it’s data set? the business key!!! these keys have meaning, and should be clearly defined in a hierarchical taxonomy of terms. the definitions should be business definitions, functions applied, along with security and classification requirements.
it is from these keys that we begin establishing our bi architecture underneath. next come the associations. it’s natural for a set of business data to change along the business process, and it’s natural along that critical path for that data to be associated with other data (also in the hierarchy). it may be a child, a peer, or a parent within the hierarchy, and there may be several different types of relationships. but from a business sense of understanding, it’s a taxonomy.
the associations, what about the associations? well, these are very very important for many reasons. 1) because they tie the data together with context along the business process path. 2) they change the context at specific points within the business path. 3) they represent a larger view of context (aggregates, loose affiliations, strong affiliations, etc…) you get the idea, these associations are the glue that hold your information together throughout processing in the application world.
finally, comes the context itself. the descriptions of what these associations are, and what these keys mean – but here’s the twist: the context changes over time. so we have to rely on data warehousing notions of storing raw snapshots of this information in order to retrieve it.
this point became clear to me recently when i visited my new wireless carrier’s local retail outlet. i was about to take a trip abroad and needed to upgrade to a new handset that would work overseas as well as at home. as the salesperson handling the changeover entered the data for my new account, i noticed two small rows of dots at the bottom of the screen labeled “churn” and “revenue,” and i immediately realized that this was a perfect example of the value of analytics embedded into the middle of a business workflow. http://www.information-management.com/specialreports/2008_95/10001884-1.html
but wait a minute… how does this all tie in to taxonomies and big data and role-driven bi?
quite simply my dear watson… (sherlock holmes)…. role-driven bi is defined by an acl (access control list), where the individual who logs in is provided a role. that role is provided with a set of accessibility restrictions (what the user can/can’t do, can/can’t see). aggregate context (or better yet, the notion of an idea rather than a single thought) can only be identified through tying many contexts together at a point in time. which means that each thought is uniquely identifiable, and sits within a hierarchical relationship (taxonomy of terms).
instead of analytic tools, role-based bi harnesses the power of analytics to deliver applications anyone can use without any specialized knowledge or training. by focusing on specific roles, these bi applications can anticipate exactly what data the user needs and how to package it to make it most valuable and easy to consume. when combined with the web-enabled technology of on-demand delivery (software as a service or saas), role-based bi applications can begin to deliver value in a day instead of months or years. http://www.information-management.com/specialreports/2008_95/10001884-1.html
furthermore, finding and locating that data requires an entry point to this taxonomy to make the searching and locations efficient. finally, we add to the mix the notion of “doing a little work”, turning the aggregate context into information, and spinning the interpretation (quite possibly through a different taxonomy than the one that holds the data). and big data? well, mining information, performing advanced analytics, and garnering insights for operational, strategic, and tactical all require loads of data. big data is on the rise for a number of reasons (the one just mentioned is included).
so how do we put it all together?
well for starters, get yourself a good working knowledge of a data model architecture that manages it’s data through ontologies and taxonomies (potentially the data vault, maybe something else?) second, remember to focus on the business processes, their relationships, and the information passing through them. third, take heart – study the business, become a domain expert to understand the problems and issues faced by business users. fourth, remember the goal: put the analytical power of the data in the users hands.
…role-based bi anticipates the information needs of the user by leveraging domain-awareness. clearly, business users don’t want data exploration tools; what they need are information solutions.
the critical point underlying all of this is the way role-based bi essentially turns traditional bi on its head. with role-based bi, the right information is automatically pushed out to the right people at the right time, based on the work they are doing. it’s more than simple notification, it’s an awareness of the domain packaged with important context when they need it. http://www.information-management.com/specialreports/2008_95/10001884-1.html?pg=2
remember that dealing with big data can be a big hassle, and furthermore dealing with big data very fast can cause even more headaches. but, that one of the best solutions you can have is to a) store the data in a raw format b) build a taxonomy/ontology that represents the business c) identify the data and relationships through business keys d) establish context for all given points in time e) mine the information for correlation analysis, f) use a different ontology to get the data out, than you used to put the data in.
so maybe when i said: check out the data vault because i’ve been through this in the 1990’s (for the us government no less), i wasn’t so crazy after all…
do you have a different opinion? i want to hear from you. do you have a similar experience? tell me your story, add a comment!