Integrating Systems
We all understand that there are numerous issues with disparate systems, and I think it’s common knowledge that consolidation of those systems to a single platform (hardware/software/database) can save money, save time, and make things a little less complicated in the end, at least that’s the dream. We also have seen the rise of something called Cloud Computing, which offers a partial solution to some of these integration issues, but doesn’t solve them all.
You see, it’s getting from disparate systems to that single integrated system that is the really difficult part. There are all kinds of issues faced by both business and IT in the struggle to consolidate. I’ve worked in some of these environments building business intelligence systems all these years, and I don’t think I need to tell you that it’s a bumpy road filled with deep pot-holes, sharp-turns, and pit-falls.
The Issues:
Some of the issues I’ve run into over the years really focus on the definition, or understanding of the data. Other issues include:
- Functions applied to translate the data
- Interpretation of the Data
- “MASTER” determination of the data
- Best storage and architecture of the data
- Visualization of the data
- Accountability and Auditability of the data
- Traceability of the data
- Overloading (multi-use of single columns and record types) of data
- Historical data with lost definitions
And on and on and on. This doesn’t even begin to account for the process problems faced when it comes time to implement the chosen design or data architecture for integration. The processing problems can include the following (but certainly are not limited to these):
- Data too big
- No change data capture/no audit trail
- Bad indexes
- No control over source feeds, source timing
- Multi-system valuation dependencies
- Missing data
- Changed Passwords
- Mis-aligned access rights
- Overflowing Data
- Out-of-range data
- Bad domain data (a date field contains a string..)
And more… All of these problems contribute to the long hours and hard work of IT trying to get the “consolidated BI system” on-line for you, the business user. I’ve been through these issues, and lived to talk about successful implementations. It all starts with the data architecture used in the consolidated system: IF THE ARCHITECTURE/DATA MODEL DESIGN IS CORRECT, then most of these problems are alleviated. We get to introduce parallelism and partitioning to the implementation, we get to introduce high-speed data integration, we no longer have to care about “availability” of the data (to a certain degree).
So how do we get there? I mean, to the “RIGHT” data architecture?
That’s where the Data Vault comes in. It is a free and open-source design technique that has been honed to solve these problems at the root. It starts with the focus on the business keys, right where the data meets the business process. By focusing on the business keys, we generally can get a definitional or semantic agreement among high level business users; and semantics are important because…
Everything implemented on a computer has semantics that is meaningful to the implementers and the users. Problems arise when different systems and the data they process have different semantics. More problems arise when the users and implementers make different assumptions about the semantics. Similar problems arise as systems evolve over time with updates, revisions, extensions, and connections to independently developed systems. Computer systems have been successfully interacting across long distance networks for over forty years. But a tight integration, even of local systems, is hard to achieve because of the difficulty of ensuring that all components will interpret the same data in the same way. Further complications are created by different notations and conventions in databases, knowledge bases, the Semantic Web, folksonomies, and legacy systems that have no explicit semantic notations. Finally, even the most precisely defined and integrated semantic systems must interact with people who have little or no knowledge of the precise definitions and little time, desire, or ability to learn them. http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&proposalid=3053
Now it goes deeper than that… One of the basic tennants of the Data Vault Model is to store a passively integrated view of the business, without changing the data on the way in. In laymans terms this basically means: integrate by business key where the semantic definitions match – don’t integrate where they don’t match. It also means, that we do not process the data (except maybe field data type/domain alignment) on the way in. So this makes the Data Vault Model a great foundational start to systems integration.
But wait, what about the PROCESSING of the data?
However, there are more problems at the implementation level that we encounter, especially when we have to deal with history, legacy systems, timing, arrival, availability, networking, etc.. Many of the “technical” issues such as connectivity and access can be solved by using a good ETL or ELT platform/tool. However, there are still other issues that just a tool alone can’t solve! For instance: parallelism, partitioning, and scheduling of parallel feeds. The data model will drive the ability to implement parallel and partitioned loading processes!
The Data Vault Model is engineered to take this into account, and the view of the BI world/overall systems architecture can now leverage parallelism and partitioning in a near-linear fashion. That basically means: you can finally use all the hardware resources you’ve been given – even max them out at 110% if you really want to. With the Data Vault Modeling, you have the freedom to schedule specific parts of the load as soon as the data is available, and in maximum parallelism with maximum partitioning control.
This means whether you’re on a multi-processing platform (MPP), or you’re on a Big-Iron SMP, or your in a cloud, you can take advantage of the resources that are there – in order to perform the data integration tasks as fast as possible.
But what about the semantic rules and understanding? You said: load raw data to the data warehouse…
Yes, indeed I did. And what’s happening there, is that we are shifting the power back in to the hands of the business users. By putting raw but integrated data in to the Data Vault, and by providing an advanced analytic engine like an in-database data mining tool, the business user is empowered to interpret the data with the rules that they see fit. Of course this process should be governed, a general consensus should be achieved as to how the business operates, or at the very least the local business unit. But what we are saying here, is give the business rules back to the business users, let them determine how to interpret the data and what’s best for their business.
This means moving the business rules downstream of the consolidated business intelligence system (EDW). This has the effect of shifting part (not all) of the burden downstream. At the time the business user wishes to investigate the data (which should be an iterative process anyhow, not something tied to a rigidly formatted report), they have a question in mind. Then let them parse and reduce, parse/interpret/reduce the data as needed.
So what are you saying?
In reality, systems integration CAN be done on a shoestring budget, and CAN be done with limited resources. It CAN be done with big data, real-time and unstrtuctured data sets all integrating for useful answers to business questions – you just need to know and understand the right architecture for the job.
I’ve been in to technology for at least 25+ years (since at least 1982). I’ve worked as a consultant for most of my life building big systems for companies like Nike, US West, Excel Energy, Expedia, and Department of Defense. I’ve seen a lot of issues crop up, and been able to solve many of them. Let me help you navigate around the potential issues your facing through my unique one-on-one coaching area. Let me give you the tools and the resources you need to succeed in your systems and data consolidation effort.
But you say: how can one person do this? I thought I needed 20 or 30 consultants or an outsourced team… Nope. I recently went fixed bid against a large contracting firm. They bid 90 days, $250k and I don’t know how many people… maybe 5? I bid 2 weeks, 2 people and $30k. We got the job, put the Data Vault in from start to finish, generated the ETL loading code for Oracle Stored Procedures, and in two weeks we had a financial data warehouse up and running, and were successful. This is a real-life case. The customer told me: even if you fail I still have time and money to go back to Consulting firm X and hire them.
Needless to say, they didn’t need to do that. In my coaching area, I’m offering you a chance to learn and use the same tools and techniques that I use to accomplish tasks like this in a very short period of time with less resources, and less cost. At least go check it out… Click Here.
You must be logged in to post a comment.
