Data Science

#bigdata #nosql #dv2 and Managed Self Service BI

Maybe you’ve heard the term: “Self-Service BI”?  Maybe not.  It’s a blind term in my opinion, and lacks quite a bit in it’s simplistic title.  It truly needs to become: “Managed Self Service BI”.  Why? Because I.T. (like it or not) is still in the picture, but more than that, Business needs / demands a governed or managed process (just like I.T.) in order to affect data and processes in a production environment.

Well, Well – look how far we’ve come!  NOT.

Does anyone remember EII? (Enterprise Information Integration)  It’s been around a LONG LONG Time, don’t believe me?  Just check my blog entries on

And my personal favorite:

What’s even funnier, is to read the comments from 2005 when I wrote those entries.  The comments (many of them) predicted that EII was a huge success, and would supplant the Data Warehouse, replace the need for Integration & Data Architecture and Data Modeling, and so on.

What’s the point?

The point is: EII WAS on-demand federated query.  The ONLY engine still around today is Composite Software, and they are doing a great job of improving the premise and the platform.  The point is: we went through all of this “self-service BI” without management, without governance, without data modeling, and supposedly without IT…  We’ve been here before!

The POINT is: Self-Service BI as a notion / concept is NOT NEW.  Business users have been trying desperately to return to this state ever since DSS (Decision Support Systems) and Reporting Systems were split from the operational applications.

The POINT is: EII, Federated Query, On-Demand Query, Self-Service BI is a pandora’s box without good governance, good models, good management and IT participation.


The proper term should be: MANAGED Self Service BI.

Ok, read the Wikipedia definition: Wikipedia defines EII

Enough of the Rant, Back to the topic…

There are a number of tools in the #nosql space that do some REALLY COOL THINGS, things that EII had hoped to achieve, but either never got there, or faded out of existence before it could truly mature (again, my hat is off to Composite Software for sticking around and making a go of it).

Today, the terms I find include:

  • Data Munging (or Data Mugging as my good friend Doug Needham would call it)
  • Data Preparation
  • Data Refinery (As per Claudia Imhoff) – although this one takes on new meaning
  • Data Exploration
  • Data Blending
  • Data Wrangling (Yee-Haw, get your Data Cowboys/Cowgirls ready)

There are a number of GOOD things, and a host of serious innovations that have come from the tools in these spaces – some really cool stuff… but does really cool stuff make for good governance?  How about good data warehousing?  No?  Well, hmmm – maybe some more thought is needed in this inventions.

I’m reviewing and evaluating a list of tools in this space, my personal favorite thus far is Datameer.  You should check them out.  It has governance, change management, lineage, check-in/out, SQL generation, Map Reduce code generation, and a whole lot more – including neural net analysis visualization.  It’s simple and easy to use (thus far).  Other tools in this space that I’m looking at include:

Now before you get on your high horse (Data Wrangler Horse) and begin asking me why I didn’t include: QlikView, Tableau, Cognos, SAS, etc.. etc..) my reasoning that I have some basic criteria that these tools must meet: a) must generate Map & Reduce code, b) must generate SQL code, c) Must work (not through 3rd party) directly with Hadoop based systems, d) must have visualizations built in, e) must allow the users to construct process workflows / business logic, f) must be able to WRITE results back to Hadoop, and a few others that I won’t bore you with.  Most of the other tools are considered “Pure-play BI tools” – or work through partners to get these elements working.  Hence not included in my list.

There is even Google Refine (which has a very cool set of features for Refining Data, but is lacking some of the more sophisticated interfaces, graphs, and visualizations that Datameer has.  If you have experience in any of the above tools, I’d love to hear from you in the comments below.  Please tell me the BEST / TOP 3 features you LOVE about the tool, and the bottom 3 features you dislike about your tool.  If you’re up to it, let me know the top 3 features you WISH your tool had.

I thought this entry was about Managed Self Service BI?

It is, so let’s get back to it.  In Managed SSBI, good governance, solid data modeling, and information architecture are key components to successful indulgence.  Applicability increases when good business use cases have been defined.  Keep in mind, that just because these tools can access / run / generate analytics IN a Hadoop platform, doesn’t mean you should!   Some of these analytical inquires should remain as AD-HOC questions sent in SQL form to a scalable Database engine capable of returning sub-second response times.

It is time to bring this entry to an end, I will write more as I progress in my investigations of these tool sets.  However, I will leave you with these thoughts

  1. Don’t forget to comment, I want to hear from you
  2. Managed Self-Service BI is the proper term
  3. I.T. must still be involved
  4. Good Governance is a must
  5. Good Architecture (system, data, information) is a must
  6. Relational Databases still hold REAL VALUE
  7. Good Data Models help structure data in an understandable manner
  8. Business Use Cases *must* be defined (otherwise, how will the business know if they’ve found what their looking for?  Especially if they haven’t defined what they are looking for)
  9. Lots of tooling doing different things
  10. We’ve been down this road before!  With EII – perhaps these NEW tool vendors could learn a thing or two about researching the problems and issues we had with the EII phase of the market.

Of course, “what’s old is new again…” uggh, I seriously dislike this statement.

What do you think?

Dan Linstedt

Tags: , , , , , , , , , , , , ,

No comments yet.

Leave a Reply