(Editor's note: Mani Keeran will speak on the topic of intelligent graph analytics at the upcoming MDM Summit in San Francisco. Joining him for the session “Mastering ‘Parties/Counterparties’ Using Intelligent Graph Technologies” will be Franklin Templeton Investments information architects Kim Gi and Sharma Preeti. In this column, Keeran gives an advance look at their presentation).

With the explosion of new data management technologies such as NoSQL databases and the higher demand for self-service analytics, there are multiple innovative paths opening up to solve data analytics challenges.

Source systems - especially legacy ones - serve highly optimized business processes and are closest to business users and customers. They produce a mix of master and transaction data. Mastering the data from multiple systems has been in practice for many years with varying degrees of success.

The typical pattern for analytics is to master a specific subject area and then send the data as dimensions to a data warehouse system to combine with the transactions for analysis. Mastering and sending the data to meet the transactions in a data warehouse may be traditionally accepted way, but this multi-step data movement is also expensive to develop and maintain.

There are a number of expensive tasks involved in this method, from analyzing the business requirements to designing the data model to development and testing ETL code for MDM and DW systems. Is there any other way we can do this faster and cheaper, and without designing data model and writing ETL code?

When my team thought about experimenting with some new ideas, we realized that we needed to look beyond traditional standards like relational database and ETL technologies.

Here is how our experiment was set up. The objective was defined as finding a faster and cheaper development of interactive ad hoc analysis system where master and transaction data from multiple systems and multiple domains will be sourced. There were no constraints placed on specific technology for both processing and storing of data.

The underlying issue in building these systems can be abstracted as ‘(dis)connected data problem’. So, we realized we needed technology and a data store that will ease this connection problem and decided to go with RDF Graph database technology.

Just like our mind sees the world as connected entities (rather than rows and columns in relational database), the RDF Graph DB model sees the data as connected graph. In addition to keeping the data relations for every data attribute, this technology also gives powerful ‘inferencing’ feature to discover new relationships that are not defined explicitly. Once these new relationships are discovered, they can also be stored as part of the database for more powerful interactive analysis. RDF Graph database technology is being used for Linked Open Data datasets like DBPedia.

In an RDF data store, we can pre-define the schema models - called Ontologies - as well as load new dataset as they come in. So, instead of spending enormous amount of time in creating the data model, we started out with a standard – Financial Industry Business Ontology (FIBO) model and decided to extend it as we encounter a new set of data.

The expense involved with mastering custom code was avoided through the use of RDF Graph DB features. We could load multiple datasets into RDF Graph DB, as they are maintained in the source system without creating special extract files. The connections happen at the database at the attribute level between multiple domains as well as with transaction data.

The major mindset change required is to not process master and transaction data separately and then build dimensional model, but to build an integrated RDF Graph DB where they can co-exist and fully connected through a single set of processes. The result was a powerful graph analytics system which can be used for complex analysis interactively and with graphic visualization.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access