What if there wasn’t a single source of truth in corporate data after all? Many enterprises need to adapt to a ‘federated’ business reality in which many different sources of data exist and there is a growing need to collaborate externally around data.
In order to deal with this situation, and ultimately thrive on it, this requires the smart use of the next generation of Meta-Data Management, Master Data Management, virtualization and Business Process Management tools. They can deliver federated access to information, that soon may be spread across multiple 'Data Lakes'. Governance in this world is markedly different from traditional data warehouse centric approaches: it needs to be much more agile and more business focused but also needs to balance regulatory considerations and the need for free business collaboration.
As Capgemini’s research with MIT Sloan, the Digital Advantage, found, companies that become digital, outperform their peers. Most tellingly it was companies that took the conservative route to digitization that delivered the most managed route towards becoming a Digital Enterprise. The challenge for any organization looking to become digital is to leverage all of its data and to enable the business to combine it. The ‘Fashionista’ approach is to opportunistically look towards technology silos for point solutions. The ‘Conservative’ approach is to look towards governance and a consistent way for all the business to combine the information to their individual needs.
The modern reality of technical evolution is that the conservative approach is the one that embraces delivery and operation changes more aggressively and focuses less on individual technologies.
This view on information underpins the Business Data Lake which Capgemini co-innovated with Pivotal in 2013 and has since been adopted by both Informatica and EMC. Data Apart Together is not simply about how you combine data; it’s about recognizing that information often is apart for a reason. It is partner data they don’t want to share in the raw, it’s personal information that has to be kept in a specific geography, or its separate due to an acquisition and the sheer volume of information makes it unreasonable to coalesce into a single environment. Data Apart Together is therefore about how you enable different business units to derive insight across all the available information and not simply that which is directly available. This is where governance, in particular Business Meta-Data, MDM and RDM, deliver huge benefits. The role of governance here is not to constrain the business by forcing a single view, but instead to concentrate on how a business can collaborate around information. This view on governance is essential when thinking about how business users actually leverage information.
For many years there’s been an approach of focusing on the data schema to create a single and consistent view for every part of a company. The problem is that this doesn’t reflect how people actually use information in their jobs. Nor does centralization in a single schema represent the actual reality of modern information challenges: centralization may be desirable, but it’s simply not always possible. Stakeholders look to create personal views that reflect the individual challenges that they, and their teams, face. Thus the marketing lead for an airline looks towards the customers as the center of their view, while the maintenance department looks for aircraft information. To enable analytics to be done against these disparate data sources is about enabling them to create the right insight for their problems, or to put it another way, insight at the point of action.
Governance needs to reduce focus on schemas and data quality, and build toward how data sets can be combined and therefore, on the identifiers that can be used to link those data sets consistently. Data quality becomes a side effect of governance rather than the goal. This approach is essential when looking at Big Data solutions.
It’s ridiculous to think you can possibly create a single schema that includes all of the internal and external data that a company uses. Information from Facebook and other social media feeds is ever changing, information available from open-government sources is continually added to, and unstructured information like email and attachments defy any sort of traditional approach. Twitter and other Social media sites are already forcing firms to submit analytics to them rather than sharing their information openly.
Simply put, a traditional strategy of "one big database, one big schema" is irrelevant in the modern world.
In the Business Data Lake we’ve concentrated on governance from a business perspective not from a technical IT schema approach. This approach focuses on enabling collaboration and allowing the business to combine the various data sets within their various Lake to create their own local views, and from there to see where more governance and data quality is required – rather than creating a central plan, which turns out to be wrong.
This focus on identification and cross-referencing means both transactional systems, as well as post-transactional analytics, can leverage the full range of organizational information in a managed approach that aligns with the business model and value. It thus delivers on the promise of digitization and delivers benefits earlier than plain technology-centric approaches.
'Data Apart Together' is a key trend for businesses and a key trend for IT to recognize how the market has changed. It's about creating the platform that helps the business brings fragmented data together for its local purposes, not how IT tries to impose a single view on information that constrains the agility of the business.
(About the author: Ron Tolido is an analyst with Capgemini)