Click here to read part 1.

Where the boundaries of insurance enterprise data are clearly demarcated, data consumers can get a good sense of what is pertinent and worthy of analytics. In the world of big data, this comfort zone no longer exists. For starters, the sheer number of data elements is not manageable by traditional means. Assigning enterprise value to every big data element is a nearly impossible task.

Consider telematics devices that interface directly with onboard vehicle control systems and can record a tremendous number of variables from temperature, pressure, velocity and volumetric sensors. It is, thus, not uncommon to find hundreds of variables in these “raw” data sources at a very fine level of granularity. Determining what is pertinent requires careful aggregation of the data without obscuring any real gems in terms of data relationships.

Another question is longevity — how long does the enterprise need to retain the data? With traditional insurance data, five years of premium and 15 years of claims is the rule of thumb for many applications. But who knows how many tweets, posts, audio and video streams should be kept, and for how long?

Perhaps big data should support answers to questions about what is happening right now. Some platform vendors already offer capabilities for automated sentiment analysis by scrubbing blogs, websites and other sources in an attempt to generate scalar values relating to how an organization is perceived. Given that perceptions can change dramatically in a very short time, we really need to examine the value of big data over the long-term.

Also see Views from the Front Lines of the Data-Analytics Revolution


Predictive analytics can suffer from model degradation if data changes too rapidly. What is the “lift” provided by models using data that is too old (or changing so rapidly) that strong predictions become questionable? Big data sources have a greater dynamic range and volatility than legacy insurance data sources — presenting exactly these challenges for modeling.

Big data longevity also presents issues of data storage and structure, and architectural concerns. Can traditional data warehousing support the unstructured, free-formatted, volatile and modally unstable big data elements? What about the actual databases? Because insurance analytics requires strong consistency models from its underlying databases, No, SQL databases are only part of the answer.

Despite the questionable paradigm of “store it now, structure it later” approach touted by some vendors, Apache Hadoop provides an interesting architectural model for long-term storage and processing of big data sources. It has the potential to tackle issues of volume and lack of structure head-on. Hadoop may offer a solution to the problem of potentially “missing a conversation”.

In the legacy world where data was created internally as a byproduct of a business process, such as underwriting or claims servicing, ensuring privacy and confidentiality was under the direct control of the carrier. In this world, insurance companies do an exemplary job of keeping personal information private.

But in the world where an increasing amount of decision-making data originates from big data sources, the carrier may not be able to guarantee the privacy and confidentiality of the data as it flows into the carrier’s walls. Nevertheless, carriers will continue to be reminded that privacy and confidentiality should remain the norm, irrespective of where the decision-making data comes from.

It is clear that the advent of the data deluge in the form of new, nimble, but massive big data sources cannot be ignored by the insurance industry. At the same time, it can be seen that carriers often have bigger fish to fry — certainly when they are in the midst of implementing processes and applications with a more sharply defined return on investment (ROI). User analytics, policy administration system replacement and underwriting platforms come to mind as good examples of areas where the ROI is clear.

In the final analysis, investing in, harvesting and processing big data sources must yield a significant ROI, given the size of investments in business processes, change management and infrastructure in a post-data deluge world. Clearly, big data means big changes. It is noteworthy that as big data enters into its growth stage for the insurance industry, it has yet to cause any major disruptions. Consequently —and even as the industry still searches for its first big data “killer app”— the opportunities for positive ROI in big data abound.

This blog was exclusively written for Insurance Networking News. Published with permission