With more than 90 percent of today’s data having been produced within the last two years, we are experiencing a data explosion that makes traditional ad-hoc analysis obsolete for gaining business insights. This presents an ever-growing gap between entities that generate the so-called big data (for instance, a sensor that sends braking information from a vehicle) to the systems that successfully analyze the data and yield business insights.

Statisticians often cluster large amounts of data to understand it. Similar data are categorized in the same cluster, while different clusters are kept as far apart from each other as possible.

A heat map is another powerful method employed to visualize big data. It plots data with various colors according to its “intensity.” In the end, it resembles a thermal image.

In a cluster heat map, the rows and columns are hierarchically clustered. Data points are represented as a matrix of rectangular tiling, with each tile having a shade based on a color scale to represent the value of the corresponding element. The axes are ordered such that similar rows and columns are near each other. Cluster heat maps are already used extensively in bioinformatics, where the volume of data is huge. They’re beginning to be used on insurance and healthcare data.

For instance, consider a “pay how you drive” system where a black box installed in your vehicle sends data to a system that then keeps track of violations. Your premium can be determined from the number of violations you have compared with the mean violations in the area where you drive. To analyze the mean violations in an area, the following can be done.

Since the number of violations are the data points, they can be tracked according to parts of the city (on the x-axis) and by the age of the driver (on the y-axis). Both of these axes can be hierarchically clustered. On the x-axis, a major metropolitan area like Detroit can be divided into cities—Detroit, Farmington Hills, Royal Oak, Birmingham, etc.—and each city can be subdivided into zip codes. The y-axis can have age categories such as under-30 years of age and over-30 years of age. The under-30 category can be subdivided into 17 to 21, 22 to 30 and so on.

The result is a cluster heat map. Once you’ve built it from data collected from various subjects, it can be used as a quick guide to estimate the risk involved in insuring an individual from a particular area. The map can also give insights on risk relationships between the various regions in the city and the risk transition between the various age groups. Color-coding can then be used to represent the "cells" (for instance, red, green, blue and black in that order to represent violations).

Cluster heat maps serve as a quick two-dimensional scorecard that can represent billions of data points. As the amount of data grows exponentially, cluster heat maps will become increasingly valuable to insurers that must make sense out of mountains of data.

This blog was exclusively written for and originally appeared at Insurance Networking News, a SourceMedia brand. It may not be reposted or reused without permission from Insurance Networking News.The opinions of bloggers on www.insurancenetworking.com do not necessarily reflect those of Insurance Networking News.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access