The insurance industry depends on reliable and timely data. This data originates from a variety of systems, like policy issuance, customer service, billing and claims. To aggregate and integrate large volumes of data from a plethora of sources is not a trivial problem for data architects.

Yet what's even more challenging than the development of such projects is testing them.

While the application-testing landscape is fertile with a variety of fresh ideas and offerings, the terrain of data testing remains surprisingly barren. A quick search on Amazon will demonstrate the curious truth: the number of texts dedicated to testing data applications is woefully low when compared with those covering software testing. Why is that, one wonders, especially if one is tasked with architecting a data project that has few or no application/user-interface (UI) layers? A few quick explanations do come to mind:

First, software engineering as a discipline takes a holistic view of application development and doesn’t treat database development as a separate concern deserving any kind of special treatment. Consequently, we see a healthy proliferation of testing tools, patterns and best practices on the UI and application layers overall, with little or no caveats for data testing. We see the notions of continuous integration, test-driven development, code-coverage and so on firmly established in the application development communities. Yet the testers and developers of non-UI data-heavy applications, such as data warehouses and business intelligence, struggle to get even the rudimentary automation right.

Second, database vendors have by and large lacked any emphasis on enforcing some sort of development discipline on their platforms. With a near lack of built-in support for quality, it can become a challenge to enact even the basic quality-controls for a project. Consider, for instance, concepts like ‘project’, ‘build’, or ‘unit-test’. Such notions have been introduced in software development to improve the manageability of complex projects, and also to establish a quality-baseline. (For instance. a broken build isn’t considered working software). No such concepts exist in the database platforms, making the task of organizing large code bases and tracking quality an extremely daunting one.

Third, tools and technologies like extract, transform load (ETL) and BI that are integral to most contemporary data architectures are relatively new concepts undergoing frequent revisions. Built-in support for testing is slowly emerging in this space, though it’s nowhere near as comprehensive as the well-entrenched techniques in the application development arena.

As a result of some of the observations above, we witness a tendency with IT managers to treat data testing the old-fashioned way: hiring second-class resources and wasting excessive man-hours on repetitive and manual processes that could have otherwise gone into more impactful quality assurance.

Yet they start feeling overwhelmed as data volumes or integrations grow and customer demands become more complex. Consequently, data quality suffers, and it’s all downhill from there.

Sound familiar? We’ve been there too, and over the years have learned some common-sense remedies that we’ll be sharing in the second part of this blog.

This commentary originally appeared at Insurance Networking News.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access