Data visualization involves a graphical image generated by software, where the content of the image is determined by reading digital data. The data is usually numeric, but some software can visualize concepts drawn from text documents. The software arranges geometric shapes (such as points, lines, circles and rectangles) to create an interpretation of the data it read. Attributes such as proximity, size and color express relationships between the geometric shapes. Data visualization has gained adoption among business users because it supports a variety of business tasks, including decision making, knowledge management and business performance management.
Three strong trends have shaped the direction of data visualization software for business users over the last few years:
Chart Types. Most data visualizations depend on a standard chart type, whether a rudimentary pie chart or an advanced scatter plot. The list of chart types supported by software has lengthened considerably in recent years.
Level of User Interaction with Visualization. A few years ago, most visualizations were static charts for viewing only. At the cutting edge of data visualization today, dynamic chart types are themselves the user interface, where the user manipulates a visualization directly and interactively to discover new views of information online.
Size and Complexity of Data Structures Represented by Vis- ualization. A rudimentary pie or bar chart visualizes a simple series of numeric data points. Newer advanced chart types, however, visualize thousands of data points or complex data structures, such as neural nets.
Figure 1 charts these trends and places in their context some of the common functionality of data visualization software. These trends also reveal a progression from forms of rudimentary data visualization (RDV) to advanced ones (ADV), as seen moving from lower-left to upper-right corners in Figure 1. Rudimentary forms of data visualization (pie and bar charts and presentation graphics) have been available in software for many years, whereas advanced forms (with interactive user interfaces, drill-down and live data connectivity) are relatively new. The dotted lines in Figure 1 create a life-cycle context for rudimentary and advanced forms of data visualization by identifying three life-cycle stages: maturing, evolving and emerging.
Figure 1: Trends in Data Visualization
Charting the Course of Charting
Basic charting typically involving pie or bar charts is the oldest and most common form of data visualization as well as the most rudimentary. Presentation graphics, intended to impress readers of presentations and reports, give basic charts a stunning visual appeal by applying a three-dimensional appearance, animation and color effects such as gradient fills.
The wide array of visualization software supports many different chart types, because the needs of users vary greatly. For example, business users demand pie and bar charts, whereas scientific users need scatter plots and constellation graphs. Users looking at geospatial data need maps and other three-dimensional data representations. Digital dashboards are popular with executive business intelligence users who monitor organizational performance metrics, visualizing them as speedometers, thermometers or traffic lights.
Tools for charting and presentation graphics are devoted exclusively to visualizing data. However, data visualization capabilities are commonly embedded in a wide range of software types, including tools for reporting, online analytical processing (OLAP), text mining and data mining as well as applications for customer relationship management and business performance management. To address the need for embedded visualizations, many software vendors have architected data visualization functions as components that can be embedded in a variety of tools, applications, personal productivity software and Web pages (including dashboards and personalized portal pages).
Business software for charting has evolved from static and superficial charts to interactive visualizations with data connectivity and drill down. These advanced capabilities are found in a new category of software the enterprise charting system (ECS). With an ECS, users can develop and deploy chart-centric analytic applications that provide data visualization specifically for business intelligence on an enterprise scale.
Business use aside, data visualization is part and parcel of software focused on a variety of scientific studies. Data visualization plays an important role in software for the study of mathematical, statistical, geographic and spatial data. Some data visualization software for business users has borrowed chart types originally designed for scientific users, such as scatter plots and constellation graphs.
Whereas scientific users tolerate heavily technical functionality that may require knowledge of programming languages or statistics, business users need this functionality to be hidden under a friendly user interface. For data visualization to appeal to business users, it must provide out-of-the-box value in the form of functionality for solving business problems, such as analyzing or mining customer behavior, product categories and business performance.
Note that presenting multiple, simultaneous visualizations is appropriate with complex data sets (especially multidimensional ones), which would be hard to represent in a single image. In such a case, however, the charts should be linked together, so that user changes to one are reflected in the others.
Get Your Hands on the Data Literally
The most dramatic transformation in data visualization has been the move from static charts for viewing only to dynamic, online visualizations with which the user can interact. The user can refresh an interactive visualization with recent data (analogous to running a report) or interact with its content (analogous to using an application).
For instance, with basic interaction, a user can rotate a chart or change its chart type to discover the most revealing view. A user can also change visual properties, such as fonts, colors and borders. With complex types of visualizations, such as constellation and scatter charts, the user can select data points with a mouse and move them to clarify the view.
Advanced data visualization techniques often present a chart or other visualization as a summary level. The user can drill down into the visualization to explore the detailed data that it summarizes or drill down into OLAP, data mining or other advanced functionality.
Advanced interaction enables the user to change the visualization for the purpose of discovering alternate interpretations of the data. Interacting with the visualization should involve a minimally invasive user interface, where the user simply double clicks part of the visualization, drags and drops representations of data entities or right clicks the mouse to select from a menu. With OLAP or data mining tools, direct interaction with the visualization is part of the iterative nature of data analysis. With text mining or document management tools, direct interaction is a navigational mechanism that helps the user explore libraries of documents.
Visual query is a cutting-edge manifestation of advanced user interaction. For example, a user might see outlying data points in a scatter graph, select them with a mouse and receive a new visualization representing just those points. The data visualization application generates the appropriate query language, manages its submission to a database and visually represents the result set. The user can focus on analysis without being distracted by query definition.
Considering the mouse as an extension of a user's hands, advanced interaction with a visualization literally puts the user's hands on data and keeps them there. The mouse pointer stays on the visualization, not on pull-down menus. The user's eye stays on the visualization, instead of ping-ponging between it and dialog boxes. In short, the user can focus on information discovery and analysis instead of the user interface.
A Thousand Points of Data
A strong trend with advanced data visualization is a move toward representing large and/or complex data sets, unlike the average bar or pie chart that represents a simple series of numeric data points. For example, OLAP (and query and reporting) tools have long supported charting for their online reports. A few OLAP tools now include data visualization capabilities for representing complex multidimensional data.
Most charting typically involves a one-time read of data, unlike newer visualization software that refreshes chart content by reading data periodically. In fact, data visualization users who monitor linear processes (stock market tickers, computer system performance data, seismographs, utility grid loads, etc.) need real-time or near real-time data feeds.
Users of data mining tools typically analyze very large sets of numeric data. Since traditional business chart types (pie and bar charts) are ill-equipped to represent thousands of points of data, mining tools almost always support some form of data visualization to portray patterns and trends in massive data sets. Generic data mining tools usually consist of several algorithms, each with an analytic approach, data connectivity and visualization that is appropriate to it. Likewise, many software applications for customer relationship management (CRM) enable users to study large customer and prospect databases with embedded technologies for mining and visualization.
While data visualization usually involves structured numeric data, it also is key to representing patterns in so-called unstructured data, namely text documents. In particular, the new generation of text mining tools can parse large collections of documents and build an index (sometimes called a taxonomy) of the concepts and topics covered in those documents. When the index is built with neural net technology, its complexity is difficult to convey to the user without some form of data visualization. The visualization usually serves two purposes. It is a visual representation of the content of the document library, and it is a navigational mechanism on which the user can click to explore documents and their topics.
Visualizing the Future
Data visualization has captured mind share among business users, because its benefits to data analysis and knowledge management systems are obvious. However, advanced data visualization has achieved relatively limited adoption among business users. One barrier is its newness; there are not yet enough reference sites to help promote it. Also, advanced data visualization is typically embedded in product types that are themselves emerging data mining tools, the new generation of text mining tools and personalized corporate portal pages.
Even so, the need and desire for business-oriented data visualization will accelerate over the next few years, driven by the following factors:
Customer Centricity. Companies recognize that analyzing customer data yields a return on their IT investment. Therefore, many companies are gearing up to collect customer data at fine levels of granularity especially a customer's "clickstream" and these massive data sets benefit from analytic approaches that incorporate advanced data visualization.
Massive Data Sets in E-Business. As more companies become true electronic businesses, data collection accelerates in areas ranging from internal operations to supply chain to customer interaction and beyond. These massive data sets merit mining and analysis to improve efficiencies and effectiveness which, in turn, require advanced data visualization to represent trends and patterns.
Visual User Interfaces (VUIs). Browser-based applications and Web sites have upset standard notions of how a user interface should look, and data visualization will be an integral part of the next generation of user interfaces. The seeds of VUIs can already be seen in metrics- oriented analytic applications such as dashboards and balanced scorecards, and the rise of corporate portals will increase the need for VUIs in the form of personalized portal pages.
Philip Russom is the senior manager of research and services at The Data Warehousing Institute (TDWI), where he oversees many of TDWI's research-oriented publications, services and events. Prior to joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research, Giga Information Group and Hurwitz Group. He also ran his own business as an independent industry analyst and BI consultant, and was contributing editor with Intelligent Enterprise and DM Review magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at email@example.com.