Ask editor Mar Cabra of the International Consortium of Investigative Journalists (ICIJ), the group behind the Panama Papers, the world’s biggest data scoop, what her team does, and she replied: “We use technology to tell great stories.”
To capture those stories, the press has always relied on tools like the phone and the fax, as well as their many contacts. But as digital technologies have permeated modern life, journalists like Cabra have also come to increasingly depend on data.
The Panama Papers is a prime example. These articles cast a harsh light on the shady tax dealings of the global elite in what has become the world’s foremost financial scandal. Reflecting the contents of an epic 2.6 terabytes of data and 12 million documents, they also comprise the world’s largest data-based exposé, surpassing other high-profile data-driven investigations like those by Edward Snowden and Wikileaks. Notably, the ICIJ’s investigative work makes use of new analytical methods that were simply not available even five years ago.
Apart from the larger social significance of the ICIJ’s work and this new powerful data journalism, is there any special relevance for today’s corporate CIOs? Indeed there is and it has to do with graph databases—a new way of extracting insights from vast amounts of data.
When it needed to evaluate the immense dataset that the Panama Papers revelations are based on, the ICIJ realized it would need a specialized tool, one that could process a large volume of highly connected data quickly, easily and efficiently. This included a vast pool of unstructured information, mainly in the form of scanned bank statements (PDFs), which could not be readily examined by conventional means. Another key challenge was to make the data understandable to journalists around the globe, even those who were not especially tech savvy.
Cabra and her team had tackled complex data challenges before, and knew that graph databases excel at spotting relationships inside large data sets, revealing patterns and trends that weren’t apparent otherwise. “It wasn’t until we picked up graph database technology that we started to really grasp the potential of the data,” Cabra recounts. “The reaction from colleagues was ‘Oh my God, this is magic!’”
How does graph technology outperform other more traditional ways of working with data? Instead of artificially placing the data in tables, the way a relational database does, graphs use a notational structure that echoes the way humans intuitively work with information. Once that data model is coded into a scalable architecture, a graph database is matchless at analyzing the connections in huge, complex datasets.
It isn’t only journalists who have been benefiting from these capabilities. Social web giants Google, Facebook and LinkedIn have been using graph databases to derive value from related data for some time. The PageRank algorithm that powers Google is a graph database application, for example, as are Facebook’s and LinkedIn’s tools for mapping social networks.
As graph database technology has evolved, such highly scalable connected data analysis is now available to organizations of any size. Forrester maintains that 25% of enterprises will be using graph databases shortly, while Gartner reports that graphs are the fastest-growing database market and predicts that 70% of leading companies will pilot a significant graph database by 2018.
Already, a wide spectrum of enterprises are proving their value. In the retail industry, for example, graph databases generate sophisticated personalization, while in financial services they are being successfully deployed for fraud detection. The healthcare industry is using them to investigate disease, while the government has begun utilizing them for security and other public applications.
The arrival of the Internet of Things and the petabyte quantities of data that it generates will accelerate their acceptance, since graph databases can easily handle even this quantity of data. As the line between analytical and operational repositories blurs, graphs can help enterprises interpret this data in ways that more traditional data warehouses and relational databases cannot.
A caveat is that graph databases aren’t for every data problem. There are transactional and analytical processing requirements for which relational technology is better suited – especially transaction-intensive systems such as your financial, HR or ERP systems. There are also NoSQL (Not Only SQL) databases that handle other vast datasets where relationships aren’t all important.
However, a graph database makes sense for any organization seeking to make the most of its connected data, which is welcome news for any IT leader wanting to find new tools to help their organization reap success in our super-connected, data-driven era.
Graph databases have permanently altered the nature of journalism – and have the potential to disrupt many more industries.
About the Author
Emil Eifrem is co-founder and CEO of Neo Technology, the company behind the world’s leading graph database, Neo4j (http://neo4j.com/)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access