There is a nuance about big data analysis. It's really about small data. While this may seem confusing and counter to the whole big data “movement,” small data is the product of big data analysis. This is not a new concept, nor is it unfamiliar to people who have been doing data analysis for any length of time. The overall working space is larger, but the answers lie somewhere in the small.
In the old days of traditional data analysis, we began with databases filled with customer information, product information, transactions, telemetry data, etc. Even then, there was too much data available to efficiently analyze. Systems, networks and software didn’t have the performance or capacity to address the scale. As an industry, we addressed the shortcomings by creating smaller data sets.
These smaller data sets were still fairly substantive, and we quickly discovered other shortcomings, the most glaring of which was the mismatch between the data and the working context. If I worked in accounts payable, I had to look at a large amount of unrelated data in order to do my job. Again, the industry responded by creating smaller, contextually relevant data sets. Big to small to smaller still.
You may recognize this as the migration from production databases to data warehouses to data marts. More often than not, the data for the warehouses and the marts was chosen on arbitrary or experimental parameters, resulting in a great deal of trial and error. All too often, the data was chosen to support an output or a conclusion we wanted to see as opposed to discovering something new, interesting or anomalous. We weren’t getting the perspectives we needed or that were possible because the capacity reductions weren’t based on computational fact.
Enter big data with all its volumes, velocities and varieties and the problem remains — or perhaps worsens. We have addressed the shortcomings of the infrastructure and can store and process huge amounts of additional data, but we also had to introduce new technologies specifically to help us manage and manipulate big data. If we think this is challenging now, just wait a year or two: The emergence and inevitability of ubiquitous machine data is just around the corner. Don’t be scared; be prepared!
Despite outward appearances, this is a very good thing. Today and in the future we will have more data than we can imagine, and we’ll have the means to capture and manage it. What is more necessary than ever is the ability to analyze the right data in a timely enough fashion to make decisions and take action. We will still shrink the data sets into “fighting trim,” but we can do so computationally and dynamically. We process the big data and turn it into small data so it’s easier to comprehend. It’s more precise because it was derived from a much broader starting point; therefore, it’s more contextually relevant. Small data is where we’ll get the answers, today and in the future.