Last September, two computer science students from the University of St. Andrews in the U.K. attempted to pin down a definition of Big Data, publishing “Undefined by Data: A Survey of Big Data Definitions” in the open-source journal arxiv.org. Their round-up included:
- Gartner Group: The “Four V’s” definition: volume, velocity, variety, veracity
- Oracle: The derivation of value from traditional relational database-driven business decision-making, augmented with new sources of unstructured data such as blogs, social media, sensor networks, and image data.
- Intel: Generating a median of 300 terabytes of data weekly. Includes business transactions stored in relational databases, documents, e-mail, sensor data, blogs and social media
- Microsoft: The process of applying serious computing power, the latest in machine learning and artificial intelligence, to seriously massive and often highly complex sets of information.
- The application definition (arrived at by analyzing the Google Trends results for “big data”): Large volumes of unstructured and/or highly variable data that require the use of several different analysis tools and methods, including text mining, natural language processing, statistical programming, machine learning, and information visualization.
- The Method for an Integrated Knowledge Environment (MIKE2.0) definition: A high degree of permutation and interaction within a dataset, rather than the size of the dataset. “Big Data can be very small, and not all large datasets are Big.”
- NIST: Data that exceeds the capacity or capability of current or conventional [analytic] methods and systems.
Doug Fridsma, M.D., chief science officer for the ONC, has a definition that will resonate with almost everyone: “More data than you're used to--some people deal with petabytes and it's easy, but if you're a small practice, just your own data is more data than you're used to,” he says.
This piece was originally published by Health Data Management. Published with permission.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access