Back in the day I was a UNIX Systems admin. One of the “joys” of being a sysadmin was chugging through log files. I looked for errors in applications. I looked for commands that people used to delete their entire account (rm r ./*.* is a good one). I looked for the reason why particular databases brought down the operating system. One of the essentials in any sysadmin’s toolbox is tail <>. Looking at these nicely formatted, but rarely structured enough to be placed in a database table (log4j is a good example), log files was a daily occurrence with an active environment.
Although application and operating system log files still exist, the growth of the Internet of Things is starting to emerge. Everything from heart monitors for running, to brake sensors in cars to refrigerators will provide information on what a device is doing and/or doing for you. Much of this sensor information is similar to those UNIX log files from my past life. Using multi-structured formats that often don’t fit well in a relational data store, the information from sensors in machinery, GPS mapping tools and even Fitbits comes in some interesting and constantly changing formats. Figure 1 is an example from a weather station:
Sensors Sensors Everywhere
One area of early adopters of increased sensor capacity, and thus sensor data, is industrial machinery. Capital investment equipment that manufactures components, produces energy or is used in transportation is an excellent use case. Manufacturers use the sensor information provided by the data to increase production quality and tighten tolerances in the manufacturing process. Energy production has both input costs and regulatory requirements, which can both be optimized if they are properly measured and tracked. Transportation equipment promises to lower fuel costs and increase productivity for both cargo and passenger travel. General Electric is a leader in this space and has made it a made a component of its business moving forward.
One of the highest profile use cases for sensor data in transportation is in the aviation industry. The new Boeing 787 promises to generate upwards of 500GB of information per flight for some carriers. Methods required to access this information are very different from my cracking open a log file with vi and/or tail-ing it from the command line. Half a terabyte of data requires better tools, and I have talked about many of those tools in recent blog posts. NoSQL data stores provide the ability to make the most of this abundance of information in both terms of storage and access.
I have detailed operational uses for manufacturers and owners of equipment above. Nevertheless, as we look at the information from sensor data, new use cases are emerging. Some use cases may even expand beyond what was originally intended. Continuing with airline industry examples, the worldwide integrated air traffic control system helps to make the planet a smaller place and makes travel over oceans and around the globe as routine as heading down the highway to visit the in-laws.
Airlines use sensor data to track flights and optimize fuel consumption. Airports balance takeoffs and landings. Boeing and Airbus check the status of “units” that cost upwards of $200m each. Much of this data is public domain. Organizations such as FlightAware grab this information, apply a little analytical logic and provide a valuable service for travelers and their families. FlightAware turns operational data into an information service to help its users plan when to leave for the airport or decide to change a flight.
Information into Investigation
In July of 2013, many insightful minds decided to use FlightAware information for another purpose. If that flight information could be used to determine when and where a flight was for travel purposes, why not do so to determine when planes encountered trouble? Anyone could look at the “log file” for the Asiana flight that crashed at the San Francisco airport, and many people did. They also looked at other Asiana flight information and for other flights with similar aircraft landing at SFO. Here is a good visualization comparing Asiana flights.
This visualization is not from the NTSB or another government agency; it is a visual comparison of flight log information based on multi-structured data sources that has been presented by Google Earth. It did not take months to put this information together. Some of these comparisons were made within a day.
It should be noted that NTSB investigators have found nothing to corroborate online flight tracking records, such as the ones above, as the direct cause of the Asiana crash. But it was interesting that, much like weather models that were once the domain of a government agency, flight tracking data is now freely available for public citizens to use and analyze.
March 2014 provides a better example of how sensor information can help in unexpected ways: the still-unfolding story of a Malaysia Airlines flight that disappeared en route to China. Official and public information shows that the plane “disappeared” around 12:02pm Eastern Standard time, pushing officials to search along the initial flight path. However, new pieces of information related to the plane’s sensor reporting system moved the search area and widened the investigation.
I doubt that anyone involved in the development of the air traffic control system envisioned regular citizens using their data to compare flight tracks after a crash or that the engineers from Boeing, Rolls-Royce or any of the manufacturers of the 777 thought that their system log files that monitor plane status and engine performance could or would be used to identify the path of a missing (potentially hijacked) plane. This is the power of multi-structured sensor data and discovery/exploratory mindsets. NoSQL platforms have the ability to enable this exploration and exploitation of multi-structured data into new areas where only our imagination can see the possibilities.
What say the readers?
Have you used multi-structured log data for unintended value?
Do you think that application and sensor log information is valuable?
Do you think that “crowd sourcing” aviation information is a wise idea?
Is the “story”/situation of MH370 eerily similar to the plot of the TV series “Lost”?
Provide your comments below and/or ping me via Twitter at @JohnLMyers44 with the hashtag #noodlingNoSQL.