Clearing a Data Pileup
Highway traffic monitoring and reporting has come a long way since the days of roadside police reports, TV news helicopters and traffic cams. Those tools along with more modern sensors are still in use, but for the unaware, the data sets that factor into snapshots and time-summary views of traffic patterns have exploded exponentially.
What has or will soon change your drive-time experience is that traffic data is increasingly collected through the GPS systems sold through auto manufacturers and truck fleets, and even more through the cellular phones in the pockets of commuters. With sophisticated algorithms that calculate the "handoffs" of cell phones as they pass from one transmission tower to the next, it is now possible for service providers (who strike agreements with telco and fleet providers) to predict origin and destination information, speeds, slowdowns and even accidents based on the data.
One of these service providers is U.K.-based Integration Transport Information System (ITIS), which collects data for up-to-the-minute traffic information services and separately sells historic traffic information to third-party users in more than 25 countries around the globe.
At ITIS, more than 52 million rows of "floating vehicle data" on U.K. traffic patterns alone are stored every day in a data center near Manchester, England. Through a service called Trafficlink, ITIS provides information on speed, congestion, delays and accidents to news outlets for followers of the morning drive. "In the U.K., about 90 percent of the traffic reports on the radio are either our staff or somebody at a radio station reading from one of our systems," says Thom Shelton, director of operations at ITIS.
Globally, the company stores about 1.5 billion rows of data per month in the same facility and calls on a fast-growing repository of dozens of billions of rows of traffic information to serve another line of business, which analyzes and resells historic data.
Customers of this service include the public sector and local and national governments investigating new and existing roadways, or private groups studying the impact of traffic flows, origin and destination data for a variety of reasons that might include exurban development or greenhouse gas studies.
Over time, John O'Reilly, the database manager at ITIS who manages the extraction processes for historic data customers, watched the scale of data being captured quickly outgrow the performance of the company's server base. "Some clients want monthly extractions, which are very large records, and others want data over a 12-month period, either of which are hugely time-consuming on our systems."
"The algorithms we run are intensive and really hammer our servers," says Shelton, to the point that ITIS had no choice but to build and operate the Manchester data center to ensure that power and cooling needs would be able to scale with the business.
An architecture, based on Microsoft SQL Server data delivered over fiber provides operational reporting, but front-end SQL queries become complex and slow dramatically when run against billions of records. As the historic database grew, extraction performance degraded in a downward spiral. Concurrent queries to serve multiple accounts only made matters worse.
Clients looking to explore deep volumes of historic data aren't well served by the online analytical processing (OLAP) of traditional data warehouse reporting, says Philip Russom, an analyst at The Data Warehousing Institute. "A report is a template where you know exactly what questions you want to answer, where the data is coming from and how you transform it, and you're basically pouring new data to refresh it. That's what the enterprise data warehouse is very good at." By contrast, he says, "analytics is a discovery mission where you're trying to figure out what you don't know," and not a strength of traditional data warehousing.
The constraints of the EDW were already showing at ITIS. "It became a struggle to get the data out of the system, which was when we began looking at different providers to help us out," O'Reilly says.
Analytic inquiries are also typified by distinct methodologies such as data mining, predictive analytics, statistics or artificial intelligence, where specialized software/hardware configurations known as data warehouse appliances have emerged as cost-efficient add-ons to traditional systems.
A data warehouse appliance, what Russom would call a SOS, or system-on-the-side solution, seemed the obvious course. Shelton and O'Reilly agreed that one requirement would be to keep the SQL Server front end that ITIS had standardized on. "We didn't want to buy new technology or have to retrain people as we would if we'd gone with Netezza or Teradata, which were quite frankly too expensive and too powerful for what we needed," O'Reilly says.
In racetrack vernacular, data appliances tend to be "horses for courses," and, for its own purposes, the solution ITIS picked was the Dataupia Satori Server, which, pre-integrated with SQL Server, has delivered an average four times improvement in response time for analytic queries. "The average is four times faster, but certain queries easily run 95 percent faster," says Shelton. "Some customers will review a query and later come back with a slight change. The kind of thing that used to take days we're now hoping we can deliver over the course of a coffee break."
This also lets his business maintain its current skill set and lessen resource demand in a relatively small company where servers greatly outnumber employees. "In the past, we probably had three or four DBAs, not necessarily all full time, but they were spending a huge amount of time generating historic data," O'Reilly says. "Now we're probably at about one full-time equivalent managing this."
The Dataupia stack at ITIS consists of four 2.2TB blades or 8.8TB total, with future upgrades easily added as additional nodes.
Thomson hopes the efficiency savings will allow him to turn DBAs to more productive tasks. "Enabling the DBA team and the people who support them to focus on the value-added stuff makes a big difference," he says. "Right now we're planning a fancy Web front end in front of Dataupia so people who aren't SQL experts can just plug into a month's data from London or whatever they want."
One example run during testing looked at what happens when a motorway has one lane blocked as opposed to being completely shut, and the resulting effect on local urban areas. "We could see that towns became highly congested during times when one lane was shut," Shelton says. "When all lanes were shut, you'd think things would get worse, but actually, because everything had ground to a halt, we found through analysis that local towns had better flow because of overall decreased traffic."
Simple counterintuitive results are the fruits of analytic inquiry, the likes of which Shelton presented at last year's Intelligent Transport Systems (ITS) World conference in New York that looked at incorporating manufacturing techniques such as statistical process control to make signs and signals "smarter." Future marching orders for global manufacturers of traffic equipment will require huge quantities of time series data over many years in intervals as short as 15 minutes. This kind of mass and granularity reflects the infrastructure ITIS is building to support its growing base of client needs into the future.