Infrastructure Analysis -- A New Culture of Analytics
Useful data analysis requires solid infrastructure; otherwise, data is neither reliable nor accessible. But solid infrastructure benefits from data analysis just as much as business processes do, or perhaps even more. Organizations that rely on data analysis at the business level can use the same “culture of analytics” to better understand infrastructure integrity and bring deeper meaning to all data.
Too often “low level” infrastructure issues are treated as an afterthought when the same analytical techniques and data discovery used at higher layers of the business are invaluable methods for understanding and validating infrastructure operation. In fact, those techniques regularly produce the kind of insights and surprises that make data analysis so valuable on higher-level operations.
Companies such as Corvil provide traders with an interactive high-level view of operational performance. This analysis might show the latency and volume of trading across the network to demonstrate instances in which the infrastructure would have a significant impact on a specific server or application. A deeper understanding of operational peaks and valleys then allows traders to improve processes based on this intelligence.
Image of Corvil Data Visualization of data volume (in bytes) leaving a trading system and heading towards external markets.
Expanding on that example, there is a significant amount of information that organizations can learn through deeper analysis of the underlying infrastructure. A time map of the time network architecture is useful for large corporate networks improving a legacy of unreliable, imprecise, un-adaptable time sources across the network and applications.
A time map can identify, for example: an application server responsible for distributing unreliable time across the network and all applications that rely on it, time distribution networks falling out of sync when companies glue time distribution networks together, common in financial trading, distribution and consumption of network time protocol (NTP) and precision time protocol (PTP) time sources across machines, false redundancy if the system is relying on the sources that sync back to the same source, and how far downstream the tie source is and how reliable it is. This can affect how well time creates an audit trail around latency sensitive issues, which lets IT staff improve the quality of the time distribution network.
If there is a natural failure or an attack on a system, any data associated with that central point would be impacted and the data quality of any associated data relying on that infrastructure should be called into question. The below image of time disturbed in a small network helps demonstrate this point.
Image of Time Distributed in a Small Network
(Source: FSMLabs TimeKeeper)
The blue lines are base clocks, the red lines show movement of time via the Network Time Protocol (NTP) and the green line shows time going over the alternative IEEE 1588 Precision Time Protocol (PTP). The server at the bottom right has been configured to have two backup time sources over NTP and to get its main time feed over PTP from the device in the center. That device gets time from GPS (blue line on the right) and also from 7 backup NTP sources.
The configuration sounds impressive, but when data visualization is used to understand the configuration of time across the network, the answer is clearer. The visualization immediately shows that the redundancy is mostly illusion. All the NTP backups on the left depend on ACTS, a modem service with very limited reliability and quality. All it would take is for the primary GPS link on the left to fail in order to seriously degrade time at the server at the bottom right.
The time on which these applications rely is not as reliable as one might think. Recent denial-of-service (DDoS) attacks on the NTP protocol have shown that time’ can be an entry point and vulnerability for an attacker to compromise a system and jeopardize the integrity of any and all data. But, even without an attacker or failure scenario, time across the network can still be inaccurate.
To look at just one of many examples of how time across the network (or multiple networks that connect with one another) might vary, we can examine the concept of “leap seconds.” Clocks that are wrong or unsynchronized by as much as 35 seconds are not all that unusual when network clocks forget leap seconds.. In some, latency sensitive industries, such as financial trading, this can pose a huge challenge when sequencing market data in milliseconds or microseconds. Without technology to cross-check clocks against each other and to fail over when necessary, time remains a variable that can impact data quality.
A Time Quality Map
(Source: FSMLabs TimeKeeper)
A time quality map can be used to gain a better understanding of just how much or little time sources are impacting the network and the business data that relies on that network.
Time to Take Action
Companies should be questioning the quality of their business data and investigating what is going on at the network level. They can use the same techniques they use at the business level to better analyze network data and gain greater insight into infrastructure and data integrity. That additional intelligence will help them to run more efficient and affective businesses.