Earlier this month, U.S. air carrier Delta Airlines dealt with a power outage that grounded thousands of flights world-wide during peak travel season. This outage resulted in thousands of canceled flights and tens of thousands of unhappy and stranded customers who weren’t able to get any updates via the Delta app nor the company’s flight information displays. It is estimated that the disruptive outage will cost the airline millions of dollars in lost revenue and a severely damaged reputation.
While it’s easy to feel bad for Delta, they knew this day was coming due to their legacy IT systems yet they didn’t do anything to stop it. This outage is just the latest example of the airline industry facing a meltdown due to outdated IT systems that they can’t be bothered to update. Just a month ago, Southwest Airlines dealt with a similar scenario in which a broken network router caused another 2,300 canceled flights and cost Southwest tens of millions of dollars. Legacy software and hardware being used in business critical systems should be a huge concern.
Technical failures affecting major airlines is yet another wakeup-call to businesses. As we march further into the 21st century, companies need to start replacing their aging IT infrastructure, and fast. This isn’t just about good IT hygiene –it’s about nation-state sponsored hacking into the national infrastructure.
If legacy systems weren’t bad enough, recent reports claim that Chinese hackers are selling vulnerabilities in infrastructure of major airlines on the dark web. There is no proof that the information being sold on the dark web and the outages that occurred earlier this month are linked, but it wouldn’t be surprising. It’s not just that old equipment is more likely to break down – these systems haven’t been patched or updated in years.
Vulnerabilities that are known by cyber criminals will not be fixed thereby making these systems insecure. All patch work needs to be done in-house. This means hiring extremely specialized (and expensive) engineers or even hiring former developers of the legacy systems. But since many businesses have been doing this for years, they generally won’t make the transition to new infrastructure until their hand is forced.
The Delta outage is clearly just the beginning of the many problems the airline industry will face over the coming years until they replace their legacy IT systems. But an overhaul of an entire existing system can be a huge time and money suck for many in the industry. If replacing a legacy system is not an immediate or feasible option, airlines can easily turn to network monitoring that can quickly pinpoint the source of potential issues and promptly alert an airline’s IT team for a quick solution before shutting down their service across the globe.
The following are best practices airlines can follow to help them monitor their systems, in lieu of a network overhaul, with ease and peace of mind:
Ensure complete visibility into the entire network
While every IT team should have complete visibility into their entire network, this is an absolute requirement for IT teams dealing with legacy infrastructure. Legacy hardware, or in Delta’s case a power switch, should be under constant scrutiny by their IT team since legacy systems themselves are a serious risk to business operations.
For example, an IT team could have seen that there were too many critical applications running through the same access point on the network, which could cause a domino effect if one part of the IT stack failed. Use this visibility to ensure all business critical components have multiple access points so if something like a power switch goes down, there’s less of a chance of downtime.
Have proper and reliable backup systems in place for business critical components
Looking back at Southwest’s IT problems last month, the company claimed that the outage was due to a single network router being the access point for hundreds of critical applications. When this router “partially” failed (even though a backup process was in place in case of a breakdown) the backup was not triggered because it was not a complete failure, therefore causing a ripple effect. This is a clear example of having a backup system just to check off the box. Instead, set up and test your backup system for every single scenario you can think of to make sure that when the time comes, it’ll actually function to avoid a meltdown.
Set up proper thresholds
Southwest Airlines did not go into further details about their outage but a simple threshold through a network monitoring system could have triggered an alert on the partial failure of the device. The IT team would have seen this alert and been able to understand quickly where the outage was happening – possibly even before it occurred.
As far as Delta is concerned, it would be important to make sure that strict thresholds for alerting IT were set for any legacy systems in the IT infrastructure. We’ll never be able to know exactly what happens during a particular airline outrage, but we do know that airlines have two options moving forward to make sure it doesn’t happen again: either replace the legacy systems, or monitor them like a hawk.
(About the author: Joe Krivickas is chief executive officer of Ipswitch)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access