Tech Glitches Are Annoyingly Natural. Let’s Prepare for Them
For about an hour last week, it seemed like we were facing the second coming of the Y2K bug—when three large computer platforms broke down within hours of one another, for reasons having nothing to do with security breaches. The situation highlights a basic problem at the enterprise level: Systems don't age well and are poorly maintained.
If you’re a fan of old video games, you may be familiar with Missingno.
Missingno, a glitch in the Pokemon franchise, has become famous among gamers as an example of a game breaking in an interesting way. When users catch the errant Pokemon, they don’t get a new monster to put in their collection—they (usually) get a blocky backwards-L-shaped blob. It looks weird, but that’s what many gamers love about it—it’s created something of a cult among gamers who grew up in a more simplistic time.
It’s a fun glitch. The problem with glitches, however, is that they usually aren’t fun. Instead, they’re frustrating and, in the wrong context, can cost lots of money and manpower to fix.
Which brings us to last week’s trifecta of tech disasters. The first one brought down the entire fleet of United Airlines jets for a few hours. Then, a couple hours later, the New York Stock Exchange went offline for a while. Finally, the Wall Street Journal went offline. (If you’re a financial analyst who has to fly across the country for work, it may feel like the machine is targeting you, specifically.)
Much of value may have been lost over that 12-hour period that week, but one thing that was gained were some insightful commentaries about what happened.
Felix Salmon, the star journalist for Fusion, suggested that the incident was akin to a spate of shark attacks or similar headline-chewers. Coincidental timing, he notes, does not a trend make.
“Take any four news events from any daily newspaper. What were the chances that all four things would happen on the same day? Remote, right? And yet they all happened,” Salmon writes.
It’s a calm, logical take on something that wasn’t calm or logical. But what if what we need is not calmness, but real talk?
Fortunately, Zeynep Tufekci has that in spades. The University of North Carolina academic, writer, and sociologist says it’s worth getting nervous about the glitch-a-thon, because, when it comes down to it, the incident exposes some worrying issues around increasingly essential software systems.
“The big problem we face isn’t coordinated cyber-terrorism, it’s that software sucks,” Tufekci writes in a post on Medium’s Message blog. “Software sucks for many reasons, all of which go deep, are entangled, and expensive to fix.”
Tufekci’s argument goes something like this: As systems have been built upon generations and generations of software, that software becomes increasingly fragile and prone to breakage. But at the same time, these software systems have become Too Big To Fail in the AIG sense: If they break, so does a key part of our economic engine.
But when this software breaks, the solution often doesn’t involve top-down fixes, but band-aid solutions that become harder to maintain as time goes on.
The Creaky Effect
In the case of United, just as an example, the company is relying on hardware and software platforms that date back generations. In 2012, United retired its 41-year-old (!) computer system Apollo and replaced it with a system it inherited from Continental—one that’s nearly as old and arguably creakier.
“For its reservation system, the combined companies dropped the United web-based interface for a command line interface that runs on Microsoft DOS,” Silicon Angle‘s Alex Williams wrote back in 2012.
You could always switch airlines, but then again, they’re having similar issues of creakiness.
Now, obviously, these systems see their share of upgrades, but it could be argued that these upgrades haven’t kept pace with consumer technology. In an age where coffee shops allow you to pay with iPads, why aren’t larger infrastructures better?
The answer, Tufekci says, is that there often isn’t a cost incentive to do the hard work of getting legacy systems into fighting shape. The result is that cybersecurity issues have gotten a little overblown, yet there’s “not much interest in spending real money in fixing the boring but important problems with the software infrastructure.”
This stuff can get expensive, admittedly—she compares the situation to the need to upgrade Amtrak’s rusted rails. But perhaps it’s something that more organizations need to embrace as the cost of doing business.
“Our dominant operating systems, our way of working, and our common approach to developing, auditing and debugging software, and spending (or not) money on its maintenance, has not yet reached the requirements of the 21st century,” she adds.
Upgrades, Not Quick Fixes
It’s a common refrain that goes beyond airlines and stock markets: it could be argued that the recent troubles faced by the Office of Personnel Management, including its decision to shut down its online security clearance system, have roots in this very same kind of problem.
On the other hand, perhaps there is potential for this situation to change—and associations could help lead that change in their industries as a whole or within their own organizations as a start.
A few weeks ago, I highlighted the efforts by the ATM Industry Association (ATMIA) to encourage its members to upgrade to Windows 10 on its release—which would be a huge shift in mindset for an industry that relied on IBM’s OS/2, along with Windows XP, long past their expiration dates.
“Bear in mind, it’s not advisable for deployers to wait on the XP system for Windows 10, as there are security risks to being on an unsupported OS platform,” ATMIA CEO Mike Lee said.
Because ATMIA spoke up and made the case for it, there’s a good chance it’ll get more takers because of the association’s standing in the ATM field.
These conversations need to happen both in industries being represented by trade groups and within associations themselves. (How’s your AMS looking these days?)
All this stuff may be a tough sell; the cost-benefit ratio may seem out of whack in the short term. But most associations are in this for the long haul, and no matter how long it stops the leak, duct tape eventually won’t be enough to hold back a flood.
Comments