Advertisement

Prepare for more digital failures like the chaos at Heathrow, they're going to come

Delays were expected, but nothing like this.

For months, airport bosses have been warning that the need to check passengers' coronavirus documents would cause long queues at UK borders.

In response, the Home Office said it was doing everything it could to minimise disruption, including rolling out an upgrade to e-gates so that documents could be checked automatically.

Then, this morning, the e-gates went down. At airports across the country Border Force staff were reduced to checking documents by hand. Queues built up, stretching back so far that passengers couldn't even get off their planes.

It was, everyone agreed, chaos. It was also, in digital terms, quite normal.

The problem, it emerged, was a technical fault in the e-gate computer system. As yet, neither the Home Office nor the airports have explained exactly what the fault was, describing it only as a "systems failure".

This suggests that something went wrong in one of the central systems used to manage the e-gate service.

There will be a central management system which monitors the entire service. An issue here could cause the whole system to malfunction quite easily.

Inevitably, fears have been raised of a cyberattack, but with outages like this the cause can often be incredibly mundane. Last year, dozens of the world's biggest websites went down after an error at Cloudflare networking service. The problem turned out to be a single bad router somewhere in Atlanta.

The magic of digital systems is that they don't obey the normal rules of scale. Half a dozen people can write a few hundred lines of code and make an app that's used by tens of millions of people.

But this magic also works in reverse. A single lapse of attention or fat finger error can bring down an entire site or system. Get something wrong and it can affect everything.

That is why no-one is immune to this kind of outage, not even the richest companies in the world, who employ whole squadrons to prevent even a second of downtime. Earlier this year Amazon and Reddit were offline for 45 minutes. Instagram and Twitter have been known to disappear for hours at a time.

None of these outages lasted very long (some might say not long enough). Likewise, the e-gates at UK airports were only out for around 90 minutes.

But such is our reliance on these systems that even brief absences cause all sorts of difficulty. That's why it's so important to have alternatives to digital systems, especially when they control something as crucial as the entry to a country.

The worry is that the efficiency of digital systems is hampering our ability to respond to their failures. In the last decade, Border Force staffing levels have been dramatically cut - in part because it's believed that their work can be replaced by digital systems. You hear similar stories across the public sector, from policing to social services.

The problem comes when those digital systems go down. As we saw today, that is very hard to avoid. You might even say it's inevitable.