Network Outages Explained (And What Happened at CrowdStrike?)

Abstract image depicting connected and broken network connections

Image by the Author

My experimentation with Flux.1 continues. The above image is the result of a prompt for an abstract image on networks and connections. Workflow via glif.

In this week’s newsletter, a concise network outages explainer.

We start with a definition. What exactly is a network outage? Moving on to affected stakeholders. Not forgetting cost implications and mitigation efforts. The newsletter concludes with the CrowdStrike outage from earlier this summer - including memes.

By the end of the newsletter, you'll be informed, meme aware, with an improved understanding of network outages and their impact.

Newsletter Topics

What is a network outage?

Computer networking "is the process of connecting two or more computing devices, such as desktop computers, mobile devices, routers or applications, to enable the transmission and exchange of information and resources." 1

A network outage occurs when the network is unavailable or isn't functioning to its full capacity.

Outage Types & Causes

There are 3 different types of network outage: 2

  1. Total Outage: Global, everything is down (CrowdStrike)

  2. Partial Outage: Regional, some regions or services are down (Apple)

  3. Latency-Related: Slow loading content, global or regional (Meta)

There’s generally 6 network outage causes, represented below.

Network outage causes. Created using Napkin.ai

Image by the Author.

Quick overview and definition, check out the footnotes for more on networks and outages 3.

Costs & Stakeholder Impact.

Depending on your industry, organisations are looking at network outage costs of between $5,6004 - $9,000 per minute5; $336,000 - $540,000 per hour!

That’s a substantial impact for any organisation. So much so, that some shareholders6 are suing CrowdStrike, accusing it of making “false and misleading” statements.

As for investors, The Guardian 7 reported that the software update would cost US Fortune 500 companies $5.4 billion, with banking, airline and healthcare organisations the worst affected.

Delta Air Lines cancelled more than 6,000 flights, Air France KLM expect a $10 million incident loss. Insurers should also be concerned. The estimated cost of $5.4 billion is for Fortune 500 companies, as a total (global) outage this figure will rise.

Staying with insurance, will policies cover all organisation losses?

There’s indirect costs - productivity losses, inability to perform tasks - to contend with. Although judging by the memes (see In conclusion) , most employees are thankful for an early 4 day week or angry for the normal 5 days!

Governments are monitoring the incident. The US Government opened an investigation8 into Delta Air Lines after the cancelled flights. Southwest Airlines previously received a $140m civil penalty after cancelling 16,900 flights.

Lawsuits and civil penalties appear imminent for both CrowdStrike and Delta.

How do you mitigate against network outages?

Network World9 provide weekly network disruption updates.

For week August 19-25, they reported 204 global network events. 193 UN member states, millions of computer networks, Feels low!

Network outages are going to happen. How do you mitigate?

Cue Eye on Tech’s short 3 min video - IT Outages Explained.

The bulk of the video explains IT outages, shoring up / re-naming the identified categories from the “What is a network outage section”.

Two minutes, 32 seconds into the video they suggest the following strategies

  • Redundancy and resilience

  • Regular maintenance and updates

  • Data backups

  • Configuration management

  • Extended software testing

  • Software rollbacks

  • Disaster recovery & training

All reasonable options. Cloud services is missing!

Although, CrowdStrike affected Azure services - mitigating circumstances have their limits!

Enjoying the content? Click here to subscribe to the weekly newsletter.

Email [email protected] with comments and feedback.

What happened at CrowdStrike?

CrowdStrike update timeline, created in Napkin.ai

Image by the Author

Utilising Wikipedia10, the above image shows the timeline from release to when the fix was deployed. Within 6 hours, CrowdStrike11 had released, reversed and deployed an update fix.

However, an estimated 8.5 million systems were affected. As a commercial update, it affected Windows pcs and servers running CrowdStrike’s Falcon software. Personal pc’s not so much, Linux and computers running macOS were also unaffected.

The deployment fix required affected machines to reboot. Microsoft advised restoring backups from before the 18th of July.

In conclusion.

CrowdStrike is an example of a total outage.

In this concise explainer, we've covered costs and stakeholder impacts. Explored high level mitigation strategies, raised questions on civil penalties and insurance cover.

Despite releasing a fix 6 hours after the initial release, this incident is likely to be remembered for the public spats with Delta and the mismanagement of the Uber Eat vouchers.

Current cost estimates of $5.4 billion, will rise, more legal suits and civil penalties will follow for CrowdStrike.

I promised memes. This is my favourite.

Reply

or to participate.