Fresh Thoughts #28: Resilience: What Are My Options?

    Newsletter
motion blur of people walking in an underground station

Following on from last week's newsletter about the need for resilience... What are your options? What's available, and what does it all mean?

"Never say - redundant. It implies unnecessary. And there's no budget for unnecessary."

There should never be a budget for unnecessary. But when spending gets tight - resilience can start to look increasingly unnecessary.

So why do you have an extra firewall? A just-in-case laptop? A storage or database cluster?

Resilience.

Resilience is measured in downtime.

How much time is it ok to be non-productive?

How much will it cost to have operations shut down?

But there's also - in the event of a failure - how are operations brought back online?

If it's automatic, we call it "failover". If it's not… you need to ensure the knowledge and experience of bringing operations back are also resilient.

There are only five tiers of resilience:

  • No Resilience
  • Cold Standby
  • Hot Standby
  • Asynchronous Operation
  • Synchronous Operation (or no downtime)

In cybersecurity, we often overcomplicate the idea of resilience. But businesses already deal with it every day.

Which tier you choose is entirely dependent on your business need.

No Resilience

  • No preparation.
  • Go to the market for a replacement and hope you can find one.
  • It's never fun, but we've all found ourselves here… the unanticipated problem. When the company is on the line, is this where you want to be?
  • How long will you be offline? 🤷‍♂️ It's based on hope.

The Business Experience:

Whilst this situation is clearly undesirable, it's the one new businesses operate under daily. Working in a world of unknowns. But more things become known when you've been in business for a few years. This tier is one to leave behind and hope never to return to.

The Cybersecurity Experience:

This is the situation all cybersecurity professionals fear. In the event of a failure, businesses often decide the show must go on. This means ignoring security processes and significantly increasing the risk of further failures. In the worst case, one failure leads to another, and so on...

Cold Standby

  • There's a plan.
  • It may have been created 5 minutes ago… but at least there's a direction.
  • The plan hasn't been tested, but we have some parts.
  • Best thought of as a "duct tape and string" approach to resilience.
  • Downtime is measured in hours or days.

The Business Experience:

This is the scrappy part of business life. Someone decides to leave your business with immediate effect, but luckily you know someone. It will take time to get up to speed, but a few days of downtime and underperformance is ok.

The Cybersecurity Experience:

This is often the situation when recovering from ransomware attacks. Standard configurations are used in the business for laptops, networking, and server security. There's a spare laptop, access point, or firewall on a shelf somewhere, with vague instructions on how to set it up.

Hot Standby

  • There's a plan
  • It's primed and ready to go
  • It was tested in an exercise last month
  • We just need to do a little bit of work & we're ready. "Gimme 15 minutes."

The Business Experience:

Sometimes, the person with the critical information isn't in the office. They're on holiday or taking the family to the zoo. This is the byproduct of undocumented processes - which causes disruption and inconvenience, but it isn't the end of the world. A call. An interruption can get you back working within minutes.

The Cybersecurity Experience:

There's a spare system ready to work. System configurations are tested and ready to go. It just needs to be connected to the right place. This is where those "infrastructure as code" people assume every business is operating… In reality, it's rarely used. This approach comes with the extra burden of maintaining software and configs, but it can appear to be magic when it works.

You're now entering the automatic zone...

Asynchronous Failover

  • Resilience is planned, primed, and ready to go at a moment's notice.
  • A spare IT system will automatically restart operations in a few minutes. All but the most observant won't even notice.
  • The system may be undersized to cope with the workload to save money. But in a crisis, degraded performance is ok.

The Business Experience:

This is like someone calling in sick for their shift at a restaurant. Their workload will be automatically picked up by other members of the team. No one will notice if it's a slow night and the team isn't busy. But if it's a busy Friday night - service will be slower than usual. The performance will be degraded.

The Cybersecurity Experience:

This is where security and IT folk feel comfortable. It's the most common way to deal with large central file servers and networks. If something fails, the entire business will be offline. But it's ok because another system will take up the workload a few seconds later. Performance may be degraded, but at month end, that is much more tolerable than no data.

Synchronous Failover

  • It doesn't get better than this...
  • Failure is not an option and every second counts.
  • Every instruction and piece of data is duplicated.
  • Only the most critical systems need this level of resilience.

The Business Experience:

Oddly, there are few parallels to this approach in business experience. The closest is seen on production lines, where extra quality inspectors - check each other's work in real-time. The salaries of the additional workers are much less than the cost of stopping the line or having defective parts.

The Cybersecurity Experience:

This is used in financial, medical, and oil production databases. Downtime is measured sub-second, and failures will go unnoticed, which is just as well when outages can cost millions per minute. You probably don't need this level of resilience.

Final Thoughts

Cybersecurity vendors and professionals often start by assuming zero downtime or total resilience is the answer. But other areas of business rarely operate in this way. A five-minute or even a five-hour delay may be acceptable - it's a business decision.

In challenging economic times, decisions to trim down resilience are inevitable. Whether that's people or technical processes. Knowing what level of downtime is acceptable is essential.

August 16, 2022
5 Minutes Read

Fresh Thoughts to Your Inbox

Fresh perspectives on cybersecurity every Tuesday. Real stories, analytical insights, and a slash through buzzwords.

We'll never share your email.

Related Reads

Freshsec Logo

Subscribe to Fresh Thoughts

Our weekly newsletter brings you cybersecurity stories and insights. The insights that help you cut through the bull.

We'll never share your email.

Resources

Fresh Security Support

Your Questions

Blog

Legal Bits

Your Privacy

Our Terms

Cookies