Navigating Outages: lessons from recent cloud failures

In the ever-evolving landscape of digital services, outages can strike unexpectedly, impacting businesses and users alike. In this episode of the SaaS Show, hosts Andreas and Sjimi delve into the recent outages experienced by major cloud providers like Amazon and Cloudflare, sharing insights and lessons learned to help startups make the right choices.

Watch the SaaS Show with Sjimi and Andreas on YouTube

Cloud outages arrive with the same elegance as a falling wardrobe. One moment everything is humming along. The next, half your platform is unreachable, support inboxes are filling up, and someone is muttering about DNS again.

Recent incidents at Amazon, Cloudflare and friends reminded many teams that reliability is not an abstract concept. It is a budget line, a design choice and sometimes a gamble. Below are some distilled lessons.

Outages hurt more when you have paid for the illusion of safety

Many teams still optimise for price instead of risk. That might look clever during procurement season, but it becomes painfully short-sighted the moment your cheaper plan is the reason you are offline.

If your business loses real money when the internet sneezes, you probably want more than the bargain tier of global infrastructure. Not because expensive plans are glamorous but because resilience is not free. It certainly is not retrofittable during a crisis.

Redundancy is not magic, it is a choice

Every cloud provider has a glossy page explaining why their platform never goes down. Anyone who has been in the industry longer than a weekend knows that this is a polite fiction.

Failover DNS, secondary providers and backup paths rarely feel urgent until the day you need them. That is why risk management should be intentional. Teams must ask the dull but essential question: given our business criticality, how much risk are we actually comfortable with?

It is astonishing how many companies answer that question only after they have spent four hours in firefighting mode.

The internet is built on dependencies you do not control

Even the most sophisticated setups crumble when the broader ecosystem shakes. Spain blocking Cloudflare during major football matches is a helpful reminder that your carefully curated architecture can be undone by a rights dispute you have never heard of.

Dependency chains are long. They include regulators, ISPs, caching layers, CDNs, and a surprising amount of duct tape. Pretending otherwise is wishful thinking.

Operational continuity is a habit, not a rescue plan

If outages embarrass your organisation every time, the real problem is not the cloud. It is the absence of a strategy. Resilience takes ongoing investment. It means monitoring, observability, clear ownership, and thoughtful choices about where to spend money.

It also requires teams to be brutally honest about the domino effects of downtime. For many SaaS companies, outages are not an inconvenience. They are an existential threat.

Listen to the SaaS Show on Spotify

Key takeaways

Cloud failures will continue to happen. The real variable is how gracefully your organisation survives them.

Reliability scales with budget, not hope.
Redundancy works only if you actually design for it.
The digital ecosystem is fragile and occasionally ridiculous.
Resilience is a practice. Not an upgrade you buy during the crisis.

If your business is built on cloud foundations, understanding these trade-offs is not optional. It is the groundwork for stability in a world where outages are inevitable.

Navigating Outages: lessons from recent cloud failures

Andreas Creten

Dimitri Roose

What will the state of AI be like by this time next year?

Things we do in our first weeks as Fractional CTO

Stop Coding, Start Leading: Shifting dynamics for startup CEOs

Watch the SaaS Show with Sjimi and Andreas on YouTube

Outages hurt more when you have paid for the illusion of safety

Redundancy is not magic, it is a choice

The internet is built on dependencies you do not control

Operational continuity is a habit, not a rescue plan

Listen to the SaaS Show on Spotify

Key takeaways

Member discussion

The secret sauce to high-performing teams: leadership, trust, and autonomy

Decoding Startup Success: Insights from Lotte Geldermans (Pitchdrive) on technical due diligence and growth strategy

Pulse Episode 14: From CEO to CTO and a horror story with Arnout Van de Meulebroucke

Pulse Episode 13: Jason Fried on Trust, Hiring, and Rehiring

Pulse: Episode 11 – On technical debt with Jonas Drieghe

Navigating Outages: lessons from recent cloud failures

Andreas Creten

Dimitri Roose

What will the state of AI be like by this time next year?

Things we do in our first weeks as Fractional CTO

Stop Coding, Start Leading: Shifting dynamics for startup CEOs

Get all the latest posts delivered straight to your inbox.

Watch the SaaS Show with Sjimi and Andreas on YouTube

Outages hurt more when you have paid for the illusion of safety

Redundancy is not magic, it is a choice

The internet is built on dependencies you do not control

Operational continuity is a habit, not a rescue plan

Listen to the SaaS Show on Spotify

Key takeaways

Member discussion

The secret sauce to high-performing teams: leadership, trust, and autonomy

Decoding Startup Success: Insights from Lotte Geldermans (Pitchdrive) on technical due diligence and growth strategy

Pulse Episode 14: From CEO to CTO and a horror story with Arnout Van de Meulebroucke

Pulse Episode 13: Jason Fried on Trust, Hiring, and Rehiring

Pulse: Episode 11 – On technical debt with Jonas Drieghe