Cross-Region Redundancy Revisited

by | Sep 25 | Blog, CAKE News

Most developers understand the importance of redundancy at different levels in a web application’s architecture.

However, events like the Sept. 20 Amazon Web Services (AWS) outage prove that leveraging tools in just a single region, even with the redundancy they provide across availability zones, is still not enough. The only method of absorbing these types of outages is to spread the load across two or more regions. While cross-region redundancy requires major architectural work and increased cost, any service claiming to maintain even a 99.9% SLA must build applications with full regional failures in mind.

Understanding redundancy

Redundancy at the data layer has traditionally received the most emphasis, as data loss is one of the worst-case scenarios. For this reason many database platforms support a combination of clustering, replication and mirroring to ensure at least two copies of data are maintained at all times.

Built-in redundancy in hosting/serving static content has been trivial for many years with the proliferation of CDN providers and storage frameworks like Amazon’s S3.

Finally, redundancy at the services/processing layer has been made easier by Cloud providers with tools like load balancers, auto-scaling, distributed caching and NoSQL database options, as well as technologies like Docker and Amazon’s Lambda. It’s great to see these tools becoming the standard as it means a more resilient/reliable Internet for all.

Architecture

The main decision to be made at each tier and for each application in a cross-region redundant architecture is whether it should operate in an active/passive manner or active/active.

Active/Passive entails that processing only occurs in one location at a time, where the secondary location only takes over if the primary has failed.

Active/Active processing can occur in two or more locations in tandem. Certain application requirements and certain technologies can limit the use of an active/active architecture, but wherever possible, we highly recommend this structure. It not only makes more effective use of resources (idle resources have a massive opportunity cost), but it also avoids regularly testing (performing failovers) for regional disaster recovery, as that comes built in. In addition, it opens up the opportunity to reduce latency worldwide by applying latency based DNS routing to send users to their closest datacenter.

Furthermore, regardless of active/active vs. active/passive a DNS service that supports health checks with automatic failover is imperative. The main complication of active/active in the event of a full-regional outage is that your other region(s) will need to be able to handle the full load of both regions. Fine-tuned auto-scaling makes this much easier, but we recommend load tests that double traffic volume in a short period to ensure applications can scale quickly in those worst-case scenarios.

Cost

While architecting for regional redundancy is more complex and inhibits some organizations from implementing it, we recognize that cost is the main deterrent for a majority of companies. Many people believe cost grows linearly when adding new datacenters, but this is only the case if all services were active/passive and there was no ability to auto scale.

It should actually be a relatively small increase in cost to add new datacenters if applications are architected accordingly. And when considering the performance and reliability enhancements inherent with latency-based DNS routing, the cost is more than justified. Additionally, in the event that an outage does occur, the (opportunity) cost during the outage period can quickly dwarf the hosting cost increases discussed.

Whether you’re a SaaS company who must credit customers for a breach of SLA or a site driven by ad revenue that misses out on hours demand, the insurance of multiple regions cannot be ignored.

The CAKE Approach

At CAKE, we leverage DynamoDB for a few of our applications and were able to respond immediately Sunday morning when the outage occurred. Because our applications are built to be active/active, we were able to quickly shift all traffic from Virginia to Oregon (us-west-2), avoiding any major downtime for our clients.

With a digital marketing tracking platform that runs 24/7, CAKE understands the damage done when our clients lose service and as such, we take reliability and uptime very seriously. We know our clients have little patience for outages, so if we want to retain our customers we must stay up. Bain & Company has stated that “…a 5% increase in customer retention can mean a 30% increase in profitability for the company.”

Maintaining our SLA is the CAKE engineering team’s highest priority and that was proven Sunday morning. We encourage SaaS customers not to accept the excuse of blaming hosting providers for regional outages and accepting the loss as unavoidable. While we’ve heard other tracking platforms take this approach, this is something CAKE will never do. When events like the one on Sunday occur, it becomes clear that not all platforms can really back up their claims of true reliability, but we at CAKE are proud to show off of our continuous effort to keep our service running at all costs.

Author

Garth Harris

Garth Harris

As COO of the Affiliate Marketing Group, Garth's focus is to drive growth and adoption through the marketing and product teams while providing excellent customer service within the onboarding and support departments. During his 15 years in the affiliate marketing industry, Garth has held a wide range of roles, from client services manager to senior director of product engineering and, most recently, general manager at CAKE. These experiences have helped him build a deep understanding of his customers and their businesses. Outside of work, Garth is happiest behind a grill or at the beach with his family.

Related Articles

From Setup to Success Best Practices Video Series
Jun 10 2026

[Video] From Setup to Success: Clickless Postback Tracking

Garth Harris, COO of CAKE and TUNE, sits down with Luke Kadillak,...
Better Together: CAKE and TUNE Align Under One Unified Affiliate Marketing Vision
Jun 08 2026

CAKE and TUNE: Your Foundation for Partnership Growth

The future of affiliate marketing is built on confidence. CAKE and...
Increase AI Brand Mentions with Your Affiliate Program
Jun 04 2026

For More Brand Mentions in AI Search, Your Affiliate Program Is a Good Place to Start

AI-generated search results have changed how brands get discovered....
CAKE at Affiliate Summit West 2026
Mar 16 2026

Building Better Connections at Affiliate Summit West 2026

Connection was the word of the week at ASW26, and a constant theme...
new advertisers
Feb 18 2026

5 Best Practices for Onboarding New Advertisers

Onboarding is a critical step in establishing a long-term...
2025 Year in review - CAKE Product Updates
Dec 09 2025

Year in Review: 2025 Product Updates

As we wrap up the year, we want to take a moment to thank our...
Affiliate Referral Program
Nov 27 2025

Turbocharge Revenue and Partnerships With an Affiliate Referral Program

Discover how to boost revenue and partnerships with an affiliate...
Batch Processing
Nov 06 2025

Streamline Link Generation and Data Cleanup with New Batch Actions in CAKE

New in CAKE: Two bulk actions, Batch Link Generation and Batch Data...
Garth Harris at the Sip & Send happy hour at ASE 2025, hosted by CAKE and TUNE
Oct 21 2025

Affiliate Summit in 2025: A Recap and Look Ahead

Garth Harris provides his first write-up of an Affiliate Summit...
Email Marketing - Why It Belongs in Your Affiliate Program
Sep 24 2025

7 Reasons Email Marketing Rocks for Affiliate Programs

Email marketing is a fundamental part of strategic affiliate...