System design 101

Making some notes as I learn about System design from Reddit and YouTube

Journey of a request

API Gateways for Product Managers

A simple example is Stripe. As a PM for developer tools, you might want to build several services like payments, fraud detection, user services, disputes etc. Your customers (other developers) need to integrate with all of these. Do you make them call:

Or do they call one endpoint: api.stripe.com and you handle the routing?

So what is a gateway? Think of it as a single front door for all your services. Instead of clients calling search.yourapp.com, user.yourapp.com, booking.yourapp.com separately, they call api.yourapp.com/search, api.yourapp.com/user, etc. The gateway handles routing, security, and rate limiting.

When it makes sense:

  • External-facing APIs (mobile apps, web, third-party integrations need one consistent endpoint)

  • You need centralized security and rate limiting across many services. Need everyone to log in and you as the PM need analytics of all this.

  • Different routes need different rules (search can handle 100 req/min, checkout only 10 req/min)

When it's actually hurting you (and why engineers might push back):

  1. Internal service communication - If Service A calls Service B through the gateway, you're adding extra steps and latency for no reason. So say you want to personalize and need user data. During peak load (think Black Friday), this compounds. Engineers will want direct communication here. You will lose some tracking though.

  2. Small internal tools - An analytics dashboard for 50 employees? The gateway setup takes days and adds operational complexity for minimal benefit. A simpler load balancer works fine.

  3. Real-time features - Gaming, live chat, real-time bidding. When users expect <50ms response times, every extra hop through a gateway is noticeable degradation. As a thumb rule, most applications have ~100-200ms of latency

Caching patterns for Product Managers - Cache Aside

Caching sounds deceptively simple: store frequently used data close to the user to reduce database load. But at scale, the wrong choice of caching pattern can take entire systems down (not an exaggeration.)

There are three basic patters (i'll tackle the other 2 in the next posts):

  1. Cache Aside: Your application checks cache first. If data exists, return it. If not, fetch from the database, store in cache, then return to the user.

  2. Write through (Upcoming)

  3. Read Through (Upcoming)

Cache Aside Caching or Lazy Loading

Cache Aside is the most straightforward caching pattern. Here’s how it works: your application first checks the cache for data. If found (cache hit), return it immediately to the user. If not (cache miss), fetch from the database, store it in cache, then return the data. Cache Aside is perfect for read-heavy workloads where the same data gets requested repeatedly.

But, there is a catch (and its a big one): data inconsistency is inherent unless you invest in sophisticated invalidation techniques. Let’s say an Uber driver updated their availability. This gets updated in the database, but remember, the cache is not in the write path. So the cache now has the old data.

Uber’s first approach was simple: 5-minute TTL (time-to-live). After 5 minutes, cached data expires and the next request fetches fresh data. If data changes during those 5 minutes, users see stale information.

Change Data Capture (CDC): This was Uber’s solution. While it is an improvement, CDC adds infrastructure complexity.

When Cache Aside fails, it fails quite catastrophically: Twitter

Uber: https://www.uber.com/en-BR/blog/how-uber-serves-over-40-million-reads-per-second-using-an-integrated-cache/#:~:text=Cached Reads

Twitter: https://danluu.com/cache-incidents/

At Twitter (this was before they changed their name to X) user changed their username, but about a week later, it mysteriously reverted to the old one. The root cause was a cache inconsistency in Twitter’s cache-aside system.

To fully understand this, we must venture into the rabbit hole of distributed systems. Cache data is big and cannot be stored on a single server. It is in-fact sharded across multiple instances. There are technical details around Twitter’s Rails app that I am abstracting here, but the core of the issue was that among other architectural details, Twitter used a modulo-based cache routing. So, their mapping looked something like this:

server_index = hash(”user:1234”) % 3 //where 3 is the number of servers

-> Let’s say this pointed to server A

Now let’s say Server A goes down as nodes sometimes do. So the total number of servers now is down to 2.

server_index = hash(”user:1234”) % 2 

-> This now points to server B

Let’s go back to how cache aside works. Server B does not have the user:1234 data cached, so the system fetches the data from the database, and updates the cache. The database had the new username, so the cache on server B gets the new username. All good till now.

But, when server A comes back up, it still has the stale user:1234 username data cached. Now the caches, and the database, are out of sync. The customer sees unpredictable and inconsistent data

All the caches that carried that data needed to be invalidated. Doing so at Twitter's scale would cause thousands (or millions) of cache misses at once, hammering the database — the very thing caching was supposed to prevent.

Almost no modern system uses modulo hashing today. They use consistent hashing. 

But the key takeaway is that once Twitter discovered this design flaw, it took them about two years to fully rebuild their cache architecture to handle these failures safely.

Next
Next

Building Agents In B2B ?