#System Design Interview Guide 2025 Scalability Load Balancing Database Architecture Patterns
Three years ago, I confidently deployed my first "real" application to production: a simple Flask app on a single AWS EC2 instance. It could handle maybe 50 concurrent users before falling over. Fast forward to today, and I've helped design systems that serve millions of requests per hour across multiple data centers.
The journey from "it works on my laptop" to "it works at scale" taught me that system design isn't about memorizing architectural patterns - it's about understanding trade-offs and making conscious decisions about complexity, consistency, and cost.
Table Of Contents
- The Great Crash of 2022
- System Design Fundamentals: The Building Blocks
- Caching: The Speed Multiplier
- Microservices Architecture: Breaking the Monolith
- Message Queues: Asynchronous Communication
- Real-World System Design: Coffee Shop Platform
- System Design Interview Process
- Monitoring and Observability
- Final Thoughts: Design for Reality, Not Perfection
The Great Crash of 2022
Let me tell you about the day that taught me everything about system design. We'd built a coffee shop loyalty app at RainCity FinTech. Simple architecture: one server, one database, deployed and done. Until Seattle Coffee Week hit.
Normal day: 100 users
Coffee Week Day 1: 5,000 users
Our server: *dies dramatically*
Error logs:
- Database connection timeout
- Memory exhausted
- CPU at 100%
- Response times: 30+ seconds
- Users: Very angry
That incident forced me to learn system design fundamentals the hard way. Here's what I learned about building systems that don't fall over when people actually use them.
System Design Fundamentals: The Building Blocks
Scalability: Growing Without Breaking
Think of scalability like a coffee shop:
Vertical Scaling (Scale Up): Get a bigger espresso machine
- Add more CPU, RAM, disk to existing server
- Simple but has limits
- Single point of failure
Horizontal Scaling (Scale Out): Open more locations
- Add more servers
- More complex but unlimited growth potential
- Built-in redundancy
# Example: Scaling a simple web app
# Version 1: Single server (vertical scaling)
server_capacity = {
'cpu_cores': 4,
'ram_gb': 16,
'max_users': 1000
}
# Version 2: Multiple servers (horizontal scaling)
server_farm = [
{'cpu_cores': 2, 'ram_gb': 8, 'max_users': 500},
{'cpu_cores': 2, 'ram_gb': 8, 'max_users': 500},
{'cpu_cores': 2, 'ram_gb': 8, 'max_users': 500},
{'cpu_cores': 2, 'ram_gb': 8, 'max_users': 500}
]
# Total capacity: 2000 users with redundancy
Load Balancing: The Traffic Director
A load balancer is like a host at a busy restaurant, directing customers to available tables:
# nginx.conf - Simple load balancer configuration
upstream coffee_app {
# Round-robin by default
server app1.coffee-shop.com:5000;
server app2.coffee-shop.com:5000;
server app3.coffee-shop.com:5000;
# Health checks
keepalive 32;
}
server {
listen 80;
server_name coffee-shop.com;
location / {
proxy_pass http://coffee_app;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Connection pooling
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Load Balancing Strategies:
- Round Robin: Requests go to servers in order
- Least Connections: Route to server with fewest active connections
- Weighted: Some servers handle more traffic than others
- IP Hash: Same user always goes to same server (sticky sessions)
Database Architecture: Where Data Lives
This is where things get interesting. A single database works until it doesn't:
-- The evolution of database architecture
-- Stage 1: Single database (simple)
CREATE DATABASE coffee_shop;
-- Stage 2: Master-slave replication (read scaling)
-- Master: handles writes
-- Slaves: handle reads (can have multiple)
-- Stage 3: Sharding (write scaling)
-- Shard by user ID: users 1-10000 → db1, users 10001-20000 → db2
-- Stage 4: Microservices databases
-- user_service → user_db
-- order_service → order_db
-- inventory_service → inventory_db
Database Scaling Patterns:
# Read Replicas Pattern
class DatabaseRouter:
def __init__(self):
self.master = connect_to_db('master.db.coffee-shop.com')
self.slaves = [
connect_to_db('slave1.db.coffee-shop.com'),
connect_to_db('slave2.db.coffee-shop.com'),
connect_to_db('slave3.db.coffee-shop.com')
]
def read(self, query):
# Route reads to random slave
slave = random.choice(self.slaves)
return slave.execute(query)
def write(self, query):
# All writes go to master
return self.master.execute(query)
# Sharding Pattern
class ShardedDatabase:
def __init__(self):
self.shards = {
'shard_1': connect_to_db('shard1.db.coffee-shop.com'), # users 1-100000
'shard_2': connect_to_db('shard2.db.coffee-shop.com'), # users 100001-200000
'shard_3': connect_to_db('shard3.db.coffee-shop.com') # users 200001-300000
}
def get_shard(self, user_id):
if user_id <= 100000:
return self.shards['shard_1']
elif user_id <= 200000:
return self.shards['shard_2']
else:
return self.shards['shard_3']
def get_user(self, user_id):
shard = self.get_shard(user_id)
return shard.execute(f"SELECT * FROM users WHERE id = {user_id}")
Caching: The Speed Multiplier
Caching is like keeping popular items at the counter instead of going to the storage room every time:
# Multi-level caching strategy
import redis
import memcache
from functools import wraps
# Level 1: In-memory cache (fastest, smallest)
memory_cache = {}
# Level 2: Redis (fast, medium size)
redis_client = redis.Redis(host='cache.coffee-shop.com')
# Level 3: Database (slowest, largest)
database = connect_to_db('master.db.coffee-shop.com')
def cached_query(cache_key, ttl=300):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
# Try memory cache first
if cache_key in memory_cache:
return memory_cache[cache_key]
# Try Redis cache
redis_result = redis_client.get(cache_key)
if redis_result:
result = json.loads(redis_result)
memory_cache[cache_key] = result # Populate memory cache
return result
# Hit database
result = func(*args, **kwargs)
# Store in all cache levels
memory_cache[cache_key] = result
redis_client.setex(cache_key, ttl, json.dumps(result))
return result
return wrapper
return decorator
@cached_query('popular_products', ttl=600)
def get_popular_products():
return database.execute("""
SELECT p.*, COUNT(oi.id) as order_count
FROM products p
JOIN order_items oi ON p.id = oi.product_id
WHERE oi.created_at > NOW() - INTERVAL '7 days'
GROUP BY p.id
ORDER BY order_count DESC
LIMIT 10
""")
Cache Invalidation Strategies:
# Cache invalidation patterns
class CacheManager:
def __init__(self):
self.redis = redis.Redis()
def invalidate_user_cache(self, user_id):
# Time-based expiration
self.redis.delete(f"user:{user_id}")
self.redis.delete(f"user_orders:{user_id}")
def invalidate_product_cache(self, product_id):
# Tag-based invalidation
tags = [f"product:{product_id}", "popular_products", "product_list"]
for tag in tags:
self.redis.delete(tag)
def write_through_cache(self, key, value, ttl=300):
# Update cache and database simultaneously
self.redis.setex(key, ttl, json.dumps(value))
database.update(value)
def cache_aside_pattern(self, key, fetch_function, ttl=300):
# Application manages cache
result = self.redis.get(key)
if not result:
result = fetch_function()
self.redis.setex(key, ttl, json.dumps(result))
return json.loads(result)
Microservices Architecture: Breaking the Monolith
Microservices are like having specialized teams instead of one person doing everything:
# Monolithic architecture (everything in one app)
class CoffeeShopApp:
def create_user(self, user_data):
# User management logic
pass
def create_order(self, order_data):
# Order processing logic
pass
def process_payment(self, payment_data):
# Payment processing logic
pass
def manage_inventory(self, inventory_data):
# Inventory management logic
pass
def send_notifications(self, notification_data):
# Notification logic
pass
# Microservices architecture (separate services)
# user-service/app.py
@app.route('/users', methods=['POST'])
def create_user():
# Only handles user-related operations
user = User.create(request.json)
# Publish event for other services
publish_event('user.created', {'user_id': user.id})
return jsonify(user.to_dict())
# order-service/app.py
@app.route('/orders', methods=['POST'])
def create_order():
order_data = request.json
# Call other services via HTTP
user = http_client.get(f'http://user-service/users/{order_data["user_id"]}')
inventory = http_client.get(f'http://inventory-service/check/{order_data["product_id"]}')
if inventory['available']:
order = Order.create(order_data)
# Async processing
publish_event('order.created', order.to_dict())
return jsonify(order.to_dict())
else:
return jsonify({'error': 'Product not available'}), 400
# payment-service/app.py
@app.route('/payments', methods=['POST'])
def process_payment():
payment_data = request.json
# Process payment with external provider
result = stripe.charge.create(
amount=payment_data['amount'],
currency='usd',
source=payment_data['token']
)
if result['status'] == 'succeeded':
publish_event('payment.completed', {
'order_id': payment_data['order_id'],
'amount': payment_data['amount']
})
return jsonify(result)
Service Communication Patterns:
# Synchronous communication (HTTP/REST)
class OrderService:
def __init__(self):
self.user_service = HTTPClient('http://user-service')
self.inventory_service = HTTPClient('http://inventory-service')
self.payment_service = HTTPClient('http://payment-service')
def create_order(self, order_data):
# Synchronous calls - simple but creates coupling
user = self.user_service.get(f'/users/{order_data["user_id"]}')
inventory = self.inventory_service.post('/check', {'product_id': order_data['product_id']})
if inventory['available']:
payment = self.payment_service.post('/charge', {
'amount': order_data['amount'],
'user_id': user['id']
})
if payment['success']:
return Order.create(order_data)
# Asynchronous communication (Message Queues)
class OrderService:
def __init__(self):
self.message_queue = RabbitMQ('localhost')
def create_order(self, order_data):
# Create order immediately
order = Order.create(order_data)
# Publish events for async processing
self.message_queue.publish('order.created', {
'order_id': order.id,
'user_id': order.user_id,
'product_id': order.product_id,
'amount': order.amount
})
return order
def handle_payment_completed(self, event_data):
# Event handler for payment completion
order = Order.get(event_data['order_id'])
order.status = 'paid'
order.save()
# Trigger next step
self.message_queue.publish('order.paid', order.to_dict())
Message Queues: Asynchronous Communication
Message queues are like having a reliable postal service between your services:
# Using Redis as a message queue
import redis
import json
import time
from threading import Thread
class MessageQueue:
def __init__(self):
self.redis = redis.Redis()
def publish(self, topic, message):
self.redis.lpush(topic, json.dumps(message))
def subscribe(self, topic, callback):
while True:
# Blocking pop - waits for messages
message = self.redis.brpop(topic, timeout=5)
if message:
data = json.loads(message[1])
callback(data)
# Producer (Order Service)
class OrderService:
def __init__(self):
self.queue = MessageQueue()
def create_order(self, order_data):
order = Order.create(order_data)
# Publish event for async processing
self.queue.publish('order.created', {
'order_id': order.id,
'user_id': order.user_id,
'product_id': order.product_id
})
return order
# Consumer (Email Service)
class EmailService:
def __init__(self):
self.queue = MessageQueue()
self.setup_consumers()
def setup_consumers(self):
Thread(target=self.consume_order_events).start()
def consume_order_events(self):
self.queue.subscribe('order.created', self.send_order_confirmation)
def send_order_confirmation(self, order_data):
# Send email confirmation
user = User.get(order_data['user_id'])
send_email(
to=user.email,
subject='Order Confirmation',
template='order_confirmation',
data=order_data
)
Real-World System Design: Coffee Shop Platform
Let me show you how I'd design a complete coffee shop platform:
# Architecture Overview
system_components:
load_balancer:
type: "nginx"
function: "Route traffic to application servers"
application_tier:
servers: 3
type: "Docker containers"
auto_scaling: true
microservices:
- user_service
- order_service
- payment_service
- inventory_service
- notification_service
databases:
primary: "PostgreSQL cluster (master + 2 slaves)"
cache: "Redis cluster"
search: "Elasticsearch"
message_queue: "RabbitMQ cluster"
monitoring:
- "Prometheus + Grafana"
- "ELK stack for logging"
deployment: "Kubernetes"
Database Schema Design:
-- User Service Database
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Sharding key
shard_key INT GENERATED ALWAYS AS (id % 1000) STORED
);
-- Order Service Database
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL, -- Foreign key to user service
status VARCHAR(20) DEFAULT 'pending',
total_amount DECIMAL(10,2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Partition by date for performance
PARTITION BY RANGE (created_at)
);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
API Gateway Pattern:
# api-gateway/app.py
from flask import Flask, request, jsonify
import requests
import jwt
from functools import wraps
app = Flask(__name__)
# Service registry
SERVICES = {
'user': 'http://user-service:5000',
'order': 'http://order-service:5000',
'payment': 'http://payment-service:5000',
'inventory': 'http://inventory-service:5000'
}
def authenticate(f):
@wraps(f)
def decorated(*args, **kwargs):
token = request.headers.get('Authorization')
if not token:
return jsonify({'error': 'No token provided'}), 401
try:
# Verify JWT token
payload = jwt.decode(token.split(' ')[1], 'secret', algorithms=['HS256'])
request.user_id = payload['user_id']
except jwt.InvalidTokenError:
return jsonify({'error': 'Invalid token'}), 401
return f(*args, **kwargs)
return decorated
def rate_limit(requests_per_minute=60):
# Implementation of rate limiting
pass
@app.route('/api/users/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE'])
@authenticate
@rate_limit(120) # Higher limit for user operations
def proxy_user_service(path):
url = f"{SERVICES['user']}/{path}"
resp = requests.request(
method=request.method,
url=url,
headers=dict(request.headers),
data=request.get_data(),
params=request.args
)
return jsonify(resp.json()), resp.status_code
@app.route('/api/orders/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE'])
@authenticate
@rate_limit(30) # Lower limit for order operations
def proxy_order_service(path):
url = f"{SERVICES['order']}/{path}"
# Add user context to request
headers = dict(request.headers)
headers['X-User-ID'] = str(request.user_id)
resp = requests.request(
method=request.method,
url=url,
headers=headers,
data=request.get_data(),
params=request.args
)
return jsonify(resp.json()), resp.status_code
# Health check aggregation
@app.route('/health')
def health_check():
service_health = {}
for service_name, service_url in SERVICES.items():
try:
resp = requests.get(f"{service_url}/health", timeout=5)
service_health[service_name] = resp.status_code == 200
except:
service_health[service_name] = False
overall_health = all(service_health.values())
return jsonify({
'status': 'healthy' if overall_health else 'unhealthy',
'services': service_health
}), 200 if overall_health else 503
System Design Interview Process
Here's how I approach system design problems:
1. Clarify Requirements (5 minutes)
Interviewer: "Design a coffee shop ordering system"
Me: "Let me clarify the requirements:
- How many customers do we expect?
- What's the expected order volume?
- Do we need real-time updates?
- Mobile app, web, or both?
- Multiple locations or single store?
- Do we handle payments or integrate with external providers?"
2. Estimate Scale (5 minutes)
Assumptions:
- 1000 customers per day per location
- 100 locations = 100,000 orders/day
- Peak hours: 10x normal traffic
- Average order: 2 items
Calculations:
- Orders/second: 100,000 / 86,400 ≈ 1.2 QPS average
- Peak QPS: 12 QPS
- Data per order: ~1KB
- Daily storage: 100MB
- 5 years storage: ~180GB
3. High-Level Design (10 minutes)
Client Apps → Load Balancer → API Gateway → Microservices
↓
Message Queue
↓
Database Cluster
4. Detailed Component Design (15 minutes)
Focus on 2-3 critical components:
# Order Processing Service
class OrderProcessor:
def __init__(self):
self.db = DatabaseCluster()
self.cache = RedisCluster()
self.queue = MessageQueue()
self.inventory = InventoryService()
self.payment = PaymentService()
async def create_order(self, order_data):
# 1. Validate inventory
inventory_check = await self.inventory.check_availability(
order_data['items']
)
if not inventory_check['available']:
raise OutOfStockException()
# 2. Create order record
order = await self.db.orders.create({
'user_id': order_data['user_id'],
'items': order_data['items'],
'status': 'pending',
'estimated_time': self.calculate_prep_time(order_data['items'])
})
# 3. Process payment asynchronously
await self.queue.publish('payment.process', {
'order_id': order.id,
'amount': order.total_amount,
'payment_method': order_data['payment_method']
})
# 4. Reserve inventory
await self.inventory.reserve(order_data['items'], order.id)
# 5. Cache order for quick lookup
await self.cache.set(f"order:{order.id}", order.to_dict(), ttl=3600)
return order
5. Scale and Optimize (10 minutes)
Bottlenecks and Solutions:
1. Database writes → Sharding by user_id or location_id
2. Cache misses → Multi-level caching strategy
3. Service failures → Circuit breakers and fallbacks
4. Peak traffic → Auto-scaling and queue buffering
Monitoring and Observability
A system you can't observe is a system you can't trust:
# Monitoring setup
import prometheus_client
from prometheus_client import Counter, Histogram, Gauge
import time
import logging
# Metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'])
REQUEST_DURATION = Histogram('http_request_duration_seconds', 'HTTP request duration')
ACTIVE_ORDERS = Gauge('active_orders_count', 'Number of active orders')
DATABASE_CONNECTIONS = Gauge('database_connections_active', 'Active database connections')
def monitor_endpoint(f):
@wraps(f)
def decorated(*args, **kwargs):
start_time = time.time()
try:
result = f(*args, **kwargs)
REQUEST_COUNT.labels(
method=request.method,
endpoint=request.endpoint,
status='success'
).inc()
return result
except Exception as e:
REQUEST_COUNT.labels(
method=request.method,
endpoint=request.endpoint,
status='error'
).inc()
raise
finally:
REQUEST_DURATION.observe(time.time() - start_time)
return decorated
# Structured logging
import structlog
logger = structlog.get_logger()
@app.route('/orders', methods=['POST'])
@monitor_endpoint
def create_order():
order_data = request.json
logger.info("Order creation started",
user_id=order_data['user_id'],
items_count=len(order_data['items']))
try:
order = OrderService.create(order_data)
logger.info("Order created successfully",
order_id=order.id,
user_id=order.user_id,
total_amount=order.total_amount)
return jsonify(order.to_dict())
except OutOfStockException as e:
logger.warning("Order failed - out of stock",
user_id=order_data['user_id'],
unavailable_items=e.items)
return jsonify({'error': 'Items out of stock'}), 400
Final Thoughts: Design for Reality, Not Perfection
That coffee shop loyalty app that crashed on Seattle Coffee Week taught me the most important lesson about system design: start simple, measure everything, and scale the bottlenecks.
You don't need microservices from day one. You don't need Kubernetes for 100 users. You don't need event sourcing for a blog. But you do need to understand these patterns so you can apply them when they solve real problems.
The best system design is the simplest one that meets your requirements. As you grow, you'll face new challenges - high read load, complex business logic, global distribution, regulatory compliance. Each challenge has known patterns and solutions.
Whether you're building a coffee shop app or designing systems for millions of users, the fundamentals remain the same: understand your requirements, estimate your scale, design for failure, and evolve incrementally.
Remember: every large-scale system started as a simple application that worked. The art is knowing when and how to add complexity without breaking what already works.
Currently writing this from the 10th floor of the Seattle Public Library, where I can see the city's infrastructure spreading out below - a beautiful reminder that all complex systems are built from simple, reliable components. Share your system design journey @maya_codes_pnw - from single server to global scale! 🏗️☕
Add Comment
No comments yet. Be the first to comment!