The year was 2019. I was three months into my first "real" software job, cocky with my computer science degree and ready to revolutionize the industry. My manager handed me what seemed like a simple task: "We need to track user preferences for push notifications."
"Easy," I thought. "I'll just add a column to the users table."
Two hours later, I had added 47 columns to the users table. Each notification type got its own boolean column: email_marketing
, sms_marketing
, push_marketing
, email_security
, sms_security
, push_security
, email_product_updates
, sms_product_updates
... you get the idea.
My manager took one look at my pull request and said, "We need to talk."
That conversation fundamentally changed how I think about data. Today, I'm the one having that conversation with junior developers, and I've seen every creative way to destroy a database schema. Let me share what I've learned about database design that they definitely didn't teach in my algorithms class.
The Database Schema That Haunts My Dreams
Before diving into principles, let me paint you a picture of real database horror. This is a simplified version of a schema I inherited at a previous company:
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255),
-- Original columns end here
-- Then the chaos began...
notification_email_marketing BOOLEAN,
notification_email_security BOOLEAN,
notification_sms_marketing BOOLEAN,
notification_sms_security BOOLEAN,
notification_push_marketing BOOLEAN,
notification_push_security BOOLEAN,
-- 200 more preference columns...
billing_address_line1 VARCHAR(255),
billing_address_line2 VARCHAR(255),
billing_city VARCHAR(255),
billing_state VARCHAR(255),
billing_zip VARCHAR(255),
billing_country VARCHAR(255),
shipping_address_line1 VARCHAR(255),
shipping_address_line2 VARCHAR(255),
shipping_city VARCHAR(255),
shipping_state VARCHAR(255),
shipping_zip VARCHAR(255),
shipping_country VARCHAR(255),
last_login_ip VARCHAR(45),
last_login_timestamp TIMESTAMP,
previous_login_ip VARCHAR(45),
previous_login_timestamp TIMESTAMP,
password_reset_token VARCHAR(255),
password_reset_expires_at TIMESTAMP,
email_verification_token VARCHAR(255),
email_verified_at TIMESTAMP,
-- And it just kept going...
subscription_plan VARCHAR(50),
subscription_expires_at TIMESTAMP,
trial_started_at TIMESTAMP,
trial_ends_at TIMESTAMP,
favorite_color VARCHAR(50), -- Yes, really
favorite_food VARCHAR(100),
preferred_language VARCHAR(10),
timezone VARCHAR(50),
created_at TIMESTAMP,
updated_at TIMESTAMP
);
This table had 387 columns. I know because I counted them while crying into my coffee. Adding a new notification type required a database migration that locked the table for 3 hours. A simple user lookup was pulling 40KB of data per row.
The lesson? Database design isn't just about "making it work." It's about making it work tomorrow, next year, and when you have a million users instead of a thousand.
The First Law of Database Design: Single Responsibility
Remember that notification preferences disaster? Here's how it should have been designed from the start:
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE notification_preferences (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
notification_type VARCHAR(50) NOT NULL,
channel VARCHAR(20) NOT NULL, -- email, sms, push
enabled BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
UNIQUE KEY unique_user_type_channel (user_id, notification_type, channel)
);
Now adding a new notification type is a data operation, not a schema change. Want to add "friend_request" notifications? Insert a few rows. Want to add Slack as a channel? Update the enum. No migrations, no downtime, no tears.
Normalization: The Art of Splitting Smartly
Let me tell you about the "Great Address Refactor of 2021." Our users
table had separate columns for billing and shipping addresses. Seemed reasonable until we needed to support:
- Multiple shipping addresses per user
- Saving addresses for quick reuse
- Validating addresses with postal services
- Tracking address change history
Suddenly, 12 columns became this beautiful, flexible structure:
CREATE TABLE addresses (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
type ENUM('billing', 'shipping') NOT NULL,
label VARCHAR(100), -- "Home", "Work", "Mom's House"
line1 VARCHAR(255) NOT NULL,
line2 VARCHAR(255),
city VARCHAR(100) NOT NULL,
state VARCHAR(100) NOT NULL,
postal_code VARCHAR(20) NOT NULL,
country VARCHAR(2) NOT NULL,
is_default BOOLEAN DEFAULT FALSE,
is_verified BOOLEAN DEFAULT FALSE,
verified_at TIMESTAMP NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
INDEX idx_user_type (user_id, type),
INDEX idx_user_default (user_id, is_default)
);
The beauty? Users can now have multiple addresses of each type, we can validate them independently, and adding address history tracking later was trivial.
The Relationship Goldilocks Zone
I've seen two extremes in database design:
- Everything in one table (my 387-column nightmare)
- Everything in separate tables (the "let's normalize to 6th normal form" crowd)
Both are wrong. Here's a real example of finding the sweet spot.
Bad - Over-normalized:
CREATE TABLE users (id, name, email);
CREATE TABLE user_first_names (user_id, first_name);
CREATE TABLE user_last_names (user_id, last_name);
CREATE TABLE user_emails (user_id, email);
CREATE TABLE user_creation_dates (user_id, created_at);
Also Bad - Under-normalized:
CREATE TABLE users (
id, name, email,
order1_id, order1_total, order1_date,
order2_id, order2_total, order2_date,
order3_id, order3_total, order3_date
-- What happens when they place a 4th order?
);
Just Right - Properly normalized:
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
total DECIMAL(10,2) NOT NULL,
status ENUM('pending', 'paid', 'shipped', 'delivered') NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id)
);
The Index Strategy That Saved Our Bacon
Picture this: Black Friday, 2022. Traffic was 50x normal. Every page was taking 30+ seconds to load. Our database was drowning.
The problem? This innocent-looking query:
SELECT * FROM orders
WHERE user_id = 123
AND status = 'shipped'
AND created_at > '2022-11-01';
Millions of rows, no indexes. The database was doing full table scans for every request.
The fix:
CREATE INDEX idx_orders_user_status_date
ON orders (user_id, status, created_at);
Query time dropped from 30 seconds to 3 milliseconds. Site went from "completely broken" to "fastest it's ever been."
The lesson? Index your queries, but index them smart:
-- Good - Composite index matching query pattern
CREATE INDEX idx_orders_lookup ON orders (user_id, status, created_at);
-- Bad - Separate indexes that can't be combined efficiently
CREATE INDEX idx_orders_user ON orders (user_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_date ON orders (created_at);
Real-World Design Patterns That Actually Work
The Polymorphic Relationship Pattern
Need to add comments to both posts and videos? Don't create separate tables:
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
commentable_type VARCHAR(50) NOT NULL, -- 'post' or 'video'
commentable_id BIGINT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_commentable (commentable_type, commentable_id),
FOREIGN KEY (user_id) REFERENCES users(id)
);
The Soft Delete Pattern
Sometimes you need to delete data without actually deleting it:
CREATE TABLE posts (
id BIGINT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
deleted_at TIMESTAMP NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_not_deleted (deleted_at) -- For WHERE deleted_at IS NULL
);
The Audit Trail Pattern
Want to track who changed what and when?
CREATE TABLE user_audit_log (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
field_name VARCHAR(100) NOT NULL,
old_value TEXT,
new_value TEXT,
changed_by BIGINT NOT NULL,
changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_user_changes (user_id, changed_at),
FOREIGN KEY (user_id) REFERENCES users(id),
FOREIGN KEY (changed_by) REFERENCES users(id)
);
The Performance Landmines I've Stepped On
The N+1 Query Nightmare
This innocent Laravel code nearly brought down our production server:
$orders = Order::all(); // 10,000 orders
foreach ($orders as $order) {
echo $order->user->name; // 10,000 additional queries!
}
The database design enabled this by having proper foreign keys, but the application code was the problem. Always design with eager loading in mind:
-- Make sure your foreign keys are indexed
CREATE INDEX idx_orders_user ON orders (user_id);
The UUID vs Integer Debate
I once insisted on using UUIDs for everything because "they're more secure." Performance dropped by 40%. Turns out, UUIDs as primary keys are terrible for MySQL performance due to page splits and fragmentation.
Use UUIDs when:
- You need truly globally unique identifiers
- You're worried about ID enumeration attacks
- You're merging data from multiple sources
Use integers when:
- Performance matters more than perfect uniqueness
- You're building a typical web application
- You want simple, fast joins
The Database Migration Strategy That Saved My Career
November 2021. We needed to add a column to our users table. Simple, right? Wrong. The table had 50 million rows. A naive ALTER TABLE
would lock the table for 6 hours during peak business hours.
The solution - Online schema changes:
-- Step 1: Add the column as nullable
ALTER TABLE users ADD COLUMN subscription_tier VARCHAR(20) DEFAULT 'free';
-- Step 2: Backfill in batches (via application code)
-- UPDATE users SET subscription_tier = 'premium' WHERE id BETWEEN 1 AND 10000;
-- (repeat in chunks)
-- Step 3: Make it NOT NULL once backfilled
ALTER TABLE users MODIFY subscription_tier VARCHAR(20) NOT NULL DEFAULT 'free';
Zero downtime, zero angry customers, zero lost revenue.
The Design Review Checklist That Prevents Disasters
Before any schema change goes live, I run through this checklist:
- Is every table properly normalized? (But not over-normalized)
- Does every foreign key have an index?
- Are all queries I expect to run indexed?
- Can this schema handle 10x the current data?
- Are column types appropriate for the data? (VARCHAR(255) for everything is lazy)
- Are there any potential race conditions?
- How will I migrate existing data?
- What happens if this migration fails halfway through?
The Bottom Line: Design for Tomorrow
Good database design isn't about following rules blindly. It's about understanding your data, anticipating growth, and building systems that can evolve.
My 387-column users table? We eventually refactored it into 12 properly normalized tables. The refactor took 6 months, but the system became 10x faster and infinitely more maintainable.
Your database is the foundation of your application. Build it like you're planning to run a billion-dollar business on it. Because one day, you might be.
The notification preferences feature that started this whole journey? It's still running in production, unchanged, four years later. That's the difference between good design and "making it work."
Now if you'll excuse me, I need to review a schema design for a new feature. The developer wants to add 23 columns to the users table. Some conversations never end.
P.S. - That favorite color column I mentioned? It's still there. We're afraid to remove it in case some ancient part of the system depends on it. That's the real lesson: every bad design decision lives forever. Make good ones from the start.
Add Comment
No comments yet. Be the first to comment!