Why Database Design Matters
Bad database design cost me 2 weeks of refactoring and angry users. Good design scales effortlessly. Here’s what I learned building systems handling millions of records.
Core Design Principles
1. Normalization vs Denormalization
Normalization (OLTP) - Remove redundancy
-- Normalized: Users and Orders separate
CREATE TABLE users (
id UUID PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total DECIMAL(10, 2) NOT NULL,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Query requires JOIN
SELECT u.name, o.total
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed';
Denormalization (OLAP) - Optimize for reads
-- Denormalized: Duplicate user data for fast queries
CREATE TABLE orders_denormalized (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
user_name VARCHAR(255) NOT NULL, -- Duplicated
user_email VARCHAR(255) NOT NULL, -- Duplicated
total DECIMAL(10, 2) NOT NULL,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- No JOIN needed
SELECT user_name, total
FROM orders_denormalized
WHERE status = 'completed';
When to denormalize:
- Read-heavy workloads
- JOINs causing performance issues
- Data rarely changes
- Analytics/reporting queries
2. Indexing Strategies
B-Tree Index (default, most common)
-- Simple index
CREATE INDEX idx_orders_status ON orders(status);
-- Composite index (order matters!)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Works for:
WHERE user_id = '...' AND status = '...' ✅
WHERE user_id = '...' ✅
WHERE status = '...' ❌ (doesn't use index efficiently)
Partial Index (smaller, faster)
-- Only index active orders
CREATE INDEX idx_active_orders ON orders(created_at)
WHERE status IN ('pending', 'processing');
-- Query must match the WHERE condition
SELECT * FROM orders
WHERE status = 'pending'
AND created_at > NOW() - INTERVAL '7 days'; ✅
Expression Index
-- Index on computed values
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now case-insensitive search is fast
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
JSONB Index (PostgreSQL)
CREATE TABLE products (
id UUID PRIMARY KEY,
name VARCHAR(255),
metadata JSONB
);
-- GIN index for JSONB
CREATE INDEX idx_products_metadata ON products USING GIN(metadata);
-- Fast JSONB queries
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
SELECT * FROM products WHERE metadata ? 'brand';
3. Partitioning for Scale
Range Partitioning (time-series data)
-- Parent table
CREATE TABLE events (
id UUID NOT NULL,
user_id UUID NOT NULL,
event_type VARCHAR(50),
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Partitions by month
CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
CREATE TABLE events_2025_03 PARTITION OF events
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');
-- Queries automatically use correct partition
SELECT * FROM events
WHERE created_at >= '2025-02-15'
AND created_at < '2025-02-20'; -- Only scans events_2025_02
List Partitioning (categorical data)
CREATE TABLE orders (
id UUID NOT NULL,
region VARCHAR(50) NOT NULL,
total DECIMAL(10, 2)
) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders
FOR VALUES IN ('US', 'USA', 'United States');
CREATE TABLE orders_eu PARTITION OF orders
FOR VALUES IN ('UK', 'FR', 'DE', 'IT', 'ES');
CREATE TABLE orders_asia PARTITION OF orders
FOR VALUES IN ('JP', 'CN', 'IN', 'SG');
4. Efficient Queries
Use EXPLAIN ANALYZE
-- See actual execution plan
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5;
-- Output shows:
-- - Index usage
-- - Join methods
-- - Actual execution time
-- - Row estimates vs actual
Avoid N+1 Queries
// ❌ Bad: N+1 queries
async function getOrdersWithUsers(orderIds: string[]) {
const orders = await db.query('SELECT * FROM orders WHERE id = ANY($1)', [orderIds]);
for (const order of orders) {
// N queries!
order.user = await db.query('SELECT * FROM users WHERE id = $1', [order.user_id]);
}
return orders;
}
// ✅ Good: Single query with JOIN
async function getOrdersWithUsers(orderIds: string[]) {
return db.query(`
SELECT
o.*,
json_build_object('id', u.id, 'name', u.name, 'email', u.email) as user
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.id = ANY($1)
`, [orderIds]);
}
Pagination Best Practices
-- ❌ Bad: OFFSET gets slower with large offsets
SELECT * FROM products
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000; -- Scans 10,020 rows
-- ✅ Good: Keyset pagination
SELECT * FROM products
WHERE created_at < $1 -- Last seen timestamp
ORDER BY created_at DESC
LIMIT 20;
-- Or use cursor-based
SELECT * FROM products
WHERE id > $1 -- Last seen ID
ORDER BY id
LIMIT 20;
Advanced Patterns
1. Soft Deletes
-- Add deleted_at column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
-- Create index excluding deleted
CREATE INDEX idx_active_users ON users(created_at)
WHERE deleted_at IS NULL;
-- "Delete" user
UPDATE users SET deleted_at = NOW() WHERE id = $1;
-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;
2. Optimistic Locking
-- Add version column
CREATE TABLE products (
id UUID PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10, 2),
version INTEGER NOT NULL DEFAULT 0
);
-- Update with version check
UPDATE products
SET
price = $1,
version = version + 1
WHERE id = $2 AND version = $3
RETURNING *;
-- If affected rows = 0, someone else modified it
TypeScript implementation:
async function updateProduct(id: string, updates: Partial<Product>, currentVersion: number) {
const result = await db.query(`
UPDATE products
SET
name = COALESCE($1, name),
price = COALESCE($2, price),
version = version + 1
WHERE id = $3 AND version = $4
RETURNING *
`, [updates.name, updates.price, id, currentVersion]);
if (result.rows.length === 0) {
throw new Error('Product was modified by another user');
}
return result.rows[0];
}
3. Audit Trail
-- Main table
CREATE TABLE accounts (
id UUID PRIMARY KEY,
balance DECIMAL(10, 2) NOT NULL,
updated_at TIMESTAMP DEFAULT NOW()
);
-- Audit table
CREATE TABLE accounts_audit (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID NOT NULL,
old_balance DECIMAL(10, 2),
new_balance DECIMAL(10, 2),
changed_by UUID,
changed_at TIMESTAMP DEFAULT NOW()
);
-- Trigger to record changes
CREATE OR REPLACE FUNCTION audit_account_changes()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO accounts_audit (account_id, old_balance, new_balance, changed_by)
VALUES (NEW.id, OLD.balance, NEW.balance, current_setting('app.user_id', true)::UUID);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER account_audit_trigger
AFTER UPDATE ON accounts
FOR EACH ROW
EXECUTE FUNCTION audit_account_changes();
4. Full-Text Search
-- Add tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);
-- Update search vector
UPDATE articles SET search_vector =
to_tsvector('english',
coalesce(title, '') || ' ' ||
coalesce(content, '') || ' ' ||
coalesce(author, '')
);
-- Trigger to keep it updated
CREATE FUNCTION articles_search_trigger() RETURNS trigger AS $$
BEGIN
NEW.search_vector := to_tsvector('english',
coalesce(NEW.title, '') || ' ' ||
coalesce(NEW.content, '') || ' ' ||
coalesce(NEW.author, '')
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();
-- Search query with ranking
SELECT
id,
title,
ts_rank(search_vector, query) AS rank
FROM articles, plainto_tsquery('english', 'typescript react') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
5. Connection Pooling
import { Pool } from 'pg';
// Create pool
const pool = new Pool({
host: process.env.DB_HOST,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // Max connections
idleTimeoutMillis: 30000, // Close idle connections
connectionTimeoutMillis: 2000,
});
// Use transactions properly
async function transferMoney(fromId: string, toId: string, amount: number) {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Deduct from sender
await client.query(
'UPDATE accounts SET balance = balance - $1 WHERE id = $2',
[amount, fromId]
);
// Add to receiver
await client.query(
'UPDATE accounts SET balance = balance + $1 WHERE id = $2',
[amount, toId]
);
await client.query('COMMIT');
} catch (error) {
await client.query('ROLLBACK');
throw error;
} finally {
client.release(); // Always release!
}
}
Monitoring & Optimization
1. Identify Slow Queries
-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slow queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Find most frequent queries
SELECT
query,
calls,
total_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
2. Check Index Usage
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan as scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find duplicate indexes
SELECT
pg_size_pretty(sum(pg_relation_size(idx))::bigint) as size,
(array_agg(idx))[1] as idx1,
(array_agg(idx))[2] as idx2,
(array_agg(idx))[3] as idx3
FROM (
SELECT
indexrelid::regclass as idx,
indrelid,
indkey
FROM pg_index
) sub
GROUP BY indrelid, indkey
HAVING count(*) > 1;
3. Monitor Performance
// Prometheus metrics for queries
import { Histogram } from 'prom-client';
const queryDuration = new Histogram({
name: 'db_query_duration_seconds',
help: 'Database query duration',
labelNames: ['query_name']
});
async function measureQuery<T>(name: string, query: () => Promise<T>): Promise<T> {
const end = queryDuration.startTimer({ query_name: name });
try {
return await query();
} finally {
end();
}
}
// Usage
const users = await measureQuery('get_users', () =>
db.query('SELECT * FROM users WHERE active = true')
);
Real-World Results
Applied these patterns to our e-commerce platform:
Before
- Query time: 2.5s average
- Database CPU: 85% constant
- Connection pool exhausted daily
- Manual query optimization needed
After
- Query time: 120ms average (95% reduction)
- Database CPU: 30% average
- Zero connection issues
- Auto-optimized with proper indexes
Database Design Checklist
- Normalized schema (3NF minimum)
- Indexes on foreign keys
- Indexes on WHERE/ORDER BY columns
- Partitioning for large tables (>10M rows)
- Connection pooling configured
- Slow query logging enabled
- Regular VACUUM and ANALYZE
- Backup strategy in place
- Monitoring and alerting setup
Conclusion
Good database design is invisible. It just works. Key principles:
- Normalize first, denormalize only when needed
- Index strategically - not too few, not too many
- Partition large tables before they become a problem
- Monitor continuously - measure everything
- Use transactions correctly
- Pool connections - never connect per query
Remember: Schema changes are expensive. Get it right early.
Happy querying! 🗄️