Related: DB Indexing - Kai Sassnowski Video
What is an Index?
An index is like a book’s index at the back - it helps you find information quickly without scanning every page.
Without index:
SELECT * FROM orders WHERE symbol = 'AAPL';
-- Scans ALL rows in the table to find AAPL orders
-- Time: O(n) - scales with table sizeWith index:
CREATE INDEX idx_orders_symbol ON orders(symbol);
SELECT * FROM orders WHERE symbol = 'AAPL';
-- Jumps directly to AAPL entries using the index
-- Time: O(log n) - much fasterHow Indexes Work
B-Tree Index (Most Common)
[M]
/ \
[E] [T]
/ \ / \
[A] [G] [K] [Z]
| | | |
... ... ... ...
- Balanced tree - all paths from root to leaf have similar length
- Sorted - data stored in sorted order
- Range queries work -
<,>,BETWEEN,LIKE 'A%' - Default in PostgreSQL, MySQL, SQL Server
Index Types
1. Single Column Index
CREATE INDEX idx_orders_user_id ON orders(user_id);2. Composite/Multi-Column Index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);Important: Left-to-right rule
-- Index: (user_id, created_at)
-- ✅ Uses index (user_id is first column)
SELECT * FROM orders WHERE user_id = 123;
-- ✅ Uses index (both columns in order)
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01';
-- ❌ Does NOT use index (created_at alone - not leftmost)
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- ❌ Does NOT use index (wrong order)
SELECT * FROM orders WHERE created_at > '2024-01-01' AND user_id = 123;3. Unique Index
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Ensures no duplicate emails4. Partial Index (PostgreSQL)
-- Index only active orders (smaller, faster)
CREATE INDEX idx_orders_active ON orders(user_id)
WHERE status = 'active';5. Covering Index (Index-Only Scan)
-- Include extra columns in index to avoid table lookup
CREATE INDEX idx_orders_covering ON orders(user_id, created_at)
INCLUDE (symbol, quantity, price);
-- Entire query satisfied from index, no table access needed
SELECT symbol, quantity, price FROM orders WHERE user_id = 123;6. Hash Index
-- Good for exact matches only
CREATE INDEX idx_orders_symbol_hash ON orders USING HASH (symbol);
SELECT * FROM orders WHERE symbol = 'AAPL'; -- ✅ Fast
SELECT * FROM orders WHERE symbol > 'AAPL'; -- ❌ Can't useIndex Strategies for OLTP
1. Index Primary Keys (Automatic)
-- Already indexed by default
-- orders.id is automatically indexed
SELECT * FROM orders WHERE id = 123; -- Fast2. Index Foreign Keys
-- If you frequently query by user_id, index it
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_positions_user_id ON positions(user_id);3. Index WHERE Clauses
-- If you frequently filter by status
CREATE INDEX idx_orders_status ON orders(status);
-- Query: SELECT * FROM orders WHERE status = 'pending';4. Index ORDER BY
-- If you frequently sort by created_at
CREATE INDEX idx_orders_created ON orders(created_at);
-- Query: SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;5. Composite Index for Common Queries
-- If you often run:
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';
-- Create composite index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);Real Trading System Examples
Scenario 1: User’s Open Orders
-- Common query
SELECT * FROM orders
WHERE user_id = 123 AND status IN ('pending', 'filled');
-- Index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);Scenario 2: Symbol Lookup
-- Common query
SELECT * FROM symbols WHERE ticker = 'AAPL';
-- Index
CREATE INDEX idx_symbols_ticker ON symbols(ticker);Scenario 3: Recent Orders
-- Common query
SELECT * FROM orders
WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 10;
-- Index (ORDER BY benefits from index)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);Scenario 4: Time-Based Analysis
-- Common query
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';
-- Index
CREATE INDEX idx_orders_created ON orders(created_at);When NOT to Use Indexes
1. Small Tables
-- Table has 100 rows - full scan is faster than index lookup
-- Skip indexing2. Low Selectivity Columns
-- If 90% of rows have status = 'active'
SELECT * FROM orders WHERE status = 'active'; -- Returns 90% of table
-- Index on status is useless, optimizer will do full scan3. Frequently Updated Columns
-- Every update to last_login also updates the index
-- Costs write performance
CREATE INDEX idx_users_logged_in ON users(last_login);
-- Only worth it if reads far outweigh writes4. Don’t Over-Index
-- Bad: 10 indexes on orders table
CREATE INDEX idx_orders_1 ON orders(col1);
CREATE INDEX idx_orders_2 ON orders(col2);
-- ...
-- Every INSERT/UPDATE must maintain all indexes
-- Slows down writes significantlyIndex Trade-offs
| Benefit | Cost |
|---|---|
| Faster reads | Slower writes (index maintenance) |
| Faster WHERE | More storage space |
| Faster ORDER BY | More memory for index cache |
| Faster JOINs | Slower ALTER TABLE |
Rule of thumb: If you have 5+ indexes on a high-write table, reconsider.
How to Analyze Index Usage
EXPLAIN (PostgreSQL)
EXPLAIN SELECT * FROM orders WHERE user_id = 123;
-- Look for:
-- "Index Scan using idx_orders_user_id" ✅ Good - using index
-- "Seq Scan on orders" ❌ Bad - table scanEXPLAIN ANALYZE (Actual Execution)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Shows actual time taken, rows returned
-- More accurate but actually runs the queryCheck Index Usage
-- PostgreSQL: What indexes exist
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';
-- What queries use which indexes
SELECT query, calls, index_scans
FROM pg_stat_statements;Common Mistakes
1. Indexing After Data Load
-- Bad: Load 10M rows, then create index (slow)
INSERT INTO orders (...) SELECT ... FROM huge_table;
CREATE INDEX idx_orders_user ON orders(user_id);
-- Good: Create index first, then load data
CREATE INDEX idx_orders_user ON orders(user_id);
INSERT INTO orders (...) SELECT ... FROM huge_table;2. Wrong Column Order in Composite Index
-- Query: WHERE status = 'active' AND user_id = 123
-- Wrong order:
CREATE INDEX idx_orders_status_user ON orders(status, user_id);
-- Right order (match query pattern):
CREATE INDEX idx_orders_user_status ON orders(user_id, status);— Best practice: Put equality columns first, range last
### 3. Not Considering Query Patterns
```sql
-- If queries vary widely, multiple single-column indexes may help:
-- Query 1: WHERE user_id = 123
-- Query 2: WHERE status = 'active'
-- Query 3: WHERE created_at > '2024-01-01'
-- Could use:
CREATE INDEX idx_orders_user ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created ON orders(created_at);
Summary
- Indexes = fast lookups at the cost of write performance and storage
- B-tree is the default, works for most cases
- Composite indexes follow left-to-right rule
- Index what you query - WHERE, ORDER BY, JOIN columns
- Don’t over-index - every index slows down writes
- Use EXPLAIN to verify index usage
- Partial indexes for large tables with filtered queries