Related: DB Indexing - Kai Sassnowski Video

What is an Index?

An index is like a book’s index at the back - it helps you find information quickly without scanning every page.

Without index:

SELECT * FROM orders WHERE symbol = 'AAPL';
-- Scans ALL rows in the table to find AAPL orders
-- Time: O(n) - scales with table size

With index:

CREATE INDEX idx_orders_symbol ON orders(symbol);
 
SELECT * FROM orders WHERE symbol = 'AAPL';
-- Jumps directly to AAPL entries using the index
-- Time: O(log n) - much faster

How Indexes Work

B-Tree Index (Most Common)

            [M]
          /     \
       [E]       [T]
      /  \      /   \
   [A]   [G]  [K]   [Z]
    |     |    |     |
   ...   ...  ...   ...
  • Balanced tree - all paths from root to leaf have similar length
  • Sorted - data stored in sorted order
  • Range queries work - <, >, BETWEEN, LIKE 'A%'
  • Default in PostgreSQL, MySQL, SQL Server

Index Types

1. Single Column Index

CREATE INDEX idx_orders_user_id ON orders(user_id);

2. Composite/Multi-Column Index

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

Important: Left-to-right rule

-- Index: (user_id, created_at)
 
-- ✅ Uses index (user_id is first column)
SELECT * FROM orders WHERE user_id = 123;
 
-- ✅ Uses index (both columns in order)
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2024-01-01';
 
-- ❌ Does NOT use index (created_at alone - not leftmost)
SELECT * FROM orders WHERE created_at > '2024-01-01';
 
-- ❌ Does NOT use index (wrong order)
SELECT * FROM orders WHERE created_at > '2024-01-01' AND user_id = 123;

3. Unique Index

CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Ensures no duplicate emails

4. Partial Index (PostgreSQL)

-- Index only active orders (smaller, faster)
CREATE INDEX idx_orders_active ON orders(user_id) 
WHERE status = 'active';

5. Covering Index (Index-Only Scan)

-- Include extra columns in index to avoid table lookup
CREATE INDEX idx_orders_covering ON orders(user_id, created_at) 
INCLUDE (symbol, quantity, price);
 
-- Entire query satisfied from index, no table access needed
SELECT symbol, quantity, price FROM orders WHERE user_id = 123;

6. Hash Index

-- Good for exact matches only
CREATE INDEX idx_orders_symbol_hash ON orders USING HASH (symbol);
 
SELECT * FROM orders WHERE symbol = 'AAPL';  -- ✅ Fast
SELECT * FROM orders WHERE symbol > 'AAPL';  -- ❌ Can't use

Index Strategies for OLTP

1. Index Primary Keys (Automatic)

-- Already indexed by default
-- orders.id is automatically indexed
SELECT * FROM orders WHERE id = 123;  -- Fast

2. Index Foreign Keys

-- If you frequently query by user_id, index it
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_positions_user_id ON positions(user_id);

3. Index WHERE Clauses

-- If you frequently filter by status
CREATE INDEX idx_orders_status ON orders(status);
 
-- Query: SELECT * FROM orders WHERE status = 'pending';

4. Index ORDER BY

-- If you frequently sort by created_at
CREATE INDEX idx_orders_created ON orders(created_at);
 
-- Query: SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;

5. Composite Index for Common Queries

-- If you often run:
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';
 
-- Create composite index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

Real Trading System Examples

Scenario 1: User’s Open Orders

-- Common query
SELECT * FROM orders 
WHERE user_id = 123 AND status IN ('pending', 'filled');
 
-- Index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

Scenario 2: Symbol Lookup

-- Common query
SELECT * FROM symbols WHERE ticker = 'AAPL';
 
-- Index
CREATE INDEX idx_symbols_ticker ON symbols(ticker);

Scenario 3: Recent Orders

-- Common query
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at DESC 
LIMIT 10;
 
-- Index (ORDER BY benefits from index)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Scenario 4: Time-Based Analysis

-- Common query
SELECT * FROM orders 
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';
 
-- Index
CREATE INDEX idx_orders_created ON orders(created_at);

When NOT to Use Indexes

1. Small Tables

-- Table has 100 rows - full scan is faster than index lookup
-- Skip indexing

2. Low Selectivity Columns

-- If 90% of rows have status = 'active'
SELECT * FROM orders WHERE status = 'active';  -- Returns 90% of table
-- Index on status is useless, optimizer will do full scan

3. Frequently Updated Columns

-- Every update to last_login also updates the index
-- Costs write performance
CREATE INDEX idx_users_logged_in ON users(last_login);
-- Only worth it if reads far outweigh writes

4. Don’t Over-Index

-- Bad: 10 indexes on orders table
CREATE INDEX idx_orders_1 ON orders(col1);
CREATE INDEX idx_orders_2 ON orders(col2);
-- ...
-- Every INSERT/UPDATE must maintain all indexes
-- Slows down writes significantly

Index Trade-offs

BenefitCost
Faster readsSlower writes (index maintenance)
Faster WHEREMore storage space
Faster ORDER BYMore memory for index cache
Faster JOINsSlower ALTER TABLE

Rule of thumb: If you have 5+ indexes on a high-write table, reconsider.


How to Analyze Index Usage

EXPLAIN (PostgreSQL)

EXPLAIN SELECT * FROM orders WHERE user_id = 123;
 
-- Look for:
-- "Index Scan using idx_orders_user_id"  ✅ Good - using index
-- "Seq Scan on orders"                   ❌ Bad - table scan

EXPLAIN ANALYZE (Actual Execution)

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
 
-- Shows actual time taken, rows returned
-- More accurate but actually runs the query

Check Index Usage

-- PostgreSQL: What indexes exist
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'orders';
 
-- What queries use which indexes
SELECT query, calls, index_scans 
FROM pg_stat_statements;

Common Mistakes

1. Indexing After Data Load

-- Bad: Load 10M rows, then create index (slow)
INSERT INTO orders (...) SELECT ... FROM huge_table;
CREATE INDEX idx_orders_user ON orders(user_id);
 
-- Good: Create index first, then load data
CREATE INDEX idx_orders_user ON orders(user_id);
INSERT INTO orders (...) SELECT ... FROM huge_table;

2. Wrong Column Order in Composite Index

-- Query: WHERE status = 'active' AND user_id = 123
-- Wrong order:
CREATE INDEX idx_orders_status_user ON orders(status, user_id);
 
-- Right order (match query pattern):
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

— Best practice: Put equality columns first, range last


### 3. Not Considering Query Patterns
```sql
-- If queries vary widely, multiple single-column indexes may help:
-- Query 1: WHERE user_id = 123
-- Query 2: WHERE status = 'active'
-- Query 3: WHERE created_at > '2024-01-01'

-- Could use:
CREATE INDEX idx_orders_user ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created ON orders(created_at);

Summary

  • Indexes = fast lookups at the cost of write performance and storage
  • B-tree is the default, works for most cases
  • Composite indexes follow left-to-right rule
  • Index what you query - WHERE, ORDER BY, JOIN columns
  • Don’t over-index - every index slows down writes
  • Use EXPLAIN to verify index usage
  • Partial indexes for large tables with filtered queries