DB Indexing - Kai Sassnowski Video

Link: https://www.youtube.com/watch?v=HubezKbFL7E

1. What is a Database Index?

Primary Purpose: An index is a performance tool designed to make reading data faster. Since slow queries are the leading cause of poor application performance, indexing is the most effective way to optimize.
The “Phone Book” Analogy: An index is like a phone book ordered by last name. Because the data is sorted, you can find a specific entry quickly without reading every page.
The Power of Order: Searching ordered data is exponentially more efficient than searching unordered data. It allows the database to use algorithms similar to a binary search to locate rows.

2. The B-Tree Data Structure

Most database indexes use a B-Tree (Balanced Tree) structure to maintain order and ensure predictable performance.

Structural Components

Root Node: The entry point for every search.
Sub-trees: Nodes are organized logically; values less than the parent go to the left, and values greater or equal go to the right.
Leaf Nodes: The bottom layer where the actual indexed values are stored.
Doubly Linked List: Leaf nodes are connected to each other. This allows the database to perform “range scans” (e.g., finding all IDs between 10 and 20) by finding the first value and then simply following the links to the next ones.

What is actually inside an index?

An index does not copy the whole table. It contains:

The Indexed Column(s): Only the specific data you chose to index.
The Row ID (Pointer): A reference that tells the database exactly where the rest of the row’s data is located on the disk.

Key Benefits

Logarithmic Scalability: As your data grows from 1,000 to 1,000,000 rows, the depth of the tree only increases slightly, keeping performance stable.

3. Trade-offs: The Cost of Indexing

Read vs. Write: While indexes speed up reads, they slow down writes (INSERT, UPDATE, DELETE).
Maintenance: Every time data changes, the database must also update the B-Tree and potentially “rebalance” it to keep the leaf nodes at the same level.
Rule of Thumb: Use as many indexes as necessary to support your queries, but as few as possible to avoid bloating write times and storage.

4. Analyzing the Execution Plan (`EXPLAIN`)

To see if your index is working, prepend EXPLAIN to your SQL query. The type (Access Type) column tells you how the database is accessing the data:

Access Type	Description	Performance
const / eq_ref	Finds a single value using a unique index/primary key.	Excellent (Stop optimizing here)
ref / range	Finds a starting point and scans a portion of the leaf nodes.	Good (Limits rows inspected)
index	Full Index Scan. Scans every leaf node in the index.	Poor (Better than table scan, but slow)
ALL	Full Table Scan. Reads every row and column from disk.	Worst (Avoid at all costs)

5. Common Pitfalls and Solutions

Pitfall 1: Functions in `WHERE` Clauses

Problem: WHERE YEAR(created_at) = 2013 prevents the index from being used because the database doesn’t know the result of the function until it runs it on every row.
Solution: Use raw ranges: WHERE created_at BETWEEN '2013-01-01' AND '2013-12-31'.

Pitfall 2: Missing Data (The “Hidden” Table Hit)

Problem: If you SELECT total but your index only covers created_at, the database must jump back to the main table for every single row to find the “total” value.
Solution: Index-Only Scan. Add the requested column to the index. If the index contains all the data the query needs, the database never has to touch the main table.

Pitfall 3: Incorrect Column Order (Left-to-Right Rule)

Problem: In a multi-column index (A, B, C), you can search for A or A and B, but you cannot efficiently search for B and C alone. The index is sorted by the first column primarily.
Solution: Place the most selective columns (the ones that filter out the most data) first.

Pitfall 4: The Inequality “Wall”

Problem: Once you use a range operator (>, <, BETWEEN) on a column in a multi-column index, the database cannot use the subsequent columns in that index for filtering.
Solution: Always place columns used with equality (=) first in your index, and the range column last.

6. The Developer’s Mandate

Indexes are Query-Specific: You don’t “index a table”; you “index a query.” An index’s design depends entirely on the WHERE, SELECT, and ORDER BY clauses of a specific query.
Ownership: Developers—not just DBAs—must own indexing because only the developer understands the application’s data access patterns and which performance trade-offs are acceptable.

Amadeus

Explorer

DB Indexing - Kai Sassnowski Video

Link: https://www.youtube.com/watch?v=HubezKbFL7E

1. What is a Database Index?

2. The B-Tree Data Structure

Structural Components

What is actually inside an index?

Key Benefits

3. Trade-offs: The Cost of Indexing

4. Analyzing the Execution Plan (`EXPLAIN`)

5. Common Pitfalls and Solutions

Pitfall 1: Functions in `WHERE` Clauses

Pitfall 2: Missing Data (The “Hidden” Table Hit)

Pitfall 3: Incorrect Column Order (Left-to-Right Rule)

Pitfall 4: The Inequality “Wall”

6. The Developer’s Mandate

Graph View

Table of Contents

Backlinks

All Tags

Amadeus

Explorer

DB Indexing - Kai Sassnowski Video

Link: https://www.youtube.com/watch?v=HubezKbFL7E

1. What is a Database Index?

2. The B-Tree Data Structure

Structural Components

What is actually inside an index?

Key Benefits

3. Trade-offs: The Cost of Indexing

4. Analyzing the Execution Plan (EXPLAIN)

5. Common Pitfalls and Solutions

Pitfall 1: Functions in WHERE Clauses

Pitfall 2: Missing Data (The “Hidden” Table Hit)

Pitfall 3: Incorrect Column Order (Left-to-Right Rule)

Pitfall 4: The Inequality “Wall”

6. The Developer’s Mandate

Graph View

Table of Contents

Backlinks

All Tags

4. Analyzing the Execution Plan (`EXPLAIN`)

Pitfall 1: Functions in `WHERE` Clauses