B+Tree Index Internals

Why B+Tree?

B+Tree is the de facto standard for database indexes. Compared to B-Tree:

Leaf nodes hold data; internal nodes hold only keys — shorter tree, fewer I/Os
Leaf nodes form a linked list — efficient range scans
All data at the same depth — predictable lookup performance

Node Structure

Internal Node:
┌──────────────────────────────────────────┐
│  [P0] K1 [P1] K2 [P2] ... Kn [Pn]       │
│   ↓       ↓       ↓           ↓          │
│  <K1    K1~K2   K2~K3       >Kn         │
└──────────────────────────────────────────┘

Leaf Node:
┌─────────────────────────────────────────────┐
│ [K1, Data1] [K2, Data2] ... [Kn, DataN] →  │
│                                          next │→
└─────────────────────────────────────────────┘

Core Operations

Insert

Locate target leaf node
If node has space → insert directly
If node is full → Split:
- Allocate new node
- Move half of keys to new node
- Insert separator key into parent
- Recursively propagate splits upward

Before split:  [10, 20, 30, 40, 50] (order=5, full)
               │
After split:   [10, 20, 30] → parent [40]
               [40, 50]          [10, 20, 30] [40, 50]

Delete

Locate target leaf node
Delete key
If node underflows → Merge or Redistribute

Redistribute (from right sibling):
[10, 20] → [10] (underflow)    → pull down 30 from parent, pull up 35 from right
[30, 35, 40]                    → [30, 35] [40]

Concurrency Control

Multi-threaded B+Tree access requires consistency guarantees:

Latch Crabbing Protocol

Search phase: parent unsafe → lock child, release parent
Split phase:  lock all nodes on path, propagate upward
Safe node:    won't split → release ancestor locks

Optimistic Approach

Assume tree structure doesn’t change (most ops don’t trigger splits/merges)
Use read locks during search
If modification needed → reacquire write locks
Used by both MySQL InnoDB and PostgreSQL btree indexes

Clustered vs Secondary Indexes

Type	What leaf holds	Characteristics
Clustered	Full data row	Data is index, physically ordered
Secondary	Primary key reference	Requires table lookup

MySQL InnoDB’s primary key is a clustered index — data is physically ordered by the PK. PostgreSQL does not use clustered indexes; all indexes are secondary (pointing to CTID).

B+Tree Variants

B-link Tree: leaf siblings linked via pointers (all B+Trees)
B*-Tree: split redistributes across two siblings, reducing split frequency
Fractal Tree: buffers writes in internal nodes, flushes batches to leaves

Summary

B+Tree achieves O(log n) lookups via controlled tree height and fanout
Split/merge operations are the primary write performance factor
Concurrency control must balance correctness and throughput
Clustered vs non-clustered index choice impacts query performance