Why B+Tree?
B+Tree is the de facto standard for database indexes. Compared to B-Tree:
- Leaf nodes hold data; internal nodes hold only keys — shorter tree, fewer I/Os
- Leaf nodes form a linked list — efficient range scans
- All data at the same depth — predictable lookup performance
Node Structure
Internal Node:
┌──────────────────────────────────────────┐
│ [P0] K1 [P1] K2 [P2] ... Kn [Pn] │
│ ↓ ↓ ↓ ↓ │
│ <K1 K1~K2 K2~K3 >Kn │
└──────────────────────────────────────────┘
Leaf Node:
┌─────────────────────────────────────────────┐
│ [K1, Data1] [K2, Data2] ... [Kn, DataN] → │
│ next │→
└─────────────────────────────────────────────┘
Core Operations
Insert
- Locate target leaf node
- If node has space → insert directly
- If node is full → Split:
- Allocate new node
- Move half of keys to new node
- Insert separator key into parent
- Recursively propagate splits upward
Before split: [10, 20, 30, 40, 50] (order=5, full)
│
After split: [10, 20, 30] → parent [40]
[40, 50] [10, 20, 30] [40, 50]
Delete
- Locate target leaf node
- Delete key
- If node underflows → Merge or Redistribute
Redistribute (from right sibling):
[10, 20] → [10] (underflow) → pull down 30 from parent, pull up 35 from right
[30, 35, 40] → [30, 35] [40]
Concurrency Control
Multi-threaded B+Tree access requires consistency guarantees:
Latch Crabbing Protocol
Search phase: parent unsafe → lock child, release parent
Split phase: lock all nodes on path, propagate upward
Safe node: won't split → release ancestor locks
Optimistic Approach
- Assume tree structure doesn’t change (most ops don’t trigger splits/merges)
- Use read locks during search
- If modification needed → reacquire write locks
- Used by both MySQL InnoDB and PostgreSQL btree indexes
Clustered vs Secondary Indexes
| Type | What leaf holds | Characteristics |
|---|---|---|
| Clustered | Full data row | Data is index, physically ordered |
| Secondary | Primary key reference | Requires table lookup |
MySQL InnoDB’s primary key is a clustered index — data is physically ordered by the PK. PostgreSQL does not use clustered indexes; all indexes are secondary (pointing to CTID).
B+Tree Variants
- B-link Tree: leaf siblings linked via pointers (all B+Trees)
- B*-Tree: split redistributes across two siblings, reducing split frequency
- Fractal Tree: buffers writes in internal nodes, flushes batches to leaves
Summary
- B+Tree achieves O(log n) lookups via controlled tree height and fanout
- Split/merge operations are the primary write performance factor
- Concurrency control must balance correctness and throughput
- Clustered vs non-clustered index choice impacts query performance