Why B+Tree?

B+Tree is the de facto standard for database indexes. Compared to B-Tree:

  • Leaf nodes hold data; internal nodes hold only keys — shorter tree, fewer I/Os
  • Leaf nodes form a linked list — efficient range scans
  • All data at the same depth — predictable lookup performance

Node Structure

Internal Node:
┌──────────────────────────────────────────┐
│  [P0] K1 [P1] K2 [P2] ... Kn [Pn]       │
│   ↓       ↓       ↓           ↓          │
│  <K1    K1~K2   K2~K3       >Kn         │
└──────────────────────────────────────────┘

Leaf Node:
┌─────────────────────────────────────────────┐
│ [K1, Data1] [K2, Data2] ... [Kn, DataN] →  │
│                                          next │→
└─────────────────────────────────────────────┘

Core Operations

Insert

  1. Locate target leaf node
  2. If node has space → insert directly
  3. If node is full → Split:
    • Allocate new node
    • Move half of keys to new node
    • Insert separator key into parent
    • Recursively propagate splits upward
Before split:  [10, 20, 30, 40, 50] (order=5, full)

After split:   [10, 20, 30] → parent [40]
               [40, 50]          [10, 20, 30] [40, 50]

Delete

  1. Locate target leaf node
  2. Delete key
  3. If node underflows → Merge or Redistribute
Redistribute (from right sibling):
[10, 20] → [10] (underflow)    → pull down 30 from parent, pull up 35 from right
[30, 35, 40]                    → [30, 35] [40]

Concurrency Control

Multi-threaded B+Tree access requires consistency guarantees:

Latch Crabbing Protocol

Search phase: parent unsafe → lock child, release parent
Split phase:  lock all nodes on path, propagate upward
Safe node:    won't split → release ancestor locks

Optimistic Approach

  • Assume tree structure doesn’t change (most ops don’t trigger splits/merges)
  • Use read locks during search
  • If modification needed → reacquire write locks
  • Used by both MySQL InnoDB and PostgreSQL btree indexes

Clustered vs Secondary Indexes

TypeWhat leaf holdsCharacteristics
ClusteredFull data rowData is index, physically ordered
SecondaryPrimary key referenceRequires table lookup

MySQL InnoDB’s primary key is a clustered index — data is physically ordered by the PK. PostgreSQL does not use clustered indexes; all indexes are secondary (pointing to CTID).

B+Tree Variants

  • B-link Tree: leaf siblings linked via pointers (all B+Trees)
  • B*-Tree: split redistributes across two siblings, reducing split frequency
  • Fractal Tree: buffers writes in internal nodes, flushes batches to leaves

Summary

  • B+Tree achieves O(log n) lookups via controlled tree height and fanout
  • Split/merge operations are the primary write performance factor
  • Concurrency control must balance correctness and throughput
  • Clustered vs non-clustered index choice impacts query performance

References