The Page Layout Design Challenge
Data page design faces two core tensions:
- Variable-length fields: strings, JSON, arrays — lengths are unpredictable
- In-place update vs append-only: balancing write and read performance
Slotted Pages
Slotted Pages is the most widely used page layout in relational databases — both PostgreSQL and MySQL InnoDB adopt this design.
┌──────────────────────────────────────┐
│ Page Header │
│ ┌──────────┬──────────────┐ │
│ │ LSN (8B) │ Free Space Ptr│ ... │
│ └──────────┴──────────────┘ │
├──────────────────────────────────────┤
│ │
│ Free Space (Gap) │
│ ↓ ↓ ↓ ↓ ↓ ↓ ↓ │
│ │
├──────────────────────────────────────┤
│ Tuple 2 (id=2, name="Bob") │
├──────────────────────────────────────┤
│ Tuple 1 (id=1, name="Alice") │
├──────────────────────────────────────┤
│ Item Pointers │
│ ┌─────────────────────────────────┐ │
│ │ Slot 1: offset=XXX, length=YYY │ │
│ │ Slot 2: offset=XXX, length=YYY │ │
│ └─────────────────────────────────┘ │
└──────────────────────────────────────┘
Key Features
- Tuples grow backward from page end; slot array grows forward from header
- Slot array records each tuple’s offset and length — supports variable-length fields
- Deletion just marks the slot — no data movement (vacuum reclaims space later)
- Heap-Only Tuple (HOT) updates: create new versions in-page, avoiding index updates
PostgreSQL Implementation
// src/include/storage/bufpage.h
typedef struct PageHeaderData {
PageXLogRecPtr pd_lsn; // WAL LSN
uint16 pd_checksum; // checksum
uint16 pd_flags; // flags
LocationIndex pd_lower; // start of free space
LocationIndex pd_upper; // end of free space
LocationIndex pd_special; // special space offset
uint16 pd_pagesize_version;
ShortTransactionId pd_prune_xid;
ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; // row pointer array
} PageHeaderData;
Alternative Layouts
Log-Structured Pages
Never overwrite; append new versions:
Page v1: [Tuple A, Tuple B]
Page v2: [Tuple A, Tuple B', Tuple C] ← append B', C
Page v3: [Tuple A, Tuple B', Tuple C']
- Pros: extremely fast writes (sequential append), simple crash recovery
- Cons: needs compaction to merge versions, read amplification
- Used by: LSM-Tree (RocksDB), WiredTiger
PAX (Partition Attributes Across)
Columnar grouping within a page:
┌─────┬──────────┬──────────┬──────────┐
│ │ Column A │ Column B │ Column C │
│Row 1│ val_a1 │ val_b1 │ val_c1 │
│Row 2│ val_a2 │ val_b2 │ val_c2 │
│ ... │ ... │ ... │ ... │
└─────┴──────────┴──────────┴──────────┘
- Pros: efficient column-wise computation, high compression
- Cons: row-level operations require multiple memory jumps
- Used by: DuckDB, MonetDB/X100
Summary
| Scheme | Best For | Example Systems |
|---|---|---|
| Slotted Pages | OLTP (balanced R/W) | PostgreSQL, MySQL |
| Log-structured | Write-heavy | RocksDB, Cassandra |
| PAX / Columnar | Analytical queries | DuckDB, Vertica |