Why Buffer Pools?

Disk I/O is the primary performance bottleneck in databases. A random disk read costs ~0.1–1ms, while memory access is ~100ns. The buffer pool caches hot pages in memory, minimizing disk access.

Buffer Pool Architecture

┌────────────────────────────────────────────┐
│              Buffer Pool                    │
│                                            │
│  ┌──────┐ ┌──────┐ ┌──────┐     ┌──────┐  │
│  │Frame 0│ │Frame 1│ │Frame 2│ ... │Frame N│ │
│  │ Page  │ │ Page  │ │ Free  │     │ Page  │ │
│  │  #102 │ │  #55  │ │       │     │  #204 │ │
│  │usage=3│ │usage=1│ │       │     │usage=2│ │
│  └──────┘ └──────┘ └──────┘     └──────┘  │
│                                            │
│  ┌──────────────────────────────┐          │
│  │      Clock Algorithm          │          │
│  │   Hand → Frame 2             │          │
│  └──────────────────────────────┘          │
└────────────────────────────────────────────┘

Page Lifecycle

[Disk] ──read──→ [Buffer Pool Free Frame]

                     ├── Pin (refcount +1)


               [Active Page]

                     ├── Unpin (refcount -1)


               [Evictable Page]

              ┌──────┴──────┐
              │              │
         (Evicted)     (Re-pinned)
              │              │
              ▼              ▼
          [Disk]      [Active Page]

Replacement Algorithms

Clock

See PostgreSQL Clock-sweep Algorithm Walkthrough for a deep dive.

Core idea: treat the buffer pool as a circular buffer with a clock hand. On each eviction attempt:

  1. Advance hand
  2. If current page usage_count > 0: decrement and continue
  3. If usage_count == 0: select as victim, flush dirty page, replace

LRU-K

An enhanced LRU variant that tracks timestamps of the last K accesses, predicting the next access time. Better than pure LRU at preventing sequential scan pollution.

Concurrency Control

Buffer pool concurrency must balance performance and correctness:

Mechanism Granularity Purpose
Latch Per-frame Protects page content read/write
Partitioned lock Frame groups Reduces global contention
Lock-free (CAS) Per-frame High-performance scenarios

PostgreSQL uses a hybrid of per-frame LWLock + atomic state bits.

Prefetching

For predictable access patterns like sequential scans and index scans, proactively load likely-needed pages:

Sequential scan prefetch:
Scan Page 0 → background load Pages 1–3
Scan Page 1 → cache hit ✓
Scan Page 2 → cache hit ✓

Summary

  • Buffer pools are the backbone of database performance
  • Clock algorithm is a practical concurrent replacement for LRU
  • Multi-level locking (partitioned + per-frame) is a critical optimization
  • Prefetching significantly boosts sequential scan performance

References