What is a Storage Engine?
A storage engine is the core component of a DBMS responsible for organizing, accessing, and persisting data between disk and memory. It directly determines read/write performance, space utilization, and data durability.
Core Responsibilities
┌──────────────────────────────────────────┐
│ Storage Engine │
├──────────────────────────────────────────┤
│ Page Layout │ Buffer Pool│ File Layout │
├──────────────┼───────────┼──────────────┤
│ WAL │ Vacuum │ Compression │
└──────────────────────────────────────────┘
1. Page Management
Databases divide data into fixed-size pages (typically 4KB–16KB), the smallest unit of I/O. Page layout determines:
- How a row is stored
- How variable-length fields are handled
- How fast lookups work
Common layouts:
- Slotted Pages: an array of slot pointers at page end → row offsets (PostgreSQL, MySQL)
- Log-structured: append-only, never overwrite (LSM-Tree, RocksDB)
- PAX: columnar grouping within a page (DuckDB)
2. Buffer Pool
The buffer pool is an in-memory page cache that reduces disk I/O. Core challenges:
- Replacement policy: which page to evict (LRU, Clock, LRU-K)
- Concurrency: fine-grained locking for multi-threaded access
- Prefetching: loading likely-needed pages proactively
3. File Organization
How data is laid out on disk:
- Heap File: pages linked into a list, supports random read/write
- B+Tree: the primary key index is the data store (MySQL InnoDB clustered index)
- LSM-Tree: sequential writes + background compaction, write-optimized
Why Storage Engine Choices Matter
Different engines optimize different trade-offs:
| Workload | Recommended Engine | Reason |
|---|---|---|
| OLTP (high-concurrency) | B+Tree (InnoDB) | Fast point queries, rich transactional support |
| OLAP (analytical) | Columnar (DuckDB) | Scan-heavy, column-wise efficient |
| Log/Time-series writes | LSM (RocksDB) | Extremely fast sequential writes |
| In-memory | Memory B+Tree / Skip List | Zero disk I/O overhead |
What’s in This Section
This chapter series covers storage engine components in depth:
- Buffer pool management and page replacement algorithms
- Page layouts: Slotted Pages vs log-structured
- Data file organization methods
- Compression and encoding techniques
References
- PostgreSQL Docs: Database Physical Storage
- MySQL Docs: InnoDB Storage Engine
- Alex Petrov, Database Internals, Chapter 1–3