TL;DR
CMU 15-445/645, taught by Prof. Andy Pavlo, is a graduate-level database systems course covering buffer pools, B+Tree indexes, query optimization, MVCC, and distributed transactions. This guide maps out the complete learning path.
Course Overview
| Item | Details |
|---|---|
| Course Site | 15445.courses.cs.cmu.edu |
| Videos | YouTube “CMU Database Group” |
| Instructor | Andy Pavlo |
| Difficulty | Graduate level; OS basics recommended |
Core Modules
1. Storage Engine (Lectures 3-6)
- Buffer Pool Management: Clock algorithm vs LRU-K
- Page Layout: Slotted pages, log-structured storage
- Project #1: Build a Buffer Pool Manager
2. Indexing (Lectures 7-10)
- B+Tree: Insert, delete, merge, split
- Hash Indexes: Extendible Hashing, Linear Hashing
- Project #2: Implement a B+Tree index
3. Query Engine (Lectures 11-14)
- Query Optimization: RBO → CBO, Selinger optimizer
- Execution Models: Volcano model, vectorized, JIT
- Project #3: Implement query executors (Join, Aggregation)
4. Transactions & Concurrency (Lectures 15-18)
- MVCC: Timestamp ordering, OCC
- Two-Phase Locking: Deadlock detection, hierarchical locks
- Project #4: Implement 2PL and MVCC
Study Tips
- Start with Lectures 1-2 for course overview and C++ prerequisites
- Per module: watch video → read assigned paper → code Project
- Don’t skip papers: ARIES, Raft, Spanner — they’re essential
- Use BusTub tests as the ground truth when debugging Projects
Companion Resources
| Resource | Link |
|---|---|
| BusTub Source | github.com/cmu-db/bustub |
| Lecture Slides | Download from course site |
| Community Notes | GitHub topic: cmu-15445 |
| Discord | CMU Database Group Discord |
Advanced Courses
After 15-445, consider:
- CMU 15-721: Advanced Database Systems (columnar, JIT, HTAP)
- CMU 15-826: Multimedia Databases & Data Mining
Summary
CMU 15-445 is the gold standard for learning database internals. Complete all 4 Projects and you’ll have built a working disk-based relational database engine from scratch.