TL;DR

CMU 15-445/645, taught by Prof. Andy Pavlo, is a graduate-level database systems course covering buffer pools, B+Tree indexes, query optimization, MVCC, and distributed transactions. This guide maps out the complete learning path.

Course Overview

Item Details
Course Site 15445.courses.cs.cmu.edu
Videos YouTube “CMU Database Group”
Instructor Andy Pavlo
Difficulty Graduate level; OS basics recommended

Core Modules

1. Storage Engine (Lectures 3-6)

  • Buffer Pool Management: Clock algorithm vs LRU-K
  • Page Layout: Slotted pages, log-structured storage
  • Project #1: Build a Buffer Pool Manager

2. Indexing (Lectures 7-10)

  • B+Tree: Insert, delete, merge, split
  • Hash Indexes: Extendible Hashing, Linear Hashing
  • Project #2: Implement a B+Tree index

3. Query Engine (Lectures 11-14)

  • Query Optimization: RBO → CBO, Selinger optimizer
  • Execution Models: Volcano model, vectorized, JIT
  • Project #3: Implement query executors (Join, Aggregation)

4. Transactions & Concurrency (Lectures 15-18)

  • MVCC: Timestamp ordering, OCC
  • Two-Phase Locking: Deadlock detection, hierarchical locks
  • Project #4: Implement 2PL and MVCC

Study Tips

  1. Start with Lectures 1-2 for course overview and C++ prerequisites
  2. Per module: watch video → read assigned paper → code Project
  3. Don’t skip papers: ARIES, Raft, Spanner — they’re essential
  4. Use BusTub tests as the ground truth when debugging Projects

Companion Resources

Resource Link
BusTub Source github.com/cmu-db/bustub
Lecture Slides Download from course site
Community Notes GitHub topic: cmu-15445
Discord CMU Database Group Discord

Advanced Courses

After 15-445, consider:

  • CMU 15-721: Advanced Database Systems (columnar, JIT, HTAP)
  • CMU 15-826: Multimedia Databases & Data Mining

Summary

CMU 15-445 is the gold standard for learning database internals. Complete all 4 Projects and you’ll have built a working disk-based relational database engine from scratch.

References