Designing Machine Learning Systems — Chip Huyen
The definitive guide to production ML. Covers data engineering, feature stores, training pipelines, model serving, monitoring, and the organizational patterns that make ML teams effective. Required reading for anyone moving models from notebooks to production.
KV-Cache Optimization Papers
Efficient KV-cache management is the key bottleneck for long-context LLM inference. Studying PagedAttention (vLLM), multi-query attention, grouped-query attention, and sliding window approaches. Understanding these tradeoffs directly informs inference infrastructure decisions.
The Staff Engineer’s Path — Tanya Reilly
Technical leadership beyond writing code. Architecture decisions that compound, mentoring that scales, and organizational influence through technical judgment. Rethinking what “impact” means at senior levels.