Thoughts

Before It's Too Late to Ask

March 25, 2026 ·
#ai#ai-safety

Learning about AI progress and safety — come along.
Routing Inference Requests

March 23, 2026 ·
#ai#distributed-systems

Building a cache-aware router, pointing it at real GPUs, and measuring the tradeoff between latency and throughput.

Cloud Inference · Part 3
Inside the GPU Server

March 21, 2026 ·
#ai#distributed-systems

Weight streaming, KV caches, and why inference routing is a different problem.

Cloud Inference · Part 2
Designing an Inference Gateway

February 18, 2026 ·
#ai#distributed-systems

Tracing a request through the system that sits between you and the LLM.

Cloud Inference · Part 1
What I'll Be Writing About

January 26, 2026 ·
#announcement

An introduction to this blog: reflections from a decade in distributed systems at AWS, plus explorations into AI's technical and societal dimensions.