Infrastructure Engineering Resources
Of the folks I chatted with, the most common way of learning about infrastructure engineering was working professionally with experienced peers. That is, indeed, among the most effective way to learn about infrastructure, but it’s not always an accessible option, and certainly not the only way.
This is a collection of resources that I, or folks I’ve chatted to, found valuable. The majority of these resources are organized into alphabetically-ordered categories, but I wanted to start by recognizing a handful of foundational resources that I’d recommend starting with first:
- Thinking in Systems: A Primer: Donella Meadows
- Accelerate: Forsgren, Humble, and Kim
- Reading one of The Phoenix Project (Kim, Behr, Spafford) or The Unicorn Project: Kim (Gene Kim)
Once you’ve read those, move to a section of particular interest and dive in.
Architecture
- A Philosophy of Software Design: John Ousterhout
- Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis: Adam Tornhill
Career
- The Manager’s Path: Camille Fournier – a great career resource for engineers, even if you’re not considering management
- The Effective Engineer: Edmond Lau, Bret Taylor
- Staff Engineer: Leadership beyond the management: Will Larson, Tanya Reilly
- The Engineer/Manager Pendulum: Charity Majors
Design Docs, Tech Specs, RFCs, and so on
- A practical guide to writing technical specs
- Design Docs at Google
- Design Docs, Markdown, and Git
- Documenting Architecture Decisions
- How to write a better technical design document
- Technical Decision-Making and Alignment in a Remote Culture
- Writing Technical Design Docs
Developer Productivity
- Accelerate’s definition of developer productivity
- The SPACE of Developer Productivity
- DORA Research Program – DevOps Research & Assessment reports, particularly the annual state of DevOps reports
- Migrations: the sole scalable fix to tech debt
- You can’t reason about big balls of mud
- Managing technical quality in a codebase
Metrics & Measurement
Papers
- Dynamo: Amazon’s Highly Available Key-value Store
- On Designing and Deploying Internet-Scale Services
- No Silver Bullet - Essence and Accident in Software Engineering
- Out of the Tar Pit
- The Chubby lock service for loosely-coupled distributed systems
- Bigtable: A Distributed Storage System for Structured Data
- Raft: In Search of an Understandable Consensus Algorithm
- Paxos Made Simple
- SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol
- Hints for Computer System Design
- Big Ball of Mud
- The Google File System
- CAP Twelve Years Later: How the Rules Have Changed
- Harvest, Yield, and Scalable Tolerant Systems
- MapReduce: Simplified Data Processing on Large Clusters
- Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- Kafka: a Distributed Messaging System for Log Processing
- Large-scale cluster management at Google with Borg
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Papers We Love is a great community to find more!
Philosophy & Approach
- Technical Decision Making by Cindy Sridharan
- Effective Mental Models for Code and Systems by Cindy Sridharan
- “I Wouldn’t Start from Here”. How to make a big technical change by Tanya Reilly
- Computers can be understood by Nelson Elhage
- Maintaining platform-product fit
- Magnitudes of exploration
Planning
- Infrastructure between cost center and ego trip
- Infrastructure planning: users, baselines and timeframes
- How to invest in technical infrastructure
Reliability
separate out on-call? pagerduty manual jelli manual
- Incident response, programs and you(r startup)
- Writing a reliability strategy: reason about complex things with system models
- Healthchecks at scale
- Describing fault domains
- Don’t follow the sun
Roles
Strategy
- Write five, then synthesize: good engineering strategy is boring
- A Framework For Responsible Innovation
- How Big Technical Changes Happen at Slack - Several People Are Coding
- On Drafting an Engineering Strategy
- Defining a Tech Strategy
- Delivering on an architecture strategy
- Stepping Stones not Milestones
- Achieving Alignment and Efficiency Through a Technical Strategy
- The difficult teenage years: Setting tech strategy after a launch by Anna Shipman
- Learning to have an engineering vision
Technical writing
- Docs for Developers: Bhatti, Corleissen, Lambourne, Nunez, Waterhouse
Tools
Uncategorized
These are valuable resources that don’t quite fit into one of the above categories.