Backup and Recovery · PostgreSQL
Recovery Leadership for On-Call Rotations
A four-week program on the human side of recovery — leading an incident, writing a post-mortem worth reading, and building rotation practices that do not exhaust the team.
About this cohort
Technical recovery is only half the work. This cohort focuses on the operator-leader role during and after an incident: how to set tempo, how to delegate the right tasks to the right shoulders, and how to write a post-mortem that engineers will actually open. We use recorded incident audio and transcripts (anonymised, with permission) to dissect decisions in slow motion, and we practice the small rituals — pre-incident readiness checks, intra-rotation handovers, decompression after long nights — that decide whether a rotation is sustainable.
Inclusions
- Recorded incident dissections with explicit decision points
- Templates for incident command, scribe, and communicator roles
- Post-mortem writing workshop with peer feedback
- Rotation health audit framework you can run on your own team
- Two live mock-incident sessions in week three and week four
By the end you can
- 01 Lead an incident bridge with clear roles and a calm tempo
- 02 Write a post-mortem that engineers outside the room can act on
- 03 Audit your on-call rotation against sustainable practice
Programme lead
Choi Areum
Learner Success Manager who pairs cohort members with the right incident scenarios. She also coordinates our paired drills and post-program follow-ups.
Common questions
From the cohort
-
The week-four mock incident was exhausting in the right way. I rewrote my team’s post-mortem template that weekend.
Areum L.Areum L.
Senior operator 4.8/5
-
Useful, careful pacing; the audit framework gave me language for a conversation I had been avoiding with my manager.
Joon-hoJoon-ho
4.6/5 · survey