How Complex Systems Fail (Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety) Richard Cook. pdf
1) Complex systems are intrinsically hazardous systems.
2) Complex systems are heavily and successfully defended against failure.
3) Catastrophe requires multiple failures – single point failures are not enough..
4) Complex systems contain changing mixtures of failures latent within them.
5) Complex systems run in degraded mode.
6) Catastrophe is always just around the corner.
7) Post-accident attribution accident to a ‘root cause’ is fundamentally wrong.
8) Hindsight biases post-accident assessments of human performance.
9) Human operators have dual roles: as producers & as defenders against failure.
10) All practitioner actions are gambles.
11) Actions at the sharp end resolve all ambiguity.
12) Human practitioners are the adaptable element of complex systems.
13) Human expertise in complex systems is constantly changing
14) Change introduces new forms of failure.
15) Views of ‘cause’ limit the effectiveness of defenses against future events.
16) Safety is a characteristic of systems and not of their components
17) People continuously create safety.
18) Failure free operations require experience with failure.
.
The short paper advances rapidly from numbered paragraph to numbered paragraph as it develops a fresh look at fallibility in systems and their operators.
The author's suspicion of typical root cause analysis is perhaps the biggest surprise.
The ultimate conclusion agrees with that which we have for programmers, practice what you fear the most.
Atul Gawande's Checklist Manifesto was in some way inspired by Cook's work. post
A collection of postmortems. Pull requests welcome. Thirty six contributors offer short summaries. github