Field note
Field Notes from a Weekend Incident Caused by Stale Context
Stale context incidents are expensive because the system appears coherent until operators compare the wrong assumptions across agents.
Weekend incidents become long incidents when the team discovers that agents acted on different assumptions and the audit trail does not make that mismatch obvious.
The expensive part
- Every retry reuses the wrong premise.
- Operators compare outputs instead of assumptions.
- Recovery starts late because the failure looks random.
Use the security docs, audit schema, and architecture guide to expose stale-context failures faster.