FDE-012field-patterns/go-live-troubleshooting-checklist.mdUPDATED: 06/18/2026
Go-Live Troubleshooting Checklist
Pattern
Name: Go-live troubleshooting checklist
When to use it: When a prototype or implementation starts handling real users, real customer data, or production integrations.
Why it matters for FDE roles: FDEs are often present when the system first meets operational reality.
Plain-English Description
Go-live troubleshooting is the practice of preparing for likely failures, watching the right signals, and resolving issues with clear ownership.
Situation Signals
- Job listing signal: deployment, production support, troubleshooting, reliability.
- Customer signal: real users or production data are about to enter the workflow.
- Project signal: the system needs monitoring, rollback, support, and ownership.
What To Ask
- What are the most likely failure points?
- Who is on point during go-live?
- What logs or dashboards will we watch?
- How do we pause, rollback, or manually recover?
What To Do
- Confirm credentials, environment, data, and owners.
- Test happy path and known failure paths.
- Watch logs, errors, latency, and user decisions.
- Keep a live issue list with owner and status.
Artifacts To Produce
- Diagram: production workflow and integration boundaries.
- Checklist: go-live readiness and rollback.
- Demo/prototype: final smoke test.
- Customer-facing note: support path and known limitations.
Failure Modes
- No rollback or disable path.
- No one knows where errors are logged.
- Support ownership is unclear.
- The system fails silently.
Interview Language
One sentence I could say in an interview:
For go-live, I want clear owners, smoke tests, logs, rollback paths, and a shared issue list so the customer is not left guessing when reality hits.
Relevant work experience for this pattern: