Disaster Recovery Runbook
Aligned with NIST SP 800-34r1, FCA regulatory expectations, and best practices for operational resilience.
Document Owner: IT Continuity Lead Version: 1.0 Review Cycle: Semi-annually or post-major change Last Reviewed: [Insert Date]
Overview
Purpose
Guide the recovery of critical systems and services following a disruption.
Scope
Covers systems, applications, networks, and data critical to business operations.
Assumptions
Staff availability, backup availability, offsite access.
References
BIA Document, Crisis Comms Plan, Incident Register, Configuration Management DB.
Roles & Responsibilities
DR Manager
Coordinates execution of runbook
IT Lead
Leads system recovery
Comms Lead
Handles stakeholder updates
Security Officer
Ensures security during recovery
Vendor Liaison
Engages external partners
DR Scenarios & Triggers
Cybersecurity Incident
Ransomware, DDoS
Data encrypted, systems offline
Infrastructure Outage
Power, hosting
Cloud provider outage
Data Loss
Accidental or malicious
Critical database corruption
Site Unavailability
Flood, fire, etc.
Data centre destroyed
Critical Assets and Recovery Details
Payment API
Application
2 hours
15 mins
See Section 6
AWS (EU-WEST)
Customer DB
Database
4 hours
1 hour
See Backup Procedure
Azure
IAM System
Security
1 hour
0 mins
Activate failover
On-prem
Jira/Confluence
Collaboration
8 hours
24 hours
Contact Atlassian support
SaaS
Communication Plan
Internal Teams
Slack / Email
Hourly
DR Manager
Executives
SMS / Call
2-Hourly
Comms Lead
Customers
Status Page
As needed
Comms Lead
Regulators (e.g. FCA)
Email / Call
Within 24h
Compliance Officer
Step-by-Step Recovery Procedures
Example: Payment API Recovery
Trigger: Monitoring system flags downtime or data loss.
Notify: DR team via PagerDuty; escalate to DR Manager.
Initial Response:
Isolate affected VMs
Snapshot logs and preserve evidence
Activate DR site:
Deploy from pre-approved CloudFormation template (AWS)
Data Restoration:
Recover from S3 backup (timestamp T-15min)
Validation:
QA team runs regression test suite
Go/No-Go Decision: DR Manager signs off
Customer Notification: Update status page and email notices
Post-Incident Review:
Root cause analysis
Lessons learned
Update this runbook if needed
Testing & Maintenance Schedule
Tabletop Exercise
Quarterly
[Insert Date]
[Insert Date]
IT Continuity Lead
Full DR Test
Annually
[Insert Date]
[Insert Date]
Security Officer
Backup Restore Test
Monthly
[Insert Date]
[Insert Date]
IT Ops
Appendices
Appendix A: Contact Sheet
Appendix B: Asset Configuration Docs
Appendix C: Backup Schedules & Locations
Appendix D: DR Site Map & Network Diagrams
Appendix E: Incident Response Plan Link
Last updated