What is our notification workflow for platform outages/performance issues?
Platform outage workflow + task & notification template
Outage Notification Workflow
- Outage Detection
- Monitor product status continuously.
- Set up automated alerts for any downtime or irregular performance metrics.
- Initial Response
- On outage detection, the system should automatically create a task titled "Outage: [Product Name]" and assign it to the designated response team.
- Notify the response team via email and in-app notifications.
- Investigation
- The response team assesses the situation to determine the cause and potential impact.
- Update the task with findings and estimated resolution time.
- Communication to Stakeholders
- Draft an initial communication to inform users about the outage, including what happened and what is being done.
- Send out the communication through email, social media, and other relevant channels.
- Regular Updates
- Provide regular updates within the task every 30 minutes or as significant developments occur.
- Share these updates with stakeholders until the issue is resolved.
- Resolution
- Once the outage is resolved, update the task status to "Resolved" and notify all team members.
- Send a final communication to stakeholders detailing the resolution and any future preventive measures.
- Review Meeting
- Schedule a post-mortem meeting to discuss the outage, its handling, and steps to prevent future occurrences.
- Document insights and action items in the task for reference.
- Follow-up Actions
- Implement agreed preventive measures and monitor their effectiveness.
- Update any documentation or systems as required based on the review meeting outcomes.
Automations:
- Trigger: Outage detected
- Action: Create task, assign team, send notifications
- Recurrence: Every 30 minutes for updates until resolved
Communication Template:
Subject: [Urgent] Service Disruption - [Product Name]
Dear [User],
We are currently experiencing a service disruption with [Product Name].
Our team is actively working to resolve this issue and we will keep you updated on our progress.
We apologize for any inconvenience this may cause and appreciate your patience.
Sincerely,
[Your Company Support Team]
Task Template:
Title: Outage: [Product Name]
Description:
- Time of detection:
- Suspected cause (if known):
- Impact assessment:
- Communication log:
- Resolution updates:
Assigned to: [Response Team]
Tags: Outage, High Priority
Checklist:
- [ ] Initial stakeholder communication
- [ ] Regular updates posted
- [ ] Final resolution communication
- [ ] Post-mortem scheduled