Degraded Performance on Admin Portal
Incident Report for Phonism
Postmortem

Summary

On Nov 10, 2020, between 17:09 to 18:07 UTC our US platform experienced degradation of backend database services that impacted the responsiveness of the Admin Portal and led to a 20-minute outage of device provisioning.

Root Cause

Database services reached high load conditions which triggered an issue with our query routing engine resulting in some suboptimal routing of additional queries.

Remediation

Completed:

  • Refinement of query routing rules to ensure primary services stability
  • Additional alarms, alerts, and monitoring of key performance metrics within our core services and infrastructure
  • Training on troubleshooting and responding to similar situations to prevent any loss of service.
Posted Dec 01, 2020 - 16:38 EST

Resolved
Full functionality has been restored and we will continue to monitor.
Posted Nov 10, 2020 - 13:07 EST
Monitoring
We have deployed a hotfix to alleviate the high load on our database. We are monitoring all services at this time and are seeing a vast improvement
Posted Nov 10, 2020 - 12:55 EST
Identified
We have identified an issue causing abnormally high utilization on our primary database.

A hotfix has been created and is being tested in development.
Posted Nov 10, 2020 - 12:34 EST
Investigating
We are investigating reports of slow responsiveness in the Admin Portal, degraded performance in Phone Provisioning, and API.

Some API/Phone Provisioning requests may receive a 504 or take longer than usual.

We will update shortly with more information.
Posted Nov 10, 2020 - 12:09 EST
This incident affected: Phonism United States (Admin Portal, Phone Provisioning (Generic), Phone Provisioning (Cisco)) and API.