Summary

On Nov 10, 2020, between 17:09 to 18:07 UTC our US platform experienced degradation of backend database services that impacted the responsiveness of the Admin Portal and led to a 20-minute outage of device provisioning.

Root Cause

Database services reached high load conditions which triggered an issue with our query routing engine resulting in some suboptimal routing of additional queries.

Remediation

Completed:

Refinement of query routing rules to ensure primary services stability
Additional alarms, alerts, and monitoring of key performance metrics within our core services and infrastructure
Training on troubleshooting and responding to similar situations to prevent any loss of service.

Posted Dec 01, 2020 - 16:38 EST

Resolved

Full functionality has been restored and we will continue to monitor.

Posted Nov 10, 2020 - 13:07 EST

Monitoring

We have deployed a hotfix to alleviate the high load on our database. We are monitoring all services at this time and are seeing a vast improvement

Posted Nov 10, 2020 - 12:55 EST

Identified

We have identified an issue causing abnormally high utilization on our primary database.

A hotfix has been created and is being tested in development.

Posted Nov 10, 2020 - 12:34 EST

Investigating

We are investigating reports of slow responsiveness in the Admin Portal, degraded performance in Phone Provisioning, and API.

Some API/Phone Provisioning requests may receive a 504 or take longer than usual.

We will update shortly with more information.

Posted Nov 10, 2020 - 12:09 EST

This incident affected: Phonism United States (Admin Portal, Phone Provisioning (Generic), Phone Provisioning (Cisco)) and API.