Increased check scheduling latency in us-east-1
Incident Report for Checkly
Postmortem

Incident Summary: Elevated Browser Check Scheduling Latency in us-east-1

Date: March 28-29, 2024

Overview

On March 28th from 23:40 UTC to March 29th at 1:43 UTC, Checkly experienced elevated browser check scheduling delay in the us-east-1 region due to an unexpected surge in browser check runs, attributed to suspicious activity. This incident impacted approximately 37,600 browser and multistep checks, with the maximum delay reaching 44.5 minutes. API checks and heartbeat checks remained unaffected. All delayed checks were eventually processed. We didn’t detect any signs of false positives caused by this incident.

Root Cause

Suspicious activity within one customer’s accounts generated over 21,000 additional browser check runs in a short period of time, saturating the us-east-1 region's capacity and causing the delays.

Conclusion

We apologize for any inconvenience and assure you that all delayed checks were executed without any false positives. Checkly is committed to providing a reliable service and we are taking steps to enhance our resilience and security.

Support

For questions or concerns, please contact support@checklyhq.com. We value your trust and are here to help.

Posted Mar 29, 2024 - 17:39 UTC

Resolved
This incident has been resolved.
Posted Mar 29, 2024 - 01:50 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Mar 29, 2024 - 01:37 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Mar 29, 2024 - 01:16 UTC
Investigating
We are currently investigating this issue.
Posted Mar 29, 2024 - 00:22 UTC
This incident affected: Browser check runtime.