Subset of browser checks failing
Incident Report for Checkly
Postmortem

We are currently modernizing and improving our runner infrastructure. During the rollout of the new sandbox runner, we shipped a broken deployment.

Here is what happened:

  1. We shipped a broken deployment, which resulted in false positives in a subset of checks.
  2. Additionally, the rollback of the deployment failed, prolonging the incident.
  3. The lack of proper internal communication about the system's state and rollback led to confusion and a prolonged response even more.

The incident affected browser checks in two specific cases:

  1. When using Twilio in any runtime - due to TWILIO_ACCOUNT_SID environment variable being improperly escaped. Affected check runs failed with an error message: accountSid must start with AC
  2. Checks using mocha in runtimes 2022.02 and 2022.10 with Playwright. These checks failed with an error Could not load reporter /checkly/functions/src/browser-check-common/src/mocha/reporter.js:

Impact:

  • A total of 10 customers were affected.
  • A total of 120 browser check runs (0.5%) failed due to this incident.
  • A total of 66 different checks were affected.
  • The broken deployment was shipped at 15:30 and was rolled back at 16:10 UTC

We apologize for any problems this may have caused. If you have any questions or concerns feel free to contact us via support@checklyhq.com

Posted Mar 22, 2024 - 17:31 UTC

Resolved
This incident has been resolved.
Posted Mar 21, 2024 - 17:50 UTC
Monitoring
We have rolled back the breaking change and are monitoring our systems.
Posted Mar 21, 2024 - 17:20 UTC
Update
The rollback is in progress, but taking longer than usual.
Posted Mar 21, 2024 - 17:14 UTC
Identified
We have identified the issue and are rolling back the breaking change.
Posted Mar 21, 2024 - 16:56 UTC
Investigating
A small subset of browser checks are failing when using specific environment variables. We are investigating
Posted Mar 21, 2024 - 16:21 UTC
This incident affected: Browser check runtime.