Detect, track, and manage flaky tests automatically. Euriqa scores every test for flakiness and can quarantine unreliable tests to keep your CI green.
Flaky tests — tests that sometimes pass and sometimes fail without code changes — are one of the biggest drains on engineering productivity. They erode trust in CI, slow down deployments, and waste developer time investigating false failures.
Euriqa detects flakiness by analyzing retry behavior during test runs. When a test fails on some attempts but passes on retries, it is marked as "flaky" rather than "failed". The detection follows this process:
For flakiness detection to work, you must enable retries in your Playwright configuration:
// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
retries: 2, // Enable retries for flakiness detection
reporter: [
['@euriqa/euriqa-playwright', {
apiKey: process.env.EURIQA_API_KEY,
projectId: process.env.EURIQA_PROJECT_ID,
}],
],
});Every test receives a numerical flakiness score from 0 to 1. The score is calculated based on historical pass/fail patterns across multiple runs — not just whether a test failed once.
| Score Range | Severity | Meaning |
|---|---|---|
| 0 | Stable | Perfectly stable, never flaky |
| 0.1 - 0.3 | Low | Occasionally flaky, low severity |
| 0.3 - 0.7 | Moderate | Moderately flaky, should be investigated |
| 0.7 - 1.0 | High | Highly flaky, severely impacts CI reliability |
The flakiness score is a weighted analysis of consistency across recent runs:
flakiness_score = flaky_runs / total_runsWhere flaky_runs is the number of runs where the test exhibited flaky behavior (failed on some attempts, passed on retries) and total_runs is the total number of runs within the scoring window.
curl https://app.euriqa.dev/api/flakiness \
-H "X-API-Key: your-api-key" \
-G -d "projectId=your-project-id"Returns an array of tests with their flakiness scores, total runs, flaky runs, last flaky timestamp, and quarantine status.
The flakiness threshold determines which tests are flagged as high-severity and, when auto-quarantine is enabled, which tests are automatically quarantined. The default threshold is 0.3.
Update the flakiness threshold in your project settings:
curl -X PATCH https://app.euriqa.dev/api/projects/your-project-id \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"flakinessThreshold": 0.5
}'Quarantined tests are flagged across the Euriqa dashboard so your team knows they are being managed. Quarantine does not skip tests — it marks them so their results can be interpreted in context.
You can manually quarantine or unquarantine any test from the flakiness dashboard with a single click, or via the API:
# Quarantine a test
curl -X POST https://app.euriqa.dev/api/flakiness \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"testId": "test-id",
"quarantined": true
}'# Unquarantine a test
curl -X POST https://app.euriqa.dev/api/flakiness \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"testId": "test-id",
"quarantined": false
}'When auto-quarantine is enabled, tests that exceed the configured flakiness threshold are automatically quarantined. This keeps your CI green without manual intervention.
Enable auto-quarantine in your project settings:
curl -X PATCH https://app.euriqa.dev/api/projects/your-project-id \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"autoQuarantine": true,
"flakinessThreshold": 0.3
}'The flakiness section of the Euriqa dashboard provides several views to help you understand and manage test reliability.
Key metrics displayed at the top of the flakiness page:
A line chart showing how flakiness scores change over time. Use it to identify whether flakiness is getting better or worse, correlate spikes with code changes or infrastructure issues, and track improvement across configurable time ranges.
A detailed, sortable table of every flaky test with the following columns:
| Column | Description |
|---|---|
| Test Name | Full title with file path |
| Flakiness Score | Numerical score, sorted highest first by default |
| Total Runs | How many times the test has been executed |
| Flaky Runs | How many runs showed flaky behavior |
| Last Flaky | Timestamp of when the test was last flaky |
| Quarantine Toggle | One-click quarantine/unquarantine per test |
retries: 2 in your Playwright config. This is the foundation of flakiness detection.