Flakiness Detection & Quarantine

Detect, track, and manage flaky tests automatically. Euriqa scores every test for flakiness and can quarantine unreliable tests to keep your CI green.

How Flakiness Detection Works

Flaky tests — tests that sometimes pass and sometimes fail without code changes — are one of the biggest drains on engineering productivity. They erode trust in CI, slow down deployments, and waste developer time investigating false failures.

Euriqa detects flakiness by analyzing retry behavior during test runs. When a test fails on some attempts but passes on retries, it is marked as "flaky" rather than "failed". The detection follows this process:

The Euriqa reporter tracks retry attempts for each test.
Tests that fail on some attempts but pass on retries are marked as flaky.
Flakiness scores are calculated across historical runs.
Scores update with each new run.
Auto-quarantine triggers when scores exceed the configured threshold.

Prerequisites

For flakiness detection to work, you must enable retries in your Playwright configuration:

typescript

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  retries: 2, // Enable retries for flakiness detection
  reporter: [
    ['@euriqa/euriqa-playwright', {
      apiKey: process.env.EURIQA_API_KEY,
      projectId: process.env.EURIQA_PROJECT_ID,
    }],
  ],
});

Without retries enabled, Euriqa cannot distinguish between a flaky test and a genuinely failing test. We recommend setting retries to at least 2.

Flakiness Score

Every test receives a numerical flakiness score from 0 to 1. The score is calculated based on historical pass/fail patterns across multiple runs — not just whether a test failed once.

Score Range	Severity	Meaning
0	Stable	Perfectly stable, never flaky
0.1 - 0.3	Low	Occasionally flaky, low severity
0.3 - 0.7	Moderate	Moderately flaky, should be investigated
0.7 - 1.0	High	Highly flaky, severely impacts CI reliability

Calculation Formula

The flakiness score is a weighted analysis of consistency across recent runs:

text

flakiness_score = flaky_runs / total_runs

Where flaky_runs is the number of runs where the test exhibited flaky behavior (failed on some attempts, passed on retries) and total_runs is the total number of runs within the scoring window.

Fetching Flakiness Data via API

bash

curl https://app.euriqa.dev/api/flakiness \
  -H "X-API-Key: your-api-key" \
  -G -d "projectId=your-project-id"

Returns an array of tests with their flakiness scores, total runs, flaky runs, last flaky timestamp, and quarantine status.

Flakiness Threshold

The flakiness threshold determines which tests are flagged as high-severity and, when auto-quarantine is enabled, which tests are automatically quarantined. The default threshold is 0.3.

Configuring via API

Update the flakiness threshold in your project settings:

bash

curl -X PATCH https://app.euriqa.dev/api/projects/your-project-id \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "flakinessThreshold": 0.5
  }'

Lowering the threshold will flag more tests as high-severity. Raising it will reduce noise but may miss moderately flaky tests. We recommend starting with the default of 0.3 and adjusting based on your team's tolerance.

Quarantine

Quarantined tests are flagged across the Euriqa dashboard so your team knows they are being managed. Quarantine does not skip tests — it marks them so their results can be interpreted in context.

Manual Quarantine

You can manually quarantine or unquarantine any test from the flakiness dashboard with a single click, or via the API:

bash

# Quarantine a test
curl -X POST https://app.euriqa.dev/api/flakiness \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "testId": "test-id",
    "quarantined": true
  }'

bash

# Unquarantine a test
curl -X POST https://app.euriqa.dev/api/flakiness \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "testId": "test-id",
    "quarantined": false
  }'

Auto-Quarantine

When auto-quarantine is enabled, tests that exceed the configured flakiness threshold are automatically quarantined. This keeps your CI green without manual intervention.

Enable auto-quarantine in your project settings:

bash

curl -X PATCH https://app.euriqa.dev/api/projects/your-project-id \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "autoQuarantine": true,
    "flakinessThreshold": 0.3
  }'

Auto-quarantine is a per-project setting. You can enable it for high-traffic projects while keeping manual control on others.

Dashboard Views

The flakiness section of the Euriqa dashboard provides several views to help you understand and manage test reliability.

Flakiness Overview

Key metrics displayed at the top of the flakiness page:

Total Flaky Tests — Number of tests identified as flaky.
Quarantined Tests — Number of tests currently quarantined.
High-Severity Flaky Tests — Tests with a flakiness score at or above the threshold.
Average Flakiness Score — Overall flakiness health of the project.
Flaky Tests Percentage — Proportion of your test suite that is flaky.

Trend Chart

A line chart showing how flakiness scores change over time. Use it to identify whether flakiness is getting better or worse, correlate spikes with code changes or infrastructure issues, and track improvement across configurable time ranges.

Flaky Tests Table

A detailed, sortable table of every flaky test with the following columns:

Column	Description
Test Name	Full title with file path
Flakiness Score	Numerical score, sorted highest first by default
Total Runs	How many times the test has been executed
Flaky Runs	How many runs showed flaky behavior
Last Flaky	Timestamp of when the test was last flaky
Quarantine Toggle	One-click quarantine/unquarantine per test

Best Practices

Enable retries — Set retries: 2 in your Playwright config. This is the foundation of flakiness detection.
Start with the default threshold — The 0.3 threshold is a good balance between catching flaky tests and avoiding too much noise. Adjust once you understand your baseline.
Enable auto-quarantine on high-traffic projects — If flaky tests are regularly blocking your CI, auto-quarantine keeps your pipeline moving while you investigate.
Review the trend chart weekly — Spot flakiness spikes early before they become a bigger problem. Correlate spikes with recent code changes or infrastructure updates.
Fix root causes, not symptoms— Quarantine is a stopgap. Use Euriqa's trace viewer and test history to identify the root cause of flakiness (timing issues, shared state, environment dependencies).
Unquarantine after fixing — Once you fix a flaky test, unquarantine it and monitor its score over the next few runs to confirm stability.
Track flaky tests percentage — Aim to keep your flaky tests percentage below 5%. A rising percentage indicates systemic test reliability issues.

Flakiness Detection & Quarantine

Detect, track, and manage flaky tests automatically. Euriqa scores every test for flakiness and can quarantine unreliable tests to keep your CI green.

How Flakiness Detection Works

The Euriqa reporter tracks retry attempts for each test.
Tests that fail on some attempts but pass on retries are marked as flaky.
Flakiness scores are calculated across historical runs.
Scores update with each new run.
Auto-quarantine triggers when scores exceed the configured threshold.

Prerequisites

For flakiness detection to work, you must enable retries in your Playwright configuration:

typescript

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  retries: 2, // Enable retries for flakiness detection
  reporter: [
    ['@euriqa/euriqa-playwright', {
      apiKey: process.env.EURIQA_API_KEY,
      projectId: process.env.EURIQA_PROJECT_ID,
    }],
  ],
});

Without retries enabled, Euriqa cannot distinguish between a flaky test and a genuinely failing test. We recommend setting retries to at least 2.

Flakiness Score

Every test receives a numerical flakiness score from 0 to 1. The score is calculated based on historical pass/fail patterns across multiple runs — not just whether a test failed once.

Score Range	Severity	Meaning
0	Stable	Perfectly stable, never flaky
0.1 - 0.3	Low	Occasionally flaky, low severity
0.3 - 0.7	Moderate	Moderately flaky, should be investigated
0.7 - 1.0	High	Highly flaky, severely impacts CI reliability

Calculation Formula

The flakiness score is a weighted analysis of consistency across recent runs:

text

flakiness_score = flaky_runs / total_runs

Where flaky_runs is the number of runs where the test exhibited flaky behavior (failed on some attempts, passed on retries) and total_runs is the total number of runs within the scoring window.

Fetching Flakiness Data via API

bash

curl https://app.euriqa.dev/api/flakiness \
  -H "X-API-Key: your-api-key" \
  -G -d "projectId=your-project-id"

Returns an array of tests with their flakiness scores, total runs, flaky runs, last flaky timestamp, and quarantine status.

Flakiness Threshold

The flakiness threshold determines which tests are flagged as high-severity and, when auto-quarantine is enabled, which tests are automatically quarantined. The default threshold is 0.3.

Configuring via API

Update the flakiness threshold in your project settings:

bash

curl -X PATCH https://app.euriqa.dev/api/projects/your-project-id \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "flakinessThreshold": 0.5
  }'

Quarantine

Quarantined tests are flagged across the Euriqa dashboard so your team knows they are being managed. Quarantine does not skip tests — it marks them so their results can be interpreted in context.

Manual Quarantine

You can manually quarantine or unquarantine any test from the flakiness dashboard with a single click, or via the API:

bash

# Quarantine a test
curl -X POST https://app.euriqa.dev/api/flakiness \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "testId": "test-id",
    "quarantined": true
  }'

bash

# Unquarantine a test
curl -X POST https://app.euriqa.dev/api/flakiness \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "testId": "test-id",
    "quarantined": false
  }'

Auto-Quarantine

When auto-quarantine is enabled, tests that exceed the configured flakiness threshold are automatically quarantined. This keeps your CI green without manual intervention.

Enable auto-quarantine in your project settings:

bash

curl -X PATCH https://app.euriqa.dev/api/projects/your-project-id \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "autoQuarantine": true,
    "flakinessThreshold": 0.3
  }'

Auto-quarantine is a per-project setting. You can enable it for high-traffic projects while keeping manual control on others.

Dashboard Views

The flakiness section of the Euriqa dashboard provides several views to help you understand and manage test reliability.

Flakiness Overview

Key metrics displayed at the top of the flakiness page:

Total Flaky Tests — Number of tests identified as flaky.
Quarantined Tests — Number of tests currently quarantined.
High-Severity Flaky Tests — Tests with a flakiness score at or above the threshold.
Average Flakiness Score — Overall flakiness health of the project.
Flaky Tests Percentage — Proportion of your test suite that is flaky.

Trend Chart

Flaky Tests Table

A detailed, sortable table of every flaky test with the following columns:

Column	Description
Test Name	Full title with file path
Flakiness Score	Numerical score, sorted highest first by default
Total Runs	How many times the test has been executed
Flaky Runs	How many runs showed flaky behavior
Last Flaky	Timestamp of when the test was last flaky
Quarantine Toggle	One-click quarantine/unquarantine per test

Best Practices

Enable retries — Set retries: 2 in your Playwright config. This is the foundation of flakiness detection.
Start with the default threshold — The 0.3 threshold is a good balance between catching flaky tests and avoiding too much noise. Adjust once you understand your baseline.
Enable auto-quarantine on high-traffic projects — If flaky tests are regularly blocking your CI, auto-quarantine keeps your pipeline moving while you investigate.
Review the trend chart weekly — Spot flakiness spikes early before they become a bigger problem. Correlate spikes with recent code changes or infrastructure updates.
Fix root causes, not symptoms— Quarantine is a stopgap. Use Euriqa's trace viewer and test history to identify the root cause of flakiness (timing issues, shared state, environment dependencies).
Unquarantine after fixing — Once you fix a flaky test, unquarantine it and monitor its score over the next few runs to confirm stability.
Track flaky tests percentage — Aim to keep your flaky tests percentage below 5%. A rising percentage indicates systemic test reliability issues.