Email testing in CI without burning real inboxes

How to wire programmable email into GitHub Actions, CircleCI, and GitLab — with parallelization, secret scoping, and concurrency that does not blow up at 50 workers.

May 8, 2026 by catchotp team

Most teams do not have a CI email-testing problem so much as a CI email-testing accidental architecture. The first test that needs an OTP gets pointed at a personal Gmail. The second one points at qa@yourcompany.com. By the time the third arrives, someone has stood up Mailpit in a sidecar container, and the fourth team gives up and mocks. None of these are good.

This post walks through the actual shape of email testing in CI in 2026, with concrete configs for GitHub Actions, CircleCI, and GitLab CI, plus the parallelism and secret-management patterns that hold up when your test suite grows from 5 email tests to 500.

The four CI failure modes for email tests

Before the configs, the failure modes — because the configs only matter to the extent that they avoid these.

Failure 1: shared inbox contention

Two tests run in parallel, both wait for “the latest email at qa@yourcompany.com,” and the second one reads the OTP intended for the first. Test A passes. Test B fails on a verification-code-mismatch error that has nothing to do with the change under test.

Failure 2: stale fixtures from yesterday’s run

A regression test reads “the most recent email at the QA address” and gets one from yesterday’s CI run that nobody cleaned up. The test passes against stale data. Real bug ships.

Failure 3: secret leaks via fixture data

Someone hard-codes an API key into a test fixture, the fixture gets logged at debug level, the log gets shipped to Sentry, and the API key now lives in three places it should not. None of those three are revoked when the key is rotated.

Failure 4: provider rate limits caused by parallelism

Your test suite hits a single email provider with 50 parallel workers. The provider’s per-second rate limit is 30. Tests fail intermittently for “no email arrived” reasons, which look like flakes, which get retried, which makes the rate limit worse.

Every CI pattern below is shaped to avoid all four.

The right shape

Three properties make a CI email-testing setup reliable.

Per-test inbox. Every test gets a fresh address. No sharing, no contention, automatic cleanup via TTL.
Long-poll waiter. The test blocks on a single HTTP request that resolves the moment the email arrives. No sleep(), no polling loop, no race with stale fixtures.
Per-pipeline scoped credentials. Each CI pipeline has its own scoped API key. A leak in staging E2E does not blast-radius into production.

What this looks like in practice:

import { CatchOTP } from '@catchotp/sdk';
const otp = new CatchOTP({ apiKey: process.env.CATCHOTP_KEY! });

const inbox = await otp.inboxes.create({ mode: 'ephemeral', ttlMinutes: 10 });
const code = await otp.inboxes.waitForOtp(inbox.id, { timeoutSeconds: 30 });

Three lines. The CI integration is mostly about wiring the secret in and respecting concurrency caps.

GitHub Actions

The canonical config for a Playwright test suite that uses email OTP.

name: e2e
on: [pull_request, push]

jobs:
  e2e:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm
      - run: corepack enable && pnpm install --frozen-lockfile
      - run: pnpm exec playwright install --with-deps chromium
      - name: Run tests
        env:
          CATCHOTP_KEY: ${{ secrets.CATCHOTP_KEY_CI }}
        run: pnpm test:e2e --shard=${{ matrix.shard }}/4

Three things to notice.

Shard, do not parallelize within a worker. GitHub’s matrix gives you four parallel runners. Each runner runs Playwright with workers=1 or workers=2 (Playwright defaults to half the CPU count on shared CI runners). Total concurrent inboxes ≈ shards × workers — usually 4 × 2 = 8. Comfortably under the Pro plan’s 50-inbox cap.

Use a scoped key. CATCHOTP_KEY_CI is a separate API key from CATCHOTP_KEY_LOCAL and CATCHOTP_KEY_PROD. Rotate them independently. A leak in CI logs does not require rotating the production key.

Don’t fail-fast. When one shard fails because of a real bug, you want the other shards to keep running so you can see the full picture in one CI cycle. fail-fast: false is the right default here.

The wrong way (don’t do this)

# DO NOT COPY THIS
strategy:
  matrix:
    shard: [1, 2, ..., 50]  # 50 parallel runners
env:
  CATCHOTP_KEY: ${{ secrets.CATCHOTP_KEY }}  # one key everywhere
  TEST_EMAIL: qa@yourcompany.com  # shared inbox

This is the four-failure-modes config in three lines.

CircleCI

CircleCI’s parallelism is per-job, not per-matrix. The shape is similar but the syntax is different.

version: 2.1
jobs:
  e2e:
    docker:
      - image: cimg/node:20.18-browsers
    parallelism: 4
    steps:
      - checkout
      - run: corepack enable
      - run: pnpm install --frozen-lockfile
      - run: pnpm exec playwright install --with-deps chromium
      - run:
          name: e2e
          command: |
            pnpm test:e2e \
              --shard=$((CIRCLE_NODE_INDEX + 1))/$CIRCLE_NODE_TOTAL
          environment:
            CATCHOTP_KEY: $CATCHOTP_KEY_CI

workflows:
  test:
    jobs:
      - e2e:
          context: catchotp-ci

Two CircleCI-specific notes:

Use a context, not a per-job env var. CircleCI Contexts give you per-pipeline scoped secret management with audit. The catchotp-ci context can be locked down to specific branches (e.g., only main and PRs from contributors with write access).

Mind the parallelism semantics. CircleCI’s parallelism: 4 runs four copies of the same job. Inside each copy, CIRCLE_NODE_INDEX is 0..3 and CIRCLE_NODE_TOTAL is 4. Pass them through to your test runner’s sharding flag.

GitLab CI

GitLab’s parallelism is parallel: N. The variable management is different again.

e2e:
  image: node:20-bullseye
  parallel: 4
  before_script:
    - corepack enable
    - pnpm install --frozen-lockfile
    - pnpm exec playwright install --with-deps chromium
  script:
    - pnpm test:e2e --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
  variables:
    CATCHOTP_KEY: $CATCHOTP_KEY_CI
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

Inside the runner, $CI_NODE_INDEX is 1..N (note: 1-indexed, unlike CircleCI). Pass it through accordingly.

GitLab’s masked variables protect against the secret accidentally appearing in logs. Mark CATCHOTP_KEY_CI as masked at the project or group level. This is essentially free defense-in-depth.

Concurrency: the math you need to do once

Inboxes are the resource that gets capped. Plans typically expose:

Plan	Concurrent inboxes
Free	5
Pro	50
Team	500

The math: total parallel CI workers × inboxes per worker ≤ your cap.

The most common shape:

Solo developer / small project on Free: 4 shards × 1 inbox per shard = 4. Comfortably under 5.
Mid-size project on Pro: 8 shards × 4 workers per shard × 1 inbox = 32. Fits under 50 with headroom.
Large org on Team: dozens of pipelines, each with its own scoped key, contributing to a 500-inbox shared pool. Per-pipeline alerts on approaching the cap.

If your math says “we’d need 60 concurrent inboxes on Pro,” the answer is either upgrading to Team or serializing the email-using tests with a Playwright @otp tag and --workers=4 on those specifically.

Secrets management

Three rules cover most teams.

Rule 1: per-pipeline scoped keys

Create a separate API key for each pipeline that needs one. Naming convention: <service>-<env>-<purpose>. Examples: signup-staging-e2e, billing-prod-smoke. The key is scoped to that pipeline’s CI variables and nowhere else.

Rule 2: rotate on the same cadence as everything else

Add the catchotp keys to the same rotation list as your AWS access keys, your Stripe keys, and your provider tokens. Quarterly is the most common cadence.

Rule 3: never log the key

This sounds obvious; in practice, every team has at least one place where it gets logged. The two common patterns:

// BAD — logs the entire env including secrets
console.error('test failed', { env: process.env });

// BAD — logs the SDK config which includes the key
console.error('client config', otp.config);

// GOOD — log specific safe values
console.error('test failed', { inboxId: inbox.id, address: inbox.address });

The single most useful prevention: a CI step that scans for likely secret patterns in test output and fails the build if found. gitleaks works, as does the GitHub native secret scanner.

Parallelization patterns

The framework-level parallelism layered with the CI-level parallelism is where teams get confused. The mental model:

CI (sharding)         →   1 of 4 shards
  Test runner (workers) →  2 of 2 workers in this shard
    Test                →   1 inbox per test

Total inboxes in flight = shards × workers. Tests inside a worker run sequentially.

Two anti-patterns to avoid:

All sharding, no workers. 50 shards, 1 worker each. You pay 50× CI minutes for what could be 8 shards × 6 workers.
All workers, no sharding. 1 shard, 50 workers. You hit one CI runner’s CPU cap and the runner thrashes.

The sweet spot is usually 4-8 shards with 2-6 workers each.

What about Mailpit, MailHog, Mailcatcher?

These tools are great for testing your application code. They are not great for testing your email integration.

The thing they cannot do: exercise the real DNS path. Your DKIM record, your SPF record, your transactional provider’s per-recipient reputation, your sending IP’s reputation — none of that is exercised when you point at localhost:1025. The bug “email lands in spam in production” stays uncaught.

The right pattern is to use both:

Unit and component tests: Mailpit. Fast, free, no network.
Integration and E2E tests: real DNS path via programmable email. Catches the things Mailpit cannot.

Different tools for different jobs.

How catchotp helps

We are the receive side that the configs above point at. Every plan, including Free, gives you per-test inbox isolation, sub-second long-poll waiters, and per-pipeline scoped API keys. The Pro tier covers most teams under the 50-inbox concurrency cap. The Team tier covers most engineering orgs.

Free tier: 5 inboxes, 1,000 messages a month, no credit card. Start free or view pricing for the full tier breakdown.

How to Test OTP Flows in 2026 — the full guide to OTP testing patterns and anti-patterns.
How to E2E Test Sign-Up Flows With Real Emails — Playwright and Cypress walkthroughs.
The E2E testing use case covers fixtures, retries, and parallelism in more depth.

The shorter version: per-test inbox, scoped key, four to eight shards, and a long-poll waiter. Do that and email tests stop being the flaky part of your CI.

Email testing in CI without burning real inboxes

The four CI failure modes for email tests

Failure 1: shared inbox contention

Failure 2: stale fixtures from yesterday’s run

Failure 3: secret leaks via fixture data

Failure 4: provider rate limits caused by parallelism

The right shape

GitHub Actions

The wrong way (don’t do this)

CircleCI

GitLab CI

Concurrency: the math you need to do once

Secrets management

Rule 1: per-pipeline scoped keys

Rule 2: rotate on the same cadence as everything else

Rule 3: never log the key

Parallelization patterns

What about Mailpit, MailHog, Mailcatcher?

How catchotp helps

Subscribe to the catchotp newsletter

Try it for yourself.

Email testing in CI without burning real inboxes

The four CI failure modes for email tests

Failure 1: shared inbox contention

Failure 2: stale fixtures from yesterday’s run

Failure 3: secret leaks via fixture data

Failure 4: provider rate limits caused by parallelism

The right shape

GitHub Actions

The wrong way (don’t do this)

CircleCI

GitLab CI

Concurrency: the math you need to do once

Secrets management

Rule 1: per-pipeline scoped keys

Rule 2: rotate on the same cadence as everything else

Rule 3: never log the key

Parallelization patterns

What about Mailpit, MailHog, Mailcatcher?

How catchotp helps

Related reading

Subscribe to the catchotp newsletter

Try it for yourself.