How to test SMS OTP flows in 2026
End-to-end guide to testing SMS one-time-password flows with Twilio, Telnyx, and Vonage receivers — and why an email-OTP primitive is often a better proxy.
SMS OTP testing is harder than email OTP testing in almost every way. The provider charges per send. The carriers occasionally drop messages. The number-to-message correlation has to live somewhere. And the receiver — the thing your test reads from — is either a Twilio webhook, a Telnyx webhook, or some hand-rolled SMPP client. None of those are zero-config.
This is a complete guide to testing SMS OTP flows in 2026: the three receiver patterns, code samples in Node and Python for Twilio / Telnyx / Vonage, CI considerations that bite teams in week three, and the pragmatic case for using email OTP as a proxy for many of the assertions you actually care about.
Why SMS testing is its own problem
The reason SMS testing diverges from email testing is that the receive side does not fit on your laptop. With email, you can spin up Mailpit and have a real-ish inbox in 10 seconds. With SMS, the receive side requires a phone number that a carrier owns, an inbound webhook that a public URL can reach, and a billing relationship that costs money before you have written any tests.
Concretely: every SMS test run spends 0.7 to 4 cents in carrier fees. A CI suite that runs 200 SMS tests an hour, eight hours a day, is burning between $11 and $64 a day. That cost is invisible until the credit-card alert fires.
The four hard parts:
- You need a real inbound phone number. Carriers do not let you script a fake one. You buy one from Twilio / Telnyx / Vonage / etc.
- You need a public webhook. The provider POSTs the inbound SMS to a URL. Local-only test runs need ngrok or equivalent.
- You need to correlate messages to tests. If two parallel tests both wait for “the latest SMS to +1-555-…”, one of them gets the wrong code.
- You pay per send. Even when the test fails halfway through, the carrier already charged you.
The three patterns below all solve the four hard parts in different ways.
Pattern 1: provider webhooks (Twilio inbound)
This is the canonical setup. You buy a Twilio number, you configure the inbound webhook to hit your test infrastructure, you read incoming SMSes from a queue.
Twilio inbound, Node
import express from 'express';
import twilio from 'twilio';
const app = express();
app.use(express.urlencoded({ extended: false }));
// In-memory store; in prod, this is Redis or DynamoDB
const messages = new Map<string, string[]>();
app.post('/sms-inbound', (req, res) => {
const from = req.body.From as string;
const body = req.body.Body as string;
const list = messages.get(from) ?? [];
list.push(body);
messages.set(from, list);
res.status(204).send();
});
export async function waitForSms(
to: string,
timeoutMs = 30_000,
): Promise<string> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
const list = messages.get(to);
if (list && list.length > 0) {
return list.shift()!;
}
await new Promise((r) => setTimeout(r, 250));
}
throw new Error(`No SMS to ${to} within ${timeoutMs}ms`);
}
The polling-loop receive side is the part that always grows. You start with setInterval, you discover that two parallel tests step on each other, you add a per-test number reservation, and three weeks later you have a small distributed system.
Twilio inbound, Python
import os
import time
from collections import defaultdict
from flask import Flask, request
app = Flask(__name__)
messages: dict[str, list[str]] = defaultdict(list)
@app.post("/sms-inbound")
def inbound():
sender = request.form["From"]
body = request.form["Body"]
messages[sender].append(body)
return "", 204
def wait_for_sms(to: str, timeout_s: int = 30) -> str:
deadline = time.monotonic() + timeout_s
while time.monotonic() < deadline:
if messages.get(to):
return messages[to].pop(0)
time.sleep(0.25)
raise TimeoutError(f"No SMS to {to} within {timeout_s}s")
Same shape. Same problem at scale. The receive side eats more time than the actual test logic.
Pattern 2: Telnyx programmable messaging
Telnyx exposes the inbound side as a Messaging Profile with a webhook. The shape is nearly identical to Twilio; the pricing is meaningfully different (cheaper per message, MMS included).
import express from 'express';
const app = express();
app.use(express.json());
app.post('/telnyx-inbound', (req, res) => {
const { event_type, payload } = req.body.data ?? {};
if (event_type === 'message.received') {
const to = payload.to[0]?.phone_number;
const body = payload.text;
// ...store keyed by `to`
}
res.status(200).send();
});
Telnyx’s webhook envelopes are CloudEvents-style with event_type discrimination. The receive-side code is a switch statement over event types rather than a flat handler.
Pattern 3: Vonage virtual numbers
Vonage (formerly Nexmo) is the third option. Their receive shape is different again — JSON inbound webhook with messageId, to, msisdn (sender), and text keys.
@app.post("/vonage-inbound")
def vonage_inbound():
data = request.get_json()
to = data["to"]
body = data["text"]
messages[to].append(body)
return "", 200
The pattern repeats. The annoying part is that there is no shared interface — each provider invented its own envelope, and your receive-side code accumulates a switch on which provider this number belongs to.
CI considerations
Three things go wrong in CI specifically.
1. The webhook needs a public URL
Twilio cannot POST to localhost:3000. The three options in increasing order of robustness:
- ngrok in CI. Works for prototypes, breaks under parallelism, has paid tiers for stability.
- A long-lived test environment with a fixed webhook URL. Now you have one shared receive side and the per-test correlation problem we mentioned above.
- A managed receive-side service. This is what catchotp does for email — and what we explicitly do not do for SMS yet.
2. Concurrency on a single phone number
A Twilio number can only “be” one number. If three parallel tests send a code to the same number, they all see all three codes. You either lease a pool of numbers (one per worker, $1-2/month each) or serialize the SMS-using tests.
3. Provider rate limits
Most providers cap inbound SMS to a single number at 1 per second. Bursty tests hit this rate limit and the carrier silently drops the second message. Your test fails for “no OTP arrived” reasons; the actual cause is rate-limited.
Why email OTP is often a better proxy
Here is the thesis that always lands hard with teams: most SMS OTP flows in 2026 also support email OTP, and most of the assertions you care about are platform-agnostic.
The assertions that are SMS-specific:
- The carrier delivered the message.
- The provider’s webhook fired.
- The message body was under 160 characters and not split across two SMSes.
The assertions that are platform-agnostic and account for 80% of OTP test value:
- The user can complete the verification step end-to-end.
- The OTP code is the right format and length.
- The code expires when it should.
- Replay protection works (one-time use).
- Rate limits trigger after N failed attempts.
- The session that results from verification has the right scope.
If the only thing your SMS tests are doing is the second list, you can run them through email OTP at a fraction of the cost and complexity, and rely on a smaller, slower, dedicated SMS smoke test for the first list.
The pattern we recommend:
- Most signup tests use email OTP — fast, cheap, deterministic, free under most plans.
- A small SMS smoke suite runs on a schedule — once an hour, two or three tests, against a real Twilio number. Catches the SMS-specific failures.
- Production has synthetic monitoring — a separate concern from CI, runs against the real production stack.
Concrete example: hybrid test suite
Here is a Playwright test that uses email OTP to validate the verification logic, and a separate scheduled test that uses SMS to validate the carrier path.
Logic test (every commit, runs in 600ms)
import { test, expect } from '@playwright/test';
import { CatchOTP } from '@catchotp/sdk';
const otp = new CatchOTP({ apiKey: process.env.CATCHOTP_KEY! });
test('verification: rate limit triggers after 5 bad attempts', async ({ page }) => {
const inbox = await otp.inboxes.create({ mode: 'ephemeral', ttlMinutes: 10 });
await page.goto('/signup');
await page.getByLabel('Email').fill(inbox.address);
await page.getByRole('button', { name: 'Continue' }).click();
// Wait for the real OTP so we know the channel works
await otp.inboxes.waitForOtp(inbox.id, { timeoutSeconds: 30 });
// But submit five wrong codes on purpose
for (let i = 0; i < 5; i++) {
await page.getByLabel('Verification code').fill('000000');
await page.getByRole('button', { name: 'Verify' }).click();
}
await expect(page.getByText(/too many attempts/i)).toBeVisible();
});
Carrier smoke test (hourly, real Twilio)
test('SMS carrier path delivers in under 30 seconds', async ({ page }) => {
const number = await reserveTestNumber(); // from your number pool
await page.goto('/signup');
await page.getByLabel('Phone').fill(number);
await page.getByRole('button', { name: 'Send code' }).click();
const code = await waitForSms(number, 30_000); // your Twilio receive side
await page.getByLabel('Code').fill(code);
await page.getByRole('button', { name: 'Verify' }).click();
await expect(page.getByRole('heading', { name: 'Welcome' })).toBeVisible();
});
The test is the same shape; the receive side is what changes. Email OTP gives you a managed receive side with sub-second latency. SMS forces you to build one.
When to bite the bullet and run real SMS tests
Three cases where the email-as-proxy pattern is wrong:
- You ship to markets where SMS is the only channel. India and parts of LATAM still skew SMS-first. Your verification UX needs to match.
- You have a regulatory requirement. PSD2 SCA, US KBA, certain HIPAA flows — auditors want evidence the SMS path works.
- You sell to enterprises that demand SMS by default. B2B mid-market often expects SMS as the secure-by-default channel.
In all three cases, the smallest possible suite of SMS tests, scheduled, against a real provider, is the right shape. Do not run them per-commit.
How catchotp helps
We do not yet operate an inbound SMS receive side. We do operate a real-DNS, real-DKIM email receive side that handles 80% of the OTP-test problem at sub-second latency, free under 1,000 messages a month. For most teams, the right path is email-OTP-as-proxy in CI and a small dedicated SMS smoke suite on a schedule.
If your verification UX is email-first or supports both channels, start free and replace the SMS-mock-or-burn-money decision with an actual API. If you are SMS-only, the patterns above are the right ones — the cost is real and the receive-side is what eats your time.
Free tier: 5 inboxes, 1,000 messages a month, no credit card. Start free or read the OTP testing guide for framework-specific patterns.
Related reading
- How to Test OTP Flows in 2026 — the email-OTP version of this post, in more depth.
- Email testing in CI without burning real inboxes — provider concurrency, secrets, and parallel test patterns.
- The OTP testing use case walks through Cypress, Playwright, Jest, and pytest fixtures.
The shorter version: SMS testing is real work, and you should only do as much of it as the carriers actually require. Everything else is faster and cheaper as email OTP.