Stress Testing

Run the repeatable 1K-user readiness drill against CE Pro before launches and scale changes.

IntermediateownermanagerdeveloperUpdated 2026-04-26

Stress Testing

CE Pro now ships with a repeatable stress-test harness for the office and platform paths most likely to matter as traffic grows.

The goal is not to win a benchmark. The goal is to answer a much more useful question:

Can this deployment handle a realistic jump toward 1,000 users without falling apart on the hot admin, health, and queue paths?

Telemetry Gate Before Stress Runs

Production telemetry comes first. Stress tests are now a validation tool after the team knows which production routes matter, not the first source of truth for Phase 1 work.

Before ranking scale fixes, collect 3-7 days of production traffic and export the Phase 0 baseline:

npm run telemetry:production -- --days=7 --out=reports/production-telemetry.md

Use that report to identify the top slow routes, run EXPLAIN ANALYZE on the top slow SQL statements, and classify each fix as a missing index, N+1 pattern, exact count, bad join, or larger architecture issue.

If the report shows current SLOs are already met at the current measured load, halt speculative Phases 1-6 and re-evaluate quarterly. If the report shows real hot paths, run focused stress tests against those routes after the cheap fixes or queueing changes land.

What The Harness Exercises

The default route mix focuses on:

GET /admin
GET /admin/jobs
GET /admin/invoices
GET /api/admin/jobs?limit=25&offset=*
GET /api/admin/invoices?limit=25&offset=*
GET /api/admin/services
GET /api/admin/lead-sources
GET /api/admin/tags
HEAD /api/health
optional GET /api/health with the cron bearer secret for queue visibility

That mix intentionally leans on the exact paths touched in the scale-hardening work:

paginated Jobs and Invoices reads
short-lived org-scoped cache reads
server-first admin page renders
health and queue visibility

The default test is session-paced, not a single-client flood.

That matters because /api/admin/* now has a baseline per-user rate limit. If you hammer the admin APIs from one cookie, you mostly learn that your own guardrail works.

A realistic office-load test should spread requests across a pool of authenticated admin sessions so the results reflect multi-user behavior.

Default 1K-Ready Profile

The default profile is:

warmup: 8 sessions for 60s
steady: 20 sessions for 180s
peak: 40 sessions for 300s
spike: 60 sessions for 120s

Think time defaults to 900ms through 2500ms between requests.

That is a much better model for a 1,000-user business app than firing 1,000 requests at once from one process.

Running The Test

Create a local file with one full admin Cookie header value per line, then run:

STRESS_BASE_URL=https://app.cleanestimate.pro \
STRESS_ADMIN_COOKIE_FILE=.secrets/stress-admin-cookies.txt \
STRESS_CRON_SECRET=replace-me \
npm run test:stress

To inspect the resolved plan without sending traffic:

npm run test:stress -- --dry-run

Each run writes a JSON report into stress-reports/ unless you pass a custom --output.

Pass Criteria

Treat the default drill as healthy when all of these remain true:

error rate stays at or below 2%
429 rate stays at or below 5%
overall p95 latency stays at or below 1500ms
health-check p95 stays at or below 500ms
queue backlog does not grow continuously during the run
dead-letter work does not spike during the test window

When A Run Fails

If throttling is the main failure:

add more admin cookies before assuming the app itself is the bottleneck
confirm you are not over-driving the same small set of users into the baseline admin limiter

If Jobs or Invoices are the dominant latency source:

review the active list queries
compare the org size against the search terms and offsets used in the run
confirm the latest indexes and migrations are live in the target environment

If health checks degrade:

inspect Supabase latency
inspect queue backlog and oldest ready age
confirm the queue snapshot path is still timing out quickly instead of hanging

Was this article helpful?

Still need help? Contact support

Stress Testing

Stress Testing

Telemetry Gate Before Stress Runs

What The Harness Exercises

Why It Uses A Cookie Pool

Default 1K-Ready Profile

Running The Test

Pass Criteria

When A Run Fails

Related Docs

Related articles