Load Testing Web Apps: Practical Guide (2026)
Load testing answers "how much traffic can we handle before things break." Done right, it tells you capacity, surfaces bottlenecks, and informs infrastructure decisions. Done wrong, it burns money on
Load testing answers "how much traffic can we handle before things break." Done right, it tells you capacity, surfaces bottlenecks, and informs infrastructure decisions. Done wrong, it burns money on cloud spend without teaching anything useful. This guide covers what to measure, what tools to use, and how to interpret results.
What load testing actually measures
Four main signals:
- Throughput — requests per second the system sustains
- Latency — p50 / p95 / p99 response times under load
- Error rate — percentage of failed requests
- Resource usage — CPU, memory, DB connections, etc.
The system passes the test if all four stay within acceptable bounds at the target load.
Types of load tests
- Smoke — minimal load, verify the pipeline works
- Load — expected peak traffic sustained for a period
- Stress — increase until something breaks, find the breaking point
- Spike — sudden surge (flash sale, marketing push)
- Soak — sustained load over hours or days to find leaks
Each answers different questions. You need all at different points in the release cycle.
Tools
k6 (recommended default)
Scriptable in JavaScript, cloud option for distributed load, clean reports. Free and open source.
import http from 'k6/http';
import { check } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },
{ duration: '2m', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.get('https://myapp.com/api/products');
check(res, { 'status 200': (r) => r.status === 200 });
}
JMeter (legacy, still widely used)
GUI-first, XML configs, mature ecosystem, verbose. If you inherit JMeter scripts, fine. Greenfield, prefer k6.
Locust (Python)
Scripting in Python, good for teams already Python-fluent.
Artillery (JS)
Similar to k6, different ergonomics.
Gatling (Scala/JVM)
Enterprise-grade, good for teams in the JVM ecosystem.
Test targets
Define targets up front:
- Peak expected traffic (req/sec, concurrent users)
- Target latency budgets (p95 < 500ms, p99 < 2s)
- Acceptable error rate (< 0.1%)
- Resource ceilings (CPU < 80%, DB connections < 70%)
Any test that does not validate against predefined targets is not a test — it is a data-gathering exercise.
What to test
Critical endpoints
Login, search, checkout, payment. Anything that, if slow, breaks the user experience.
High-fanout reads
Home page, feed, dashboard. Reads that hit many downstream services.
Write paths
Post, upload, order. Writes that serialize or contend.
Third-party integrations
Payment gateway, email, SMS, push. Load tests should include realistic third-party latency.
Scenario design
Scripts should reflect real user behavior:
- Think time between actions (1-5 seconds)
- Mix of flows (80% browse, 15% cart, 5% checkout)
- Realistic payloads
- Realistic geographic distribution
- Authenticated vs anonymous mix
A test that hammers one endpoint with no think time measures throughput of that endpoint — useful for capacity planning, not representative of user load.
Where to run
- Against staging — safest, no production impact, but scale of staging matters
- Against production with a dedicated test tenant — realistic but risky; needs careful isolation
- Against a dedicated perf environment — ideal, expensive
Never load-test production without coordination with SRE and a kill switch.
Common bottlenecks
- Database connection pool exhaustion — symptoms: latency climbs sharply at some threshold, errors jump
- CPU-bound endpoints — symptoms: latency climbs with load, CPU stays high
- Slow downstream calls — symptoms: latency flat until a timeout cliff
- Memory leak — symptoms: latency climbs over soak test, eventual OOM
- Cache stampede — symptoms: upstream service overwhelmed when cache expires
- Single-threaded component (message broker consumer, etc.) — cannot scale by adding replicas
Interpreting results
- Smooth latency up to target load, error rate flat → pass
- Latency cliff at N requests/sec → N is your capacity
- Error rate rising before latency → check for crashes, OOMs, broken downstream
- Latency p99 much higher than p95 → long tail from GC, network, or specific slow queries
- CPU climbing linearly → expected; find ceiling before memory or CPU saturate
How SUSA covers this
SUSA tests the client side (mobile and web). Load testing the backend is a separate discipline — use k6 or equivalent. SUSA's network_tester simulates degraded conditions on the client to verify UX, not server capacity.
For end-to-end coverage, pair SUSA (client functional + UX) with k6 (server capacity). Both run in CI on release candidates.
Frequency
- Smoke: every commit
- Load at expected peak: weekly
- Stress: monthly or before major releases
- Spike: before known events (marketing push, seasonal)
- Soak: quarterly
Load testing is a signal that compounds. One test tells you the current state. Ten tests over six months tell you capacity trends — which is what capacity planning actually needs.
Test Your App Autonomously
Upload your APK or URL. SUSA explores like 10 real users — finds bugs, accessibility violations, and security issues. No scripts.
Try SUSA Free