Load Testing Web Apps: Practical Guide (2026)

Load testing answers "how much traffic can we handle before things break." Done right, it tells you capacity, surfaces bottlenecks, and informs infrastructure decisions. Done wrong, it burns money on

May 05, 2026 · 3 min read · Testing Guides

Load testing answers "how much traffic can we handle before things break." Done right, it tells you capacity, surfaces bottlenecks, and informs infrastructure decisions. Done wrong, it burns money on cloud spend without teaching anything useful. This guide covers what to measure, what tools to use, and how to interpret results.

What load testing actually measures

Four main signals:

Throughput — requests per second the system sustains
Latency — p50 / p95 / p99 response times under load
Error rate — percentage of failed requests
Resource usage — CPU, memory, DB connections, etc.

The system passes the test if all four stay within acceptable bounds at the target load.

Types of load tests

Smoke — minimal load, verify the pipeline works
Load — expected peak traffic sustained for a period
Stress — increase until something breaks, find the breaking point
Spike — sudden surge (flash sale, marketing push)
Soak — sustained load over hours or days to find leaks

Each answers different questions. You need all at different points in the release cycle.

Tools

k6 (recommended default)

Scriptable in JavaScript, cloud option for distributed load, clean reports. Free and open source.


import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://myapp.com/api/products');
  check(res, { 'status 200': (r) => r.status === 200 });
}

JMeter (legacy, still widely used)

GUI-first, XML configs, mature ecosystem, verbose. If you inherit JMeter scripts, fine. Greenfield, prefer k6.

Locust (Python)

Scripting in Python, good for teams already Python-fluent.

Artillery (JS)

Similar to k6, different ergonomics.

Gatling (Scala/JVM)

Enterprise-grade, good for teams in the JVM ecosystem.

Test targets

Define targets up front:

Peak expected traffic (req/sec, concurrent users)
Target latency budgets (p95 < 500ms, p99 < 2s)
Acceptable error rate (< 0.1%)
Resource ceilings (CPU < 80%, DB connections < 70%)

Any test that does not validate against predefined targets is not a test — it is a data-gathering exercise.

What to test

Critical endpoints

High-fanout reads

Home page, feed, dashboard. Reads that hit many downstream services.

Write paths

Post, upload, order. Writes that serialize or contend.

Third-party integrations

Payment gateway, email, SMS, push. Load tests should include realistic third-party latency.

Scenario design

Scripts should reflect real user behavior:

Think time between actions (1-5 seconds)
Mix of flows (80% browse, 15% cart, 5% checkout)
Realistic payloads
Realistic geographic distribution
Authenticated vs anonymous mix

A test that hammers one endpoint with no think time measures throughput of that endpoint — useful for capacity planning, not representative of user load.

Where to run

Against staging — safest, no production impact, but scale of staging matters
Against production with a dedicated test tenant — realistic but risky; needs careful isolation
Against a dedicated perf environment — ideal, expensive

Never load-test production without coordination with SRE and a kill switch.

Common bottlenecks

Database connection pool exhaustion — symptoms: latency climbs sharply at some threshold, errors jump
CPU-bound endpoints — symptoms: latency climbs with load, CPU stays high
Slow downstream calls — symptoms: latency flat until a timeout cliff
Memory leak — symptoms: latency climbs over soak test, eventual OOM
Cache stampede — symptoms: upstream service overwhelmed when cache expires
Single-threaded component (message broker consumer, etc.) — cannot scale by adding replicas

Interpreting results

Smooth latency up to target load, error rate flat → pass
Latency cliff at N requests/sec → N is your capacity
Error rate rising before latency → check for crashes, OOMs, broken downstream
Latency p99 much higher than p95 → long tail from GC, network, or specific slow queries
CPU climbing linearly → expected; find ceiling before memory or CPU saturate

How SUSA covers this

SUSA tests the client side (mobile and web). Load testing the backend is a separate discipline — use k6 or equivalent. SUSA's network_tester simulates degraded conditions on the client to verify UX, not server capacity.

For end-to-end coverage, pair SUSA (client functional + UX) with k6 (server capacity). Both run in CI on release candidates.

Frequency

Smoke: every commit
Load at expected peak: weekly
Stress: monthly or before major releases
Spike: before known events (marketing push, seasonal)
Soak: quarterly

Load testing is a signal that compounds. One test tells you the current state. Ten tests over six months tell you capacity trends — which is what capacity planning actually needs.

Test Your App Autonomously

Upload your APK or URL. SUSA explores like 10 real users — finds bugs, accessibility violations, and security issues. No scripts.

Try SUSA Free