Test Data Management: Patterns and Pitfalls

Tests need data. Bad test data is a quiet productivity killer — flaky tests, shared state, stale fixtures, dev / staging divergence. This guide covers the patterns that actually work in 2026.

February 15, 2026 · 3 min read · Testing Guides

Tests need data. Bad test data is a quiet productivity killer — flaky tests, shared state, stale fixtures, dev / staging divergence. This guide covers the patterns that actually work in 2026.

The problem

Automated tests need:

Options: shared fixtures (stale), per-test create-destroy (slow), seeded databases (expensive), synthetic (realistic?). Each has tradeoffs.

Patterns

1. Per-test create-destroy

Every test creates the data it needs and cleans up after.

Pros: isolated, repeatable, no shared state.

Cons: slow (setup overhead per test), potential for incomplete cleanup.

When: unit / integration tests.


@pytest.fixture
def user(api_client):
    u = api_client.create_user(email=f"test-{uuid4()}@example.com")
    yield u
    api_client.delete_user(u.id)

2. Shared test database, reset between runs

Database dumped to known state before each test suite.

Pros: fast setup per test; realistic data relations.

Cons: heavyweight restore; one developer's changes affect others.

When: integration tests with complex relational data.

3. Database snapshot / restore

Docker image or DB snapshot as golden. Each test run restores.

Pros: fast on SSD; realistic data; dev-prod parity achievable.

Cons: snapshot maintenance overhead.

4. Test tenant / test account

Dedicated tenant in shared environment. Tests operate within tenant boundary.

Pros: realistic environment.

Cons: cross-test pollution within tenant; requires careful tear-down.

When: staging / preprod smoke tests.

5. Synthetic generation (Faker)


from faker import Faker
fake = Faker()
user = {"name": fake.name(), "email": fake.email(), "address": fake.address()}

Pros: unlimited, realistic-looking.

Cons: not semantically rich (randomness does not match real patterns).

6. Data factories (FactoryBoy, Factory Bot)


class UserFactory(factory.Factory):
    class Meta: model = User
    name = factory.Faker("name")
    email = factory.Sequence(lambda n: f"user{n}@example.com")

Pros: declarative, reusable, sensible defaults.

Cons: learning curve.

7. Production data replica (sanitized)

Dump of production, PII scrubbed.

Pros: most realistic; surfaces real-world data patterns.

Cons: expensive, sensitive, hard to update.

When: performance tests, complex integration.

Pitfalls

Shared mutable state

Two tests modify the same "test user". Flaky when run in parallel.

Fix: per-test creation or strict isolation (separate tenants).

Over-mocking

Mocks drift from reality. Production behavior differs from test.

Fix: contract tests. Integration tests against real services. Prefer fewer mocks closer to real.

Leaked state

Test creates data, fails before tear-down. Next test sees orphan.

Fix: cleanup in finally. Scheduled cleanup jobs for stale test data.

Magic IDs

Test assumes user.id = 42 because it ran in that order.

Fix: always use references; never hard-code IDs.

Not dev-prod parity

Dev database has 100 records; prod has 10M. Queries that work dev, timeout prod.

Fix: performance-realistic test data.

Data for specific test types

Unit

Minimal. Constructed in-line or via factory. In memory.

Integration

API-seeded data. Real database. Scoped to test user.

UI / E2E

Real backend with known account. Data fresh enough to be relevant.

Performance

Volume representative of production. PII-sanitized if from real data.

Security

Deliberately malicious patterns for injection, XSS testing.

Automation

Test data API

Dedicated internal API to create/destroy test entities. Used by all test suites.

Fixture management

Versioned fixture files in repo. Migrations apply to fixtures too.

Anonymization pipeline

Production → stage: scrub names, emails, phone numbers, any PII.

How SUSA uses test data

SUSA needs:

Credentials should be dedicated test accounts, not shared. Production accounts should never be used for automated exploration.


susatest-agent test myapp.apk --username ci-test-01@example.com --password "..."

Lifecycle policies

Test data is engineering. Treat it like production infrastructure; it pays back in reliable tests.

Test Your App Autonomously

Upload your APK or URL. SUSA explores like 10 real users — finds bugs, accessibility violations, and security issues. No scripts.

Try SUSA Free