Test Data Management: Patterns and Pitfalls
Tests need data. Bad test data is a quiet productivity killer — flaky tests, shared state, stale fixtures, dev / staging divergence. This guide covers the patterns that actually work in 2026.
Tests need data. Bad test data is a quiet productivity killer — flaky tests, shared state, stale fixtures, dev / staging divergence. This guide covers the patterns that actually work in 2026.
The problem
Automated tests need:
- User accounts to test as
- Data to operate on
- Known state to assert against
Options: shared fixtures (stale), per-test create-destroy (slow), seeded databases (expensive), synthetic (realistic?). Each has tradeoffs.
Patterns
1. Per-test create-destroy
Every test creates the data it needs and cleans up after.
Pros: isolated, repeatable, no shared state.
Cons: slow (setup overhead per test), potential for incomplete cleanup.
When: unit / integration tests.
@pytest.fixture
def user(api_client):
u = api_client.create_user(email=f"test-{uuid4()}@example.com")
yield u
api_client.delete_user(u.id)
2. Shared test database, reset between runs
Database dumped to known state before each test suite.
Pros: fast setup per test; realistic data relations.
Cons: heavyweight restore; one developer's changes affect others.
When: integration tests with complex relational data.
3. Database snapshot / restore
Docker image or DB snapshot as golden. Each test run restores.
Pros: fast on SSD; realistic data; dev-prod parity achievable.
Cons: snapshot maintenance overhead.
4. Test tenant / test account
Dedicated tenant in shared environment. Tests operate within tenant boundary.
Pros: realistic environment.
Cons: cross-test pollution within tenant; requires careful tear-down.
When: staging / preprod smoke tests.
5. Synthetic generation (Faker)
from faker import Faker
fake = Faker()
user = {"name": fake.name(), "email": fake.email(), "address": fake.address()}
Pros: unlimited, realistic-looking.
Cons: not semantically rich (randomness does not match real patterns).
6. Data factories (FactoryBoy, Factory Bot)
class UserFactory(factory.Factory):
class Meta: model = User
name = factory.Faker("name")
email = factory.Sequence(lambda n: f"user{n}@example.com")
Pros: declarative, reusable, sensible defaults.
Cons: learning curve.
7. Production data replica (sanitized)
Dump of production, PII scrubbed.
Pros: most realistic; surfaces real-world data patterns.
Cons: expensive, sensitive, hard to update.
When: performance tests, complex integration.
Pitfalls
Shared mutable state
Two tests modify the same "test user". Flaky when run in parallel.
Fix: per-test creation or strict isolation (separate tenants).
Over-mocking
Mocks drift from reality. Production behavior differs from test.
Fix: contract tests. Integration tests against real services. Prefer fewer mocks closer to real.
Leaked state
Test creates data, fails before tear-down. Next test sees orphan.
Fix: cleanup in finally. Scheduled cleanup jobs for stale test data.
Magic IDs
Test assumes user.id = 42 because it ran in that order.
Fix: always use references; never hard-code IDs.
Not dev-prod parity
Dev database has 100 records; prod has 10M. Queries that work dev, timeout prod.
Fix: performance-realistic test data.
Data for specific test types
Unit
Minimal. Constructed in-line or via factory. In memory.
Integration
API-seeded data. Real database. Scoped to test user.
UI / E2E
Real backend with known account. Data fresh enough to be relevant.
Performance
Volume representative of production. PII-sanitized if from real data.
Security
Deliberately malicious patterns for injection, XSS testing.
Automation
Test data API
Dedicated internal API to create/destroy test entities. Used by all test suites.
Fixture management
Versioned fixture files in repo. Migrations apply to fixtures too.
Anonymization pipeline
Production → stage: scrub names, emails, phone numbers, any PII.
How SUSA uses test data
SUSA needs:
- Test user credentials (passed via CLI)
- Test APK / URL
- Optional: test assets (images, files) to push to device
Credentials should be dedicated test accounts, not shared. Production accounts should never be used for automated exploration.
susatest-agent test myapp.apk --username ci-test-01@example.com --password "..."
Lifecycle policies
- Test accounts created on-demand, deleted after use
- Nightly job cleans up test accounts older than 7 days
- Retention of failed-test state for debugging (opt-in)
- No PII in any test environment
Test data is engineering. Treat it like production infrastructure; it pays back in reliable tests.
Test Your App Autonomously
Upload your APK or URL. SUSA explores like 10 real users — finds bugs, accessibility violations, and security issues. No scripts.
Try SUSA Free