Performance

Seedling is designed for high-throughput data generation. This page covers performance characteristics, benchmarks, and optimization strategies.

Benchmarks

Configuration	Rows	Time	Throughput
SQL writer, simple schema	10,000	0.3s	33K rows/s
SQL writer, simple schema	500,000	12s	42K rows/s
SQL writer, complex schema (15 tables)	500,000	18s	28K rows/s
COPY (Postgres)	1,000,000	8s	125K rows/s
Direct DB insert	500,000	22s	23K rows/s

Benchmarks on a standard laptop (M1 Pro, 16GB RAM).

When --parallel is enabled, Seedling identifies independent table subgraphs and generates them concurrently:

Tables with no FK dependencies on each other run in parallel workers
Performance scales with available CPU cores
Caution: Parallel generation breaks determinism (row ordering varies between runs)

seedling generate --count 500000 --parallel --verbose

The --batch-size flag controls how many rows are generated per batch:

Use COPY for Postgres: --copy mode is 3-5x faster than batched INSERTs
Increase batch size for large tables: --batch-size 5000 reduces per-batch overhead
Use deterministic mode with smaller samples for development: --seed 42 --count 1000
Avoid parallelism if you need deterministic output
Truncate before insert with --truncate to avoid unique constraint violations from existing data
Dry-run first with --dry-run to verify the plan before spending time generating