Skip to main content

Documentation Index

Fetch the complete documentation index at: https://tommy-acf5e428.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Performance

Seedling is designed for high-throughput data generation. This page covers performance characteristics, benchmarks, and optimization strategies.

Benchmarks

ConfigurationRowsTimeThroughput
SQL writer, simple schema10,0000.3s33K rows/s
SQL writer, simple schema500,00012s42K rows/s
SQL writer, complex schema (15 tables)500,00018s28K rows/s
COPY (Postgres)1,000,0008s125K rows/s
Direct DB insert500,00022s23K rows/s
Benchmarks on a standard laptop (M1 Pro, 16GB RAM).

Parallel Generation

When --parallel is enabled, Seedling identifies independent table subgraphs and generates them concurrently:
  • Tables with no FK dependencies on each other run in parallel workers
  • Performance scales with available CPU cores
  • Caution: Parallel generation breaks determinism (row ordering varies between runs)
seedling generate --count 500000 --parallel --verbose

Batching

The --batch-size flag controls how many rows are generated per batch:
  • Smaller batches use less memory but more overhead
  • Larger batches are faster but use more RAM per table
  • Default: 1000

Optimization Tips

  1. Use COPY for Postgres: --copy mode is 3-5x faster than batched INSERTs
  2. Increase batch size for large tables: --batch-size 5000 reduces per-batch overhead
  3. Use deterministic mode with smaller samples for development: --seed 42 --count 1000
  4. Avoid parallelism if you need deterministic output
  5. Truncate before insert with --truncate to avoid unique constraint violations from existing data
  6. Dry-run first with --dry-run to verify the plan before spending time generating