Getting started
This page is the shortest path from zero to a first successful run. For strategy details, row filters, dump seals, and CI patterns, continue with the configuration guide and the repository README.md.
Prerequisites
- Rust stable toolchain (
rustuprecommended). The repo includesrust-toolchain.toml(stable +rustfmt+clippy) so CI and localcargostay aligned. cargoon yourPATH
Optional: run ./scripts/setup-dev.sh once from the repo root — it installs toolchain components, cargo fetch, and a pinned mdBook under .tools/ for the same docs build CI uses.
Build
cargo build --release
./target/release/dumpling --help
Python / pip (dumpling-cli)
pip install dumpling-cli
dumpling --help
First anonymization
-
Generate a draft policy (recommended) — From your project root (or anywhere you keep config):
dumpling scaffold-config -i dump.sql -o .dumplingconfThis beta subcommand streams the dump once and writes inferred
[rules]from SQL column names (CREATE TABLE,INSERT, and PostgreSQLCOPYcolumn lists). Heuristics are English-oriented; output is draft only—review and edit every rule, add a top-levelsalt(for hashing) and any${…}secret placeholders before production use.Useful flags:
--infer-json-paths— Keep up to five sampled rows per table (reservoir) and suggest nested JSON rules ascolumn.path.leaf.--max-json-depth— Cap JSON walking depth when using--infer-json-paths(default 24).--format—postgres(default),sqlite, ormssql.--pg-restore-path/--pg-restore-arg— Optionalpg_restorebinary and extra arguments when--inputis a PostgreSQL custom-format or directory-format archive (auto-detected with--format postgres); see PostgreSQL archives and compressed inputs.
Run
dumpling scaffold-config --helpfor the full flag list. -
Or start from the example policy — Copy
.dumplingconf.exampleto.dumplingconf(or merge under[tool.dumpling]inpyproject.toml) and author[rules]by hand. Set environment variables forsaltand any${…}references. -
Align rules with your dump (manual path only) — If you skipped
scaffold-config, useCREATE TABLE,COPY … (…), andINSERT INTO … (…)lines to name[rules."table"]or[rules."schema.table"]keys. Trim to the tables you care about first. -
Run Dumpling —
dumpling -i dump.sql -o sanitized.sql(add-c pathif the config is not in the default search path). Usedumpling --check -i dump.sqlwhen you only want to know whether anything would change. -
Tighten the policy — Run
dumpling lint-policyon your config. When you are ready for stricter gates, add[sensitive_columns]and use--strict-coverage,--report, and--scan-outputas described in the configuration guide and the repositoryREADME.md.
PostgreSQL custom-format archives
If your input is a PostgreSQL custom-format file or directory-format folder (not plain SQL), use --format postgres (default): Dumpling auto-detects the archive and runs pg_restore -f - (needs pg_restore from PostgreSQL client tools). Gzip-wrapped plain SQL is streamed without a temp file; ZIP (or gzip wrapping PGDMP) uses a temp extract that is cleaned up afterward. See PostgreSQL archives and compressed inputs in the configuration guide.
Test locally (contributors)
cargo fmt --all -- --check
cargo clippy --all-targets --all-features
cargo test --all-targets --all-features