Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Getting started

This page is the shortest path from zero to a first successful run. For strategy details, row filters, dump seals, and CI patterns, continue with the configuration guide and the repository README.md.

Prerequisites

  • Rust stable toolchain (rustup recommended). The repo includes rust-toolchain.toml (stable + rustfmt + clippy) so CI and local cargo stay aligned.
  • cargo on your PATH

Optional: run ./scripts/setup-dev.sh once from the repo root — it installs toolchain components, cargo fetch, and a pinned mdBook under .tools/ for the same docs build CI uses.

Build

cargo build --release
./target/release/dumpling --help

Python / pip (dumpling-cli)

pip install dumpling-cli
dumpling --help

First anonymization

  1. Generate a draft policy (recommended) — From your project root (or anywhere you keep config):

    dumpling scaffold-config -i dump.sql -o .dumplingconf
    

    This beta subcommand streams the dump once and writes inferred [rules] from SQL column names (CREATE TABLE, INSERT, and PostgreSQL COPY column lists). It does not require an existing Dumpling config in the current directory (optional config is only merged for pg_restore / keep-original defaults). Heuristics are English-oriented; output is draft only—review and edit every rule, add a top-level salt (for hashing) and any ${…} secret placeholders before production use.

    Useful flags:

    • --infer-json-paths — Keep up to five sampled rows per table (reservoir) and suggest nested JSON rules as column.path.leaf.
    • --max-json-depth — Cap JSON walking depth when using --infer-json-paths (default 24).
    • --formatpostgres (default), sqlite, or mssql.
    • --pg-restore-path / --pg-restore-arg — Optional pg_restore binary and extra arguments when --input is a PostgreSQL custom-format or directory-format archive (auto-detected with --format postgres); see PostgreSQL archives and compressed inputs.

    Run dumpling scaffold-config --help for the full flag list.

  2. Or start from the example policy — Copy .dumplingconf.example to .dumplingconf (or merge under [tool.dumpling] in pyproject.toml) and author [rules] by hand. Set environment variables for salt and any ${…} references.

  3. Align rules with your dump (manual path only) — If you skipped scaffold-config, use CREATE TABLE, COPY … (…), and INSERT INTO … (…) lines to name [rules."table"] or [rules."schema.table"] keys. Trim to the tables you care about first.

  4. Run Dumplingdumpling -i dump.sql -o sanitized.sql (add -c path if the config is not in the default search path). Use dumpling --check -i dump.sql when you only want to know whether anything would change.

  5. Tighten the policy — Run dumpling lint-policy on your config. When you are ready for stricter gates, add [sensitive_columns] and use --strict-coverage, --report, and --scan-output as described in the configuration guide and the repository README.md.

PostgreSQL custom-format archives

If your input is a PostgreSQL custom-format file or directory-format folder (not plain SQL), use --format postgres (default): Dumpling auto-detects the archive and runs pg_restore -f - (needs pg_restore from PostgreSQL client tools). Gzip-wrapped plain SQL is streamed without a temp file; ZIP (or gzip wrapping PGDMP) uses a temp extract that is cleaned up afterward. See PostgreSQL archives and compressed inputs in the configuration guide.

Test locally (contributors)

cargo fmt --all -- --check
cargo clippy --all-targets --all-features
cargo test --all-targets --all-features