Skip to main content

Merkle DAG

Every commit produces a cryptographic fingerprint of the entire branch state, stored in a Merkle DAG (Directed Acyclic Graph).

Merkle Tree Structure

A Merkle tree is a binary tree of hashes:
                        Root = hash(LC + RC)
                       /                    \
              LC = hash(LL + LR)         RC = hash(RL + RR)
              /            \                /            \
    LL = hash(file1)  LR = hash(file2)  RL = hash(file3)  RR = hash(file4)
         |                |                |                |
      file1            file2            file3            file4
  • Leaves are hashes of the actual data files
  • Internal nodes are hashes of their two children
  • Root is a single 32-byte value representing every file in the branch
If any single bit in any file changes, the root hash changes completely.

Leaf Hashing

Each leaf hash incorporates both the file’s relative path and its content:
leaf_hash = BLAKE3(relpath + BLAKE3(file_content))
This means:
  • Two files with identical content but different paths produce different leaf hashes
  • Renaming a file changes the tree root even without content changes
  • Paths are normalized to forward slashes for cross-platform consistency

Tree Construction

The tree is built in a deterministic order:
  1. Walk the branch directory, skipping excluded files (*.sock, *.pid, postmaster.pid)
  2. Sort leaves by relative path (alphabetical)
  3. Pair leaves and hash them together
  4. If an odd leaf remains at any level, promote it to the next level
  5. Continue pairing until one hash remains — the root
An empty branch (no files) produces a well-known root:
empty_tree_root = BLAKE3("graft-empty-tree-v1")

The DAG

Commits form a directed acyclic graph through parent references:
Commit A (root: 0xabc)  ← initial main branch state
  └── Commit B (root: 0xdef, parent: 0xabc)  ← experiment checkout
        └── Commit C (root: 0x123, parent: 0xdef)  ← further change
Each commit records:
  • hash — 7-character BLAKE3 digest of tree_root + parent_hash + timestamp
  • branch — Which branch this commit belongs to
  • parent_hash — Previous commit hash (NULL for first commit)
  • tree_root — Merkle tree root (64 hex chars)
  • verified — Whether the tree has been verified against the stored root
  • message — User-provided commit message
  • created_at — ISO 8601 timestamp

SQLite Schema

CREATE TABLE commits (
    hash        TEXT PRIMARY KEY,
    branch      TEXT NOT NULL,
    parent_hash TEXT REFERENCES commits(hash),
    tree_root   TEXT NOT NULL,
    verified    INTEGER NOT NULL DEFAULT 0,
    message     TEXT NOT NULL DEFAULT '',
    created_at  TEXT NOT NULL
);

CREATE TABLE tree_entries (
    tree_root TEXT NOT NULL,
    relpath   TEXT NOT NULL,
    blob_hash TEXT NOT NULL,
    PRIMARY KEY (tree_root, relpath)
);

CREATE TABLE refs (
    name        TEXT PRIMARY KEY,
    commit_hash TEXT NOT NULL REFERENCES commits(hash),
    updated_at  TEXT NOT NULL
);

Why BLAKE3?

BLAKE3SHA-256
Single-core throughput~1 GB/s~300 MB/s
Multi-core scaling~6 GB/s on 8 cores~300 MB/s (serial)
Output size256 bits (32 bytes)256 bits (32 bytes)
For Graft’s workload (hashing ~130 files per commit on a 2GB database), BLAKE3 completes in ~2-3 seconds. SHA-256 would take 3-4x longer.

File-Level Hashing

Graft hashes files, not blocks. A 2GB Postgres database has ~130 files (not 525K 4KB pages). This means:
  • Small Merkle tree (~6KB in memory)
  • Fast to compute (~2-3 seconds)
  • Actionable errors: base/16384/12547 is corrupted tells you exactly what’s wrong
  • Acceptable trade-off: corruption is detected at file granularity, not block granularity