How do Fireproof Sites work under the hood and what precisely is retained?

Fireproof Sites are built on a local-first, content-addressed database that uses prolly-trees and Merkle-CRDTs to make offline-first collaboration fast, verifiable, and cheap to sync; data is stored as content-addressed encrypted blobs and per-ledger event logs that preserve a verifiable history ^{[1] [2] [3]}. Precisely what is retained: by design each ledger preserves a verifiable event log (a full history of transactions/diffs) and Fireproof currently preserves ledger history while discarding unreferenced internal index blocks during compaction, though final semantics for historical purging were stated as a roadmap decision ^[4].

1. The core data structures: prolly‑trees, CIDs, and Merkle roots

At the lowest level Fireproof stores documents in prolly‑trees — a B‑tree variant that yields deterministic physical layouts and the same Merkle hash root regardless of operation order — enabling cheap, fast synchronization because identical contents map to identical content IDs (CIDs) and Merkle roots ^{[5] [2]}. Every prolly‑tree root is referenced by a CID, so retrieving a document is a content‑addressed lookup to the tree root and its blocks rather than an opaque row lookup ^[2].

2. Causal event logs (Pail) and ledger semantics: what “history” means

Updates are appended to Pail, Fireproof’s Merkle clock / causal event log that functions like a distributed write‑ahead log; any two actors can cheaply merge clocks, surface conflicting updates, and reconcile them deterministically via CRDT semantics ^{[5] [2]}. Each ledger is a unit of synchronization that preserves a verifiable event log — Fireproof writes ledger transactions as lightweight encrypted diffs that link to prior diffs so the complete dataset can be reconstructed from the chain of files ^[4].

3. Replication model, forks, and what’s duplicated or deduplicated

Replication is content‑addressed and block‑based: storage and replication occur via encrypted blobs so providers like S3 or DynamoDB can host blocks cheaply; when ledgers fork, initial storage can be shared and only diffs stored per fork, minimizing duplication ^{[3] [6] [4]}. Because blocks are content‑addressed, identical blocks across forks are deduplicated by CID, keeping sync and storage efficient ^{[5] [3]}.

4. Encryption, verification, and offline‑first guarantees

Fireproof’s local‑first persistence layer stores encrypted, content‑addressed data so local apps can run offline and later verify and sync against remote gateways; cryptographic commit chains and Merkle history enforce provenance and ledger integrity in a git‑like model with lightweight verification ^{[1] [3] [7]}.

5. Compaction and retention — what is kept, what can be dropped

Currently Fireproof’s compaction strategy preserves full ledger history while discarding unreferenced internal index blocks and other unreferenced storage to save space; the project notes that semantics for explicit data purging (system purge of old versions) are undecided pre‑1.0, so historical retention beyond the preserved ledger history is a policy/implementation choice under development ^[4]. In practice this means applications can expect verifiable history to be retained by default, but fine‑grained expiration or deletion semantics are not yet standardized ^[4].

6. Tradeoffs, operational implications, and open questions

The architecture favors verifiability, offline UX, and cheap server infrastructure at the cost of keeping historical ledgers (which raises storage and privacy tradeoffs) and relying on content‑addressed encryption that shifts complexity to block management and compaction; the docs explicitly recommend scoping ledgers to logical groups to avoid a single, unbounded ledger that grows indefinitely and complicates filtering and retention ^{[4] [1]}. What is not settled in the available reporting: exact retention defaults for published Fireproof Cloud, legal/PII deletion workflows, and operational tools for long‑term archival vs. purge — those are on the roadmap or product decisions outside the technical primitives described ^{[4] [1]}.

Want to dive deeper?

How does Fireproof’s Pail Merkle clock handle long‑running conflicting edits and automated conflict resolution?

What are best practices for ledger segmentation and retention policies when building apps on Fireproof?

How do Fireproof’s compaction and encryption interact with GDPR/right‑to‑be‑forgotten requests?

Your fact-checks

How do Fireproof Sites work under the hood and what precisely is retained?