OMGDB is a single Cargo workspace (resolver = "2", members = ["crates/*"]) split into eight focused crates. The shape of the codebase mirrors the central design idea: there is one storage spine that owns the durable, human-readable source of truth, and every other capability — querying, aggregation, introspection, vector search, agent-safe mutations — is a layer built on top of it that holds no authoritative state of its own.
This page describes the crate layout, how the crates depend on one another bottom-to-top, the “text is canonical, binary is a rebuildable cache” spine that the whole design rests on, and the engineering practices that keep the workspace honest. For the on-disk details see storage; for the transaction model see transactions.
The eight crates
The workspace is dual-licensed MIT OR Apache-2.0. Each crate has a single, narrow responsibility.
| Crate | Responsibility |
|---|---|
omgdb-core | The storage spine: value model + canonical (bit-exact, deterministic) codec, the op-log (framed NDJSON + per-record CRC32), replay/fold, the in-memory Store, ACID transactions, secondary equality/multikey/range indexes, validation, compaction, the integrity check (verify), and the single-process advisory store lock. |
omgdb-query | Filter compilation and matching (MongoDB-style filters, dotted paths, array-contains semantics), the single total-order compare/value ordering, projection, explain, diagnose (why-not selectivity), and did-you-mean operator suggestions. |
omgdb-agg | The aggregation pipeline over Document values: stage dispatch plus a recursive expression engine, including $lookup, $facet, and $replaceRoot/$replaceWith. |
omgdb-change | Agent-safe mutations: dry-run plan_update (writes only a legible pending plan, no data mutation), apply_change (executes a plan in one transaction by token), and rollback (restores recorded before-documents). |
omgdb-introspect | Self-describing introspection: per-field schema inference (types + presence), Markdown describe (a live manual), JSON describe, and deterministic canonical dump. |
omgdb-md | Native Markdown support: parse frontmatter into typed fields and headings into a section tree, producing a queryable document. |
omgdb-vector | Local vector search: a pluggable Embedder trait + a deterministic offline HashingEmbedder, flat exact cosine kNN, hybrid search (structured pre-filter + semantic ranking), persisted/synced vectors with provenance + staleness, and token-budgeted cited context packs. |
omgdb-cli | The omgdb binary (26 subcommands) plus the stdio JSON-RPC MCP server (src/mcp.rs) with capability-scope enforcement. |
Note: There is no
omgdb-index,omgdb-api, oromgdb-mcpcrate. Indexes live insideomgdb-core, and the MCP server lives insideomgdb-cli(src/mcp.rs). Older design sketches that list those crates as separate members are stale.
Layering, bottom to top
The crates form a clean acyclic dependency graph. omgdb-core sits at the bottom and depends on no other workspace crate — it is the storage spine and defines the Value/Document model, the codec, and the Store. Everything else builds upward from there:
omgdb-cli
(binary + MCP server)
│ depends on all seven below
┌──────────┬──────────┬──┴────────┬──────────┬──────────┐
omgdb-agg omgdb-change omgdb- omgdb- omgdb-md omgdb-query
introspect vector │ │
│ │ │ │ │ │
└───────────┴─── omgdb-query ───────┘ │ │
│ │ │
└──── omgdb-core ──────┴───────────┘
The actual [dependencies] confirm the layers:
omgdb-core— no internal dependencies. It is the foundation.omgdb-queryandomgdb-md— depend onomgdb-coreonly.omgdb-agg,omgdb-change,omgdb-introspect, andomgdb-vector— each depend onomgdb-core+omgdb-query(they all need to read and match documents through the same total-order semantics).omgdb-cli— depends on all seven library crates and ties them together into theomgdbbinary, and additionally hosts the MCP server.
Because querying, aggregation, change planning, introspection, Markdown parsing, and vector search all sit above core and never reach around it, they cannot smuggle authoritative state outside the op-log. That structural constraint is what makes the storage invariants enforceable rather than aspirational.
The text-canonical spine
The defining idea of OMGDB is that the on-disk source of truth is an append-only, human/LLM-readable NDJSON operation log — <store>/oplog.ndjson, with a per-record CRC32 — that you can cat. Every index, vector record, and cache is a derived, rebuildable artifact that holds zero authoritative bits. This legibility is what lets the database explain (describe), verify (verify), and repair (repair) itself.
A live store is a directory bundle (<store>/oplog.ndjson). The single-file .omgdb form is produced by pack/unpack as a transport/archive format — it is not the live storage engine. The physical log line format is:
<canonical-json>\t<crc32-hex>\n
with supported log ops insert, replace, delete, begin, commit, abort, define, and create_index.
The three invariants
The whole architecture is held to three invariants. They are stated briefly here; see storage and transactions for the full treatment.
- I1 — Text completeness.
oplog.ndjsonfully determines the logical state; caches and indexes hold zero authoritative bits. - I2 — Rebuild equivalence. Deleting and rebuilding derived (binary cache) state from the log yields identical query-visible behavior after rebuild — same results, same order under the engine’s defined total order. Note that byte-identical binary rebuild is an explicit non-goal; I2 is behavioral equivalence only.
- I3 — Export stability. Canonical export is deterministic:
dump -> load -> dumpis byte-identical for the canonical representation. This is the headline, property-tested claim.
Engineering practices
The workspace enforces a consistent set of practices across all crates.
Error handling
Library crates that define their own error types use thiserror (omgdb-core, -query, -agg, -change); omgdb-introspect and omgdb-vector reuse the lower-layer error types and pull in no extra error crate. Only the omgdb-cli binary pulls in anyhow. Library code paths avoid unwrap/expect — fallible operations return typed errors rather than panicking. This keeps the libraries embeddable and lets the binary be the single place that flattens errors for human-facing reporting.
| Layer | Error crate | Panics |
|---|---|---|
Library crates that define error types (omgdb-core, -query, -agg, -change) | thiserror = "2" | no unwrap/expect in library code paths |
Binary (omgdb-cli) | anyhow = "1" | top-level error flattening only |
Tests and quality gates
The three storage invariants (I1/I2/I3) are backed by property tests rather than examples alone. The full quality-gate set runs on every commit (and is what CI runs, in order):
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all
cargo doc --no-deps --all-features
cargo test --doc --all
Shared lints are applied to every crate via the workspace (unsafe_code, missing_docs, and rust_2018_idioms set to warn; clippy all = warn). The release profile uses lto = "thin" and codegen-units = 1.
Toolchain and CI
The toolchain is Rust stable, edition 2021, with an MSRV (rust-version) of 1.89 — required for File::try_lock, the advisory store lock that enforces single-process access. CI runs the latest stable toolchain on Linux, Windows, and macOS, plus a dedicated MSRV check.
Limitation: OMGDB is an early project (v0.0). The durability and codec layers are hardened — crash recovery, the
repairtool, a bit-exact canonical codec proven by a property test, a crash-truncation matrix, and the cross-platform + MSRV CI — but it is not yet production-ready at scale. The entire dataset is held in RAM and rebuilt by replaying the whole log on open, so it does not scale beyond available memory; there is no paged/mmap binary store yet. A store is single-process (it takes an exclusive advisory lock on open), with no multi-reader/multi-writer concurrency model.
Where each capability lives
If you are looking for the code (or docs) behind a feature, the crate boundary is the map:
| You want… | Crate | Docs |
|---|---|---|
| The value model and on-disk format | omgdb-core | data model, storage |
| ACID semantics and the single-writer borrow | omgdb-core | transactions |
| Secondary indexes (equality/range/multikey) | omgdb-core | indexes |
| Filter operators and matching | omgdb-query | query operators |
| Pipelines, stages, and expressions | omgdb-agg | aggregation |
plan-update / apply / rollback | omgdb-change | agent mutations |
describe, schema inference, dump | omgdb-introspect | introspection |
| Markdown import | omgdb-md | markdown |
| Vector search and context packs | omgdb-vector | vector search, context packs |
The omgdb binary and the MCP server | omgdb-cli | cli, mcp |