OMGDB DOCS
// Querying

Aggregation Pipeline

Run a MongoDB-compatible subset of aggregation stages, group accumulators, and a recursive expression engine over an OMGDB collection.


An aggregation pipeline transforms a stream of documents through an ordered list of stages. In OMGDB a pipeline is a JSON array of stage documents — each stage is an object with exactly one $-prefixed key — passed to the CLI as a single JSON-string argument:

omgdb aggregate app.omgdb <collection> '[<stage>, <stage>, ...]'

The pipeline runs against the documents scanned from <collection>, then each stage feeds its output documents into the next. Stages, group accumulators, and expression operators are each dispatched by a single flat match on the operator name in the omgdb-agg crate. The design is deliberately additive — supporting a new operator means adding one match arm — but it is not a runtime registry: there is no registration API, trait objects, or plugin mechanism, and an unrecognized operator returns an “unknown aggregation operator” error.

Expressions are evaluated by a recursive engine. A string beginning with $ (e.g. "$a.b") is a field path resolved against the current document (missing paths yield null); any other string is a literal. An object whose first key starts with $ is a single-operator expression (and must have exactly one key); any other object is a literal sub-document evaluated field by field; arrays are evaluated element-wise.

This is a compatible subset of MongoDB’s aggregation framework, not a complete implementation. See query operators for the filter syntax used inside $match.

Stages

Stages run in array order. Each stage object must have exactly one $-key; otherwise the pipeline is malformed.

StageDescriptionExample
$matchKeeps documents matching a query filter (compiled via the query engine).{"$match":{"age":{"$gte":25}}}
$projectInclusion/exclusion projection plus computed fields. 1/true includes, 0/false excludes, any other value is an expression.{"$project":{"_id":0,"full":1}}
$addFields / $setAdds or overwrites top-level fields from evaluated expressions. $set is an exact alias.{"$addFields":{"sum":{"$add":["$a","$b"]}}}
$groupGroups by an _id expression and computes accumulators per group.{"$group":{"_id":"$dept","total":{"$sum":"$sal"}}}
$sortStable-sorts by one or more keys; each direction must be 1 or -1.{"$sort":{"age":-1}}
$limitKeeps at most the first N documents.{"$limit":1}
$skipDrops the first N documents.{"$skip":1}
$countCollapses the stream to one document with the named field set to the document count.{"$count":"n"}
$unwindEmits one document per element of an array field.{"$unwind":"$tags"}
$replaceRoot / $replaceWithPromotes an embedded document (or an expression that evaluates to an object) to the root.{"$replaceRoot":{"newRoot":"$meta"}}
$lookupLeft-outer join of another collection by field equality. Requires a store.{"$lookup":{"from":"items","localField":"item","foreignField":"_id","as":"itemDocs"}}
$facetRuns several named sub-pipelines over the same input and emits one document of result arrays.{"$facet":{"count":[{"$count":"n"}]}}

Stage notes

  • $project operates on top-level fields. Inclusion and exclusion cannot be mixed (except for _id, which may be excluded with _id:0 in an otherwise-inclusion projection). An inclusion projection keeps _id first unless _id:0.
  • $group requires an _id. Every non-_id field must be a single-accumulator object. Groups are keyed by the canonical JSON of the _id value and stored in an ordered map, so output is ordered by that canonical key rather than by input order.
  • $sort keys support dotted paths. Missing field values sort before present ones. A direction other than 1 or -1 is a malformed error.
  • $unwind accepts a "$field" string or {"path":"$field"}. It operates on a top-level field (the leading $ is stripped). A missing or null array value silently drops the document; a non-array value is passed through as a single document. There are no preserveNullAndEmptyArrays or includeArrayIndex options.
  • $replaceRoot errors if newRoot does not evaluate to an object. $replaceWith: <expr> is sugar for $replaceRoot: { newRoot: <expr> }.
  • $facet collapses the stream to a single document. Sub-pipelines inherit the store, so they may use $lookup.

Note: $lookup joins another collection and therefore needs a backing store. It is available only when the pipeline runs against a collection (as the omgdb aggregate command does); running a pipeline over an in-memory document list without a store makes $lookup a malformed error. Only the from/localField/foreignField/as form is supported — there is no let + sub-pipeline join form. The join is array-aware on both sides: a scalar on either side matches an element of an array on the other.

Limitation: $bucket and $bucketAuto are not implemented. There is no stage arm for them, and they will return an unknown-operator error.

Accumulators

Accumulators appear inside $group, one per output field, applied to the values produced by evaluating the accumulator’s argument expression for each document in the group.

AccumulatorDescriptionExample
$sumSums numeric inputs; all-integer inputs yield an integer (checked, overflow is an error), otherwise a float. Non-numeric values are ignored. Also used as a counter via {"$sum":1}.{"total":{"$sum":"$sal"}}
$avgMean of numeric inputs as a float; ignores non-numeric values; null if there are none.{"avg":{"$avg":"$sal"}}
$minSmallest value by the engine’s comparison ordering; null for empty input.{"lo":{"$min":"$sal"}}
$maxLargest value by the comparison ordering; null for empty input.{"hi":{"$max":"$sal"}}
$firstFirst accumulated value in the group (input order); null if empty.{"f":{"$first":"$sal"}}
$lastLast accumulated value in the group; null if empty.{"l":{"$last":"$sal"}}
$pushCollects all accumulated values into an array.{"all":{"$push":"$sal"}}

Limitation: $addToSet is not implemented. Use $push if duplicates are acceptable.

Expression operators

Expression operators are used inside $project, $addFields/$set, $group accumulator arguments, $replaceRoot, and as $match-free computed values. They take a single argument or an argument array, each element of which is itself an expression evaluated against the current document.

Literal and conditional

OperatorDescriptionExample
$literalReturns its operand unevaluated, so $-prefixed strings and operator objects are treated as literal data.{"$literal":"$notAFieldPath"}
$condTernary: returns the then branch when the condition is truthy, else else. Accepts [if, then, else] or {if, then, else}.{"$cond":[{"$gte":["$age",18]},true,false]}
$ifNullReturns the first argument unless it is null, in which case the second. Exactly 2 arguments.{"$ifNull":["$nickname","anon"]}
$switchEvaluates branches in order, returning the then of the first truthy case; falls back to default.{"$switch":{"branches":[{"case":{"$gte":["$score",90]},"then":"A"}],"default":"F"}}
$typeBSON-style type name of the argument as a string (e.g. an integer reports "long").{"$type":"$score"}

Note: No matching $switch branch and no default is a type error.

Arithmetic

All arithmetic operators require numeric arguments; a non-numeric argument is a type error.

OperatorDescriptionExample
$addAdds arguments; all-integer args stay integer (checked overflow), otherwise float.{"$add":["$a","$b"]}
$subtractFirst minus second; integer when both integral (checked), else float. Exactly 2 arguments.{"$subtract":["$a","$b"]}
$multiplyMultiplies arguments; integer when all integral (checked overflow), else float.{"$multiply":[2,3]}
$divideFirst divided by second; always returns a float. Divide-by-zero is a type error. Exactly 2 arguments.{"$divide":["$a",2]}
$modRemainder of first by second; integer when both integral (checked), else float. Mod-by-zero is a type error.{"$mod":["$a",-1]}

Comparison

Each comparison takes exactly 2 arguments and returns a boolean, comparing under the engine’s ordering.

OperatorDescriptionExample
$eqTrue when the arguments compare equal.{"$eq":["$a",1]}
$neTrue when the arguments are not equal.{"$ne":["$a",1]}
$gtTrue when the first is greater than the second.{"$gt":["$score",90]}
$gteTrue when the first is greater than or equal to the second.{"$gte":["$score",90]}
$ltTrue when the first is less than the second.{"$lt":["$a",10]}
$lteTrue when the first is less than or equal to the second.{"$lte":["$a",10]}

Boolean logic

A value is falsy if it is null, false, integer 0, or float 0.0; everything else (including empty string or array) is truthy.

OperatorDescriptionExample
$andTrue when all arguments are truthy.{"$and":[{"$gte":["$a",1]},{"$lt":["$a",9]}]}
$orTrue when any argument is truthy.{"$or":[{"$eq":["$a",1]},{"$eq":["$a",2]}]}
$notNegates the truthiness of its (first) argument.{"$not":["$flag"]}

Note: $and and $or do not short-circuit — all arguments are evaluated before the result is combined.

Math

OperatorDescriptionExample
$absAbsolute value; integer stays integer (checked overflow), float stays float.{"$abs":-7}
$ceilSmallest integer-valued result not less than the number; an integer is returned unchanged.{"$ceil":2.1}
$floorLargest integer-valued result not greater than the number; an integer is returned unchanged.{"$floor":2.9}
$roundRounds to the nearest integer value (ties away from zero); an integer is returned unchanged.{"$round":2.5}
$truncTruncates toward zero to an integer value; an integer is returned unchanged.{"$trunc":2.9}
$sqrtSquare root as a float.{"$sqrt":16}

Limitation: $round and $trunc take no precision/decimal-places argument (unlike MongoDB). $ceil/$floor/$round/$trunc applied to a float return a float with an integral value (e.g. 3.0), not an integer-typed value.

Strings

OperatorDescriptionExample
$concatConcatenates string arguments; returns null if any argument is null; a non-string, non-null argument is a type error.{"$concat":["$first"," ","$last"]}
$toUpperUppercases a string; null/missing yields an empty string.{"$toUpper":"$name"}
$toLowerLowercases a string; null/missing yields an empty string.{"$toLower":"$name"}
$strLenCPNumber of Unicode code points in a string; a non-string argument is a type error.{"$strLenCP":"héllo"}
$splitSplits a string by a delimiter into an array; an empty delimiter returns the whole string as a single element. Requires [string, string].{"$split":["a,b,c",","]}

Arrays

OperatorDescriptionExample
$sizeLength of an array as an integer; a non-array argument is a type error.{"$size":"$tags"}
$arrayElemAtElement at an integer index; negative indexes count from the end; out-of-range returns null. Requires [array, integer].{"$arrayElemAt":["$tags",-1]}
$inTrue when the first argument equals any element of the second (array) argument. Requires [value, array].{"$in":["b",["a","b"]]}
$isArrayTrue when the argument is an array.{"$isArray":"$tags"}
$concatArraysConcatenates array arguments into one array; a non-array argument is a type error.{"$concatArrays":[[1,2],[3]]}

Limitation: System variables ($$ROOT, $$CURRENT, $$NOW, etc.) are unsupported — a $$-prefixed string raises a malformed error. There are no $map, $filter, $reduce, $mergeObjects, $arrayToObject, $reverseArray, $sortArray, date operators, $regexMatch, $substr/$substrCP, $trim, or type-conversion operators ($toString, $toInt, …).

Worked example

Group by a field with $sum and $avg, then sort the groups. Given a salaries collection of {"dept": ..., "sal": ...} documents:

omgdb aggregate app.omgdb salaries '[
  {"$group":{"_id":"$dept","total":{"$sum":"$sal"},"avg":{"$avg":"$sal"},"n":{"$sum":1}}},
  {"$sort":{"total":-1}}
]'

For input documents:

[
  {"dept":"a","sal":100},
  {"dept":"a","sal":200},
  {"dept":"b","sal":50}
]

The $group stage produces one document per department, then $sort orders them by total descending:

[
  {"_id":"a","total":300,"avg":150.0,"n":2},
  {"_id":"b","total":50,"avg":50.0,"n":1}
]

Note that total is an integer (300) because every summed value was an integer, while avg is a float (150.0). The {"$sum":1} accumulator counts documents per group.

A second example combines computed fields with a projection — adding a concatenated full field, then keeping only it:

omgdb aggregate app.omgdb people '[
  {"$addFields":{"full":{"$concat":["$first"," ","$last"]}}},
  {"$project":{"_id":0,"full":1}}
]'

For {"first":"ada","last":"lovelace"} this yields {"full":"ada lovelace"}.

Error behavior

Pipeline errors are surfaced rather than silently swallowed:

  • A pipeline that is not an array, or a stage that is not a single $-operator object, is a malformed error.
  • An unrecognized stage, accumulator, or expression operator is an unknown operator error.
  • An operand of the wrong type (e.g. $concat on a number, $divide by zero, integer overflow in $add/$subtract/$multiply/$mod/$sum) is a type error. Integer arithmetic uses checked operations, so overflow returns an error instead of panicking.
  • A $match filter that fails to compile surfaces the underlying query error.

Edit this page on GitHub →