Compute batching (#849)

### Compute Batch Submission - Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads - Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure - Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps` ### Retracted Action State - Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount` - Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession` - Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints - Add state machine documentation and per-state comments to `RunnerAction` ### Compute Race Fixes - Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely - Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK ### Logging Optimization and ANSI improvements - Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex` - Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage` - Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h` - Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences - Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files ### CLI / Exec Refactoring - Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct - Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`) - Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser ### Testing Improvements - Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter - Add test suite banners to test listener output - Made `function.session.abandon_pending` test more robust ### Startup / Reliability Fixes - Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()` - Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing - Fail on unrecognized zenserver `--mode` instead of silently defaulting to store ### Other - Show host details (hostname, platform, CPU count, memory) when discovering new compute workers - Move frontend `html.zip` from source tree into build directory - Add format specifications for Compact Binary and Compressed Buffer wire formats - Add `WriteCompactBinaryObject` to zencore - Extended `ConsoleTui` with additional functionality - Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support - Disable compute/horde/nomad in release builds (not yet production-ready) - Disable unintended `ASIO_HAS_IO_URING` enablement - Fix crashpad patch missing leading whitespace - Clean up code triggering gcc false positives
author: Stefan Boberg <[email protected]> 2026-03-18 11:19:10 +0100
committer: GitHub Enterprise <[email protected]> 2026-03-18 11:19:10 +0100
commit: eba410c4168e23d7908827eb34b7cf0c58a5dc48 (patch)
tree: 3cda8e8f3f81941d3bb5b84a8155350c5bb2068c /docs
parent: bugfix release - v5.7.23 (#851) (diff)
download: zen-eba410c4168e23d7908827eb34b7cf0c58a5dc48.tar.xz
zen-eba410c4168e23d7908827eb34b7cf0c58a5dc48.zip
2 files changed, 848 insertions, 0 deletions
diff --git a/docs/specs/CompactBinary.md b/docs/specs/CompactBinary.md
new file mode 100644
index 000000000..d8cccbd1e
--- /dev/null
+++ b/docs/specs/CompactBinary.md
@@ -0,0 +1,663 @@
+# Compact Binary Format Specification
+
+**Version:** 1.0
+
+## Overview
+
+Compact Binary (CB) is a binary serialization format designed for efficient storage and
+transmission of structured data. It is self-describing, supports nested objects and arrays,
+and optimizes for minimal size through variable-length encoding and uniform container
+optimizations.
+
+Key design goals:
+
+- **Compact representation** — variable-length integers, type elision in uniform containers
+- **Self-describing** — every field carries its own type; no external schema required
+- **Safe to traverse** — a reader can skip any field without understanding its type
+- **Deterministic** — a canonical encoding exists so that byte-identical payloads compare equal
+- **Hashable** — every field and object can be content-addressed with a stable hash
+
+## 1. Notation
+
+| Symbol        | Meaning |
+|---------------|---------|
+| `byte`        | An unsigned 8-bit integer (octet). |
+| `VarUInt(v)`  | A variable-length unsigned integer encoding of value `v` (see §2). |
+| `BE32(v)`     | A 32-bit value stored in big-endian (network) byte order. |
+| `BE64(v)`     | A 64-bit value stored in big-endian (network) byte order. |
+| `+`           | Concatenation of byte sequences. |
+
+All multi-byte numeric values in this specification are stored in **big-endian** byte order
+unless stated otherwise.
+
+---
+
+## 2. Variable-Length Unsigned Integer (VarUInt)
+
+VarUInt encodes a 64-bit unsigned integer in 1–9 bytes. The number of leading 1-bits in the
+first byte indicates how many *additional* bytes follow. The remaining bits of the first byte,
+concatenated with the additional bytes in big-endian order, form the integer value.
+
+### 2.1 Encoding table
+
+| Leading 1-bits | Total bytes | First byte pattern | Value range |
+|:-:|:-:|---|---|
+| 0 | 1 | `0b0_______` | `0x00` – `0x7F` |
+| 1 | 2 | `0b10______` | `0x80` – `0x3FFF` |
+| 2 | 3 | `0b110_____` | `0x4000` – `0x1F_FFFF` |
+| 3 | 4 | `0b1110____` | `0x20_0000` – `0x0FFF_FFFF` |
+| 4 | 5 | `0b11110___` | `0x1000_0000` – `0x07_FFFF_FFFF` |
+| 5 | 6 | `0b111110__` | `0x08_0000_0000` – `0x03FF_FFFF_FFFF` |
+| 6 | 7 | `0b1111110_` | `0x0400_0000_0000` – `0x01_FFFF_FFFF_FFFF` |
+| 7 | 8 | `0b11111110` | `0x02_0000_0000_0000` – `0xFF_FFFF_FFFF_FFFF` |
+| 8 | 9 | `0b11111111` | `0x0100_0000_0000_0000` – `0xFFFF_FFFF_FFFF_FFFF` |
+
+### 2.2 Measuring the byte count from encoded data
+
+Count the number of leading 1-bits in the first byte (equivalently, count leading zeros of the
+bitwise complement). The byte count is that number plus one:
+
+```
+ByteCount = CountLeadingOnes(FirstByte) + 1
+```
+
+### 2.3 Reading
+
+1. Determine `ByteCount` from the first byte.
+2. Mask the first byte: `Value = FirstByte & (0xFF >> ByteCount)`.
+3. For each subsequent byte (in order): `Value = (Value << 8) | NextByte`.
+
+### 2.4 Writing
+
+1. Determine `ByteCount` from the value magnitude (see table above).
+2. Store `ByteCount - 1` trailing bytes from the value in big-endian order.
+3. Set the first byte to the remaining most-significant bits of the value, OR'd with a prefix
+   mask of `0xFF << (9 - ByteCount)`.
+
+### 2.5 Canonical form
+
+A VarUInt is canonical when it uses the minimum number of bytes required for its value. Format
+validation (§9) rejects non-canonical VarUInt encodings.
+
+### 2.6 Byte-order preservation
+
+Encoded VarUInt values sort identically in a byte-wise (lexicographic) comparison as when their
+decoded unsigned values are compared numerically. This property does **not** hold for signed
+integers encoded via ZigZag.
+
+### 2.7 Encoding examples
+
+| Value | Encoded bytes |
+|-------|---------------|
+| `0x01` | `01` |
+| `0x7F` | `7F` |
+| `0x80` | `80 80` |
+| `0x123` | `81 23` |
+| `0x1234` | `92 34` |
+| `0x12345` | `C1 23 45` |
+| `0x123456` | `D2 34 56` |
+| `0x1234567` | `E1 23 45 67` |
+| `0x12345678` | `F0 12 34 56 78` |
+| `0x123456789ABCDEF0` | `FF 12 34 56 78 9A BC DE F0` |
+
+---
+
+## 3. Field Type
+
+Every field has a type, stored as a single byte. The low 6 bits identify the type; the upper
+2 bits are flags.
+
+### 3.1 Type byte layout
+
+```
+  Bit 7       Bit 6       Bits 5..0
+┌───────────┬───────────┬──────────────────┐
+│ HasFieldName │ HasFieldType │  Type ID (0x00–0x3F) │
+└───────────┴───────────┴──────────────────┘
+```
+
+### 3.2 Flags
+
+| Flag | Value | Meaning |
+|------|-------|---------|
+| `HasFieldType` | `0x40` | **Transient.** The type byte is stored inline before the field payload. Set on fields in non-uniform containers. This flag is **not** persisted when hashing or serializing the type for comparison purposes. |
+| `HasFieldName` | `0x80` | **Persisted.** The field has a name (used inside objects). |
+
+### 3.3 Type identifiers
+
+> **Stability notice:** Type values are fixed for backward compatibility and must never change.
+
+| ID | Name | Payload |
+|----|------|---------|
+| `0x00` | None | *(invalid — must not appear in valid data)* |
+| `0x01` | Null | Empty (0 bytes) |
+| `0x02` | Object | `VarUInt(Size)` + fields (see §5) |
+| `0x03` | UniformObject | `VarUInt(Size)` + `FieldType(1)` + fields (see §5) |
+| `0x04` | Array | `VarUInt(Size)` + `VarUInt(Count)` + fields (see §6) |
+| `0x05` | UniformArray | `VarUInt(Size)` + `VarUInt(Count)` + `FieldType(1)` + fields (see §6) |
+| `0x06` | Binary | `VarUInt(Size)` + raw bytes |
+| `0x07` | String | `VarUInt(Size)` + UTF-8 bytes (no null terminator) |
+| `0x08` | IntegerPositive | `VarUInt(Value)` — non-negative integer (0 to 2^64−1) |
+| `0x09` | IntegerNegative | `VarUInt(~Value)` — negative integer (−1 to −2^63) |
+| `0x0A` | Float32 | `BE32(IEEE 754 binary32)` — 4 bytes |
+| `0x0B` | Float64 | `BE64(IEEE 754 binary64)` — 8 bytes |
+| `0x0C` | BoolFalse | Empty (0 bytes) |
+| `0x0D` | BoolTrue | Empty (0 bytes) |
+| `0x0E` | ObjectAttachment | 20 raw bytes — hash of a compact binary attachment |
+| `0x0F` | BinaryAttachment | 20 raw bytes — hash of a binary attachment |
+| `0x10` | Hash | 20 raw bytes — hash digest |
+| `0x11` | Uuid | 16 bytes — UUID (see §4.10) |
+| `0x12` | DateTime | `BE64(int64 ticks)` — 8 bytes (see §4.11) |
+| `0x13` | TimeSpan | `BE64(int64 ticks)` — 8 bytes (see §4.12) |
+| `0x14` | ObjectId | 12 raw bytes — opaque identifier |
+| `0x1E` | CustomById | `VarUInt(Size)` + `VarUInt(TypeId)` + payload (see §4.14) |
+| `0x1F` | CustomByName | `VarUInt(Size)` + `VarUInt(NameLen)` + name + payload (see §4.14) |
+| `0x20` | *(Reserved)* | Reserved for future flags. Do not define types in this range. |
+
+### 3.4 Type family classification
+
+Several types form families that can be recognized by bitmask tests on the type ID (low 6 bits):
+
+| Family | Mask | Base | Members |
+|--------|------|------|---------|
+| Object | `0x3E` | `0x02` | Object, UniformObject |
+| Array | `0x3E` | `0x04` | Array, UniformArray |
+| Integer | `0x3E` | `0x08` | IntegerPositive, IntegerNegative |
+| Float | `0x3C` | `0x08` | Float32, Float64, IntegerPositive, IntegerNegative |
+| Bool | `0x3E` | `0x0C` | BoolFalse, BoolTrue |
+| Attachment | `0x3E` | `0x0E` | ObjectAttachment, BinaryAttachment |
+
+A type belongs to a family when `(TypeID & Mask) == Base`.
+
+Note that the Float family intentionally includes integer types because integers can be
+implicitly converted to floating-point when reading.
+
+---
+
+## 4. Field Types in Detail
+
+### 4.1 Null (`0x01`)
+
+Represents an absent or null value. Payload is empty.
+
+### 4.2 Binary (`0x06`)
+
+An arbitrary byte sequence.
+
+```
+VarUInt(ByteCount) + Bytes[ByteCount]
+```
+
+### 4.3 String (`0x07`)
+
+A UTF-8 encoded text string, **not** null-terminated. `Size` is the byte length, not the
+character count.
+
+```
+VarUInt(ByteCount) + UTF8Bytes[ByteCount]
+```
+
+Canonical form requires valid UTF-8 (validated in Format mode, §9).
+
+### 4.4 IntegerPositive (`0x08`)
+
+A non-negative integer in the range [0, 2^64−1].
+
+```
+VarUInt(Value)
+```
+
+### 4.5 IntegerNegative (`0x09`)
+
+A negative integer in the range [−2^63, −1]. The payload is the ones' complement of the
+value encoded as a VarUInt:
+
+```
+VarUInt(~Value)
+```
+
+Where `~` is bitwise NOT. For example, −1 is encoded as `VarUInt(0)`, −42 is encoded as
+`VarUInt(41)`.
+
+To decode: read the VarUInt magnitude `M`, then `Value = ~M` (equivalently, `Value = M ^ -1`,
+or `Value = -(M + 1)`).
+
+> **Important:** This is ones' complement encoding, **not** ZigZag encoding. The VarInt
+> functions in the codebase (which use ZigZag) are a separate encoding used elsewhere; Compact
+> Binary integer fields use the type-tag approach with ones' complement for negatives.
+
+### 4.6 Float32 (`0x0A`)
+
+A 32-bit IEEE 754 binary32 floating-point value in big-endian byte order.
+
+```
+BE32(float32_bits)   — 4 bytes
+```
+
+### 4.7 Float64 (`0x0B`)
+
+A 64-bit IEEE 754 binary64 floating-point value in big-endian byte order.
+
+```
+BE64(float64_bits)   — 8 bytes
+```
+
+**Canonical form:** A Float64 value that can be represented exactly as Float32 (i.e., where
+`(double)(float)value == value`) should be encoded as Float32 instead. Format validation
+(§9) flags this as `InvalidFloat`.
+
+### 4.8 Bool (`0x0C` / `0x0D`)
+
+Boolean values are encoded purely by their type — there is no payload.
+
+- `BoolFalse` (`0x0C`): payload is empty.
+- `BoolTrue` (`0x0D`): payload is empty.
+
+### 4.9 Hash (`0x10`), ObjectAttachment (`0x0E`), BinaryAttachment (`0x0F`)
+
+All three are 20 raw bytes representing a hash digest. There is no length prefix — the size is
+fixed.
+
+```
+Bytes[20]
+```
+
+- **Hash** — a general-purpose hash digest.
+- **ObjectAttachment** — a hash referencing an external Compact Binary object.
+- **BinaryAttachment** — a hash referencing external raw binary data.
+
+The hash algorithm is determined by the application context (the format itself does not
+prescribe a specific hash algorithm, though the reference implementation uses BLAKE3 truncated
+to 160 bits).
+
+### 4.10 Uuid (`0x11`)
+
+A 128-bit UUID/GUID, stored as four 32-bit unsigned integers in big-endian byte order.
+
+```
+BE32(A) + BE32(B) + BE32(C) + BE32(D)   — 16 bytes total
+```
+
+The four components (A, B, C, D) correspond to the four 32-bit segments of the UUID when
+read as a sequence of big-endian 32-bit words. For an RFC 4122 UUID string
+`"aabbccdd-eeff-0011-2233-445566778899"`:
+
+- `A = 0xAABBCCDD`
+- `B = 0xEEFF0011`
+- `C = 0x22334455`
+- `D = 0x66778899`
+
+### 4.11 DateTime (`0x12`)
+
+A date and time value encoded as a big-endian signed 64-bit integer counting 100-nanosecond
+ticks since the epoch **0001-01-01 00:00:00.0000000**.
+
+```
+BE64(int64 Ticks)   — 8 bytes
+```
+
+Valid range: 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999.
+
+Reference tick constants:
+
+| Unit | Ticks |
+|------|-------|
+| Microsecond | 10 |
+| Millisecond | 10,000 |
+| Second | 10,000,000 |
+| Minute | 600,000,000 |
+| Hour | 36,000,000,000 |
+| Day | 864,000,000,000 |
+
+### 4.12 TimeSpan (`0x13`)
+
+A duration encoded as a big-endian signed 64-bit integer counting 100-nanosecond ticks. May be
+negative.
+
+```
+BE64(int64 Ticks)   — 8 bytes
+```
+
+Uses the same tick unit as DateTime (§4.11).
+
+### 4.13 ObjectId (`0x14`)
+
+A 12-byte opaque identifier. There is no length prefix — the size is fixed.
+
+```
+Bytes[12]
+```
+
+### 4.14 Custom types
+
+Custom types allow extending the format with application-specific types.
+
+**CustomById (`0x1E`):**
+
+```
+VarUInt(TotalSize) + VarUInt(TypeId) + Payload[TotalSize - sizeof(VarUInt(TypeId))]
+```
+
+`TotalSize` is the combined byte count of the encoded TypeId VarUInt and the Payload.
+
+**CustomByName (`0x1F`):**
+
+```
+VarUInt(TotalSize) + VarUInt(NameByteCount) + Name[NameByteCount] + Payload[remainder]
+```
+
+`TotalSize` is the combined byte count of the encoded name-length VarUInt, the name bytes, and
+the payload. The name is UTF-8 encoded, not null-terminated.
+
+---
+
+## 5. Objects
+
+An object is an unordered collection of uniquely named fields. There are two encoding forms:
+
+### 5.1 Non-uniform Object (`0x02`)
+
+Used when fields have different types (or when the object is empty).
+
+```
+VarUInt(PayloadSize) + Field₁ + Field₂ + … + Fieldₙ
+```
+
+`PayloadSize` is the total byte count of all encoded fields (not including the `PayloadSize`
+VarUInt itself or the container's own type byte).
+
+Each field is encoded as:
+
+```
+TypeByte + VarUInt(NameByteCount) + Name[NameByteCount] + FieldPayload
+```
+
+The `TypeByte` includes both `HasFieldType` (`0x40`) and `HasFieldName` (`0x80`) flags OR'd
+with the type ID — i.e., the stored type byte is `TypeID | 0xC0`.
+
+### 5.2 Uniform Object (`0x03`)
+
+Used when every field has the same type. The shared type is stored once in the header and
+omitted from individual fields.
+
+```
+VarUInt(PayloadSize) + FieldType(1 byte) + Field₁ + Field₂ + … + Fieldₙ
+```
+
+`PayloadSize` includes the 1-byte field type and all field bytes.
+
+Each field is encoded as:
+
+```
+VarUInt(NameByteCount) + Name[NameByteCount] + FieldPayload
+```
+
+The individual fields do **not** include a type byte. They do retain the `HasFieldName` flag
+behavior (names are present), but the type is provided by the container header.
+
+### 5.3 Empty Object
+
+An empty non-uniform object is 2 bytes: type byte + `VarUInt(0)`.
+
+```
+0x02 0x00
+```
+
+(A uniform object cannot be empty because there is no type to store.)
+
+### 5.4 Object field constraints
+
+- Field names must be non-empty.
+- Field names must be unique within the object (case-sensitive comparison).
+- Field names are UTF-8 encoded, not null-terminated.
+- Field ordering is not prescribed by the format but is significant for equality comparison —
+  two objects with the same fields in a different order are byte-wise different.
+
+---
+
+## 6. Arrays
+
+An array is an ordered collection of unnamed fields. There are two encoding forms:
+
+### 6.1 Non-uniform Array (`0x04`)
+
+Used when items have different types.
+
+```
+VarUInt(PayloadSize) + VarUInt(ItemCount) + Field₁ + Field₂ + … + Fieldₙ
+```
+
+`PayloadSize` is the total byte count of `VarUInt(ItemCount)` plus all encoded fields.
+
+Each field is encoded as:
+
+```
+TypeByte + FieldPayload
+```
+
+The `TypeByte` includes the `HasFieldType` flag (`0x40`) OR'd with the type ID. Fields in
+arrays do **not** have names.
+
+### 6.2 Uniform Array (`0x05`)
+
+Used when every item has the same type **and** every item has a non-zero-byte encoding.
+
+```
+VarUInt(PayloadSize) + VarUInt(ItemCount) + FieldType(1 byte) + Field₁ + … + Fieldₙ
+```
+
+`PayloadSize` includes `VarUInt(ItemCount)`, the 1-byte field type, and all field bytes.
+
+Each field is encoded as just its payload — no type byte, no name.
+
+### 6.3 Empty Array
+
+An empty non-uniform array:
+
+```
+0x04 0x01 0x00
+```
+
+That is: type `0x04` + `VarUInt(1)` (payload size = 1 byte for the count) + `VarUInt(0)`
+(item count = 0).
+
+### 6.4 Uniform array constraints
+
+A uniform array **must not** be used when items have zero-byte payloads (e.g., all Null or all
+Bool fields). Because such items encode as zero bytes each, they would be indistinguishable,
+and the container would have no way to address individual items. Use a non-uniform array in
+these cases.
+
+---
+
+## 7. Top-Level Fields
+
+A Compact Binary payload at the top level is typically a single field. This field may or may
+not include its type byte, depending on context:
+
+- **With type:** The field starts with its type byte (with `HasFieldType` flag set). This is
+  the self-describing form used when the consumer does not know the type in advance.
+- **Without type:** The type is communicated out of band (e.g., the consumer knows to expect an
+  Object). The field begins directly with its payload.
+
+A top-level object field (the most common case) is encoded as:
+
+```
+TypeByte(0x02) + ObjectPayload
+```
+
+or without the type byte, just:
+
+```
+ObjectPayload
+```
+
+---
+
+## 8. Packages
+
+A package bundles a Compact Binary object with its external attachments (referenced via
+ObjectAttachment and BinaryAttachment fields). It is serialized as a sequence of unnamed
+top-level fields:
+
+### 8.1 Package structure
+
+```
+[Attachment₁] [Attachment₂] … [Object] [ObjectHash] [Attachment₃] … [Null]
+```
+
+- **Object** — An `Object` field containing the root compact binary data.
+- **ObjectHash** — An `ObjectAttachment` field (`0x0E`) containing the 20-byte hash of the
+  serialized root object. Omitted when the object is empty.
+- **Attachments** — Each attachment is a pair of fields:
+  1. A `Binary` field containing the attachment data.
+  2. A `BinaryAttachment` or `ObjectAttachment` field containing the hash of that data.
+     The hash field is omitted when the binary data is empty.
+- **Null** — A `Null` field (`0x01`) terminates the package.
+
+### 8.2 Ordering
+
+The canonical order is:
+
+1. Root object + its hash
+2. Attachments ordered by hash
+3. Null terminator
+
+However, it is valid for components to appear in any order as long as:
+- There is at most one root object.
+- The Null terminator is last.
+
+### 8.3 Package constraints
+
+- At most one root object per package.
+- No duplicate attachments (by hash).
+- No null/empty attachments.
+- Attachment hashes must match their data.
+
+---
+
+## 9. Validation
+
+Implementations should support the following validation modes:
+
+| Mode | Checks |
+|------|--------|
+| **Default** | All fields are within bounds and have recognized types. Minimum required for safe reading. |
+| **Names** | Object fields have unique, non-empty names. Array fields have no names. |
+| **Format** | Canonical encoding: minimal VarUInt sizes, Float64→Float32 demotion, uniform containers used when possible, valid UTF-8 in names and strings. |
+| **Padding** | No trailing bytes after the top-level field. |
+| **Package** | Package/attachment structure is well-formed. |
+| **PackageHash** | Stored hashes match computed hashes. |
+
+---
+
+## 10. Hashing
+
+The canonical hash of a field is computed over:
+
+1. The **serialized type byte** (type ID | `HasFieldName` flag; the `HasFieldType` flag is
+   stripped).
+2. The **name** (if present): `VarUInt(NameByteCount) + NameBytes`.
+3. The **payload**.
+
+This allows deterministic content-addressing of any field, object, or array.
+
+---
+
+## 11. Complete Encoding Examples
+
+### 11.1 Simple object
+
+An object with fields `"name": "Alice"` (String) and `"age": 30` (IntegerPositive):
+
+```
+02                     -- Object type
+  17                   -- VarUInt PayloadSize = 23 bytes
+    C7                 -- String | HasFieldType | HasFieldName
+    04                 -- VarUInt NameLen = 4
+    6E 61 6D 65        -- "name"
+    05                 -- VarUInt StringLen = 5
+    41 6C 69 63 65     -- "Alice"
+    C8                 -- IntegerPositive | HasFieldType | HasFieldName
+    03                 -- VarUInt NameLen = 3
+    61 67 65           -- "age"
+    1E                 -- VarUInt Value = 30
+```
+
+Total: 25 bytes.
+
+### 11.2 Uniform array of integers
+
+An array of three positive integers `[1, 2, 3]`:
+
+```
+05                     -- UniformArray type
+  06                   -- VarUInt PayloadSize = 6
+  03                   -- VarUInt ItemCount = 3
+  08                   -- FieldType = IntegerPositive
+  01                   -- VarUInt 1
+  02                   -- VarUInt 2
+  03                   -- VarUInt 3
+```
+
+Total: 8 bytes.
+
+### 11.3 Negative integer
+
+The value −42 encoded as a standalone field:
+
+```
+09                     -- IntegerNegative type
+29                     -- VarUInt(~(-42)) = VarUInt(41) = 0x29
+```
+
+### 11.4 Nested object
+
+An object containing a nested object:
+
+```
+02                     -- Outer Object type
+  0E                   -- VarUInt PayloadSize = 14
+    C2                 -- Object | HasFieldType | HasFieldName
+    05                 -- VarUInt NameLen = 5
+    69 6E 6E 65 72     -- "inner"
+    05                 -- VarUInt inner PayloadSize = 5
+      C8              -- IntegerPositive | HasFieldType | HasFieldName
+      01              -- VarUInt NameLen = 1
+      78              -- "x"
+      0A              -- VarUInt Value = 10
+```
+
+---
+
+## 12. Summary of Fixed-Size Payloads
+
+| Type | Payload size |
+|------|-------------|
+| Null | 0 |
+| BoolFalse | 0 |
+| BoolTrue | 0 |
+| Float32 | 4 |
+| Float64 | 8 |
+| Hash | 20 |
+| ObjectAttachment | 20 |
+| BinaryAttachment | 20 |
+| Uuid | 16 |
+| DateTime | 8 |
+| TimeSpan | 8 |
+| ObjectId | 12 |
+
+## 13. Summary of Variable-Size Payloads
+
+| Type | Payload structure |
+|------|-------------------|
+| Binary | `VarUInt(Size) + Bytes` |
+| String | `VarUInt(Size) + UTF8Bytes` |
+| IntegerPositive | `VarUInt(Value)` |
+| IntegerNegative | `VarUInt(~Value)` |
+| Object | `VarUInt(Size) + Fields` |
+| UniformObject | `VarUInt(Size) + Type + Fields` |
+| Array | `VarUInt(Size) + VarUInt(Count) + Fields` |
+| UniformArray | `VarUInt(Size) + VarUInt(Count) + Type + Fields` |
+| CustomById | `VarUInt(Size) + VarUInt(TypeId) + Data` |
+| CustomByName | `VarUInt(Size) + VarUInt(NameLen) + Name + Data` |
diff --git a/docs/specs/CompressedBuffer.md b/docs/specs/CompressedBuffer.md
new file mode 100644
index 000000000..11787e3e9
--- /dev/null
+++ b/docs/specs/CompressedBuffer.md
@@ -0,0 +1,185 @@
+# Compressed Buffer Format Specification
+
+**Version:** 1.0
+
+## Overview
+
+Compressed Buffer is a self-describing binary container for compressed data. It encodes the
+compression method, block layout, and integrity checksums so that a reader can decompress the
+payload without any external metadata.
+
+Key design goals:
+
+- **Self-describing** -- decompression requires no out-of-band knowledge of the compression method or original size
+- **Block-based** -- data is split into independently-decompressible blocks for random access and parallel processing
+- **Integrity-checked** -- CRC-32 on the header and BLAKE3 hash on the raw data
+- **Method-agnostic** -- supports multiple compression backends (None, Oodle, LZ4)
+
+## 1. Notation
+
+| Symbol       | Meaning |
+|--------------|---------|
+| `byte`       | An unsigned 8-bit integer (octet). |
+| `BE32(v)`    | A 32-bit value stored in big-endian byte order. |
+| `BE64(v)`    | A 64-bit value stored in big-endian byte order. |
+| `+`          | Concatenation of byte sequences. |
+
+All multi-byte numeric values are stored in **big-endian** byte order.
+
+---
+
+## 2. Magic Number
+
+Every compressed buffer begins with the 4-byte magic value:
+
+```
+0xb7756362
+```
+
+Stored big-endian. This corresponds to the ASCII bytes `.ucb`.
+
+---
+
+## 3. Header Layout (64 bytes)
+
+The header is a fixed 64-byte structure at offset 0:
+
+| Offset | Field              | Type     | Size | Description |
+|--------|--------------------|----------|------|-------------|
+| 0      | Magic              | uint32   | 4    | `0xb7756362` (big-endian) |
+| 4      | Crc32              | uint32   | 4    | CRC-32 of header bytes 8..63 (polynomial `0x04c11db7`) |
+| 8      | Method             | uint8    | 1    | Compression method (see below) |
+| 9      | Compressor         | uint8    | 1    | Method-specific compressor ID |
+| 10     | CompressionLevel   | uint8    | 1    | Method-specific compression level |
+| 11     | BlockSizeExponent  | uint8    | 1    | Block size as a power of two: `BlockSize = 1 << BlockSizeExponent` |
+| 12     | BlockCount         | uint32   | 4    | Number of compressed blocks |
+| 16     | TotalRawSize       | uint64   | 8    | Total uncompressed data size in bytes |
+| 24     | TotalCompressedSize| uint64   | 8    | Total buffer size including header |
+| 32     | RawHash            | byte[32] | 32   | BLAKE3 hash of the uncompressed data |
+
+### Header CRC-32
+
+The `Crc32` field covers bytes 8 through 63 of the header (56 bytes). Readers should verify
+this checksum before trusting any other header field.
+
+---
+
+## 4. Compression Methods
+
+### Method 0: None (Uncompressed)
+
+Data is stored without compression. Used as a fallback when compression would increase size.
+
+**Compressor**: Ignored (0).
+
+**Layout**:
+
+```
+[Header (64 bytes)] [Raw Data]
+```
+
+`TotalCompressedSize = 64 + TotalRawSize`. There is no block size array; the payload is a
+single uncompressed span.
+
+### Method 3: Oodle
+
+Block-based compression using Oodle. The `Compressor` field selects the algorithm:
+
+| Value | Compressor |
+|-------|------------|
+| 1     | Selkie     |
+| 2     | Mermaid    |
+| 3     | Kraken     |
+| 4     | Leviathan  |
+
+`CompressionLevel` maps to Oodle compression levels (typically -4 through +8, from
+HyperFast4 to Optimal4). The default compressor is Mermaid.
+
+### Method 4: LZ4
+
+Block-based compression using LZ4. `Compressor` and `CompressionLevel` are method-specific.
+
+---
+
+## 5. Block-Based Layout (Methods 3, 4)
+
+For block-based methods the data following the header is structured as:
+
+```
+[Header (64 bytes)]
+[Block Size Array: BlockCount x BE32]
+[Compressed Block 0]
+[Compressed Block 1]
+...
+[Compressed Block N-1]
+```
+
+### Block Size Array
+
+Immediately after the header at offset 64. Each entry is a `BE32` giving the **compressed
+size** of the corresponding block. Total metadata size: `BlockCount * 4` bytes.
+
+Compressed block data begins at offset `64 + BlockCount * 4`.
+
+### Block Sizing
+
+- All blocks except the last decompress to `1 << BlockSizeExponent` bytes (default: 256 KB,
+  exponent 18).
+- The last block decompresses to `TotalRawSize - (BlockCount - 1) * BlockSize` bytes.
+- If a block's compressed size equals or exceeds its raw size, the block is stored
+  **uncompressed** (the raw bytes are used directly).
+
+### Total Size Invariant
+
+```
+TotalCompressedSize = 64 + BlockCount * 4 + sum(CompressedBlockSize[i] for i in 0..BlockCount-1)
+```
+
+---
+
+## 6. Decompression
+
+1. **Read header** at offset 0 and verify the magic number.
+2. **Verify CRC-32** over bytes 8..63.
+3. **Dispatch on Method**:
+   - Method 0: Copy `TotalRawSize` bytes starting at offset 64.
+   - Methods 3/4: Continue with block-based decompression.
+4. **Read block size array** (`BlockCount` x `BE32` at offset 64).
+5. **Decompress each block** sequentially:
+   - If `CompressedBlockSize[i] < RawBlockSize[i]`, decompress using the indicated method.
+   - Otherwise, copy the block data verbatim.
+6. **Optionally verify** the BLAKE3 hash of the reassembled raw data against `RawHash`.
+
+### Random-Access Decompression
+
+Because blocks are independent, a reader can decompress an arbitrary byte range by:
+
+1. Computing the first and last block indices that overlap the range.
+2. Summing compressed block sizes to seek to the correct offset.
+3. Decompressing only the required blocks.
+4. Trimming the first and last block outputs to the requested range.
+
+---
+
+## 7. Range Extraction
+
+A compressed buffer can be sliced into a sub-range without full decompression. The result is
+a new compressed buffer whose blocks are a subset of the original:
+
+1. Compute the first and last block indices covering the requested raw range.
+2. Emit a new 64-byte header with updated `BlockCount`, `TotalRawSize`, and
+   `TotalCompressedSize`. The `RawHash` is zeroed (not recalculated for sub-ranges).
+3. Copy the corresponding entries from the block size array.
+4. Reference or copy the compressed block data for the selected blocks.
+
+This enables efficient sub-range serving without decompressing and recompressing.
+
+---
+
+## 8. Constants
+
+| Name              | Value        | Description |
+|-------------------|--------------|-------------|
+| Magic             | `0xb7756362` | Header magic number |
+| HeaderSize        | 64           | Fixed header size in bytes |
+| DefaultBlockSize  | 262144       | Default raw block size (256 KB, exponent 18) |
author	Stefan Boberg <[email protected]>	2026-03-18 11:19:10 +0100
committer	GitHub Enterprise <[email protected]>	2026-03-18 11:19:10 +0100
commit	eba410c4168e23d7908827eb34b7cf0c58a5dc48 (patch)
tree	3cda8e8f3f81941d3bb5b84a8155350c5bb2068c /docs
parent	bugfix release - v5.7.23 (#851) (diff)
download	zen-eba410c4168e23d7908827eb34b7cf0c58a5dc48.tar.xz zen-eba410c4168e23d7908827eb34b7cf0c58a5dc48.zip