diff options
| author | Stefan Boberg <[email protected]> | 2026-03-18 11:19:10 +0100 |
|---|---|---|
| committer | GitHub Enterprise <[email protected]> | 2026-03-18 11:19:10 +0100 |
| commit | eba410c4168e23d7908827eb34b7cf0c58a5dc48 (patch) | |
| tree | 3cda8e8f3f81941d3bb5b84a8155350c5bb2068c /docs/specs/CompressedBuffer.md | |
| parent | bugfix release - v5.7.23 (#851) (diff) | |
| download | zen-eba410c4168e23d7908827eb34b7cf0c58a5dc48.tar.xz zen-eba410c4168e23d7908827eb34b7cf0c58a5dc48.zip | |
Compute batching (#849)
### Compute Batch Submission
- Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads
- Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure
- Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps`
### Retracted Action State
- Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount`
- Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession`
- Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints
- Add state machine documentation and per-state comments to `RunnerAction`
### Compute Race Fixes
- Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely
- Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK
### Logging Optimization and ANSI improvements
- Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex`
- Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage`
- Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h`
- Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences
- Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files
### CLI / Exec Refactoring
- Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct
- Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`)
- Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser
### Testing Improvements
- Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter
- Add test suite banners to test listener output
- Made `function.session.abandon_pending` test more robust
### Startup / Reliability Fixes
- Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()`
- Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing
- Fail on unrecognized zenserver `--mode` instead of silently defaulting to store
### Other
- Show host details (hostname, platform, CPU count, memory) when discovering new compute workers
- Move frontend `html.zip` from source tree into build directory
- Add format specifications for Compact Binary and Compressed Buffer wire formats
- Add `WriteCompactBinaryObject` to zencore
- Extended `ConsoleTui` with additional functionality
- Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support
- Disable compute/horde/nomad in release builds (not yet production-ready)
- Disable unintended `ASIO_HAS_IO_URING` enablement
- Fix crashpad patch missing leading whitespace
- Clean up code triggering gcc false positives
Diffstat (limited to 'docs/specs/CompressedBuffer.md')
| -rw-r--r-- | docs/specs/CompressedBuffer.md | 185 |
1 files changed, 185 insertions, 0 deletions
diff --git a/docs/specs/CompressedBuffer.md b/docs/specs/CompressedBuffer.md new file mode 100644 index 000000000..11787e3e9 --- /dev/null +++ b/docs/specs/CompressedBuffer.md @@ -0,0 +1,185 @@ +# Compressed Buffer Format Specification + +**Version:** 1.0 + +## Overview + +Compressed Buffer is a self-describing binary container for compressed data. It encodes the +compression method, block layout, and integrity checksums so that a reader can decompress the +payload without any external metadata. + +Key design goals: + +- **Self-describing** -- decompression requires no out-of-band knowledge of the compression method or original size +- **Block-based** -- data is split into independently-decompressible blocks for random access and parallel processing +- **Integrity-checked** -- CRC-32 on the header and BLAKE3 hash on the raw data +- **Method-agnostic** -- supports multiple compression backends (None, Oodle, LZ4) + +## 1. Notation + +| Symbol | Meaning | +|--------------|---------| +| `byte` | An unsigned 8-bit integer (octet). | +| `BE32(v)` | A 32-bit value stored in big-endian byte order. | +| `BE64(v)` | A 64-bit value stored in big-endian byte order. | +| `+` | Concatenation of byte sequences. | + +All multi-byte numeric values are stored in **big-endian** byte order. + +--- + +## 2. Magic Number + +Every compressed buffer begins with the 4-byte magic value: + +``` +0xb7756362 +``` + +Stored big-endian. This corresponds to the ASCII bytes `.ucb`. + +--- + +## 3. Header Layout (64 bytes) + +The header is a fixed 64-byte structure at offset 0: + +| Offset | Field | Type | Size | Description | +|--------|--------------------|----------|------|-------------| +| 0 | Magic | uint32 | 4 | `0xb7756362` (big-endian) | +| 4 | Crc32 | uint32 | 4 | CRC-32 of header bytes 8..63 (polynomial `0x04c11db7`) | +| 8 | Method | uint8 | 1 | Compression method (see below) | +| 9 | Compressor | uint8 | 1 | Method-specific compressor ID | +| 10 | CompressionLevel | uint8 | 1 | Method-specific compression level | +| 11 | BlockSizeExponent | uint8 | 1 | Block size as a power of two: `BlockSize = 1 << BlockSizeExponent` | +| 12 | BlockCount | uint32 | 4 | Number of compressed blocks | +| 16 | TotalRawSize | uint64 | 8 | Total uncompressed data size in bytes | +| 24 | TotalCompressedSize| uint64 | 8 | Total buffer size including header | +| 32 | RawHash | byte[32] | 32 | BLAKE3 hash of the uncompressed data | + +### Header CRC-32 + +The `Crc32` field covers bytes 8 through 63 of the header (56 bytes). Readers should verify +this checksum before trusting any other header field. + +--- + +## 4. Compression Methods + +### Method 0: None (Uncompressed) + +Data is stored without compression. Used as a fallback when compression would increase size. + +**Compressor**: Ignored (0). + +**Layout**: + +``` +[Header (64 bytes)] [Raw Data] +``` + +`TotalCompressedSize = 64 + TotalRawSize`. There is no block size array; the payload is a +single uncompressed span. + +### Method 3: Oodle + +Block-based compression using Oodle. The `Compressor` field selects the algorithm: + +| Value | Compressor | +|-------|------------| +| 1 | Selkie | +| 2 | Mermaid | +| 3 | Kraken | +| 4 | Leviathan | + +`CompressionLevel` maps to Oodle compression levels (typically -4 through +8, from +HyperFast4 to Optimal4). The default compressor is Mermaid. + +### Method 4: LZ4 + +Block-based compression using LZ4. `Compressor` and `CompressionLevel` are method-specific. + +--- + +## 5. Block-Based Layout (Methods 3, 4) + +For block-based methods the data following the header is structured as: + +``` +[Header (64 bytes)] +[Block Size Array: BlockCount x BE32] +[Compressed Block 0] +[Compressed Block 1] +... +[Compressed Block N-1] +``` + +### Block Size Array + +Immediately after the header at offset 64. Each entry is a `BE32` giving the **compressed +size** of the corresponding block. Total metadata size: `BlockCount * 4` bytes. + +Compressed block data begins at offset `64 + BlockCount * 4`. + +### Block Sizing + +- All blocks except the last decompress to `1 << BlockSizeExponent` bytes (default: 256 KB, + exponent 18). +- The last block decompresses to `TotalRawSize - (BlockCount - 1) * BlockSize` bytes. +- If a block's compressed size equals or exceeds its raw size, the block is stored + **uncompressed** (the raw bytes are used directly). + +### Total Size Invariant + +``` +TotalCompressedSize = 64 + BlockCount * 4 + sum(CompressedBlockSize[i] for i in 0..BlockCount-1) +``` + +--- + +## 6. Decompression + +1. **Read header** at offset 0 and verify the magic number. +2. **Verify CRC-32** over bytes 8..63. +3. **Dispatch on Method**: + - Method 0: Copy `TotalRawSize` bytes starting at offset 64. + - Methods 3/4: Continue with block-based decompression. +4. **Read block size array** (`BlockCount` x `BE32` at offset 64). +5. **Decompress each block** sequentially: + - If `CompressedBlockSize[i] < RawBlockSize[i]`, decompress using the indicated method. + - Otherwise, copy the block data verbatim. +6. **Optionally verify** the BLAKE3 hash of the reassembled raw data against `RawHash`. + +### Random-Access Decompression + +Because blocks are independent, a reader can decompress an arbitrary byte range by: + +1. Computing the first and last block indices that overlap the range. +2. Summing compressed block sizes to seek to the correct offset. +3. Decompressing only the required blocks. +4. Trimming the first and last block outputs to the requested range. + +--- + +## 7. Range Extraction + +A compressed buffer can be sliced into a sub-range without full decompression. The result is +a new compressed buffer whose blocks are a subset of the original: + +1. Compute the first and last block indices covering the requested raw range. +2. Emit a new 64-byte header with updated `BlockCount`, `TotalRawSize`, and + `TotalCompressedSize`. The `RawHash` is zeroed (not recalculated for sub-ranges). +3. Copy the corresponding entries from the block size array. +4. Reference or copy the compressed block data for the selected blocks. + +This enables efficient sub-range serving without decompressing and recompressing. + +--- + +## 8. Constants + +| Name | Value | Description | +|-------------------|--------------|-------------| +| Magic | `0xb7756362` | Header magic number | +| HeaderSize | 64 | Fixed header size in bytes | +| DefaultBlockSize | 262144 | Default raw block size (256 KB, exponent 18) | |