aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* sessions: persist to disk, prune, track client liveness, accept UE_LOGFMT ↵HEADmainStefan Boberg2026-05-0572-735/+7048
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1014) Branch started as a sessions-service overhaul (persistence, client liveness, UE_LOGFMT intake) and grew to pick up adjacent infrastructure work: an early-startup log backlog, a hardened `MemoryArena`, the `zen trace serve` viewer gaining a counter view + compact timeline + tabbed callsite panel, defensive fixes in the third-party `tourist` trace parser, a series of allocation reductions across the HTTP and compact-binary hot paths, and a new `zen sessions` CLI command tree. ## Sessions service **Persistence.** Each session lives on disk under `<DataRoot>/sessions/<id>/` as `info.cb` (metadata) plus `log.bin` (length-prefixed CbObject log records). On startup the service scans that directory and loads prior sessions as ended sessions, preloading the tail of each log so historical views work after a restart. `SessionLog` is noexcept-constructed and falls back to a disabled state on disk errors, so a bad disk can't take down `RegisterSession`. `GetSession` falls back to the ended-sessions list (fixes historical log fetches over HTTP). `LoadTail` counts only successfully-parsed records. **Pruning.** Periodic cleanup task drops ended sessions once any of three caps is exceeded: age (default 1 year), count (default 1000), or total on-disk footprint (default 50 MiB). Runs 30 s after startup, hourly thereafter. Active sessions never pruned; disk removal and directory stat happen outside the exclusive lock so a slow filesystem can't stall lookups. **Client liveness.** Sessions carry a `ProcessHandle` for the client-reported pid, captured at registration time so Windows pid recycling can't produce false positives. A 30 s asio timer probes liveness and ends dead sessions through the normal remove path, producing a synthetic `Session ended: process exited (...)` line persisted to `log.bin`. Windows decodes common NTSTATUS exit codes to human names (Ctrl-C, access violation, stack overflow, ...); POSIX stays at plain `process exited`. Clients auto-fill `ClientPid` only for local targets (unix socket / loopback); the server defensively accepts pids only from `IsLocalMachineRequest()` peers. zenserver also reports its own pid when registering its self-session, so it shows up with a real pid in the dashboard and `zen sessions ls`. **Synthetic end-of-session line.** `RemoveSession` takes an optional reason; before the session moves to the ended list it appends an Info-level `Session ended[: reason]` entry through the normal log path (released outside `m_Lock`). Current reasons: `client request` (HTTP DELETE), `server shutdown` (self-session), `process exited (...)` (liveness). **UE_LOGFMT structured entries.** `POST /sessions/{id}/log` now accepts `{level, logger, format, fields}` alongside the existing `{level, logger, message}` shape. New `logtemplate.{h,cpp}` implements UE's `StructuredLog.cpp` template grammar (field paths with `.name` / `[N]`, `{{`/`}}` escapes, `$text` / `$format` / `$locformat` object conventions, bounded recursion). Renders to a displayable message at intake while persisting raw format + fields so a future UI can drill into fields without another schema bump. Hot path is zero-alloc — renders into `ExtendableStringBuilder<256>` using stack-buffered `Oid::ToString` / `IoHash::ToHexString` overloads. UI shows a `{…}` marker with the raw template + JSON-pretty fields on hover. **Parent sessions.** `SessionInfo` gains `parent_session_id`; hub-managed storage server child processes inherit the hub's session id via `--parent-session=<id>`. `ZEN_SESSIONS_URL` env var becomes a fallback for `--sessions-url` / config when neither is provided. The in-process session log sink is disabled when a remote sessions target is configured (logs flow through `SessionsServiceClient` instead). The sessions UI groups child sessions under their parent (collapsible/expandable, sorts as a unit, supports nesting). **Platform reporting.** `SessionInfo` gains `Platform`, flowed end-to-end: client auto-fills via `GetRuntimePlatformName()`, server persists in `info.cb` (`plat`) and emits on GET. UI renders as a SimpleIcons-style inline SVG (windows / macOS / iOS / linux / wine / android / playstation / xbox / nintendo) with case-insensitive alias resolution (Win32/Win64, PS4/PS5, XSX/XSS, NintendoSwitch, iPhone/iPad, Darwin/OSX). Unknown values fall back to text; sorting runs on the underlying string. **WebSocket log streaming.** Sessions UI moves from 2 s polling to a WebSocket push model. New `WsSubscriber` has a stable id + helper methods. UI caps the log-line DOM at 5 000 entries with a shared cursor-regression helper, factored out of two call sites. Per-broadcast allocations trimmed on the push path; fixed a stack overrun in the WS log broadcast hex-id buffer. **Log memory.** `LogEntry::Level` is now `logging::LogLevel` (1 byte) instead of `std::string` (~32 B) — saves ~310 KB per full 10 k-entry deque and eliminates a per-message allocation in the in-proc sink. On-disk format writes an int32 and accepts either int or legacy string on read. `LogEntry` strings now live in a `MemoryArena`; logger names are interned across the deque. `SessionLog::Append` and `WriteSessionInfoFile` drop their `UniqueBuffer` round-trip and write `CbObject::GetView()` straight through `BasicFile` / `SafeWriteFile`. Multi-entry `POST /log` batched under one lock + one push. **In-proc log timestamps.** `InProcSessionLogSink::TimePointToDateTime` previously preserved only whole seconds, so every in-proc entry rendered at `.000` ms in the dashboard and `zen sessions tail`. It now adds the sub-second part (nanoseconds → 100 ns ticks) to keep ms precision end-to-end. **UI.** Side "Session Details" panel is gone — its info is inline in the table (appname, mode, platform, id, timestamps, this/log pills, active dot). Bottom panel is a tabbed `Log | Metadata` view with a right-side "Session Information" panel beside metadata; log-only controls (filter, newest-first, follow, log-level filter, expand/collapse) hide when Metadata is active, polling keeps running across tab switches. Wide-mode toggle fills the viewport edge-to-edge. Log lines show the logger category; timestamps render in 24 h with zero-padded fields regardless of locale. Sessions list defaults to All / 10 per page / created-desc, gains click-to-sort headers on the full dataset, a header filter box, and a pager aligned to the table's right edge. Duplicate auto-injected `<h1>Sessions</h1>` removed. ## `zen sessions` CLI New command tree on the `zen` client for inspecting the sessions service from the terminal: - **`zen sessions ls`** — lists sessions (active first, ended next; newest-first within each group) with id, status, app/mode, pid, created, duration, and log count. Supports `--status active|ended|all` (default `all`). - **`zen sessions status`** — prints the sessions service summary: self id, active / ended counts, and the read/write/delete/list/request/bad-request counters from `/stats/sessions`. - **`zen sessions tail [session]`** — tails a session's log. With no argument it tails zenserver's own session (resolved via `/sessions/list`'s `self_id`); an explicit 24-hex id targets any session, including ended ones (historical replay). `--lines N` (default 50, 0 = all buffered) trims the initial dump client-side. `--follow` prefers a WebSocket push subscription on `/sessions/ws` for sub-second latency; on upgrade failure (older server, blocked port, unix-socket transport) it falls back to HTTP cursor polling at `--interval-ms` (default 500), with sleeps chunked to 50 ms so Ctrl-C reacts quickly. Output matches `zen::logging::FullFormatter` (`[YY-MM-DD HH:MM:SS.mmm] [lvl] [logger] message`); on a TTY the level is colored and the logger is bold, with continuation lines indented under the message column using the *visible* prefix width. 404 surfaces as `(session ended)` and connection errors as `(server gone)` — both clean exits, so stopping the server mid-tail no longer prints a stack trace. - **`zen sessions ui`** — opens `<host>/dashboard/?page=sessions` in the user's default browser. Rejects unix-socket hosts. A small `ZenServiceClient::IsUnixSocket()` helper now wraps the unix-socket check used by `ui`, `sessions tail` (WS path), and `sessions ui`. ## Logging `BacklogSink` captures early-startup log entries in a fixed-capacity ring so late-attached sinks (session sink, file sink) can replay them. Detaches from the broadcast list when disabled; backed by destructor-only cleanup (no `unique_ptr` indirection per entry). Tuned defaults so the backlog covers typical bring-up without unbounded growth. ## `zen trace serve` viewer - Compact timeline mode for high-density views. - New `TRACE_INT_VALUE` / `TRACE_FLOAT_VALUE` counter trace points + a counters page in the viewer. - Callsite tables collapsed into a single tabbed panel. - Lossless `Oid <-> Guid` bridge for trace session ids; trace `SessionId` plumbed through. - `tourist` parser hardening: bounds-check `BufferStream::read`, validate `Type::info_size` before `patch()`, convert `parse_important_aux` to a loop (avoids deep recursion), widen `ParserPool` index to `uint32`, bounds-check field offsets in the dispatcher, pin `Types::parse` buffer up-front. ## `MemoryArena` Configurable chunk size, inline chunk list, oversize requests routed to truly-dedicated chunks (no slack waste, no fragmentation when one allocation is much larger than the chunk). ## Allocation cleanups across hot paths - `zenhttp::HttpRequestRouter::HandleRequest` and `FormatPackageMessageInternal`: drop heap allocations. - Compact-binary validation: `eastl::fixed_vector` + `eastl::sort`; eliminate `std::vector` churn. - `zenserverprocess`: trim transient allocations in spawn paths. - Sessions HTTP intake / broadcast: drop transient `std::string` allocs.
* hub async s3 client (#1024)Dan Engelbrecht2026-05-0527-1383/+7739
| | | | | | | | | | | | | | - Feature: `AsyncHttpClient` adds cancellable request tokens, streaming GET to a file (`AsyncDownload`), zero-copy chunk-callback GET (`AsyncStream`), pull-mode body source for streaming `AsyncPut`, retry layer mirroring the synchronous client, and a submit-side in-flight cap (`HttpClientSettings::MaxConcurrentRequests`) so hub-scale fanout against a single host cannot stall queued handles into curl's connect-timeout window - Feature: Hub hydration can route S3 transfers through a non-blocking `AsyncHttpClient` (curl_multi + asio) backed by a single io thread; hydrate and dehydrate now pipeline requests instead of blocking worker threads - `--hub-hydration-async-enabled` (Lua: `hub.hydration.async.enabled`, default true) - `--hub-hydration-async-max-concurrent-requests` (Lua: `hub.hydration.async.maxconcurrentrequests`, default `clamp(cpu*4, 128, 512)`) - Feature: Hub provision/deprovision/obliterate now run as two phases on separate worker pools so per-module hydration cannot starve child-process spawn/despawn (and vice versa) - New `--hub-instance-spawn-threads` (Lua: `hub.instance.spawnthreads`, default `clamp(cpu/8, 4, 16)`) drives child-process spawn/despawn - `--hub-instance-provision-threads` (Lua: `hub.instance.provisionthreads`) now drives per-module hydrate/dehydrate scheduling only; default changed from `max(cpu/4, 2)` to `clamp(cpu/8, 4, 12)` - `--hub-hydration-threads` (Lua: `hub.hydration.threads`) now controls per-file workers inside a single hydrate/dehydrate; default changed from `max(cpu/4, 2)` to `clamp(cpu/8, 4, 12)` - Feature: `AsyncHttpClient` owns its `asio::io_context` and one io thread by default; the `(BaseUri, io_context&)` constructor is preserved for callers that want to share an externally-driven `io_context` across clients (caller MUST keep the loop running until the client destructs) - Feature: `Hub::Configuration` C++ struct fields renamed (`OptionalProvisionWorkerPool`/`OptionalHydrationWorkerPool` -> `OptionalProvisionPool`/`OptionalSpawnPool`/`OptionalHydrationPool`). Embedders constructing `Hub` directly must update field names; provision and spawn pools must both be set or both null (asserted at construction). - Bugfix: `S3Client` signing-key cache no longer returns stale signatures after IMDS-rotated credentials change `AccessKeyId`; cache is now keyed on `(DateStamp, AccessKeyId)`
* watchdog ephemeral port exhaust (#1022)Dan Engelbrecht2026-05-047-23/+334
| | | - Improvement: Hub pools HTTP connections to managed instances so provision/deprovision churn no longer exhausts Windows ephemeral ports
* zenhttp improvements (robustness / correctness) (#968)Stefan Boberg2026-05-0444-312/+2785
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A collection of security, correctness, and robustness fixes in `zenhttp` and `zencore` surfaced by security review. Most items are small, independent commits grouped here because they all tighten trust boundaries or fix UB along the same code paths. ## WebSocket protocol hardening (RFC 6455) - **Enforce the client-side mask bit**. Server-side frame loops now reject unmasked frames with close code 1002 per §5.1. Prevents HTTP intermediary smuggling. - **Validate control frames and RSV bits**. Fragmented control frames, oversized (>125 B) control payloads, and any non-zero RSV bit now fail the connection before allocation. - **Lower per-frame payload cap** from 256 MB → 4 MB. Bounds per-connection accumulator memory. - **Implement message fragmentation**. Continuation frames are coalesced and delivered as a single message; interleaved non-control frames close with 1002; assembled messages are capped at 4 MB (1009 on overflow). Previously partial fragments were delivered to handlers, bypassing payload validation. - **Parse the 101 handshake response properly** in `HttpWsClient`. Status-line, `Upgrade`, `Connection`, and `Sec-WebSocket-Accept` are now matched exactly rather than via substring searches against the full body. ## Auth / OIDC hardening - **Constant-time password compare** in `PasswordSecurity::IsAllowed` (closes a remote length/content timing oracle). Adds a shared `ConstantTimeEquals` helper. - **Harden Basic-auth header parsing**: trim trailing LWS, reject control bytes and DEL in the credential. - **OIDC discovery pinning**: require HTTPS (loopback exempt), verify `issuer` matches `BaseUrl`, require `token_endpoint` / `userinfo_endpoint` / `jwks_uri` to share origin with `BaseUrl`, reject empty `token_endpoint`. - **Restrict `POST /auth/oidc/refreshtoken`** to local-machine requests. Previously unauthenticated in default deployments — remote callers could evict or replace cached tokens. - **Stop logging OIDC provider response bodies** on refresh failure (IdPs echo `refresh_token` back in error bodies). - **Drop the unused `IdentityToken` field** from `OidcClient` / `OpenIdToken` so nothing in the tree accidentally trusts an unverified JWT. ## Auth state encryption migration - Add `AesGcm` AEAD primitive (BCrypt / OpenSSL backends, mbedTLS stubbed) and `CryptoRandom::Fill` CSPRNG helper in `zencore/crypto.h`. - Migrate authstate file from AES-256-CBC with a fixed IV to AES-GCM with a fresh 12-byte random nonce per write and the 4-byte `ZEN1` magic bound as AAD. Legacy-CBC files are transparently read once and rewritten in the new format. ## Filesystem / IO robustness - `IoBufferExtendedCore::Materialize` now checks `MAP_FAILED` on POSIX (was comparing to `nullptr`, which let the failure sentinel propagate into later reads and `munmap(MAP_FAILED, ...)`). - `IoBufferBuilder::MakeFromFile / MakeFromTemporaryFile`: close the FD/HANDLE on exception via a dismissable `ScopeGuard`; actually check the `fstat()` return value (previously used an uninitialized `FileSize`). - `ReadFromFileMaybe`: loop short reads, retry `EINTR`, chunk Windows `ReadFile` at `0xFFFFFFFF` bytes (fixes silent truncation of multi-GiB reads). - `WipeDirectory`: compare `FindFirstFileW` handle against `INVALID_HANDLE_VALUE` rather than `nullptr`. - `RemoveFileNative` (Linux/macOS): report non-`ENOENT` stat failures via the `std::error_code` out-param and stop reading `st_mode` after a failed stat. ## Buffer / compression correctness - Avoid per-copy `IoBufferCore` heap allocations in `CompositeBuffer::CopyTo / ViewOrCopyRange` iterators; add fast path for `BufferHeader::Read` when the 64-byte header fits in the first plain-memory segment. - `BufferHeader`: add `IsHeaderValid()` gate covering `BlockSizeExponent` range, `BlockCount * BlockSize` overflow, and `TotalRawSize` bounds before any arithmetic uses them. Defends against attacker-controlled headers that can pass the CRC and trigger OOB writes in `DecompressBlock`.
* Tui picker fixes (#1027)Stefan Boberg2026-05-043-113/+347
| | | | | | | | | | - **Viewport scrolling.** Cap rendered rows to the visible terminal height and track a scroll offset that follows the selection, so long lists no longer overflow the screen and corrupt the cursor-up redraw. Hint shows `[i/N]` when the list exceeds the viewport. - **Single-write frame rendering.** Each frame is built into one `ExtendableStringBuilder` and emitted via `TuiWrite`. On Windows, `TuiWrite` routes through `WriteConsoleW` when stdout is a console, so a frame is one syscall instead of one per `printf` — eliminates the visible per-character repaint. - **All `consoletui` helpers go through `TuiWrite`.** `TuiCursorHome`, `TuiSetScrollRegion`, `TuiResetScrollRegion`, `TuiMoveCursor`, `TuiSaveCursor`, `TuiRestoreCursor`, `TuiEraseLine`, `TuiShowCursor`, and the alternate-screen enter/exit pair now bypass the CRT on Windows consoles, matching the picker. `TuiFlush` remains an unconditional `fflush(stdout)` so callers that mixed `printf` output earlier in a sequence still drain correctly. - **Width detection fix.** `TuiConsoleColumns` now reports the visible window width rather than the screen-buffer width, so labels sized to it don't wrap on legacy cmd.exe configs where the buffer is wider than the window. - **PgUp / PgDn.** Jump by one viewport, clamped to the list ends. `VK_PRIOR` / `VK_NEXT` on Windows; `ESC[5~` / `ESC[6~` on POSIX. - **Terminal resize handling.** Enable `ENABLE_WINDOW_INPUT` on stdin (Windows) and install a `SIGWINCH` handler without `SA_RESTART` (POSIX) so the blocking key read returns a new `ConsoleKey::Resize`. The picker recomputes viewport/label budgets, clears the visible screen, and redraws as a fresh first frame; pre-picker output stays in scrollback. - **Centralized label truncation.** The picker truncates item labels to fit the current terminal width (cols minus the 3-column indicator), walking back to a UTF-8 codepoint boundary so multi-byte sequences are never split. The hand-rolled width-aware truncation in `history_cmd::BuildLabel` and `ui_cmd` is removed; callers hand the picker the full label and let it clip.
* zen CLI: project-* commands → 'project <sub>' subcommands (#1026)Stefan Boberg2026-05-045-191/+257
| | | | | - Refactors the five `project-*` top-level commands into a `project <sub>` subcommand structure, mirroring the existing `cache <sub>` pattern. New surface: `project create | drop | info | op-details | stats`. - Legacy `project-create`, `project-drop`, `project-info`, `project-op-details`, `project-stats` remain functional as hidden deprecated shims that forward through `project_legacy_shim::RunAs`, so existing scripts (e.g. `scripts/test_scripts/oplog-import-export-test.py`) keep working unchanged.
* Oplog commands -> oplog subcommands (#1025)Stefan Boberg2026-05-044-632/+729
| | | | | - Consolidates the seven `oplog-*` top-level commands into a single `zen oplog <sub>` command tree, mirroring the cache refactor and PR #1026's `project <sub>` work. New surface: `oplog create | export | import | snapshot | mirror | validate | download`. - Legacy `oplog-create`, `oplog-export`, `oplog-import`, `oplog-snapshot`, `oplog-mirror`, `oplog-validate`, `oplog-download` remain functional as hidden deprecated aliases that forward through `oplog_legacy_shim::RunAs`, so existing scripts keep working.
* download spec part name fix (#1021)Dan Engelbrecht2026-04-301-1/+1
| | | * don't set default build part name if download spec is given
* Change builds ls command to default to all parts (#1019)Zousar Shaker2026-04-293-27/+29
| | | For backwards compatibility, `builds ls` retains past behavior of listing all parts, but allow both `builds download` and `builds prime-cache` to use the new standard of only operating on the "default" part.
* GetEnvVariable: return std::optional<std::string> (#1017)Stefan Boberg2026-04-2713-58/+83
| | | | | | | - `GetEnvVariable` now returns `std::optional<std::string>` so callers can distinguish an unset variable from one set to an empty value. - Windows path uses `SetLastError(ERROR_SUCCESS)` + `ERROR_ENVVAR_NOT_FOUND` to detect "not found"; POSIX path returns `nullopt` when `getenv` returns `nullptr`. - All call sites migrated. Most use `.value_or("")` to preserve current empty-or-unset behavior. The diagnostic helpers in `zen-test/artifactprovider-tests.cpp` now report `<unset>` vs `<empty>` distinctly. - Added a check in the `ExpandEnvironmentVariables` test confirming `nullopt` for an unset variable; PATH/HOME lookups in that test use `REQUIRE(has_value())` so a missing var fails cleanly instead of throwing `bad_optional_access`.
* Zs/user path case comparison (#1015)Zousar Shaker2026-04-276-20/+45
| | | - Improvement: `zen builds` `--exclude-folders` and `--exclude-extensions` values now match paths case-insensitively and tolerate surrounding whitespace between separators
* hydration with pack (#1016)Dan Engelbrecht2026-04-2712-516/+2063
| | | | | | | | | | | | | | | - Feature: Hub hydration packs small files into raw CAS pack blobs to reduce request count for modules dominated by tiny metadata files - `--hub-hydration-enable-pack` (Lua: `hub.hydration.enablepack`, default true) - `--hub-hydration-pack-threshold-bytes` (Lua: `hub.hydration.packthresholdbytes`, default 256 KiB) - `--hub-hydration-max-pack-bytes` (Lua: `hub.hydration.maxpackbytes`, default 4 MiB) - Feature: Hub hydration and dehydration can be disabled per direction - `--hub-enable-hydration` (Lua: `hub.enablehydration`, default true) - `--hub-enable-dehydration` (Lua: `hub.enabledehydration`, default true) - Feature: Hub hydration accepts a configurable file exclude list via `HydrationOptions` `excludes` (array of wildcards). Built-in defaults skip transient runtime files (`.lock`, `.sentry-native/*`, `state_marker`, `*.bak`, `gc/reserve.gc`, `auth/*`) so they no longer participate in dehydrate scans. Override semantics: a present field replaces the default outright; explicit `[]` opts out of all defaults. - Improvement: Hub hydration completion logs now report per-request average and max latency, peak in-flight workers, queue wait, and hash-cache hit percentage; loose and pack-blob transfers are reported separately - Improvement: Hub hydration pre-creates unique parent directories before scheduling parallel writes - Improvement: S3 hydration retries transient HTTP failures (timeouts, 429 throttling, 5xx server errors, connection errors) up to 3 times via the HTTP client retry layer - Improvement: S3 hydration multipart chunk size is persisted in `state.cbo` per module so hydrate replays the partitioning used at dehydrate; default raised to 64 MiB (was 32 MiB) - Improvement: Hub hydration `Obliterate` retries backend delete once before falling back to local cleanup
* fix crash when scavenging sequences or copying local chunks (#1013)Dan Engelbrecht2026-04-243-2/+459
| | | * fix crash when scavenging sequences or copying local chunks
* trace: declare Region event name fields as AnsiString (#1012)Stefan Boberg2026-04-231-6/+4
| | | | | | | | | | | | | | | | | | RegionName and Category on Misc.RegionBeginWithId were declared as uint8[] — a byte array with no Field_String class flag. UE Insights' FEventData::GetString() explicitly requires Field_String and returns false otherwise, so Insights analyzers that check(GetString(...)) fire when reading zen traces. Upstream UE declares these fields as WideString; zen's source strings are std::string_view, so AnsiString is the natural fit and the wire bytes are unchanged (same Field_8 aux stream — only the schema class bit differs). Insights' FString GetString variant accepts either ANSI or WIDE, so analyzers work without change. Zen's own tourist-based analyzer in src/zen/trace/trace_model.cpp reads raw aux bytes via Array<uint8[]> regardless of the schema tag, and its DecodeRegionName already handles both 1-byte and 2-byte widths, so it's unaffected.
* hub execution stats (#1011)Dan Engelbrecht2026-04-224-264/+488
| | | | | - Improvement: Hub hydration and dehydration completion logs now include per-phase wall time, bytes transferred, bits/s throughput, number of unique worker threads used, and the storage source/target URI - Improvement: Hub storage server instance lifecycle logs now report elapsed time for spawn and shutdown - Improvement: Hub deprovisioning now logs GC completion status and elapsed time; a GC that does not complete within the 5s deadline is logged as a warning and shutdown proceeds anyway
* fix consul test timeout (#1010)Dan Engelbrecht2026-04-223-11/+24
| | | - Improvement: Hub Consul client HTTP timeout defaults raised to 1s connect / 2s total so transient latency to a slow Consul agent no longer fails registration calls
* BlockStore: fix correctness issues in block storage layer (#996)Stefan Boberg2026-04-221-32/+47
| | | | | | | | 1. **Assert invariant in `RemoveActiveWriteBlock`** — `erase(std::find(...))` was UB if the invariant ever broke. Now asserts the iterator before erasing. 2. **Single atomic delta in `SetMetaData`** — was `+= new; -= old` as two atomic ops, briefly inflating `TotalSize()` for concurrent readers. Collapsed into one `fetch_add`. 3. **Consistent `IncludeBlocks` / `IncludeBlock`** — `IncludeBlocks` asserted on duplicate keys while `IncludeBlock` silently skipped. Made both tolerant; also made the `reserve` call additive so a second call can't shrink the capacity request. 4. **Replace `operator[]` reads with `find` on `m_ChunkBlocks`** — `tsl::robin_map::operator[]` default-inserts; several read-intent lookups could produce ghost null entries if invariants broke (especially on compaction rollback paths). 5. **Bound `GetChunk` against actual file size** — `m_IoBuffer.GetSize()` is the mapped capacity (block size, e.g. 256 MiB), not written bytes. Requests inside the mapped region but past the real EOF returned views over zero-filled memory. Now bounds against `FileSize()`.
* Zen-style trace log events (#1006)Stefan Boberg2026-04-228-135/+947
| | | | | | | | | | | | Replaces the old (not fully implemented) UE `Logging.*` sink with a typed `ZenLog.*` trace path that preserves structured fmt args end-to-end, so the zen trace analyzer (and future consumers) can re-render log messages with full formatter support. - Hook `Logger::Log` to tap `fmt::format_args` before `vformat` renders them, and emit three new events on a dedicated `ZenLogChannel`: `Category`, `MessageSpec`, `Message`. Args are serialized as `[count][descriptors][payload]` with distinct categories for bool, int, float, and string. Custom formatters fall back to a pre-rendered string. - Bool has its own wire category so `{}` renders as `true`/`false` and `{:d}` as `1`/`0`. - Zen `LogLevel` is translated to UE `ELogVerbosity` on emit so severity filtering works consistently. - Extend the zen trace analyzer to decode `ZenLog.*` via `fmt::vformat` + `dynamic_format_arg_store` — nested widths, chrono specs, etc. all work. Strings are passed as views directly from the event payload (which outlives the format call) rather than copied through a pool. - Retire the old `TraceSink` stub; the typed path supersedes it. - Switch `--trace=default` alias from `cpu,log` to `cpu,zenlog`. - Add `__int128` overloads to the arg encoder guarded by `FMT_USE_INT128` so fmt's int128 dispatch resolves unambiguously on clang/gcc. MSVC and clang-cl are unaffected.
* dashboard prominent version (#1009)Dan Engelbrecht2026-04-222-2/+31
| | | - Improvement: Dashboard banner displays the zenserver version next to the wordmark
* hub provision acceptable states (#1008)Dan Engelbrecht2026-04-221-0/+2
| | | - Bugfix: Hub provision requests now return 202 Accepted when the module is `Recovering` or `Waking` instead of rejecting
* improve zen folder consistency and creation/cleanup (#1007)Dan Engelbrecht2026-04-227-96/+181
| | | | | | | - Improvement: `zen builds` zen-folder handling is now consistent per subcommand - `list-namespaces`, `list`, `list-blocks`, `ls`: no local scratch folder is created; responses stay in memory - `upload`, `fetch-blob`, `prime-cache`, `validate-part`: default to `<cwd>/.zen` (no change) - `download`: default to `<local-path>/.zen` (no change) - Bugfix: `zen builds ls` no longer fails against cloud build storage (`--host`/`--url`) when `--storage-path` is not supplied
* zen CLI: suggest similar commands on typos (#1000)Stefan Boberg2026-04-225-1/+346
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Surface "did you mean?" suggestions when the `zen` CLI is invoked with an unknown command or subcommand, so users don't have to dig through `zen --help` every time they mistype. ``` $ zen stauts Unknown command specified: 'stauts' The most similar commands are: status Run 'zen --help' for the full list of commands. ``` ``` $ zen cache statz Unknown subcommand: 'statz' The most similar subcommands are: stats ``` ## Algorithm - Damerau-Levenshtein edit distance with case-insensitive ASCII comparison — handles insertions, deletions, substitutions, and adjacent transpositions (e.g. `versoin` → `version`). - Small prefix-match bonus so short inputs like `ca` still surface longer commands like `cache` without having to relax the distance threshold to the point where it admits noise. - Distance threshold scales with input length (`clamp(len/2, 1, 3)`). Very short inputs rely on the prefix bonus; longer inputs tolerate up to three edits. - Top 5 results by distance, stable-sorted. - Hidden commands (deprecated shims like `cache-stats`) are excluded from the candidate set so we don't advertise them.
* fix NamedEvent ftok() race on Linux/macOS (#1001)Stefan Boberg2026-04-221-4/+8
| | | | | - `ftok()` internally re-`stat()`s the path and fails with `ENOENT` if another owner's destructor unlinks the backing file between our `open()` and `ftok()`; the held fd does not protect against this - derive the IPC key via `fstat()` on the fd instead, using the same `(ino & 0xffff) | ((dev & 0xff) << 16) | (proj << 24)` formula that glibc and macOS `ftok()` compute internally - fixes intermittent "Failed to create an SysV IPC key" failures on macOS, where the slower on-disk /tmp widens the race window
* chunk-size -> chunksize for lua config (#1004)Dan Engelbrecht2026-04-211-1/+1
|
* Fix Windows service shutdown signalling (#999)Stefan Boberg2026-04-217-35/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stopping the zenserver Windows service (via `sc stop`, `zen service stop`, system shutdown, or any other SCM path) was being ignored. SCM would eventually force-kill the process after its timeout, giving an ungraceful shutdown. ## Root cause PR #751 ("add simple http client tests", c37421a3b) restructured each HTTP server's `OnRun` loop from ```cpp do { m_ShutdownEvent.Wait(WaitTimeout); } while (!IsApplicationExitRequested()); ``` to ```cpp do { ShutdownRequested = m_ShutdownEvent.Wait(WaitTimeout); } while (!ShutdownRequested); ``` That was well-intentioned — tests wanted to start/stop an HTTP server without touching global process state — but the old loop was the only thing that turned `RequestApplicationExit()` into an actual server wake-up. Once it was removed, `RequestApplicationExit(0)` was silently downgraded to "just sets a flag". The `WindowsService::SvcCtrlHandler` stop path was calling exactly that, so SCM stops stopped working. The sponsor-process check path kept working only because it *also* calls `m_Http->RequestExit()` via `ZenServerBase::RequestExit()`. ## Fix - Restore `IsApplicationExitRequested()` as a secondary exit condition in each HTTP server's `OnRun` loop (`httpsys`, `httpasio`, `httpmulti`, `httpnull`, `httpplugin`) alongside the per-server `m_ShutdownEvent` that #751 introduced. Preserves #751's goal — tests can still call `server->RequestExit()` without touching global state — while making `RequestApplicationExit()` wake the server up again, which the rest of the codebase and `SvcCtrlHandler` assume. - Clean up the service control handler in the same pass: also accept `SERVICE_CONTROL_SHUTDOWN`, report `STOP_PENDING` with a 30s `dwWaitHint` (was 0), drop the redundant second `ReportSvcStatus` call, and remove `ghSvcStopEvent` which nothing ever `Wait()`-ed on. - Advertise `SERVICE_ACCEPT_STOP | SERVICE_ACCEPT_SHUTDOWN` while running; drop controls while stop-pending/stopped. - Make `WindowsService` destructor virtual (latent UB given `Run()` was already virtual).
* filesystem.h surface error codes (#998)Dan Engelbrecht2026-04-2111-163/+181
| | | - Improvement: File copy, scan, clone, and move operations now report the underlying OS error in failure messages
* improved s3 hydration (#997)Dan Engelbrecht2026-04-2111-1184/+1287
| | | | | | | | | - Improvement: Hub shares a single S3 client and IMDS credential provider across all modules, reducing IMDS load and surviving transient IMDS blips during bulk provisioning - Improvement: Hub validates hydration config at startup; bad `--hub-hydration-target-spec` or `--hub-hydration-target-config` now fails `zen hub` at boot instead of per-module at first hydrate - Improvement: S3 hydration multipart chunk size configurable via `settings.chunk-size` (default 32 MiB) - Improvement: S3 client extracts `<Error><Code>` and `<Message>` from XML error bodies (previously logged as `<unhandled content format>`) - Improvement: S3 client fails fast with a "no credentials available" error when AWS credentials are missing, instead of sending an unsigned request that S3 rejects with a generic 400 - Improvement: IMDS credential provider retries transient connection failures (up to 3 attempts with backoff) - Improvement: HTTP clients with `RetryCount > 0` also retry on `CURLE_COULDNT_CONNECT`
* zen CLI security review fixes (#974)Stefan Boberg2026-04-2113-104/+789
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Security review follow-ups to the `zen` CLI. Each fix stands on its own commit. Grouped by category below. ## Credentials and secrets - **Per-install random auth encryption key instead of a hardcoded literal.** The default AES key and IV used to encrypt persisted OIDC refresh tokens / OAuth client secrets were ASCII literals compiled into the public source. Replaced with 32+16 random bytes persisted to `<system-root>/auth/machinekey.dat`. `SecureRandomBytes` added in zencore/crypto wrapping BCryptGenRandom / OpenSSL / mbedTLS CTR_DRBG. Partial override (only one of `--encryption-aes-key`/`--encryption-aes-iv`) is now rejected instead of silently using the hardcoded half. - **Wrap the machine key with OS-protected storage.** `machinekey.dat` is now a tagged format (4-byte magic + flags + wrapped-or-raw payload). Windows wraps via DPAPI (`CryptProtectData` at per-user scope) so a stolen disk copy cannot decrypt without the OS master key. macOS uses Keychain Services (GenericPassword under `org.unrealengine.zen.auth`, `kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly`). Linux uses libsecret (opt-in via `--zenlibsecret=yes`, off by default because headless servers typically have no Secret Service daemon). All platforms fall back to raw persistence with `0600` perms on POSIX when wrapping is unavailable. Legacy files from the prior commit are detected by size and still read. > Note: argv-redaction before Sentry on crash was previously part of this PR but was superseded by `ScrubSensitiveValues()` from #989; this PR now just calls that helper instead of walking argv itself. ## Path traversal - **Reject unsafe filenames from the remote oplog in `oplog-mirror`.** The filename from each oplog entry was joined to the mirror root without normalisation; a compromised remote could use drive letters, UNC shares, device path prefixes, absolute paths, or `..` components to write anywhere the zen user could write. An `UnsafeFileNameReason` check runs immediately after extraction, logs the offending filename, and aborts the mirror. - **Use the resolved absolute download-spec path in `builds download`.** `--download-spec-path` was computed into a sanitised absolute path, then the original unresolved path was passed to `ParseBuildManifest`, bypassing the `MakeSafeAbsolutePath` mitigations and reading from the process cwd rather than `--local-path`. ## Input validation - **Stop asserting on malformed `--build-id` / `--build-part-id`.** `Oid::FromHexString` asserts on bad input and `ZEN_ASSERT` is active in release, so a too-short or non-hex user value aborted the process instead of surfacing an `OptionParseException`. Routed all callers through `TryFromHexString`. Also fixes `ParseBuildPartId` reporting errors under the wrong option name. - **Check the JSON parse error in `oplog-export --builds-metadata-path`.** The single-arg `LoadCompactBinaryFromJson` overload discarded the parser error; malformed JSON shipped a truncated compact-binary `metadata` field to the server with no indication. Switched to the two-arg overload and throws a descriptive error naming the file and reason. - **Format the actual value in the malformed `--url` error.** The message was constructed with a literal `{}` placeholder and no `fmt::format` call, so users saw the placeholder instead of the offending URL. - **Require `--output-path` in `cache get` unless `--as-text` is set.** Previously an empty path auto-filled from the value key / attachment hash and wrote into the process cwd; the `--as-text && empty path` stdout branch was unreachable because the auto-fill ran first. - **Clear the cxxopts `allow_unrecognised_options` flag after permissive parse.** `ParseOptionsPermissive` set the flag on the Options it received and never cleared it, priming that Options for silent typo acceptance on any later reuse. Added `disallow_unrecognised_options()` to the vendored cxxopts (local patch — flagged at the declaration) and wrapped the toggle in RAII. ## Resource lifecycle - **Restore signal handlers via RAII.** `wipe`, `builds`, and `oplog-mirror` installed SIGINT/SIGBREAK handlers with raw `signal()` and never restored them; an option-parse throw left the handler targeting an abort flag nothing reads. Added `zen::ScopedSignalHandler` in zen.h and applied at all three sites (builds uses `std::optional` members so the guards survive past `OnParentOptionsParsed` into the subcommand's Run). - **Route SIGINT in `oplog-mirror` to the worker-pool abort flag.** The command declared a local `std::atomic<bool> AbortFlag` but no handler targeted it — Ctrl-C killed the process instead of cleanly aborting. Added a `MirrorAbortFlag` / `MirrorSignalCallbackHandler` pair in projectstore_impl and bound the local as a reference; existing `.store`/`.load`/capture sites unchanged. - **Clean up the `cache get` temp download on every exit path.** `Http.Download` parks the payload in the system temp dir; a failed `MoveToFile` (cross-volume, denied target) or an exception could leave the temp file behind. The downloaded buffer is already flagged delete-on-close by `HttpClient`, so the fix is just to clear that flag after a successful `MoveToFile` so the renamed-out file isn't reaped. ## Other - **Fix wrong URL fields in `oplog-export` / `oplog-import` builds-branch descriptions.** Two operator-facing "[builds] URL/namespace/bucket/buildsid" messages formatted `m_CloudUrl` instead of `m_BuildsUrl` / `m_BuildsHost` (copy-paste from neighbouring `[cloud]` branches), shown as empty or stale at the start of an export/import. - **Fix "Can't find oplog in project '{}'" formatting and a "Failed top mirror" typo in projectstore_cmd.** - **Fix a misleading `oplog-export` comment on the `--zen` scheme default** ("Assume https" vs. the `http://` the code writes). - **Fail `ScrambleDir` when `RemoveFile` doesn't delete.** The `zen builds test` scramble phase used `(void)RemoveFile(FilePath)`, discarding both the bool return and the error. A quiet delete failure let verification run against stale state; switched to the two-arg overload and throw on false return or non-empty `error_code`.
* async consul register/deregister (#992)Dan Engelbrecht2026-04-214-67/+227
| | | - Improvement: Hub Consul service registration and deregistration are now dispatched on a dedicated background thread so instance state transitions no longer stall when the Consul agent is slow or unreachable
* builds download "default" part if nothing is specified (#994)Dan Engelbrecht2026-04-213-24/+89
|
* structured cache: fix some minor issues (#995)Stefan Boberg2026-04-213-23/+29
| | | | | | | | | | | | | | - Fix double WriteResponse in PUT record failure path; the detail-body branch now short-circuits instead of falling through to a second WriteResponse call - Return 405 Method Not Allowed for unsupported verbs in the root, namespace, bucket, record, and chunk handlers (previously fell through to no response) - Clamp exec$/replay-recording thread_count so a bogus query value cannot spawn an unbounded worker pool ## Performance / cleanup - NamespaceMap now uses TransparentStringHash + std::equal_to<>, so Get/Put/Find/Drop can probe the map with a std::string_view directly instead of constructing a temporary std::string on every request - Replace insert_or_assign with try_emplace under the exclusive lock in GetNamespace; the find() re-check already guarantees the key is absent, so try_emplace matches intent better ## Reverted - The earlier change to erase the pinned entry from m_DroppedNamespaces after DropNamespace's post-drop work was reverted: other threads may still hold pointers into a dropped namespace, so tearing it down eagerly is unsafe. Dropped namespaces remain pinned for the lifetime of the process as before.
* Zen CLI common server interface (#920)Stefan Boberg2026-04-2021-552/+599
| | | | | | | | | | | | | | | | | | Introduces a common `ZenServiceClient` RAII wrapper for zen CLI commands that interact with a zenserver instance. CLI operations (admin, builds, cache, exec, hub, info, projectstore, trace, ui, version, vfs, workspaces) automatically register sessions so they become visible in the server's session list, and forward log output to the server's session log endpoint. All session HTTP I/O (announce, remove, log batches) runs on a single background worker thread, so CLI startup and shutdown never block on server availability. ### Key changes - **`ZenServiceClient`** — new RAII class that wraps host resolution, HTTP client creation, and session lifecycle (register on connect, remove on exit). Replaces ad-hoc boilerplate across all command files that talk to a server, including the new `trace` subcommands (`start`, `stop`, `status`). - **Async session I/O** — `SessionsServiceClient` now owns a single worker thread and command queue. `Announce()`, `Remove()`, and `UpdateMetadata()` enqueue commands and return immediately. The worker creates one `HttpClient` with a 5-second total timeout, bounding any individual request. Eliminates main-thread stalls when the server is unreachable. - **Session log forwarding** — `SessionLogSink` is a thin enqueuer that posts log messages to the same worker queue (no separate thread or HTTP client). Log levels are serialized as integers; the server-side ingest handles both string and integer formats for backwards compatibility, with bounds checking on integer values. - **Build & projectstore session registration** — Long-running `builds` and projectstore cache (oplog-download) connections register sessions too, making them visible alongside regular CLI command sessions. ### Cleanup - Extract `SetupCacheSession` helper on `StorageInstance` to reduce duplication. - Remove unused `HttpClient` reference in ui command.
* Rename logging::ToStringView to ToString for consistency (#993)Stefan Boberg2026-04-206-11/+11
| | | | | | | - Renames `logging::ToStringView` → `ToString` and `ShortToStringView` → `ShortToString` for consistency with the rest of the codebase, where `ToString` is the convention for enum-to-string conversions (return type already communicates it's a view). - Updates all call sites in logbase, logging helpers, session log sink, admin service, and tcplogstreamsink. Split off from the `sb/zen-monitor` branch so the ZenServiceClient refactor PR stays focused.
* hide secrets from log and sentry (#989)Dan Engelbrecht2026-04-206-11/+779
| | | * scrub sensitive command line options from log and sentry
* zen trace analysis support (#945)Stefan Boberg2026-04-2046-191/+16884
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Integrates the **tourist** trace analysis library and builds a full `zen trace` command suite for working with Unreal Engine `.utrace` files. ### Trace analysis library (`thirdparty/tourist/`) - Adds the tourist library as a third-party dependency with three modules: **foundation** (platform primitives, memory, scheduling), **trace** (UE Trace protocol decoding), and **analysis** (event dispatching and analyzer framework). - Cross-platform support for Windows, Linux, and macOS. ### `zen trace` CLI commands (`src/zen/cmds/`, `src/zen/trace/`) - **`zen trace analyze`** — Summarize a `.utrace` file: session metadata, thread inventory, command line + build configuration, CPU profiling scopes, timing, event rates, log messages, and (with symbols) memory allocation metrics including live-allocs dumps, callstack-keyed aggregation, and allocation churn. Optional HTML output for memory reports. - **`zen trace inspect`** — Dump the event schema (declared types, fields, sizes) from a trace file. - **`zen trace trim`** — Extract a time-window from a trace into a new `.utrace` file. - **`zen trace serve`** — Launch a local HTTP server hosting an interactive trace viewer; opens in the default browser. ### Symbolication (`src/zen/trace/symbol_resolver.*`, `thirdparty/raw_pdb/`) - Pluggable resolver with multiple backends: `pdb` (in-tree raw_pdb), `dbghelp` (Windows), `llvm-symbolizer` (all platforms), `atos` (macOS). An `auto` backend picks the best available tool per platform. - Microsoft Symbol Server support: downloads PDBs on demand using a redirect-aware HTTP client. - Local PDB cache keyed by image GUID preserves symbols across binary recompilation. - Callstack trimming heuristic strips UE internal noise from reports. - Binary analysis cache (`.ucache_z`) avoids re-resolving the same trace. ### Interactive trace viewer (`src/zen/frontend/html/`, `src/zen/trace/trace_viewer_service.*`) - Timeline: scope-level detail, horizontal zoom/pan, vertical scrolling, viewport-driven loading with pre-computed LOD for responsive navigation of large traces. - Thread grouping (collapsible sidebar sections) synthesized from name suffixes, natural sort order, visual distinction between lane threads and OS threads. - Bookmark and region annotations; region categories with per-category toggles; bookmark marker toggle in the toolbar. - Filterable Logs tab showing captured `UE_LOG` output. - Stats tab with per-scope aggregate statistics. - Memory tab with interactive allocation analysis and an allocation size histogram. - CsvProfiler event parsing and chart UI. ### Other in-branch supporting changes - **Cross-platform browser launcher** (`browser_launcher.{h,cpp}`) used by `trace serve`. - **`ReciprocalU64`** fast 64-bit integer division (zencore/intmath) for trace analyzers. - **`parallelsort`** cross-platform parallel sort helper (zenutil). - Frontend zip build rule so the viewer's HTML assets are bundled into `zen.exe`. - `/Zo` flag for better optimized debug info on Windows release builds. - `trace-tests.cpp` in the `zen-test` harness (harness itself landed on main via #985).
* Add CompactString utility type (#990)Stefan Boberg2026-04-202-4/+150
| | | | | - Introduce `CompactString`: a move-only, heap-allocated, immutable string wrapper that stores its length in a prefix byte for cheap `Size()`/`ToView()` while keeping the object to a single pointer. - Swap the `ToString()` integer-formatting helpers in `zencore/string.cpp` to `std::to_chars`, which is ~5-10x faster and benefits every `IntNum` / `StringBuilder` / `CbJsonWriter` caller. - No in-tree users on `main` yet; the type is ready for callers that want owned-string storage with lower per-entry overhead than `std::string` (e.g. long-lived log buffers, session records).
* Use eastl::deque for queues with many small elements (#991)Stefan Boberg2026-04-206-23/+27
| | | | | | | | | | | Switch several deque-based queues from `std::deque` to `eastl::deque` to reduce per-element heap allocation overhead. MSVC's `std::deque` allocates one node per element for anything larger than ~16 bytes; `eastl::deque` groups 4, 8, or 32 elements per block depending on element size. Converted call sites: - `BlockingQueue` and `WorkerThreadPool` (generic — downstream callers benefit automatically) - Session log entry buffer (~10k-entry ring of large log records — 4 per block vs 1) - Job queue (`Ref<Job>` — 32 per block vs 2) - RPC recording request queue (large `QueuedRequest` struct — 4 per block vs 1) - StatsD client message queues (~32-byte buffers — 8 per block vs 1)
* s3 dehydration touch cas (#977)Dan Engelbrecht2026-04-203-25/+162
| | | | * add Touch() function to s3 client * touch all used cas files in s3 dehydration path
* zen history command (#987)Dan Engelbrecht2026-04-2017-27/+828
| | | | | | | | | - Feature: Per-user invocation history for `zen` and `zenserver`; each startup appends a record to a JSONL file capped at the most recent 100 entries. Location: `%LOCALAPPDATA%\Epic\Zen\History\invocations.jsonl` on Windows, `~/.zen/History/invocations.jsonl` on POSIX - `zen history` opens an interactive picker; selecting a zen row re-runs it inline and forwards the exit code, selecting a zenserver row spawns it detached - `zen history --list` (`-l`) prints the table to stdout instead of showing the picker - `zen history --filter zen|zenserver` restricts the listing to one executable - `zen history --print` prints the reconstructed command line of the selected row instead of launching it - `--enable-execution-history` global option on both binaries (default `true`) to opt out per invocation - The history file is attached to Sentry crash reports (alongside the existing zenserver log)
* zen-test: add CLI integration harness + TestArtifactProvider + CI host stats ↵Stefan Boberg2026-04-209-0/+1239
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#985) Establishes a new end-to-end integration test harness for the `zen` CLI, the shared fetcher it uses to pull test artifacts, and the CI plumbing that feeds both. Also lowers the default test-harness log level and broadens the artifact fetcher's credential resolution. ### `zen-test` executable (`src/zen-test/`) - New binary modeled on `zenserver-test`, built only in debug. - `zen-test.{h,cpp}` harness: spawns `zen.exe` via `CreateProc` and captures combined stdout/stderr into a `ZenCommandResult` for assertion. - Registered with `scripts/test.lua` under the short name `zen` (`xmake test --run=zen`) and enabled for `--kill-stale-processes`. - Prints a clear console message when invoked from a release build (tests disabled), so misconfiguration is easy to spot. - Documented in `CLAUDE.md` (test-suite naming table + test projects section) and `README.md`. - Test cases in the `zen.artifactprovider` suite: - `probe.lyra_cook_rpc_recording` — probe against a canonical Lyra cook RPC recording that skips with a diagnostic `MESSAGE` when no artifact source is configured. - `probe.s3_readme` — probes the configured S3 bucket for `README.md` using a fresh temp cache to force the request through to S3; skips on macOS without static creds (no EC2 Mac runners in our fleet). - `zen.utility-cmd` suite: new integration tests exercising `zen print`, `zen wipe`, and `zen copy`. ### `TestArtifactProvider` (`src/zenutil/testartifactprovider.{h,cpp}`) - `Ref<TestArtifactProvider>` factory returning a local-only or S3-backed provider, selected from env vars: - `ZEN_TEST_ARTIFACTS_PATH` — local directory to serve from (write-through cache for remote fetches). - `ZEN_TEST_ARTIFACTS_S3` — S3 URL to fetch from. - `AWS_DEFAULT_REGION` / `AWS_REGION`, `AWS_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN` — standard AWS config. - `Exists(path)` / `Fetch(path)` API with a `TestArtifactFetchResult` return carrying the content buffer and a diagnostic error string. Content is cached on disk across test runs. - **IMDS credential fallback**: when no static `AWS_ACCESS_KEY_ID` is present, attaches an `ImdsCredentialProvider` so self-hosted EC2 runners with an attached IAM role can sign S3 requests without static credentials (mirrors the pattern in `zenserver/hub/hydration.cpp`). - **IMDS opt-out**: honors the standard `AWS_EC2_METADATA_DISABLED=true` env var, and skips IMDS by default on macOS where the link-local probe would just emit noise. ### Test harness log level (`src/zencore/testing.cpp`) - `TestRunner::ApplyCommandLine` now defaults the global log level to `Info` (was effectively `Trace`), cutting the noise from `xmake test --run=all` now that the suite has grown. Applies uniformly to `zencore-test`, `zenhttp-test`, `zenstore-test`, `zenutil-test`, `zenserver-test`, `zen-test`, etc. `--debug` (Debug) and `--verbose` (Trace) still opt back in when chasing failures. ### CI (`.github/workflows/validate.yml`) - **Runner info step** on all three platforms (Windows/Linux/macOS): prints host, CPU topology, memory, and disk usage before the build/test step, so flakes that correlate with a particular runner or low disk space are easy to spot. - **Artifact env wiring**: passes `ZEN_TEST_ARTIFACTS_S3` and `AWS_DEFAULT_REGION` into the debug Build & Test step on all three platforms so the probe can reach its source when the repo variable is configured. The probe skips cleanly when unset.
* add --pid och --executable till zen down command (#988)Dan Engelbrecht2026-04-206-33/+384
|
* zen: remove unused 'copy' and 'run' subcommands (#986)Stefan Boberg2026-04-205-469/+0
| | | These CLI commands are no longer useful and have been dropped from the zen client.
* added support for trace regions (#984)Stefan Boberg2026-04-203-0/+94
| | | | | | | | | - Introduces a UE-trace Region primitive in `zencore/trace.{h,cpp}` for marking named, potentially long-running intervals of work that Unreal Insights render as banners in the timeline, separately from CPU scopes. - New API: - `uint64_t TraceBeginRegion(RegionName, Category={})` / `void TraceEndRegion(RegionId)` for manual begin/end pairs. - `ScopedTraceRegion` RAII helper plus `ZEN_TRACE_REGION(name)` / `ZEN_TRACE_REGION_CAT(name, category)` macros for scope-based use. - Emits the `Misc.RegionBeginWithId` / `Misc.RegionEndWithId` trace events (paired by a `GetHifreqTimerValue()`-derived id). - Full no-op fallback under `#if !ZEN_WITH_TRACE` so callers compile in all configurations. - Annotates `GcScheduler::CollectGarbage` with `ZEN_TRACE_REGION_CAT("GcScheduler::CollectGarbage", "gc")` as a first caller — makes GC passes visible as banners in Insights without relying on the existing `ZEN_TRACE_CPU` scope alone (which doesn't render as a region).
* zencore: implement SearchPathForExecutable on POSIX (#981)Stefan Boberg2026-04-201-1/+32
| | | | | - The Linux/macOS branch of `SearchPathForExecutable` was previously a no-op that returned the input unchanged. Callers passing a bare executable name (e.g. `llvm-symbolizer`) got the same bare name back even when the binary lived elsewhere on `PATH`. - Now walks `$PATH` like `execvp` does: skip the search if the input contains a `/`, try each colon-separated entry (empty entry == cwd), and return the first candidate that is both a regular file and executable by the current user. Falls back to returning the input unchanged if nothing matches, preserving the previous behavior for the no-match case. - Windows branch is unchanged (still uses `SearchPathW`).
* zencore: CreateProc stdin pipes + BuildArgV quote stripping (#983)Stefan Boberg2026-04-204-75/+463
| | | | | | | | | | | | | | | | Two related improvements to `CreateProc`: ### 1. Stdin pipe support - Adds `StdinPipeHandles` + `CreateStdinPipe` alongside the existing `StdoutPipeHandles`, letting callers feed data into a child process's stdin. - Platform-agnostic RAII (Windows `HANDLE` pair / POSIX `pipe()` fd pair) with the same semantics as the stdout pipe: the inherited end goes to the child, the non-inherited end stays with the parent, destructor closes both. - `CreateProcOptions` gains a `StdinPipe*` field. - On Windows, `CreateProcNormal` is reworked so stdin/stdout redirection handles all combinations (stdin + stdout, each alone, neither) uniformly. POSIX already supported arbitrary fd redirection and just needed to honor the new option. - `zentest-appstub` gains a `-stdin_echo` mode that reads stdin to EOF and echoes it back (switching to binary mode on Windows so CRLF translation doesn't mangle bytes). - `zenserver-test` gets a `server.process` / `stdin_pipe.*` test group that exercises launching a child with a stdin pipe, writing, closing the write end, and reading back the echoed data. ### 2. Shell-style quote stripping in `BuildArgV` - Callers that build a single command-line string for `CreateProc` commonly wrap spacey paths in double quotes (e.g. `--tracefile="$path"`). The old `BuildArgV` only used quotes to suppress space-splitting and left the characters in the resulting argv element, so the spawned process saw literal `--tracefile="..."` and the value parser failed to open the quoted path. - `BuildArgV` now compacts in place, dropping quote chars as it goes, matching shell semantics for paired double quotes.
* zenhttp: add FollowRedirects option to HttpClient (#982)Stefan Boberg2026-04-202-0/+13
| | | | | | - Adds `FollowRedirects` (default `false`) and `MaxRedirects` (default `5`) fields to `HttpClientSettings`. - When `FollowRedirects` is enabled, the curl backend sets `CURLOPT_FOLLOWLOCATION` and `CURLOPT_MAXREDIRS` so HTTP 3xx redirects are handled transparently in the transport layer — callers no longer need to parse `Location` headers and re-issue requests themselves. - Defaults are off, so existing callers see no behavior change.
* Move ZipFs from zenserver frontend into zenhttp (#980)Stefan Boberg2026-04-207-6/+15
| | | Moves `ZipFs` from `src/zenserver/frontend/` to `src/zenhttp/` so any binary linking `zenhttp` can serve a bundled web UI from a zip archive (motivator: the upcoming `zen trace serve` subcommand).
* zencore: promote ScopedEnvVar to a shared filesystem helper (#979)Stefan Boberg2026-04-203-56/+67
| | | | | - Moves the RAII `ScopedEnvVar` helper out of `hydration.cpp`'s anonymous test namespace and into `zencore/filesystem.{h,cpp}` next to `GetEnvVariable` so it can be reused by other subsystems. - Makes the class non-copyable/non-movable and moves its members to `private`.
* consolidate cache commands into `cache` subcommand (#978)Stefan Boberg2026-04-206-822/+922
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Consolidate the scattered cache-related top-level commands into a single `zen cache <sub>` command tree, keeping the old names as hidden deprecated aliases so any existing scripts keep working. ## Motivation `zen` has accumulated a flat list of cache-adjacent commands (`cache-info`, `cache-stats`, `cache-details`, `cache-gen`, `cache-get`, `drop`, `rpc-record-start/stop`, `rpc-record-replay`). Each one re-declares `--hosturl` parsing and host resolution, and there is no natural home for new cache tooling. Grouping them under `cache` gives a consistent UX and a shared base class to hang common options off of. ## Changes ### Subcommand consolidation - Moved into `cache <sub>` form: - `cache info`, `cache stats`, `cache details`, `cache gen`, `cache get`, `cache drop` - `cache record <path>` / `cache record stop` (formerly `rpc-record-start` / `rpc-record-stop`) - `cache replay` (formerly `rpc-record-replay`) - All old top-level names remain as deprecated aliases and forward through a shared legacy-shim dispatcher that rewrites `argv` and re-enters the new dispatcher, so behavior is byte-identical for existing callers. - Deprecated aliases are now hidden from the top-level `zen --help` listing (new `ZenCmdBase::IsHidden()` + `DeprecatedCacheStoreCommand` base). They still dispatch normally; `zen cache --help` is the canonical discovery surface. ### Shared base class - New `CacheSubCmdBase` owns the `--hosturl` option and `ResolveHost()` logic, eliminating the copy/pasted block at the top of every `Run()`. ### Output format - Added `--yaml` to `cache info`, `cache stats`, and `cache details` (negotiated server-side via `Accept: text/yaml`). `cache details` now rejects `--csv --yaml` combined. ### Hardening - `cache gen`: bounds-check requested sizes before allocating. - `cache replay`: validate `--stride` / `--offset` and fix progress-math overflow edge cases.
* builds cmd refactor (#975)Dan Engelbrecht2026-04-2039-13022/+13402
| | | | | | | | | - Bugfix: `builds download` partial-block fetch decisions now account for build storage host latency - Bugfix: Transfer rate displays in `builds` commands now smooth correctly - Split `buildstorageoperations.cpp` (8.5k lines) into per-operation TUs: buildinspect, buildprimecache, buildstorageresolve, buildupdatefolder, builduploadfolder, buildvalidatebuildpart; stats moved to buildstoragestats.h. - FilteredRate extracted to zenutil. - BuildsCommand shared state consolidated into a BuildsConfiguration struct; subcommands inherit from BuildsSubCmdBase holding a `const BuildsConfiguration&` instead of a `BuildsCommand&`. - `ProgressBar` renamed to `ConsoleProgressBar`; mode enum (`ConsoleProgressMode`) lifted to namespace scope; `PushLogOperation`/`PopLogOperation`/`ForceLinebreak` promoted to virtuals on `ProgressBase`. - Free-function wrappers (`UploadFolder`, `DownloadFolder`, `ValidateBuildPart`) added around the existing operation classes so callers stop reimplementing setup + stats logging.