aboutsummaryrefslogtreecommitdiff
path: root/src/zencore/include
Commit message (Collapse)AuthorAgeFilesLines
* sessions: persist to disk, prune, track client liveness, accept UE_LOGFMT ↵HEADmainStefan Boberg2026-05-056-7/+470
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1014) Branch started as a sessions-service overhaul (persistence, client liveness, UE_LOGFMT intake) and grew to pick up adjacent infrastructure work: an early-startup log backlog, a hardened `MemoryArena`, the `zen trace serve` viewer gaining a counter view + compact timeline + tabbed callsite panel, defensive fixes in the third-party `tourist` trace parser, a series of allocation reductions across the HTTP and compact-binary hot paths, and a new `zen sessions` CLI command tree. ## Sessions service **Persistence.** Each session lives on disk under `<DataRoot>/sessions/<id>/` as `info.cb` (metadata) plus `log.bin` (length-prefixed CbObject log records). On startup the service scans that directory and loads prior sessions as ended sessions, preloading the tail of each log so historical views work after a restart. `SessionLog` is noexcept-constructed and falls back to a disabled state on disk errors, so a bad disk can't take down `RegisterSession`. `GetSession` falls back to the ended-sessions list (fixes historical log fetches over HTTP). `LoadTail` counts only successfully-parsed records. **Pruning.** Periodic cleanup task drops ended sessions once any of three caps is exceeded: age (default 1 year), count (default 1000), or total on-disk footprint (default 50 MiB). Runs 30 s after startup, hourly thereafter. Active sessions never pruned; disk removal and directory stat happen outside the exclusive lock so a slow filesystem can't stall lookups. **Client liveness.** Sessions carry a `ProcessHandle` for the client-reported pid, captured at registration time so Windows pid recycling can't produce false positives. A 30 s asio timer probes liveness and ends dead sessions through the normal remove path, producing a synthetic `Session ended: process exited (...)` line persisted to `log.bin`. Windows decodes common NTSTATUS exit codes to human names (Ctrl-C, access violation, stack overflow, ...); POSIX stays at plain `process exited`. Clients auto-fill `ClientPid` only for local targets (unix socket / loopback); the server defensively accepts pids only from `IsLocalMachineRequest()` peers. zenserver also reports its own pid when registering its self-session, so it shows up with a real pid in the dashboard and `zen sessions ls`. **Synthetic end-of-session line.** `RemoveSession` takes an optional reason; before the session moves to the ended list it appends an Info-level `Session ended[: reason]` entry through the normal log path (released outside `m_Lock`). Current reasons: `client request` (HTTP DELETE), `server shutdown` (self-session), `process exited (...)` (liveness). **UE_LOGFMT structured entries.** `POST /sessions/{id}/log` now accepts `{level, logger, format, fields}` alongside the existing `{level, logger, message}` shape. New `logtemplate.{h,cpp}` implements UE's `StructuredLog.cpp` template grammar (field paths with `.name` / `[N]`, `{{`/`}}` escapes, `$text` / `$format` / `$locformat` object conventions, bounded recursion). Renders to a displayable message at intake while persisting raw format + fields so a future UI can drill into fields without another schema bump. Hot path is zero-alloc — renders into `ExtendableStringBuilder<256>` using stack-buffered `Oid::ToString` / `IoHash::ToHexString` overloads. UI shows a `{…}` marker with the raw template + JSON-pretty fields on hover. **Parent sessions.** `SessionInfo` gains `parent_session_id`; hub-managed storage server child processes inherit the hub's session id via `--parent-session=<id>`. `ZEN_SESSIONS_URL` env var becomes a fallback for `--sessions-url` / config when neither is provided. The in-process session log sink is disabled when a remote sessions target is configured (logs flow through `SessionsServiceClient` instead). The sessions UI groups child sessions under their parent (collapsible/expandable, sorts as a unit, supports nesting). **Platform reporting.** `SessionInfo` gains `Platform`, flowed end-to-end: client auto-fills via `GetRuntimePlatformName()`, server persists in `info.cb` (`plat`) and emits on GET. UI renders as a SimpleIcons-style inline SVG (windows / macOS / iOS / linux / wine / android / playstation / xbox / nintendo) with case-insensitive alias resolution (Win32/Win64, PS4/PS5, XSX/XSS, NintendoSwitch, iPhone/iPad, Darwin/OSX). Unknown values fall back to text; sorting runs on the underlying string. **WebSocket log streaming.** Sessions UI moves from 2 s polling to a WebSocket push model. New `WsSubscriber` has a stable id + helper methods. UI caps the log-line DOM at 5 000 entries with a shared cursor-regression helper, factored out of two call sites. Per-broadcast allocations trimmed on the push path; fixed a stack overrun in the WS log broadcast hex-id buffer. **Log memory.** `LogEntry::Level` is now `logging::LogLevel` (1 byte) instead of `std::string` (~32 B) — saves ~310 KB per full 10 k-entry deque and eliminates a per-message allocation in the in-proc sink. On-disk format writes an int32 and accepts either int or legacy string on read. `LogEntry` strings now live in a `MemoryArena`; logger names are interned across the deque. `SessionLog::Append` and `WriteSessionInfoFile` drop their `UniqueBuffer` round-trip and write `CbObject::GetView()` straight through `BasicFile` / `SafeWriteFile`. Multi-entry `POST /log` batched under one lock + one push. **In-proc log timestamps.** `InProcSessionLogSink::TimePointToDateTime` previously preserved only whole seconds, so every in-proc entry rendered at `.000` ms in the dashboard and `zen sessions tail`. It now adds the sub-second part (nanoseconds → 100 ns ticks) to keep ms precision end-to-end. **UI.** Side "Session Details" panel is gone — its info is inline in the table (appname, mode, platform, id, timestamps, this/log pills, active dot). Bottom panel is a tabbed `Log | Metadata` view with a right-side "Session Information" panel beside metadata; log-only controls (filter, newest-first, follow, log-level filter, expand/collapse) hide when Metadata is active, polling keeps running across tab switches. Wide-mode toggle fills the viewport edge-to-edge. Log lines show the logger category; timestamps render in 24 h with zero-padded fields regardless of locale. Sessions list defaults to All / 10 per page / created-desc, gains click-to-sort headers on the full dataset, a header filter box, and a pager aligned to the table's right edge. Duplicate auto-injected `<h1>Sessions</h1>` removed. ## `zen sessions` CLI New command tree on the `zen` client for inspecting the sessions service from the terminal: - **`zen sessions ls`** — lists sessions (active first, ended next; newest-first within each group) with id, status, app/mode, pid, created, duration, and log count. Supports `--status active|ended|all` (default `all`). - **`zen sessions status`** — prints the sessions service summary: self id, active / ended counts, and the read/write/delete/list/request/bad-request counters from `/stats/sessions`. - **`zen sessions tail [session]`** — tails a session's log. With no argument it tails zenserver's own session (resolved via `/sessions/list`'s `self_id`); an explicit 24-hex id targets any session, including ended ones (historical replay). `--lines N` (default 50, 0 = all buffered) trims the initial dump client-side. `--follow` prefers a WebSocket push subscription on `/sessions/ws` for sub-second latency; on upgrade failure (older server, blocked port, unix-socket transport) it falls back to HTTP cursor polling at `--interval-ms` (default 500), with sleeps chunked to 50 ms so Ctrl-C reacts quickly. Output matches `zen::logging::FullFormatter` (`[YY-MM-DD HH:MM:SS.mmm] [lvl] [logger] message`); on a TTY the level is colored and the logger is bold, with continuation lines indented under the message column using the *visible* prefix width. 404 surfaces as `(session ended)` and connection errors as `(server gone)` — both clean exits, so stopping the server mid-tail no longer prints a stack trace. - **`zen sessions ui`** — opens `<host>/dashboard/?page=sessions` in the user's default browser. Rejects unix-socket hosts. A small `ZenServiceClient::IsUnixSocket()` helper now wraps the unix-socket check used by `ui`, `sessions tail` (WS path), and `sessions ui`. ## Logging `BacklogSink` captures early-startup log entries in a fixed-capacity ring so late-attached sinks (session sink, file sink) can replay them. Detaches from the broadcast list when disabled; backed by destructor-only cleanup (no `unique_ptr` indirection per entry). Tuned defaults so the backlog covers typical bring-up without unbounded growth. ## `zen trace serve` viewer - Compact timeline mode for high-density views. - New `TRACE_INT_VALUE` / `TRACE_FLOAT_VALUE` counter trace points + a counters page in the viewer. - Callsite tables collapsed into a single tabbed panel. - Lossless `Oid <-> Guid` bridge for trace session ids; trace `SessionId` plumbed through. - `tourist` parser hardening: bounds-check `BufferStream::read`, validate `Type::info_size` before `patch()`, convert `parse_important_aux` to a loop (avoids deep recursion), widen `ParserPool` index to `uint32`, bounds-check field offsets in the dispatcher, pin `Types::parse` buffer up-front. ## `MemoryArena` Configurable chunk size, inline chunk list, oversize requests routed to truly-dedicated chunks (no slack waste, no fragmentation when one allocation is much larger than the chunk). ## Allocation cleanups across hot paths - `zenhttp::HttpRequestRouter::HandleRequest` and `FormatPackageMessageInternal`: drop heap allocations. - Compact-binary validation: `eastl::fixed_vector` + `eastl::sort`; eliminate `std::vector` churn. - `zenserverprocess`: trim transient allocations in spawn paths. - Sessions HTTP intake / broadcast: drop transient `std::string` allocs.
* hub async s3 client (#1024)Dan Engelbrecht2026-05-051-0/+46
| | | | | | | | | | | | | | - Feature: `AsyncHttpClient` adds cancellable request tokens, streaming GET to a file (`AsyncDownload`), zero-copy chunk-callback GET (`AsyncStream`), pull-mode body source for streaming `AsyncPut`, retry layer mirroring the synchronous client, and a submit-side in-flight cap (`HttpClientSettings::MaxConcurrentRequests`) so hub-scale fanout against a single host cannot stall queued handles into curl's connect-timeout window - Feature: Hub hydration can route S3 transfers through a non-blocking `AsyncHttpClient` (curl_multi + asio) backed by a single io thread; hydrate and dehydrate now pipeline requests instead of blocking worker threads - `--hub-hydration-async-enabled` (Lua: `hub.hydration.async.enabled`, default true) - `--hub-hydration-async-max-concurrent-requests` (Lua: `hub.hydration.async.maxconcurrentrequests`, default `clamp(cpu*4, 128, 512)`) - Feature: Hub provision/deprovision/obliterate now run as two phases on separate worker pools so per-module hydration cannot starve child-process spawn/despawn (and vice versa) - New `--hub-instance-spawn-threads` (Lua: `hub.instance.spawnthreads`, default `clamp(cpu/8, 4, 16)`) drives child-process spawn/despawn - `--hub-instance-provision-threads` (Lua: `hub.instance.provisionthreads`) now drives per-module hydrate/dehydrate scheduling only; default changed from `max(cpu/4, 2)` to `clamp(cpu/8, 4, 12)` - `--hub-hydration-threads` (Lua: `hub.hydration.threads`) now controls per-file workers inside a single hydrate/dehydrate; default changed from `max(cpu/4, 2)` to `clamp(cpu/8, 4, 12)` - Feature: `AsyncHttpClient` owns its `asio::io_context` and one io thread by default; the `(BaseUri, io_context&)` constructor is preserved for callers that want to share an externally-driven `io_context` across clients (caller MUST keep the loop running until the client destructs) - Feature: `Hub::Configuration` C++ struct fields renamed (`OptionalProvisionWorkerPool`/`OptionalHydrationWorkerPool` -> `OptionalProvisionPool`/`OptionalSpawnPool`/`OptionalHydrationPool`). Embedders constructing `Hub` directly must update field names; provision and spawn pools must both be set or both null (asserted at construction). - Bugfix: `S3Client` signing-key cache no longer returns stale signatures after IMDS-rotated credentials change `AccessKeyId`; cache is now keyed on `(DateStamp, AccessKeyId)`
* zenhttp improvements (robustness / correctness) (#968)Stefan Boberg2026-05-044-0/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A collection of security, correctness, and robustness fixes in `zenhttp` and `zencore` surfaced by security review. Most items are small, independent commits grouped here because they all tighten trust boundaries or fix UB along the same code paths. ## WebSocket protocol hardening (RFC 6455) - **Enforce the client-side mask bit**. Server-side frame loops now reject unmasked frames with close code 1002 per §5.1. Prevents HTTP intermediary smuggling. - **Validate control frames and RSV bits**. Fragmented control frames, oversized (>125 B) control payloads, and any non-zero RSV bit now fail the connection before allocation. - **Lower per-frame payload cap** from 256 MB → 4 MB. Bounds per-connection accumulator memory. - **Implement message fragmentation**. Continuation frames are coalesced and delivered as a single message; interleaved non-control frames close with 1002; assembled messages are capped at 4 MB (1009 on overflow). Previously partial fragments were delivered to handlers, bypassing payload validation. - **Parse the 101 handshake response properly** in `HttpWsClient`. Status-line, `Upgrade`, `Connection`, and `Sec-WebSocket-Accept` are now matched exactly rather than via substring searches against the full body. ## Auth / OIDC hardening - **Constant-time password compare** in `PasswordSecurity::IsAllowed` (closes a remote length/content timing oracle). Adds a shared `ConstantTimeEquals` helper. - **Harden Basic-auth header parsing**: trim trailing LWS, reject control bytes and DEL in the credential. - **OIDC discovery pinning**: require HTTPS (loopback exempt), verify `issuer` matches `BaseUrl`, require `token_endpoint` / `userinfo_endpoint` / `jwks_uri` to share origin with `BaseUrl`, reject empty `token_endpoint`. - **Restrict `POST /auth/oidc/refreshtoken`** to local-machine requests. Previously unauthenticated in default deployments — remote callers could evict or replace cached tokens. - **Stop logging OIDC provider response bodies** on refresh failure (IdPs echo `refresh_token` back in error bodies). - **Drop the unused `IdentityToken` field** from `OidcClient` / `OpenIdToken` so nothing in the tree accidentally trusts an unverified JWT. ## Auth state encryption migration - Add `AesGcm` AEAD primitive (BCrypt / OpenSSL backends, mbedTLS stubbed) and `CryptoRandom::Fill` CSPRNG helper in `zencore/crypto.h`. - Migrate authstate file from AES-256-CBC with a fixed IV to AES-GCM with a fresh 12-byte random nonce per write and the 4-byte `ZEN1` magic bound as AAD. Legacy-CBC files are transparently read once and rewritten in the new format. ## Filesystem / IO robustness - `IoBufferExtendedCore::Materialize` now checks `MAP_FAILED` on POSIX (was comparing to `nullptr`, which let the failure sentinel propagate into later reads and `munmap(MAP_FAILED, ...)`). - `IoBufferBuilder::MakeFromFile / MakeFromTemporaryFile`: close the FD/HANDLE on exception via a dismissable `ScopeGuard`; actually check the `fstat()` return value (previously used an uninitialized `FileSize`). - `ReadFromFileMaybe`: loop short reads, retry `EINTR`, chunk Windows `ReadFile` at `0xFFFFFFFF` bytes (fixes silent truncation of multi-GiB reads). - `WipeDirectory`: compare `FindFirstFileW` handle against `INVALID_HANDLE_VALUE` rather than `nullptr`. - `RemoveFileNative` (Linux/macOS): report non-`ENOENT` stat failures via the `std::error_code` out-param and stop reading `st_mode` after a failed stat. ## Buffer / compression correctness - Avoid per-copy `IoBufferCore` heap allocations in `CompositeBuffer::CopyTo / ViewOrCopyRange` iterators; add fast path for `BufferHeader::Read` when the 64-byte header fits in the first plain-memory segment. - `BufferHeader`: add `IsHeaderValid()` gate covering `BlockSizeExponent` range, `BlockCount * BlockSize` overflow, and `TotalRawSize` bounds before any arithmetic uses them. Defends against attacker-controlled headers that can pass the CRC and trigger OOB writes in `DecompressBlock`.
* GetEnvVariable: return std::optional<std::string> (#1017)Stefan Boberg2026-04-271-1/+1
| | | | | | | - `GetEnvVariable` now returns `std::optional<std::string>` so callers can distinguish an unset variable from one set to an empty value. - Windows path uses `SetLastError(ERROR_SUCCESS)` + `ERROR_ENVVAR_NOT_FOUND` to detect "not found"; POSIX path returns `nullopt` when `getenv` returns `nullptr`. - All call sites migrated. Most use `.value_or("")` to preserve current empty-or-unset behavior. The diagnostic helpers in `zen-test/artifactprovider-tests.cpp` now report `<unset>` vs `<empty>` distinctly. - Added a check in the `ExpandEnvironmentVariables` test confirming `nullopt` for an unset variable; PATH/HOME lookups in that test use `REQUIRE(has_value())` so a missing var fails cleanly instead of throwing `bad_optional_access`.
* Zs/user path case comparison (#1015)Zousar Shaker2026-04-271-0/+18
| | | - Improvement: `zen builds` `--exclude-folders` and `--exclude-extensions` values now match paths case-insensitively and tolerate surrounding whitespace between separators
* hydration with pack (#1016)Dan Engelbrecht2026-04-271-0/+6
| | | | | | | | | | | | | | | - Feature: Hub hydration packs small files into raw CAS pack blobs to reduce request count for modules dominated by tiny metadata files - `--hub-hydration-enable-pack` (Lua: `hub.hydration.enablepack`, default true) - `--hub-hydration-pack-threshold-bytes` (Lua: `hub.hydration.packthresholdbytes`, default 256 KiB) - `--hub-hydration-max-pack-bytes` (Lua: `hub.hydration.maxpackbytes`, default 4 MiB) - Feature: Hub hydration and dehydration can be disabled per direction - `--hub-enable-hydration` (Lua: `hub.enablehydration`, default true) - `--hub-enable-dehydration` (Lua: `hub.enabledehydration`, default true) - Feature: Hub hydration accepts a configurable file exclude list via `HydrationOptions` `excludes` (array of wildcards). Built-in defaults skip transient runtime files (`.lock`, `.sentry-native/*`, `state_marker`, `*.bak`, `gc/reserve.gc`, `auth/*`) so they no longer participate in dehydrate scans. Override semantics: a present field replaces the default outright; explicit `[]` opts out of all defaults. - Improvement: Hub hydration completion logs now report per-request average and max latency, peak in-flight workers, queue wait, and hash-cache hit percentage; loose and pack-blob transfers are reported separately - Improvement: Hub hydration pre-creates unique parent directories before scheduling parallel writes - Improvement: S3 hydration retries transient HTTP failures (timeouts, 429 throttling, 5xx server errors, connection errors) up to 3 times via the HTTP client retry layer - Improvement: S3 hydration multipart chunk size is persisted in `state.cbo` per module so hydrate replays the partitioning used at dehydrate; default raised to 64 MiB (was 32 MiB) - Improvement: Hub hydration `Obliterate` retries backend delete once before falling back to local cleanup
* Zen-style trace log events (#1006)Stefan Boberg2026-04-222-27/+63
| | | | | | | | | | | | Replaces the old (not fully implemented) UE `Logging.*` sink with a typed `ZenLog.*` trace path that preserves structured fmt args end-to-end, so the zen trace analyzer (and future consumers) can re-render log messages with full formatter support. - Hook `Logger::Log` to tap `fmt::format_args` before `vformat` renders them, and emit three new events on a dedicated `ZenLogChannel`: `Category`, `MessageSpec`, `Message`. Args are serialized as `[count][descriptors][payload]` with distinct categories for bool, int, float, and string. Custom formatters fall back to a pre-rendered string. - Bool has its own wire category so `{}` renders as `true`/`false` and `{:d}` as `1`/`0`. - Zen `LogLevel` is translated to UE `ELogVerbosity` on emit so severity filtering works consistently. - Extend the zen trace analyzer to decode `ZenLog.*` via `fmt::vformat` + `dynamic_format_arg_store` — nested widths, chrono specs, etc. all work. Strings are passed as views directly from the event payload (which outlives the format call) rather than copied through a pool. - Retire the old `TraceSink` stub; the typed path supersedes it. - Switch `--trace=default` alias from `cpu,log` to `cpu,zenlog`. - Add `__int128` overloads to the arg encoder guarded by `FMT_USE_INT128` so fmt's int128 dispatch resolves unambiguously on clang/gcc. MSVC and clang-cl are unaffected.
* filesystem.h surface error codes (#998)Dan Engelbrecht2026-04-211-23/+19
| | | - Improvement: File copy, scan, clone, and move operations now report the underlying OS error in failure messages
* zen CLI security review fixes (#974)Stefan Boberg2026-04-211-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Security review follow-ups to the `zen` CLI. Each fix stands on its own commit. Grouped by category below. ## Credentials and secrets - **Per-install random auth encryption key instead of a hardcoded literal.** The default AES key and IV used to encrypt persisted OIDC refresh tokens / OAuth client secrets were ASCII literals compiled into the public source. Replaced with 32+16 random bytes persisted to `<system-root>/auth/machinekey.dat`. `SecureRandomBytes` added in zencore/crypto wrapping BCryptGenRandom / OpenSSL / mbedTLS CTR_DRBG. Partial override (only one of `--encryption-aes-key`/`--encryption-aes-iv`) is now rejected instead of silently using the hardcoded half. - **Wrap the machine key with OS-protected storage.** `machinekey.dat` is now a tagged format (4-byte magic + flags + wrapped-or-raw payload). Windows wraps via DPAPI (`CryptProtectData` at per-user scope) so a stolen disk copy cannot decrypt without the OS master key. macOS uses Keychain Services (GenericPassword under `org.unrealengine.zen.auth`, `kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly`). Linux uses libsecret (opt-in via `--zenlibsecret=yes`, off by default because headless servers typically have no Secret Service daemon). All platforms fall back to raw persistence with `0600` perms on POSIX when wrapping is unavailable. Legacy files from the prior commit are detected by size and still read. > Note: argv-redaction before Sentry on crash was previously part of this PR but was superseded by `ScrubSensitiveValues()` from #989; this PR now just calls that helper instead of walking argv itself. ## Path traversal - **Reject unsafe filenames from the remote oplog in `oplog-mirror`.** The filename from each oplog entry was joined to the mirror root without normalisation; a compromised remote could use drive letters, UNC shares, device path prefixes, absolute paths, or `..` components to write anywhere the zen user could write. An `UnsafeFileNameReason` check runs immediately after extraction, logs the offending filename, and aborts the mirror. - **Use the resolved absolute download-spec path in `builds download`.** `--download-spec-path` was computed into a sanitised absolute path, then the original unresolved path was passed to `ParseBuildManifest`, bypassing the `MakeSafeAbsolutePath` mitigations and reading from the process cwd rather than `--local-path`. ## Input validation - **Stop asserting on malformed `--build-id` / `--build-part-id`.** `Oid::FromHexString` asserts on bad input and `ZEN_ASSERT` is active in release, so a too-short or non-hex user value aborted the process instead of surfacing an `OptionParseException`. Routed all callers through `TryFromHexString`. Also fixes `ParseBuildPartId` reporting errors under the wrong option name. - **Check the JSON parse error in `oplog-export --builds-metadata-path`.** The single-arg `LoadCompactBinaryFromJson` overload discarded the parser error; malformed JSON shipped a truncated compact-binary `metadata` field to the server with no indication. Switched to the two-arg overload and throws a descriptive error naming the file and reason. - **Format the actual value in the malformed `--url` error.** The message was constructed with a literal `{}` placeholder and no `fmt::format` call, so users saw the placeholder instead of the offending URL. - **Require `--output-path` in `cache get` unless `--as-text` is set.** Previously an empty path auto-filled from the value key / attachment hash and wrote into the process cwd; the `--as-text && empty path` stdout branch was unreachable because the auto-fill ran first. - **Clear the cxxopts `allow_unrecognised_options` flag after permissive parse.** `ParseOptionsPermissive` set the flag on the Options it received and never cleared it, priming that Options for silent typo acceptance on any later reuse. Added `disallow_unrecognised_options()` to the vendored cxxopts (local patch — flagged at the declaration) and wrapped the toggle in RAII. ## Resource lifecycle - **Restore signal handlers via RAII.** `wipe`, `builds`, and `oplog-mirror` installed SIGINT/SIGBREAK handlers with raw `signal()` and never restored them; an option-parse throw left the handler targeting an abort flag nothing reads. Added `zen::ScopedSignalHandler` in zen.h and applied at all three sites (builds uses `std::optional` members so the guards survive past `OnParentOptionsParsed` into the subcommand's Run). - **Route SIGINT in `oplog-mirror` to the worker-pool abort flag.** The command declared a local `std::atomic<bool> AbortFlag` but no handler targeted it — Ctrl-C killed the process instead of cleanly aborting. Added a `MirrorAbortFlag` / `MirrorSignalCallbackHandler` pair in projectstore_impl and bound the local as a reference; existing `.store`/`.load`/capture sites unchanged. - **Clean up the `cache get` temp download on every exit path.** `Http.Download` parks the payload in the system temp dir; a failed `MoveToFile` (cross-volume, denied target) or an exception could leave the temp file behind. The downloaded buffer is already flagged delete-on-close by `HttpClient`, so the fix is just to clear that flag after a successful `MoveToFile` so the renamed-out file isn't reaped. ## Other - **Fix wrong URL fields in `oplog-export` / `oplog-import` builds-branch descriptions.** Two operator-facing "[builds] URL/namespace/bucket/buildsid" messages formatted `m_CloudUrl` instead of `m_BuildsUrl` / `m_BuildsHost` (copy-paste from neighbouring `[cloud]` branches), shown as empty or stale at the start of an export/import. - **Fix "Can't find oplog in project '{}'" formatting and a "Failed top mirror" typo in projectstore_cmd.** - **Fix a misleading `oplog-export` comment on the `--zen` scheme default** ("Assume https" vs. the `http://` the code writes). - **Fail `ScrambleDir` when `RemoveFile` doesn't delete.** The `zen builds test` scramble phase used `(void)RemoveFile(FilePath)`, discarding both the bool return and the error. A quiet delete failure let verification run against stale state; switched to the two-arg overload and throw on false return or non-empty `error_code`.
* Rename logging::ToStringView to ToString for consistency (#993)Stefan Boberg2026-04-202-5/+5
| | | | | | | - Renames `logging::ToStringView` → `ToString` and `ShortToStringView` → `ShortToString` for consistency with the rest of the codebase, where `ToString` is the convention for enum-to-string conversions (return type already communicates it's a view). - Updates all call sites in logbase, logging helpers, session log sink, admin service, and tcplogstreamsink. Split off from the `sb/zen-monitor` branch so the ZenServiceClient refactor PR stays focused.
* zen trace analysis support (#945)Stefan Boberg2026-04-202-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Integrates the **tourist** trace analysis library and builds a full `zen trace` command suite for working with Unreal Engine `.utrace` files. ### Trace analysis library (`thirdparty/tourist/`) - Adds the tourist library as a third-party dependency with three modules: **foundation** (platform primitives, memory, scheduling), **trace** (UE Trace protocol decoding), and **analysis** (event dispatching and analyzer framework). - Cross-platform support for Windows, Linux, and macOS. ### `zen trace` CLI commands (`src/zen/cmds/`, `src/zen/trace/`) - **`zen trace analyze`** — Summarize a `.utrace` file: session metadata, thread inventory, command line + build configuration, CPU profiling scopes, timing, event rates, log messages, and (with symbols) memory allocation metrics including live-allocs dumps, callstack-keyed aggregation, and allocation churn. Optional HTML output for memory reports. - **`zen trace inspect`** — Dump the event schema (declared types, fields, sizes) from a trace file. - **`zen trace trim`** — Extract a time-window from a trace into a new `.utrace` file. - **`zen trace serve`** — Launch a local HTTP server hosting an interactive trace viewer; opens in the default browser. ### Symbolication (`src/zen/trace/symbol_resolver.*`, `thirdparty/raw_pdb/`) - Pluggable resolver with multiple backends: `pdb` (in-tree raw_pdb), `dbghelp` (Windows), `llvm-symbolizer` (all platforms), `atos` (macOS). An `auto` backend picks the best available tool per platform. - Microsoft Symbol Server support: downloads PDBs on demand using a redirect-aware HTTP client. - Local PDB cache keyed by image GUID preserves symbols across binary recompilation. - Callstack trimming heuristic strips UE internal noise from reports. - Binary analysis cache (`.ucache_z`) avoids re-resolving the same trace. ### Interactive trace viewer (`src/zen/frontend/html/`, `src/zen/trace/trace_viewer_service.*`) - Timeline: scope-level detail, horizontal zoom/pan, vertical scrolling, viewport-driven loading with pre-computed LOD for responsive navigation of large traces. - Thread grouping (collapsible sidebar sections) synthesized from name suffixes, natural sort order, visual distinction between lane threads and OS threads. - Bookmark and region annotations; region categories with per-category toggles; bookmark marker toggle in the toolbar. - Filterable Logs tab showing captured `UE_LOG` output. - Stats tab with per-scope aggregate statistics. - Memory tab with interactive allocation analysis and an allocation size histogram. - CsvProfiler event parsing and chart UI. ### Other in-branch supporting changes - **Cross-platform browser launcher** (`browser_launcher.{h,cpp}`) used by `trace serve`. - **`ReciprocalU64`** fast 64-bit integer division (zencore/intmath) for trace analyzers. - **`parallelsort`** cross-platform parallel sort helper (zenutil). - Frontend zip build rule so the viewer's HTML assets are bundled into `zen.exe`. - `/Zo` flag for better optimized debug info on Windows release builds. - `trace-tests.cpp` in the `zen-test` harness (harness itself landed on main via #985).
* Add CompactString utility type (#990)Stefan Boberg2026-04-201-0/+64
| | | | | - Introduce `CompactString`: a move-only, heap-allocated, immutable string wrapper that stores its length in a prefix byte for cheap `Size()`/`ToView()` while keeping the object to a single pointer. - Swap the `ToString()` integer-formatting helpers in `zencore/string.cpp` to `std::to_chars`, which is ~5-10x faster and benefits every `IntNum` / `StringBuilder` / `CbJsonWriter` caller. - No in-tree users on `main` yet; the type is ready for callers that want owned-string storage with lower per-entry overhead than `std::string` (e.g. long-lived log buffers, session records).
* Use eastl::deque for queues with many small elements (#991)Stefan Boberg2026-04-201-2/+5
| | | | | | | | | | | Switch several deque-based queues from `std::deque` to `eastl::deque` to reduce per-element heap allocation overhead. MSVC's `std::deque` allocates one node per element for anything larger than ~16 bytes; `eastl::deque` groups 4, 8, or 32 elements per block depending on element size. Converted call sites: - `BlockingQueue` and `WorkerThreadPool` (generic — downstream callers benefit automatically) - Session log entry buffer (~10k-entry ring of large log records — 4 per block vs 1) - Job queue (`Ref<Job>` — 32 per block vs 2) - RPC recording request queue (large `QueuedRequest` struct — 4 per block vs 1) - StatsD client message queues (~32-byte buffers — 8 per block vs 1)
* zen history command (#987)Dan Engelbrecht2026-04-202-6/+30
| | | | | | | | | - Feature: Per-user invocation history for `zen` and `zenserver`; each startup appends a record to a JSONL file capped at the most recent 100 entries. Location: `%LOCALAPPDATA%\Epic\Zen\History\invocations.jsonl` on Windows, `~/.zen/History/invocations.jsonl` on POSIX - `zen history` opens an interactive picker; selecting a zen row re-runs it inline and forwards the exit code, selecting a zenserver row spawns it detached - `zen history --list` (`-l`) prints the table to stdout instead of showing the picker - `zen history --filter zen|zenserver` restricts the listing to one executable - `zen history --print` prints the reconstructed command line of the selected row instead of launching it - `--enable-execution-history` global option on both binaries (default `true`) to opt out per invocation - The history file is attached to Sentry crash reports (alongside the existing zenserver log)
* add --pid och --executable till zen down command (#988)Dan Engelbrecht2026-04-202-1/+10
|
* added support for trace regions (#984)Stefan Boberg2026-04-201-0/+49
| | | | | | | | | - Introduces a UE-trace Region primitive in `zencore/trace.{h,cpp}` for marking named, potentially long-running intervals of work that Unreal Insights render as banners in the timeline, separately from CPU scopes. - New API: - `uint64_t TraceBeginRegion(RegionName, Category={})` / `void TraceEndRegion(RegionId)` for manual begin/end pairs. - `ScopedTraceRegion` RAII helper plus `ZEN_TRACE_REGION(name)` / `ZEN_TRACE_REGION_CAT(name, category)` macros for scope-based use. - Emits the `Misc.RegionBeginWithId` / `Misc.RegionEndWithId` trace events (paired by a `GetHifreqTimerValue()`-derived id). - Full no-op fallback under `#if !ZEN_WITH_TRACE` so callers compile in all configurations. - Annotates `GcScheduler::CollectGarbage` with `ZEN_TRACE_REGION_CAT("GcScheduler::CollectGarbage", "gc")` as a first caller — makes GC passes visible as banners in Insights without relying on the existing `ZEN_TRACE_CPU` scope alone (which doesn't render as a region).
* zencore: CreateProc stdin pipes + BuildArgV quote stripping (#983)Stefan Boberg2026-04-201-0/+37
| | | | | | | | | | | | | | | | Two related improvements to `CreateProc`: ### 1. Stdin pipe support - Adds `StdinPipeHandles` + `CreateStdinPipe` alongside the existing `StdoutPipeHandles`, letting callers feed data into a child process's stdin. - Platform-agnostic RAII (Windows `HANDLE` pair / POSIX `pipe()` fd pair) with the same semantics as the stdout pipe: the inherited end goes to the child, the non-inherited end stays with the parent, destructor closes both. - `CreateProcOptions` gains a `StdinPipe*` field. - On Windows, `CreateProcNormal` is reworked so stdin/stdout redirection handles all combinations (stdin + stdout, each alone, neither) uniformly. POSIX already supported arbitrary fd redirection and just needed to honor the new option. - `zentest-appstub` gains a `-stdin_echo` mode that reads stdin to EOF and echoes it back (switching to binary mode on Windows so CRLF translation doesn't mangle bytes). - `zenserver-test` gets a `server.process` / `stdin_pipe.*` test group that exercises launching a child with a stdin pipe, writing, closing the write end, and reading back the echoed data. ### 2. Shell-style quote stripping in `BuildArgV` - Callers that build a single command-line string for `CreateProc` commonly wrap spacey paths in double quotes (e.g. `--tracefile="$path"`). The old `BuildArgV` only used quotes to suppress space-splitting and left the characters in the resulting argv element, so the spawned process saw literal `--tracefile="..."` and the value parser failed to open the quoted path. - `BuildArgV` now compacts in place, dropping quote chars as it goes, matching shell semantics for paired double quotes.
* zencore: promote ScopedEnvVar to a shared filesystem helper (#979)Stefan Boberg2026-04-201-0/+19
| | | | | - Moves the RAII `ScopedEnvVar` helper out of `hydration.cpp`'s anonymous test namespace and into `zencore/filesystem.{h,cpp}` next to `GetEnvVariable` so it can be reused by other subsystems. - Makes the class non-copyable/non-movable and moves its members to `private`.
* zenbase hardening (#971)Stefan Boberg2026-04-172-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A series of correctness and API hygiene fixes to the intrusive refcount primitives in `zenbase`, culminating in the removal of `RefPtr<T>` in favour of a single unified `Ref<T>` smart pointer. The changes are motivated by two pieces of latent UB sitting under every `Ref<T>` / `TRefCounted<T>` in the codebase, plus a handful of API footguns on the smart-pointer side (silent raw-pointer decay, missing converting moves, unconstrained conversions from unrelated types). ## Correctness fixes - **Strict-aliasing UB in atomic helpers** — `AtomicIncrement`/`Decrement`/`Add` took a `volatile uint32_t&` and reinterpret-cast it to `std::atomic<T>*`. The object was never constructed as a `std::atomic`, so the access was type-punning UB. Fixed by changing `m_RefCount` to `std::atomic<uint32_t>` directly in `RefCounted`, `TRefCounted<T>` and `IoBufferCore`. The helpers (and `zenbase/atomic.h`) are later removed entirely — the three callers now invoke `fetch_add`/`fetch_sub` directly. - **const_cast of non-mutable member** — `AddRef()` / `Release()` are `const` but mutated `m_RefCount` via `const_cast`. Since `m_RefCount` wasn't `mutable`, writing through the cast was UB for any `const`-qualified holder (e.g. a `static const` refcounted singleton). Fixed by marking `m_RefCount` `mutable` and dropping the `const_cast` in `AddRef`/`Release`. - **Public non-virtual `TRefCounted` destructor** — allowed `delete basePtr;` to slice past the CRTP `DeleteThis()` contract. Moved to `protected`. ## Memory-ordering cleanup - `AddRef` weakened from seq_cst to **relaxed** (a thread can only take a new reference via one it already holds; nothing needs to synchronize). - `Release` weakened from seq_cst to **acq_rel** (sufficient to order prior writes before the destructor, and make the decrement visible to observers). - Diagnostic `RefCount()` / `GetRefCount()` reads made **relaxed** and spelled out as explicit `.load()` — the returned value is stale the moment it's observed, so stronger ordering gives no guarantee. - No-op on x86 (`lock xadd` either way), but removes a full barrier on every `Ref<T>` copy on ARM64 (Apple silicon / Windows-on-ARM). ## `RefPtr` / `Ref` unification Before this branch, `RefPtr<T>` and `Ref<T>` were subtly different in ways that made the safer of the two (`Ref`) harder to use and the looser one (`RefPtr`) dangerous: - `RefPtr::operator T*()` was implicit — `delete refPtr;` compiled silently (double-delete), and the raw pointer could outlive the temporary `RefPtr` it was extracted from. Made `explicit`, then removed entirely once call sites were migrated to `.Get()`. - `RefPtr(T*)` was implicit while `RefPtr(RefPtr<Derived>&&)` was `explicit` — exactly the opposite of the safety intent. Reversed. - `RefPtr`'s converting move was unconstrained (any `RefPtr<U>` with an implicitly-convertible `U*` satisfied it, including `void*` and multiple-inheritance base offsets). Added a `DerivedFrom<U, T>` constraint matching `Ref<T>`. - `Ref<T>` was missing a converting move ctor / move-assignment from `Ref<Derived>` — upcasts of rvalues were going through `AddRef`+`Release` instead of a pointer steal. Added. - `Release()` and the non-move smart-pointer ops were not `noexcept`, despite being so in practice. Marked `noexcept` throughout. After all of the above, the two types were functionally identical. The final commit deletes `RefPtr` and rewrites the ~10 consumer files to use `Ref`.
* log cleanup (#969)Dan Engelbrecht2026-04-171-0/+5
| | | | - Improvement: New `ZEN_SCOPED_LOG(Expr)` macro routes `ZEN_INFO`/`ZEN_WARN`/`ZEN_DEBUG` in the enclosing block through the given logger expression instead of the default - Improvement: `BuildContainer`, `SaveOplog`, and `LoadOplogContext` now take a caller-provided `LoggerRef` so diagnostic messages route through the caller's logger
* Add reduce-allocs skill and string builder infrastructure (#937)Stefan Boberg2026-04-164-29/+58
| | | | | | | | Adds infrastructure for reducing short-lived heap allocations, to be applied across the codebase in follow-up PRs. - **`reduce-allocs` Claude Code skill** — reviews code for unnecessary heap allocations and suggests fixes using stack-friendly patterns (`ExtendableStringBuilder`, `eastl::fixed_vector`, `TRefCounted`, etc.) - **`TransparentStringHash`** (`zencore/hashutils.h`) — enables `std::string_view` lookups on `std::string`-keyed `unordered_map` without allocating a temporary string (C++20 heterogeneous lookup via `is_transparent`) - **`AppendPaddedInt()`** and **`AppendFill()`** on `StringBuilderBase` (`zencore/string.h`) — zero-padded integer formatting and repeated-character fills without going through `fmt::format` - **`StringBuilderAppender`** output iterator adapter — allows `fmt::format_to` to write directly into a `StringBuilderBase`
* fix utf characters in source code (#953)Dan Engelbrecht2026-04-135-18/+18
|
* Some minor polish from tourist branch (#949)Stefan Boberg2026-04-135-13/+55
| | | | | | | | | | - Replace per-type fmt::formatter specializations (StringBuilderBase, NiceBase) with a single generic formatter using a HasStringViewConversion concept - Add ThousandsNum for comma-separated integer formatting (e.g. "1,234,567") - Thread naming now accepts a sort hint for trace ordering - Fix main thread trace registration to use actual thread ID and sort first - Add ExpandEnvironmentVariables() for expanding %VAR% references in strings, with tests - Add ParseHexBytes() overload with expected byte count validation - Add Flag_BelowNormalPriority to CreateProcOptions (BELOW_NORMAL_PRIORITY_CLASS on Windows, setpriority on POSIX) - Add PrettyScroll progress bar mode that pins the status line to the bottom of the terminal using scroll regions, with signal handler cleanup for Ctrl+C/SIGTERM
* Logging and diagnostics improvements (#941)Stefan Boberg2026-04-135-57/+72
| | | | | | | | | | | | | | | | Core logging and system diagnostics improvements, extracted from the compute branch. ### Logging - **Elapsed timestamps**: Console log now shows elapsed time since launch `[HH:MM:SS.mmm]` instead of full date/time; file logging is unchanged - **Short level names**: 3-letter short level names (`trc`/`dbg`/`inf`/`wrn`/`err`/`crt`) used by both console and file formatters via `ShortToStringView()` - **Consistent field order**: Standardized to `[timestamp] [level] [logger]` across both console and file formatters - **Slim LogMessage/LogPoint**: Remove redundant fields from `LogMessage` (derive level/source from `LogPoint`), flatten `LogPoint` to inline filename/line fields, shrink `LogLevel` to `int8_t` with `static_assert(sizeof(LogPoint) <= 32)` - **Remove default member initializers** and static default `LogPoint` from `LogMessage` — all fields initialized by constructor - **LoggerRef string constructor**: Convenience constructor accepting a string directly - **Fix SendMessage macro collision**: Replace `thread.h` include in `logmsg.h` with a forward declaration of `GetCurrentThreadId()` to avoid pulling in `windows.h` transitively ### System Diagnostics - **Cache static system metrics**: Add `RefreshDynamicSystemMetrics()` that only queries values that change at runtime (available memory, uptime, swap). `SystemMetricsTracker` snapshots full `GetSystemMetrics()` once at construction and reuses cached topology/total memory on each `Query()`, avoiding repeated `GetLogicalProcessorInformationEx` traversal on Windows, `/proc/cpuinfo` parsing on Linux, and `sysctl` topology calls on macOS
* fix fork() issues on linux and MacOS (#910)Dan Engelbrecht2026-04-011-6/+8
| | | | | - Improvement: Hub child process spawning on macOS now uses `posix_spawn` in line with Apple recommendations - Bugfix: Hub child process spawning on Linux now uses `vfork` instead of `fork`, preventing ENOMEM failures on systems with strict memory overcommit (`vm.overcommit_memory=2`) - Bugfix: Fixed process group management on POSIX; child processes were not placed into the correct process group, breaking group-wide signal delivery
* Request validation and resilience improvements (#864)Stefan Boberg2026-03-302-5/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Security: Input validation & path safety - **Reject local file references by default** in package parsing — only allow when explicitly opted in by the service (`ParseFlags::kAllowLocalReferences`) and validated by an `ILocalRefPolicy` (fail-closed: no policy = rejected) - **`DataRootLocalRefPolicy`** restricts local ref paths to the server's data root via canonical path prefix matching - **Validate attachment hashes** in compute HTTP handlers — decompresses and re-hashes each attachment at ingestion time to reject tampered payloads - **Path traversal validation** for worker descriptions (`pathvalidation.h`) — rejects absolute paths, `..` components, Windows reserved device names, and invalid filename characters - **Harden CbPackage parsing** against corrupt inputs — overflow-safe attachment count, bounds checks on local ref offset/size, graceful failure instead of `ZEN_ASSERT` for untrusted data - **Harden legacy package parser** — reject zero-size binary fields, missing mappers, and optionally validate resolved attachment hashes - **Bounds check in `CbPackageReader::MarshalLocalChunkReference`** — detect when `MakeFromFile` silently clamps offset+size to file size ### Reliability: Lock consolidation & bug fixes - **Consolidate three action map locks into one** (`m_ActionMapLock`) — eliminates deadlock risk from multi-lock ordering, simplifies state transitions, and fixes a race where newly enqueued actions were briefly invisible to `GetActionResult`/`FindActionResult` - **Fix infinite loop in `BaseRunnerGroup::SubmitActions`** when actions exceed total runner capacity — cap round-robin at `TotalCapacity` and default unassigned results to "No capacity" - **Fix `MakeSafeAbsolutePathInPlace` for UNC paths** — `\server\share` now correctly becomes `\?\UNC\server\share` instead of `\?\server\share` - **Fix `max_retries=0`** — previously fell through to the default of 3; now correctly means "no retries" ### New: ManagedProcessRunner - Cross-platform process runner backed by `SubprocessManager` — uses async exit callbacks instead of polling, delegates CPU/memory metrics to the manager's built-in sampler - `ProcessGroup` (JobObject on Windows, process group on POSIX) for bulk cancellation on shutdown - `--managed` flag on `zen exec inproc` to select this runner - Refactored monitor thread lifecycle — `StartMonitorThread()` now called from derived constructors to avoid calling virtual functions from base constructor ### Process management - **Suppress crash dialogs** via `JOB_OBJECT_UILIMIT_ERRORMODE` + `SEM_NOGPFAULTERRORBOX` in both `WindowsProcessRunner` and `JobObject::Initialize` — prevents WER/Dr. Watson modal dialogs from blocking the monitor thread - **CREATE_SUSPENDED → AssignProcessToJobObject → ResumeThread** pattern in `WindowsProcessRunner` — ensures job object assignment before process execution - **Move stdout/stderr callbacks to `Spawn()` parameters** in `SubprocessManager` — prevents race where early output could be missed before callback installation - Consistent PID logging across all runner types ### Test infrastructure - **`zentest-appstub`**: Added `Fail` (configurable exit code) and `Crash` (abort / nullptr deref) test functions - **Compute integration tests**: exit code handling, auto-retry exhaustion, manual reschedule after failure, mixed success/failure queues, crash handling (abort + nullptr), crash auto-retry, immediate query visibility after enqueue - **Package format tests**: truncated header, bad magic, attachment count overflow, truncated data, local ref rejection/acceptance, policy enforcement (inside/outside root, traversal, no-policy fail-closed) - **Legacy package parser tests**: empty input, zero-size binary, hash resolution with/without mapper, hash mismatch detection - **UNC path tests** for `MakeSafeAbsolutePath` ### Misc - ANSI color helper macros (`ZEN_RED`, `ZEN_BRIGHT_WHITE`, etc.) and `ZEN_BOLD`/`ZEN_DIM`/etc. - Generic `fmt::formatter` for types with free `ToString` functions - Compute dashboard: truncated hash display with monospace font and hover for full value - Renamed `usonpackage_forcelink` → `cbpackage_forcelink` - Compute enabled by default in xmake config (releases still explicitly disable)
* Misc small fixes (#897)Stefan Boberg2026-03-271-0/+23
| | | | | | | | | | - **Eliminate `<regex>` usage** — Replaced `std::regex`-based URL parsing in `jupiterbuildstorage.cpp` with manual `string_view` parsing. Added `CXXOPTS_NO_REGEX` to disable regex in cxxopts. Includes comprehensive tests for the new URL parser. - **Add missing HTTP response codes** — Added `102`, `103`, `203`, `207`, `208`, `226`, `306`, `421`, `425`, `451` to the enum and reason string lookup. - **Add `ForceColor` support to zen CLI** — Plumbed the `ForceColor` logging option through to the zen client. - **Add `.clangd` config** — Strips MSVC-specific flags clangd can't handle and suppresses noisy clang-tidy checks. - **Generic `fmt::formatter` for `ToString`** — Concept-based formatter that auto-formats any type with a free `ToString()` function, removing the need for per-type specializations. - **Fix OpenSSL dependency** — Changed `zenhorde` to use `openssl3` package on Linux/macOS. - **Add `<cmath>` include** — Missing include in `hyperloglog.h`. - **GCC compile fix** — Moved `static constinit` variable inside lambda in `logging.cpp`.
* hub async provision/deprovision/hibernate/wake (#891)Dan Engelbrecht2026-03-241-0/+7
| | | | | - Improvement: Hub provision, deprovision, hibernate, and wake operations are now async. HTTP requests returns 202 Accepted while the operation completes in the background - Improvement: Hub returns 202 Accepted (instead of 409 Conflict) when the same async operation is already in progress for a module - Improvement: Hub returns 200 OK when a requested state transition is already satisfied
* Subprocess Manager (#889)Stefan Boberg2026-03-241-8/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a `SubprocessManager` for managing child processes with ASIO-integrated async exit detection, stdout/stderr pipe capture, and periodic metrics sampling. Also introduces `ProcessGroup` for OS-backed process grouping (Windows JobObjects / POSIX process groups). ### SubprocessManager - Async process exit detection using platform-native mechanisms (Windows `object_handle`, Linux `pidfd_open`, macOS `kqueue EVFILT_PROC`) — no polling - Stdout/stderr capture via async pipe readers with per-process or default callbacks - Periodic round-robin metrics sampling (CPU, memory) across managed processes - Spawn, adopt, remove, kill, and enumerate managed processes ### ProcessGroup - OS-level process grouping: Windows JobObject (kill-on-close guarantee), POSIX `setpgid` (bulk signal delivery) - Atomic group kill via `TerminateJobObject` (Windows) or `kill(-pgid, sig)` (POSIX) - Per-group aggregate metrics and enumeration ### ProcessHandle improvements - Added explicit constructors from `int` (pid) and `void*` (native handle) - Added move constructor and move assignment operator ### ProcessMetricsTracker - Cross-platform process metrics (CPU time, working set, page faults) via `QueryProcessMetrics()` - ASIO timer-driven periodic sampling with configurable interval and batch size - Aggregate metrics across tracked processes ### Other changes - Fixed `zentest-appstub` writing a spurious `Versions` file to cwd on every invocation
* Cross-platform process metrics support (#887)Stefan Boberg2026-03-232-21/+72
| | | | | | | - **Cross-platform `GetProcessMetrics`**: Implement Linux (`/proc/{pid}/stat`, `/proc/{pid}/statm`, `/proc/{pid}/status`) and macOS (`proc_pidinfo(PROC_PIDTASKINFO)`) support for CPU times and memory metrics. Fix Windows to populate the `MemoryBytes` field (was always 0). All platforms now set `MemoryBytes = WorkingSetSize`. - **`ProcessMetricsTracker`**: Experimental utility class (`zenutil`) that periodically samples resource usage for a set of tracked child processes. Supports both a dedicated background thread and an ASIO steady_timer mode. Computes delta-based CPU usage percentage across samples, with batched sampling (8 processes per tick) to limit per-cycle overhead. - **`ProcessHandle` documentation**: Add Doxygen comments to all public methods describing platform-specific behavior. - **Cleanup**: Remove unused `ZEN_RUN_TESTS` macro (inlined at its single call site in `zenserver/main.cpp`), remove dead `#if 0` thread-shutdown workaround block. - **Minor fixes**: Use `HttpClientAccessToken` constructor in hordeclient instead of setting private members directly. Log ASIO version at startup and include it in the server settings list.
* Dashboard refresh (logs, storage, network, object store, docs) (#835)Stefan Boberg2026-03-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Summary This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements. ### Sessions Service - `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle) - Storage server registers itself with its own local sessions service on startup - Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.) - Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All) - `--sessions-url` config option for remote session announcement - In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard ### Session Log Viewer - POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array - In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching - Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring - Auto-selects the server's own session on page load ### TCP Log Streaming - `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP - Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps - Gathered buffer writes to reduce syscall overhead when flushing batches - Tests covering basic delivery, multi-line splitting, drop detection, and sequencing ### New Dashboard Pages - **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session - **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`) - **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels - **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog ### Documentation Page - In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking - New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md` - Dev docs moved to `docs/dev/` ### Infrastructure & Bug Fixes - **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers - **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner - **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection - Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers - Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
* Logger simplification (#883)Stefan Boberg2026-03-233-13/+89
| | | | | | | | | | | - **`Logger` now holds a single `SinkPtr`** instead of a `std::vector<SinkPtr>`. The `SetSinks`/`AddSink` API is replaced with a single `SetSink`. This removes complexity from `Logger` itself and makes `Clone()` cheaper (no vector copy). - **New `BroadcastSink`** (`zencore/logging/broadcastsink.h`) acts as a thread-safe, shared indirection point that fans out to a dynamic list of child sinks. Adding or removing a child sink via `AddSink`/`RemoveSink` is immediately visible to every `Logger` that holds a reference to it — including cloned loggers — without requiring each logger to be updated individually. - **`GetDefaultBroadcastSink()`** (exposed from `zenutil/logging.h`) gives server-layer code access to the shared broadcast sink so it can register optional sinks (OTel, TCP log stream) after logging is initialized, without going through `Default()->AddSink()`. ### Motivation Previously, dynamically adding sinks post-initialization mutated the default logger's internal sink vector directly. This was fragile: cloned loggers (created before `AddSink` was called) would not pick up the new sinks. `BroadcastSink` fixes this by making the sink list a shared, mutable object that all loggers sharing the same broadcast instance observe uniformly.
* Upgrade mimalloc to v2.2.7 and log active memory allocator (#876)Stefan Boberg2026-03-225-25/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Upgrade mimalloc from v2.1.2 to v2.2.7. Note that mimalloc is no longer the default allocator so this only impacts users who somehow opt into mimalloc via `--malloc=mimalloc` or compile with different defaults - Add all available mimalloc versions (1.6.7–3.2.8) to the package definition for testing - Log the active memory allocator (with version where available) at server startup - Annotate vendored rpmalloc with its source commit and version ## Notable changes in mimalloc 2.1.2 → 2.2.7 - **Memory release fix** (2.2.4): fix case where OS memory was not always fully released - **Race condition fix** (2.2.6): fixed rare race condition and potential buffer overflow in debug statistics - **Windows arm64 support** (2.1.9) - **Guarded build** (2.1.9): new build mode that places OS guard pages behind objects to catch buffer overflows - **THP awareness** (2.2.6): auto-detects transparent huge pages and adjusts purge size to avoid fragmentation - **Faster TLS access on Windows** (2.2.6) - **Improved calloc and aligned allocation performance** (2.2.6) - **New diagnostic APIs** (2.2.2): `mi_options_print`, `mi_arenas_print`, `mi_stat_get` / `mi_stat_get_json` - **macOS**: use `MADV_FREE_REUSABLE` for better memory behavior (2.2.4) - **Build fixes**: Android, Xbox, musl, mingw, arm32, Debian 32-bit, non-BMI1 x64 systems ## Allocator logging Added `FMalloc::GetName()` pure virtual so the server logs which allocator is active at startup: ``` zenserver - memory allocator: mimalloc 2.2.7 ``` Allocator names include version where available: - `mimalloc 2.2.7` (runtime version via `mi_version()`) - `rpmalloc 1.5.0-dev.20250810` (ad-hoc version from vendored develop branch commit) - `ansi`, `stomp` (no version info available) ## Test plan - [x] Builds successfully on Windows (release) - [x] Verify server startup log shows allocator name - [x] Test with `--malloc=mimalloc` (default) and `--malloc=rpmalloc` - [x] Run test suites to check for regressions
* Interprocess pipe support (for stdout/stderr capture) (#866)Stefan Boberg2026-03-212-0/+39
| | | | | | | | | | | | | | | | | - **RAII pipe handles for child process stdout/stderr capture**: `StdoutPipeHandles` is now a proper RAII type with automatic cleanup, move semantics, and partial close support. This makes it safe to use pipes for capturing child process output without risking handle/fd leaks. - **Optional separate stderr pipe**: `CreateProcOptions` now accepts a `StderrPipe` field so callers can capture stdout and stderr independently. When null (default), stderr shares the stdout pipe as before. - **LogStreamListener with pluggable handler**: The TCP log stream listener accepts connections from remote processes and delivers parsed log lines through a `LogStreamHandler` interface, set dynamically via `SetHandler()`. This allows any client to receive log messages without depending on a specific console implementation. - **TcpLogStreamSink for zen::logging**: A logging sink that forwards log messages to a `LogStreamListener` over TCP, using the native `zen::logging::Sink` infrastructure with proper thread-safe synchronization. - **Reliable child process exit codes on Linux**: `waitpid` result handling is fixed so `ProcessHandle::GetExitCode()` returns the real exit code. `ProcessHandle::Reset()` reaps zombies directly, replacing the global `IgnoreChildSignals()` which prevented exit code collection entirely. Also fixes a TOCTOU race in `ProcessHandle::Wait()` on Linux/Mac. - **Pipe capture test suite**: Tests covering stdout/stderr capture via pipes (both shared and separate modes), RAII cleanup, move semantics, and exit code propagation using `zentest-appstub` as the child process. - **Service command integration tests**: Shell-based integration tests for `zen service` covering the full lifecycle (install, status, start, stop, uninstall) on all three platforms — Linux (systemd), macOS (launchd), and Windows (SCM via PowerShell). - **Test script reorganization**: Platform-specific test scripts moved from `scripts/test_scripts/` into `scripts/test_linux/`, `test_mac/`, and `test_windows/`.
* add hub instance info (#869)Dan Engelbrecht2026-03-202-1/+3
| | | | | | | - Improvement: Hub module listing now includes per-instance process metrics (memory, CPU time, working set, pagefile usage) - Improvement: Hub now monitors provisioned instance health in the background and refreshes process metrics periodically - Improvement: Hub no longer exposes raw `StorageServerInstance` pointers to callers; instance state is returned as value snapshots (`Hub::InstanceInfo`) - Improvement: Hub instance access is now guarded by RAII per-instance locks (`SharedLockedPtr`/`ExclusiveLockedPtr`), preventing concurrent modifications during provisioning and deprovisioning - Improvement: Hub instance lifecycle is now tracked as a `HubInstanceState` enum covering transitional states (Provisioning, Deprovisioning, Hibernating, Waking); exposed as a string in the HTTP API and dashboard
* Add lightweight crash handler for pre-Sentry startup backtraces (#853)Stefan Boberg2026-03-181-0/+19
| | | | | | | | | | - Install a crash handler at the very top of main() in both zenserver and zen - On Windows, uses SetUnhandledExceptionFilter with StackWalk64 for accurate crash-site backtraces with DbgHelp symbol resolution - On Linux/Mac, uses sigaction with async-signal-safe backtrace output - Automatically superseded when Sentry/crashpad installs its own handlers - Stays active for the full process lifetime if Sentry is disabled or absent - Include .sym debug symbol files in Linux release bundle
* Pre-initialization of default logger (#859)Stefan Boberg2026-03-182-1/+2
| | | Improved workaround for troubles with code potentially logging before logging is initialized. Any logging will be routed to a default console logger until logging is initialized fully
* Add natvis for Compact Binary (#860)Devin Doucette2026-03-181-0/+94
| | | | | | | Add natvis for Compact Binary Includes natvis for DateTime, TimeSpan, IoHash, Guid, Oid. Based on UE CL 51830581.
* Compute batching (#849)Stefan Boberg2026-03-186-6/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Compute Batch Submission - Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads - Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure - Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps` ### Retracted Action State - Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount` - Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession` - Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints - Add state machine documentation and per-state comments to `RunnerAction` ### Compute Race Fixes - Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely - Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK ### Logging Optimization and ANSI improvements - Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex` - Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage` - Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h` - Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences - Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files ### CLI / Exec Refactoring - Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct - Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`) - Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser ### Testing Improvements - Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter - Add test suite banners to test listener output - Made `function.session.abandon_pending` test more robust ### Startup / Reliability Fixes - Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()` - Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing - Fail on unrecognized zenserver `--mode` instead of silently defaulting to store ### Other - Show host details (hostname, platform, CPU count, memory) when discovering new compute workers - Move frontend `html.zip` from source tree into build directory - Add format specifications for Compact Binary and Compressed Buffer wire formats - Add `WriteCompactBinaryObject` to zencore - Extended `ConsoleTui` with additional functionality - Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support - Disable compute/horde/nomad in release builds (not yet production-ready) - Disable unintended `ASIO_HAS_IO_URING` enablement - Fix crashpad patch missing leading whitespace - Clean up code triggering gcc false positives
* add sanitizer options to xmake (#847)v5.7.23-pre1v5.7.23-pre0Dan Engelbrecht2026-03-171-2/+9
| | | | | | - Improvement: Add easy access options for sanitizers with `xmake config` and `xmake test` as options - `--msan=[y|n]` Enable MemorySanitizer (Linux only, requires all deps instrumented) - `--asan=[y|n]` Enable AddressSanitizer (disables mimalloc and sentry) - `--tsan=[y|n]` Enable ThreadSanitizer (Linux/Mac only)
* Enable cross compilation of Windows targets on Linux (#839)Stefan Boberg2026-03-161-1/+2
| | | | | | | This PR makes it *possible* to do a Windows build on Linux via `clang-cl`. It doesn't actually change any build process. No policy change, just mechanics and some code fixes to clear clang compilation. The code fixes are mainly related to #include file name casing, to match the on-disk casing of the SDK files (via xwin).
* URI decoding, process env, compiler info, httpasio strands, regex route ↵Stefan Boberg2026-03-162-0/+10
| | | | | | | | | | | | | | | | | removal (#841) - Percent-decode URIs in ASIO HTTP server to match http.sys CookedUrl behavior, ensuring consistent decoded paths across backends - Add Environment field to CreateProcOptions for passing extra env vars to child processes (Windows: merged into Unicode environment block; Unix: setenv in fork) - Add GetCompilerName() and include it in build options startup logging - Suppress Windows CRT error dialogs in test harness for headless/CI runs - Fix mimalloc package: pass CMAKE_BUILD_TYPE, skip cfuncs test for cross-compile - Add virtual destructor to SentryAssertImpl to fix debug-mode warning - Simplify object store path handling now that URIs arrive pre-decoded - Add URI decoding test coverage for percent-encoded paths and query params - Simplify httpasio request handling by using strands (guarantees no parallel handlers per connection) - Removed deprecated regex-based route matching support - Fix full GC never triggering after cross-toolchain builds: The `gc_state` file stores `system_clock` ticks, but the tick resolution differs between toolchains (nanoseconds on GCC/standard clang, microseconds on UE clang). A nanosecond timestamp misinterpreted as microseconds appears far in the future (~year 58,000), bypassing the staleness check and preventing time-based full GC from ever running. Fixed by also resetting when the stored timestamp is in the future. - Clamp GC countdown display to configured interval: Prevents nonsensical log output (e.g. "Full GC in 492128002h") caused by the above or any other clock anomaly. The clamp applies to both the scheduler log and the status API.
* block/file cloning support for macOS / Linux (#786)Stefan Boberg2026-03-161-0/+8
| | | | | | | | - Add block cloning (copy-on-write) support for Linux and macOS to complement the existing Windows (ReFS) implementation - **Linux**: `TryCloneFile` via `FICLONE` ioctl, `CloneQueryInterface` with range cloning via `FICLONERANGE` (Btrfs/XFS) - **macOS**: `TryCloneFile` via `clonefile()` syscall (APFS), `SupportsBlockRefCounting` via `VOL_CAP_INT_CLONE`. `CloneQueryInterface` is not implemented as macOS lacks a sub-file range clone API - Promote `ScopedFd` to file scope for broader use in filesystem code - Add test scripts for block cloning validation on Linux (Btrfs via loopback) and macOS (APFS) - Also added test script for testing on Windows (ReFS)
* Made CPR optional, html generated at build time (#840)Stefan Boberg2026-03-131-1/+1
| | | | | | | - Fix potential crash on startup caused by logging macros being invoked before the logging system is initialized (null logger dereference in `ZenServerState::Sweep()`). `LoggerRef::ShouldLog` now guards against a null logger pointer. - Make CPR an optional dependency (`--zencpr` build option, enabled by default) so builds can proceed without it - Make zenvfs Windows-only (platform-specific target) - Generate the frontend zip at build time from source HTML files instead of checking in a binary blob which would accumulate with every single update
* Add clang-cl build supportStefan Boberg2026-03-131-1/+1
| | | | | | | | | | - Add clang-cl warning suppressions in xmake.lua matching Linux/macOS set - Guard /experimental:c11atomics with {tools="cl"} for MSVC-only - Fix long long / int64_t redefinition in string.h for clang-cl - Fix unclosed namespace in callstacktrace.cpp #else branch - Fix missing override in httpplugin.cpp - Reorder WorkerPool fields to match designated initializer order - Use INVALID_SOCKET instead of SOCKET_ERROR for SOCKET comparisons
* Switch httpclient default back-end over to libcurl (#832)Stefan Boberg2026-03-131-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Switches the default HTTP client to the libcurl-based backend and follows up with a series of correctness fixes and code quality improvements to `CurlHttpClient`. **Backend switch & build fixes:** - Switch default HTTP client to libcurl-based backend - Suppress `[[nodiscard]]` warning when building fmt - Miscellaneous bugfixes in HttpClient/libcurl - Pass `-y` to `xmake config` in `xmake test` task **Boilerplate reduction:** - Add `Session::SetHeaders()` for RAII ownership of `curl_slist`, eliminating manual `curl_slist_free_all` calls from every verb method - Add `Session::PerformWithResponseCallbacks()` to absorb the repeated 12-line write+header callback setup block - Extract `ParseHeaderLine()` shared helper, replacing 4 duplicate header-parsing implementations - Extract `BuildHeaderMap()` and `ApplyContentTypeFromHeaders()` helpers to deduplicate header-to-map conversion and Content-Type scanning - Unify the two `DoWithRetry` overloads (PayloadFile variant now delegates to the Validate variant) **Correctness fixes:** - `TransactPackage`: both phases now use `PerformWithResponseCallbacks()`, fixing missing abort support and a dead header collection loop - `TransactPackage`: error path now routes through `CommonResponse`, preserving curl error codes and messages for the caller - `ValidatePayload`: merged 3 separate header-scan loops into a single pass **Performance improvements:** - Replace `fmt::format` with `ExtendableStringBuilder` in `BuildHeaderList` and `BuildUrlWithParameters`, eliminating heap allocations in the common case - Replace `curl_easy_escape`/`curl_free` with inline URL percent-encoding using `AsciiSet` - Remove wasteful `CommonResponse(...)` construction in retry logging, formatting directly from `CurlResult` fields
* Transparent proxy mode (#823)Stefan Boberg2026-03-124-21/+66
| | | | | | | | | | | | | | | | | Adds a **transparent TCP proxy mode** to zenserver (activated via `zenserver proxy`), allowing it to sit between clients and upstream Zen servers to inspect and monitor HTTP/1.x traffic in real time. Primarily useful during development, to be able to observe multi-server/client interactions in one place. - **Dedicated proxy port** -- Proxy mode defaults to port 8118 with its own data directory to avoid collisions with a normal zenserver instance. - **TCP proxy core** (`src/zenserver/proxy/`) -- A new transparent TCP proxy that forwards connections to upstream targets, with support for both TCP/IP and Unix socket listeners. Multi-threaded I/O for connection handling. Supports Unix domain sockets for both upstream/downstream. - **HTTP traffic inspection** -- Parses HTTP/1.x request/response streams inline to extract method, path, status, content length, and WebSocket upgrades without breaking the proxied data. - **Proxy dashboard** -- A web UI showing live connection stats, per-target request counts, active connections, bytes transferred, and client IP/session ID rollups. - **Server mode display** -- Dashboard banner now shows the running server mode (Zen Proxy, Zen Compute, etc.). Supporting changes included in this branch: - **Wildcard log level matching** -- Log levels can now be set per-category using wildcard patterns (e.g. `proxy.*=debug`). - **`zen down --all`** -- New flag to shut down all running zenserver instances; also used by the new `xmake kill` task. - Minor test stability fixes (flaky hash collisions, per-thread RNG seeds). - Support ZEN_MALLOC environment variable for default allocator selection and switch default to rpmalloc - Fixed sentry-native build to allow LTO on Windows
* improved oplog import progress reporting (#825)Dan Engelbrecht2026-03-111-1/+6
|
* Dashboard overhaul, compute integration (#814)Stefan Boberg2026-03-093-0/+7
| | | | | | | | | | - **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS. - **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking. - **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration. - **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP. - **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch. - **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support. Also contains some fixes from TSAN analysis.
* added auto-detection logic for console colour output (#817)Stefan Boberg2026-03-091-1/+8
| | | | | | | | | | | | | | | | | | | Add auto-detection of colour support to `AnsicolourStdoutSink`. **New `colorMode` enum** (`On`, `Off`, `Auto`) added to the header, accepted by the `AnsicolorStdoutSink` constructor. Defaults to `Auto`, so all existing call sites are unaffected. **`Auto` mode detection logic** (in `IscolourTerminal()`): 1. **TTY check** -- if stdout is not a terminal, colour is disabled. 2. **`NO_COLOR`** -- respects the no-colour.org convention. If set, colour is disabled. 3. **`COLORTERM`** -- if set (e.g. `truecolour`, `24bit`), colour is enabled. 4. **`TERM`** -- rejects `dumb`; accepts known colour-capable terminals via substring match: `alacritty`, `ansi`, `colour`, `console`, `cygwin`, `gnome`, `konsole`, `kterm`, `linux`, `msys`, `putty`, `rxvt`, `screen`, `tmux`, `vt100`, `vt102`, `xterm`. Substring matching covers variants like `xterm-256color` and `rxvt-unicode`. 5. **Fallback** -- Windows defaults to colour enabled (modern console supports ANSI natively); other platforms default to disabled. When colour is disabled, ANSI escape sequences are omitted entirely from the output. NOTE: this doesn't currently apply to all paths which do logging in zen as they may be determining their colour output mode separately from `AnsicolorStdoutSink`.