aboutsummaryrefslogtreecommitdiff
path: root/src/zenutil
Commit message (Collapse)AuthorAgeFilesLines
* s3 and consul fixes (#916)Dan Engelbrecht2 days4-6/+279
| | | | | | | | | | | * fix endpoint for stats/hub in compute/hub.html page * fix api token call failure for imds (using wrong overload for Put) * add "localhost" to healt check url in consul when no address is given * add consul fallback deregister if normal deregister fails * add consul registration unit test
* fix fork() issues on linux and MacOS (#910)Dan Engelbrecht3 days3-3/+7
| | | | | - Improvement: Hub child process spawning on macOS now uses `posix_spawn` in line with Apple recommendations - Bugfix: Hub child process spawning on Linux now uses `vfork` instead of `fork`, preventing ENOMEM failures on systems with strict memory overcommit (`vm.overcommit_memory=2`) - Bugfix: Fixed process group management on POSIX; child processes were not placed into the correct process group, breaking group-wide signal delivery
* consul env token refresh (#912)Dan Engelbrecht3 days2-6/+23
| | | - Improvement: Consul token is now re-read from the environment variable on every request, allowing token rotation without restarting the service
* Request validation and resilience improvements (#864)Stefan Boberg5 days3-35/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Security: Input validation & path safety - **Reject local file references by default** in package parsing — only allow when explicitly opted in by the service (`ParseFlags::kAllowLocalReferences`) and validated by an `ILocalRefPolicy` (fail-closed: no policy = rejected) - **`DataRootLocalRefPolicy`** restricts local ref paths to the server's data root via canonical path prefix matching - **Validate attachment hashes** in compute HTTP handlers — decompresses and re-hashes each attachment at ingestion time to reject tampered payloads - **Path traversal validation** for worker descriptions (`pathvalidation.h`) — rejects absolute paths, `..` components, Windows reserved device names, and invalid filename characters - **Harden CbPackage parsing** against corrupt inputs — overflow-safe attachment count, bounds checks on local ref offset/size, graceful failure instead of `ZEN_ASSERT` for untrusted data - **Harden legacy package parser** — reject zero-size binary fields, missing mappers, and optionally validate resolved attachment hashes - **Bounds check in `CbPackageReader::MarshalLocalChunkReference`** — detect when `MakeFromFile` silently clamps offset+size to file size ### Reliability: Lock consolidation & bug fixes - **Consolidate three action map locks into one** (`m_ActionMapLock`) — eliminates deadlock risk from multi-lock ordering, simplifies state transitions, and fixes a race where newly enqueued actions were briefly invisible to `GetActionResult`/`FindActionResult` - **Fix infinite loop in `BaseRunnerGroup::SubmitActions`** when actions exceed total runner capacity — cap round-robin at `TotalCapacity` and default unassigned results to "No capacity" - **Fix `MakeSafeAbsolutePathInPlace` for UNC paths** — `\server\share` now correctly becomes `\?\UNC\server\share` instead of `\?\server\share` - **Fix `max_retries=0`** — previously fell through to the default of 3; now correctly means "no retries" ### New: ManagedProcessRunner - Cross-platform process runner backed by `SubprocessManager` — uses async exit callbacks instead of polling, delegates CPU/memory metrics to the manager's built-in sampler - `ProcessGroup` (JobObject on Windows, process group on POSIX) for bulk cancellation on shutdown - `--managed` flag on `zen exec inproc` to select this runner - Refactored monitor thread lifecycle — `StartMonitorThread()` now called from derived constructors to avoid calling virtual functions from base constructor ### Process management - **Suppress crash dialogs** via `JOB_OBJECT_UILIMIT_ERRORMODE` + `SEM_NOGPFAULTERRORBOX` in both `WindowsProcessRunner` and `JobObject::Initialize` — prevents WER/Dr. Watson modal dialogs from blocking the monitor thread - **CREATE_SUSPENDED → AssignProcessToJobObject → ResumeThread** pattern in `WindowsProcessRunner` — ensures job object assignment before process execution - **Move stdout/stderr callbacks to `Spawn()` parameters** in `SubprocessManager` — prevents race where early output could be missed before callback installation - Consistent PID logging across all runner types ### Test infrastructure - **`zentest-appstub`**: Added `Fail` (configurable exit code) and `Crash` (abort / nullptr deref) test functions - **Compute integration tests**: exit code handling, auto-retry exhaustion, manual reschedule after failure, mixed success/failure queues, crash handling (abort + nullptr), crash auto-retry, immediate query visibility after enqueue - **Package format tests**: truncated header, bad magic, attachment count overflow, truncated data, local ref rejection/acceptance, policy enforcement (inside/outside root, traversal, no-policy fail-closed) - **Legacy package parser tests**: empty input, zero-size binary, hash resolution with/without mapper, hash mismatch detection - **UNC path tests** for `MakeSafeAbsolutePath` ### Misc - ANSI color helper macros (`ZEN_RED`, `ZEN_BRIGHT_WHITE`, etc.) and `ZEN_BOLD`/`ZEN_DIM`/etc. - Generic `fmt::formatter` for types with free `ToString` functions - Compute dashboard: truncated hash display with monospace font and hover for full value - Renamed `usonpackage_forcelink` → `cbpackage_forcelink` - Compute enabled by default in xmake config (releases still explicitly disable)
* hub s3 hydrate improvements (#902)Dan Engelbrecht5 days2-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | - Feature: Added `--hub-hydration-target-config` option to specify the hydration target via a JSON config file (mutually exclusive with `--hub-hydration-target-spec`); supports `file` and `s3` types with structured settings ```json { "type": "file", "settings": { "path": "/path/to/hydration/storage" } } ``` ```json { "type": "s3", "settings": { "uri": "s3://bucket[/prefix]", "region": "us-east-1", "endpoint": "http://localhost:9000", "path-style": true } } ``` - Improvement: Hub hydration dehydration skips the `.sentry-native` directory - Bugfix: Fixed `MakeSafeAbsolutePathInPlace` when a UNC prefix is present but path uses mixed delimiters
* hub resource limits (#900)Dan Engelbrecht5 days1-0/+1
| | | | | | | | | | | | - Feature: Hub dashboard now shows a Resources tile with disk and memory usage against configured limits - Feature: Hub module listing now shows state-change timestamps and duration for each instance - Improvement: Hub provisioning rejects new instances when disk or memory usage exceeds configurable thresholds; limits are disabled by default (0 = no limit) - `--hub-provision-disk-limit-bytes` - Reject provisioning when used disk exceeds this many bytes - `--hub-provision-disk-limit-percent` - Reject provisioning when used disk exceeds this percentage of total disk - `--hub-provision-memory-limit-bytes` - Reject provisioning when used memory exceeds this many bytes - `--hub-provision-memory-limit-percent` - Reject provisioning when used memory exceeds this percentage of total RAM - Improvement: Hub process metrics are now tracked atomically per active instance slot, eliminating per-query process handle lookups - Improvement: Hub, Build Store, and Workspaces service stats sections in the dashboard are now collapsible - Bugfix: Hub watchdog loop did not check `m_ShutdownFlag`, causing it to spin indefinitely on shutdown
* reuse single MinIO instance across s3client integration test (#901)Stefan Boberg5 days1-11/+9
| | | Replace doctest SUBCASEs with sequential scoped blocks so the MinIO server is spawned once and torn down via RAII at scope exit, instead of being restarted for every subcase re-entry. Fixes flaky CI on macOS caused by repeated MinIO process start/stop.
* remove CPR HTTP client backend (#894)Stefan Boberg8 days1-0/+4
| | | CPR is no longer needed now that HttpClient has fully transitioned to raw libcurl. This removes the CPR library, its build integration, implementation files, and all conditional compilation guards, leaving curl as the sole HTTP client backend.
* hub instance state refactor (#892)Dan Engelbrecht8 days2-8/+33
| | | | | | - Improvement: Provisioning a hibernated instance now automatically wakes it instead of requiring an explicit wake call first - Improvement: Deprovisioning now accepts instances in Crashed or Hibernated states, not just Provisioned - Improvement: Added `--consul-health-interval-seconds` and `--consul-deregister-after-seconds` options to control Consul health check behavior (defaults: 10s and 30s) - Improvement: Consul registration now occurs when provisioning starts; health check intervals are applied once provisioning completes
* Subprocess Manager (#889)Stefan Boberg11 days11-503/+2780
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a `SubprocessManager` for managing child processes with ASIO-integrated async exit detection, stdout/stderr pipe capture, and periodic metrics sampling. Also introduces `ProcessGroup` for OS-backed process grouping (Windows JobObjects / POSIX process groups). ### SubprocessManager - Async process exit detection using platform-native mechanisms (Windows `object_handle`, Linux `pidfd_open`, macOS `kqueue EVFILT_PROC`) — no polling - Stdout/stderr capture via async pipe readers with per-process or default callbacks - Periodic round-robin metrics sampling (CPU, memory) across managed processes - Spawn, adopt, remove, kill, and enumerate managed processes ### ProcessGroup - OS-level process grouping: Windows JobObject (kill-on-close guarantee), POSIX `setpgid` (bulk signal delivery) - Atomic group kill via `TerminateJobObject` (Windows) or `kill(-pgid, sig)` (POSIX) - Per-group aggregate metrics and enumeration ### ProcessHandle improvements - Added explicit constructors from `int` (pid) and `void*` (native handle) - Added move constructor and move assignment operator ### ProcessMetricsTracker - Cross-platform process metrics (CPU time, working set, page faults) via `QueryProcessMetrics()` - ASIO timer-driven periodic sampling with configurable interval and batch size - Aggregate metrics across tracked processes ### Other changes - Fixed `zentest-appstub` writing a spurious `Versions` file to cwd on every invocation
* Cross-platform process metrics support (#887)Stefan Boberg12 days3-0/+499
| | | | | | | - **Cross-platform `GetProcessMetrics`**: Implement Linux (`/proc/{pid}/stat`, `/proc/{pid}/statm`, `/proc/{pid}/status`) and macOS (`proc_pidinfo(PROC_PIDTASKINFO)`) support for CPU times and memory metrics. Fix Windows to populate the `MemoryBytes` field (was always 0). All platforms now set `MemoryBytes = WorkingSetSize`. - **`ProcessMetricsTracker`**: Experimental utility class (`zenutil`) that periodically samples resource usage for a set of tracked child processes. Supports both a dedicated background thread and an ASIO steady_timer mode. Computes delta-based CPU usage percentage across samples, with batched sampling (8 processes per tick) to limit per-cycle overhead. - **`ProcessHandle` documentation**: Add Doxygen comments to all public methods describing platform-specific behavior. - **Cleanup**: Remove unused `ZEN_RUN_TESTS` macro (inlined at its single call site in `zenserver/main.cpp`), remove dead `#if 0` thread-shutdown workaround block. - **Minor fixes**: Use `HttpClientAccessToken` constructor in hordeclient instead of setting private members directly. Log ASIO version at startup and include it in the server settings list.
* Dashboard refresh (logs, storage, network, object store, docs) (#835)Stefan Boberg12 days6-97/+676
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Summary This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements. ### Sessions Service - `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle) - Storage server registers itself with its own local sessions service on startup - Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.) - Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All) - `--sessions-url` config option for remote session announcement - In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard ### Session Log Viewer - POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array - In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching - Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring - Auto-selects the server's own session on page load ### TCP Log Streaming - `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP - Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps - Gathered buffer writes to reduce syscall overhead when flushing batches - Tests covering basic delivery, multi-line splitting, drop detection, and sequencing ### New Dashboard Pages - **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session - **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`) - **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels - **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog ### Documentation Page - In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking - New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md` - Dev docs moved to `docs/dev/` ### Infrastructure & Bug Fixes - **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers - **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner - **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection - Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers - Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
* add hub instance crash recovery (#885)Dan Engelbrecht12 days2-0/+11
| | | * add hub instance crash recovery
* Logger simplification (#883)Stefan Boberg12 days2-14/+31
| | | | | | | | | | | - **`Logger` now holds a single `SinkPtr`** instead of a `std::vector<SinkPtr>`. The `SetSinks`/`AddSink` API is replaced with a single `SetSink`. This removes complexity from `Logger` itself and makes `Clone()` cheaper (no vector copy). - **New `BroadcastSink`** (`zencore/logging/broadcastsink.h`) acts as a thread-safe, shared indirection point that fans out to a dynamic list of child sinks. Adding or removing a child sink via `AddSink`/`RemoveSink` is immediately visible to every `Logger` that holds a reference to it — including cloned loggers — without requiring each logger to be updated individually. - **`GetDefaultBroadcastSink()`** (exposed from `zenutil/logging.h`) gives server-layer code access to the shared broadcast sink so it can register optional sinks (OTel, TCP log stream) after logging is initialized, without going through `Default()->AddSink()`. ### Motivation Previously, dynamically adding sinks post-initialization mutated the default logger's internal sink vector directly. This was fragile: cloned loggers (created before `AddSink` was called) would not pick up the new sinks. `BroadcastSink` fixes this by making the sink list a shared, mutable object that all loggers sharing the same broadcast instance observe uniformly.
* Process management improvements (#881)Stefan Boberg12 days1-11/+24
| | | | | | | | | | | This PR improves process lifecycle handling and resilience across several areas: - **Reclaim stale shared-memory entries instead of exiting** (`zenserver.cpp`): When a zenserver instance fails to attach as a sponsor to an existing process (e.g. because the PID was reused by an unrelated process), the server now clears the stale shared-memory entry and proceeds with normal startup instead of calling `std::exit(1)`. - **Wait for child process exit in `Kill()` and `Terminate()` on Unix** (`process.cpp`): After sending `SIGTERM` in `Kill()`, the code now waits up to 5s for graceful shutdown (escalating to `SIGKILL` on timeout), matching the Windows behavior. `Terminate()` also waits after `SIGKILL` so the child is properly reaped and doesn't linger as a zombie clogging up the process table. - **Fix sysctl buffer race in macOS `FindProcess`** (`process.cpp`): The macOS process enumeration now retries the `sysctl` call (up to 3 attempts with 25% buffer padding) to handle the race where the process list changes between the sizing call and the data-fetching call. Also flattens the nesting and fixes the guard/free scoping. - **Terminate stale processes before integration tests** (`zenserver-test.cpp`, `test.lua`): The integration test runner now accepts a `--kill-stale-processes` flag (passed automatically by `test.lua`) that scans for and terminates any leftover `zenserver`, `zenserver-test`, and `zentest-appstub` processes from previous test runs, logging the executable name and PID of each. This addresses flaky test failures caused by stale processes from prior runs holding ports or other resources.
* S3 hydration backend for hub mode (#873)Dan Engelbrecht13 days2-34/+107
| | | | | | - Feature: Added S3 hydration backend for hub mode (`--hub-hydration-target-spec s3://<bucket>[/<prefix>]`) - Credentials resolved from `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars, falling back to EC2 instance profile via IMDS - Each dehydration uploads to a new timestamped folder and commits a `current-state.json` pointer on success, so a failed upload never invalidates the previous state - Hydration downloads to a temp directory first and only replaces the server state on full success; failures leave the existing state intact
* zen hub command (#877)Dan Engelbrecht14 days2-0/+161
| | | | | | | | | | | | | | | - Feature: Added `zen hub` command for managing a hub server and its provisioned module instances: - `zen hub up` - Start a hub server (equivalent to `zen up` in hub mode) - `zen hub down` - Shut down a hub server - `zen hub provision <moduleid>` - Provision a storage server instance for a module - `zen hub deprovision <moduleid>` - Deprovision a storage server instance - `zen hub hibernate <moduleid>` - Hibernate a provisioned instance (shut down, data preserved) - `zen hub wake <moduleid>` - Wake a hibernated instance - `zen hub status [moduleid]` - Show state of all instances or a specific module - Feature: Added new hub HTTP endpoints for instance lifecycle management: - `POST /hub/modules/{moduleid}/hibernate` - Hibernate the instance for the given module - `POST /hub/modules/{moduleid}/wake` - Wake a hibernated instance for the given module - Improvement: `zen up` refactored to use shared `StartupZenServer`/`ShutdownZenServer` helpers (also used by `zen hub up`/`zen hub down`) - Bugfix: Fixed shutdown event not being cleared after the server process exits in `ZenServerInstance::Shutdown()`, which could cause stale state on reuse
* Interprocess pipe support (for stdout/stderr capture) (#866)Stefan Boberg14 days6-3/+553
| | | | | | | | | | | | | | | | | - **RAII pipe handles for child process stdout/stderr capture**: `StdoutPipeHandles` is now a proper RAII type with automatic cleanup, move semantics, and partial close support. This makes it safe to use pipes for capturing child process output without risking handle/fd leaks. - **Optional separate stderr pipe**: `CreateProcOptions` now accepts a `StderrPipe` field so callers can capture stdout and stderr independently. When null (default), stderr shares the stdout pipe as before. - **LogStreamListener with pluggable handler**: The TCP log stream listener accepts connections from remote processes and delivers parsed log lines through a `LogStreamHandler` interface, set dynamically via `SetHandler()`. This allows any client to receive log messages without depending on a specific console implementation. - **TcpLogStreamSink for zen::logging**: A logging sink that forwards log messages to a `LogStreamListener` over TCP, using the native `zen::logging::Sink` infrastructure with proper thread-safe synchronization. - **Reliable child process exit codes on Linux**: `waitpid` result handling is fixed so `ProcessHandle::GetExitCode()` returns the real exit code. `ProcessHandle::Reset()` reaps zombies directly, replacing the global `IgnoreChildSignals()` which prevented exit code collection entirely. Also fixes a TOCTOU race in `ProcessHandle::Wait()` on Linux/Mac. - **Pipe capture test suite**: Tests covering stdout/stderr capture via pipes (both shared and separate modes), RAII cleanup, move semantics, and exit code propagation using `zentest-appstub` as the child process. - **Service command integration tests**: Shell-based integration tests for `zen service` covering the full lifecycle (install, status, start, stop, uninstall) on all three platforms — Linux (systemd), macOS (launchd), and Windows (SCM via PowerShell). - **Test script reorganization**: Platform-specific test scripts moved from `scripts/test_scripts/` into `scripts/test_linux/`, `test_mac/`, and `test_windows/`.
* add hub instance info (#869)Dan Engelbrecht2026-03-202-14/+15
| | | | | | | - Improvement: Hub module listing now includes per-instance process metrics (memory, CPU time, working set, pagefile usage) - Improvement: Hub now monitors provisioned instance health in the background and refreshes process metrics periodically - Improvement: Hub no longer exposes raw `StorageServerInstance` pointers to callers; instance state is returned as value snapshots (`Hub::InstanceInfo`) - Improvement: Hub instance access is now guarded by RAII per-instance locks (`SharedLockedPtr`/`ExclusiveLockedPtr`), preventing concurrent modifications during provisioning and deprovisioning - Improvement: Hub instance lifecycle is now tracked as a `HubInstanceState` enum covering transitional states (Provisioning, Deprovisioning, Hibernating, Waking); exposed as a string in the HTTP API and dashboard
* Zs/consul token (#870)Zousar Shaker2026-03-202-14/+58
| | | - Feature: Added support for consul token passed via environment variable, and specified a default env var name of CONSUL_HTTP_TOKEN for it in hub mode
* add --hub-hydration-target-spec to zen hub (#867)Dan Engelbrecht2026-03-192-3/+3
|
* workaround for change in xmake behaviour around download file naming (#858)Stefan Boberg2026-03-181-57/+43
| | | | | | | | | | xmake 3.0.7 has a different naming convention than 2.9.9 leading to issues in minio on_install also includes a fix for rpc.record test on Linux by replacing std::atomic wait/notify with condition_variable GCC's std::atomic<int64_t>::wait/notify on Linux uses a proxy hash table mechanism (futex only supports 32-bit words) with known issues (GCC Bug 98033, Bug 115955). Replace with std::mutex + std::condition_variable which is well-tested and consistent with the rest of the codebase.
* Simple S3 client (#836)Stefan Boberg2026-03-1814-0/+2911
| | | | | | | | | | | | This functionality is intended to be used to manage datasets for test cases, but may be useful elsewhere in the future. - **Add S3 client with AWS Signature V4 (SigV4) signing** — new `S3Client` in `zenutil/cloud/` supporting `GetObject`, `PutObject`, `DeleteObject`, `HeadObject`, and `ListObjects` operations - **Add EC2 IMDS credential provider** — automatically fetches and refreshes temporary AWS credentials from the EC2 Instance Metadata Service (IMDSv2) for use by the S3 client - **Add SigV4 signing library** — standalone implementation of AWS Signature Version 4 request signing (headers and query-string presigning) - **Add path-style addressing support** — enables compatibility with S3-compatible stores like MinIO (in addition to virtual-hosted style) - **Add S3 integration tests** — includes a `MinioProcess` test helper that spins up a local MinIO server, plus integration tests exercising the S3 client end-to-end - **Add S3-backed `HttpObjectStoreService` tests** — integration tests verifying the zenserver object store works against an S3 backend - **Refactor mock IMDS into `zenutil/cloud/`** — moved and generalized the mock IMDS server from `zencompute` so it can be reused by both compute and S3 credential tests
* Compute batching (#849)Stefan Boberg2026-03-189-553/+803
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Compute Batch Submission - Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads - Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure - Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps` ### Retracted Action State - Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount` - Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession` - Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints - Add state machine documentation and per-state comments to `RunnerAction` ### Compute Race Fixes - Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely - Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK ### Logging Optimization and ANSI improvements - Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex` - Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage` - Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h` - Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences - Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files ### CLI / Exec Refactoring - Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct - Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`) - Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser ### Testing Improvements - Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter - Add test suite banners to test listener output - Made `function.session.abandon_pending` test more robust ### Startup / Reliability Fixes - Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()` - Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing - Fail on unrecognized zenserver `--mode` instead of silently defaulting to store ### Other - Show host details (hostname, platform, CPU count, memory) when discovering new compute workers - Move frontend `html.zip` from source tree into build directory - Add format specifications for Compact Binary and Compressed Buffer wire formats - Add `WriteCompactBinaryObject` to zencore - Extended `ConsoleTui` with additional functionality - Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support - Disable compute/horde/nomad in release builds (not yet production-ready) - Disable unintended `ASIO_HAS_IO_URING` enablement - Fix crashpad patch missing leading whitespace - Clean up code triggering gcc false positives
* bugfix release - v5.7.23 (#851)Stefan Boberg2026-03-181-9/+11
| | | Works around issue where we could crash during startup when the logging system wasn't fully initialized and something used `ZEN_INFO` et al
* zen hub port reuse (#850)Dan Engelbrecht2026-03-171-0/+5
| | | | | | | | - Feature: Added `--allow-port-probing` option to control whether zenserver searches for a free port on startup (default: true, automatically false when --dedicated is set) - Feature: Added new hub options for controlling provisioned storage server instances: - `--hub-instance-http` - HTTP server implementation for instances (asio/httpsys) - `--hub-instance-http-threads` - Number of HTTP connection threads per instance - `--hub-instance-corelimit` - Limit CPU concurrency per instance - Improvement: Hub now manages a deterministic port pool for provisioned instances allowing reuse of unused ports
* Add clang-cl build supportStefan Boberg2026-03-131-1/+1
| | | | | | | | | | - Add clang-cl warning suppressions in xmake.lua matching Linux/macOS set - Guard /experimental:c11atomics with {tools="cl"} for MSVC-only - Fix long long / int64_t redefinition in string.h for clang-cl - Fix unclosed namespace in callstacktrace.cpp #else branch - Fix missing override in httpplugin.cpp - Reorder WorkerPool fields to match designated initializer order - Use INVALID_SOCKET instead of SOCKET_ERROR for SOCKET comparisons
* Unix Domain Socket auto discovery (#833)Stefan Boberg2026-03-132-3/+300
| | | | | | | | This PR adds end-to-end Unix domain socket (UDS) support, allowing zen CLI to discover and connect to UDS-only servers automatically. - **`unix://` URI scheme in zen CLI**: The `-u` / `--hosturl` option now accepts `unix:///path/to/socket` to connect to a zenserver via a Unix domain socket instead of TCP. - **Per-instance shared memory for extended server info**: Each zenserver instance now publishes a small shared memory section (keyed by SessionId) containing per-instance data that doesn't fit in the fixed-size ZenServerEntry -- starting with the UDS socket path. This is a 4KB pagefile-backed section on Windows (`Global\ZenInstance_{sessionid}`) and a POSIX shared memory object on Linux/Mac (`/UnrealEngineZen_{sessionid}`). - **Client-side auto-discovery of UDS servers**: `zen info`, `zen status`, etc. now automatically discover and prefer UDS connections when a server publishes a socket path. Servers running with `--no-network` (UDS-only) are no longer invisible to the CLI. - **`kNoNetwork` flag in ZenServerEntry**: Servers started with `--no-network` advertise this in their shared state entry. Clients skip TCP fallback for these servers, and display commands (`ps`, `status`, `top`) show `-` instead of a port number to indicate TCP is not available.
* Transparent proxy mode (#823)Stefan Boberg2026-03-121-1/+5
| | | | | | | | | | | | | | | | | Adds a **transparent TCP proxy mode** to zenserver (activated via `zenserver proxy`), allowing it to sit between clients and upstream Zen servers to inspect and monitor HTTP/1.x traffic in real time. Primarily useful during development, to be able to observe multi-server/client interactions in one place. - **Dedicated proxy port** -- Proxy mode defaults to port 8118 with its own data directory to avoid collisions with a normal zenserver instance. - **TCP proxy core** (`src/zenserver/proxy/`) -- A new transparent TCP proxy that forwards connections to upstream targets, with support for both TCP/IP and Unix socket listeners. Multi-threaded I/O for connection handling. Supports Unix domain sockets for both upstream/downstream. - **HTTP traffic inspection** -- Parses HTTP/1.x request/response streams inline to extract method, path, status, content length, and WebSocket upgrades without breaking the proxied data. - **Proxy dashboard** -- A web UI showing live connection stats, per-target request counts, active connections, bytes transferred, and client IP/session ID rollups. - **Server mode display** -- Dashboard banner now shows the running server mode (Zen Proxy, Zen Compute, etc.). Supporting changes included in this branch: - **Wildcard log level matching** -- Log levels can now be set per-category using wildcard patterns (e.g. `proxy.*=debug`). - **`zen down --all`** -- New flag to shut down all running zenserver instances; also used by the new `xmake kill` task. - Minor test stability fixes (flaky hash collisions, per-thread RNG seeds). - Support ZEN_MALLOC environment variable for default allocator selection and switch default to rpmalloc - Fixed sentry-native build to allow LTO on Windows
* hub consul integration (#820)Dan Engelbrecht2026-03-114-4/+367
| | | | | | | | - Feature: Basic consul integration for zenserver hub mode, restricted to host local consul agent and register/deregister of services - Feature: Added new options to zenserver hub mode - `consul-endpoint` - Consul endpoint URL for service registration (empty = disabled) - `hub-base-port-number` - Base port number for provisioned instances - `hub-instance-limit` - Maximum number of provisioned instances for this hub - `hub-use-job-object` - Enable the use of a Windows Job Object for child process management (Windows only)
* Merge branch 'main' into lm/oidctoken-exe-pathLiam Mitchell2026-03-0921-446/+955
|\
| * Eliminate spdlog dependency (#773)Stefan Boberg2026-03-0910-441/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removes the vendored spdlog library (~12,000 lines) and replaces it with a purpose-built logging system in zencore (~1,800 lines). The new implementation provides the same functionality with fewer abstractions, no shared_ptr overhead, and full control over the logging pipeline. ### What changed **New logging core in zencore/logging/:** - LogMessage, Formatter, Sink, Logger, Registry - core abstractions matching spdlog's model but simplified - AnsiColorStdoutSink - ANSI color console output (replaces spdlog stdout_color_sink) - MsvcSink - OutputDebugString on Windows (replaces spdlog msvc_sink) - AsyncSink - async logging via BlockingQueue worker thread (replaces spdlog async_logger) - NullSink, MessageOnlyFormatter - utility types - Thread-safe timestamp caching in formatters using RwLock **Moved to zenutil/logging/:** - FullFormatter - full log formatting with timestamp, logger name, level, source location, multiline alignment - JsonFormatter - structured JSON log output - RotatingFileSink - rotating file sink with atomic size tracking **API changes:** - Log levels are now an enum (LogLevel) instead of int, eliminating the zen::logging::level namespace - LoggerRef no longer wraps shared_ptr - it holds a raw pointer with the registry owning lifetime - Logger error handler is wired through Registry and propagated to all loggers on registration - Logger::Log() now populates ThreadId on every message **Cleanup:** - Deleted thirdparty/spdlog/ entirely (110+ files) - Deleted full_test_formatter (was ~80% duplicate of FullFormatter) - Renamed snake_case classes to PascalCase (full_formatter -> FullFormatter, json_formatter -> JsonFormatter, sentry_sink -> SentrySink) - Removed spdlog from xmake dependency graph ### Build / test impact - zencore no longer depends on spdlog - zenutil and zenvfs xmake.lua updated to drop spdlog dep - zentelemetry xmake.lua updated to drop spdlog dep - All existing tests pass, no test changes required beyond formatter class renames
| * compute orchestration (#763)Stefan Boberg2026-03-043-0/+8
| | | | | | | | | | | | | | | | | | | | - Added local process runners for Linux/Wine, Mac with some sandboxing support - Horde & Nomad provisioning for development and testing - Client session queues with lifecycle management (active/draining/cancelled), automatic retry with configurable limits, and manual reschedule API - Improved web UI for orchestrator, compute, and hub dashboards with WebSocket push updates - Some security hardening - Improved scalability and `zen exec` command Still experimental - compute support is disabled by default
| * Add test suites (#799)Stefan Boberg2026-03-023-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Makes all test cases part of a test suite. Test suites are named after the module and the name of the file containing the implementation of the test. * This allows for better and more predictable filtering of which test cases to run which should also be able to reduce the time CI spends in tests since it can filter on the tests for that particular module. Also improves `xmake test` behaviour: * instead of an explicit list of projects just enumerate the test projects which are available based on build system state * also introduces logic to avoid running `xmake config` unnecessarily which would invalidate the existing build and do lots of unnecessary work since dependencies were invalidated by the updated config * also invokes build only for the chosen test targets As a bonus, also adds `xmake sln --open` which allows opening IDE after generation of solution/xmake project is done.
| * added `--verbose` option to zenserver-test and `xmake test` (#798)Stefan Boberg2026-03-012-9/+14
| | | | | | | | | | | | * when `--verbose` is specified to zenserver-test, all child process output (typically, zenserver instances) is piped through to stdout. you can also pass `--verbose` to `xmake test` to accomplish the same thing. * this PR also consolidates all test runner `main` function logic (such as from zencore-test, zenhttp-test etc) into central implementation in zencore for consistency and ease of maintenance * also added extended utf8-tests including a fix to `Utf8ToWide()`
| * subprocess tracking using Jobs on Windows/hub (#796)Stefan Boberg2026-02-282-5/+24
| | | | | | | | | | This change introduces job object support on Windows to be able to more accurately track and limit resource usage on storage instances created by the hub service. It also ensures that all child instances can be torn down reliably on exit. Also made it so hub tests no longer pop up console windows while running.
| * Add test summary table and failure reporting to xmake test (#794)Stefan Boberg2026-02-272-1/+3
| | | | | | | | | | | | | | | | | | | | - Add a summary table printed after all test suites complete, showing per-suite test case counts, assertion counts, timings and pass/fail status. - Add failure reporting: individual failing test cases are listed at the end with their file path and line number for easy navigation. - Made zenserver instances spawned by a hub not create new console windows for a better background testing experience - The TestListener in testing.cpp now writes a machine-readable summary file (via `ZEN_TEST_SUMMARY_FILE` env var) containing aggregate counts and per-test-case failure details. This runs as a doctest listener alongside any active reporter, so it works with both console and JUnit modes. - Tests now run in a deterministic order defined by a single ordered list that also serves as the test name/target mapping, replacing the previous unordered table + separate order list. - The `--run` option now accepts comma-separated values (e.g. `--run=core,http,util`) and validates each name, reporting unknown test names early. - Fix platform detection in `xmake test`: the config command now passes `-p` explicitly, fixing "mingw" misdetection when running from Git Bash on Windows. - Add missing "util" entry to the help text for `--run`.
| * Add `zen ui` command (#779)Stefan Boberg2026-02-242-0/+542
| | | | | | | | | | Allows user to automate launching of zenserver dashboard, including when multiple instances are running. If multiple instances are running you can open all dashboards with `--all`, and also using the in-terminal chooser which also allows you to open a specific instance. Also includes a fix to `zen exec` when using offset/stride/limit
| * added ResetConsoleLog (#758)Stefan Boberg2026-02-161-0/+5
| | | | | | also made sure log initialization calls it to ensure the console output format is retained even if the console logger was set up before logging is initialized
| * logging config move to zenutil (#754)Stefan Boberg2026-02-137-4/+118
| | | | | | made logging config options from zenserver available in zen CLI
* | Merge branch 'main' into lm/oidctoken-exe-pathLiam Mitchell2026-03-098-54/+385
|\|
| * reduce batch size for reads (#740)Dan Engelbrecht2026-01-291-1/+1
| | | | | | | | | | * reduce maximum size per chunk to read to reduce disk contention * increase timeout before warning on slow shut down of zenserver * reduce default window size for blockstore chunk iteration
| * hotfix 5.7.18 (#730)Dan Engelbrecht2026-01-222-0/+40
| | | | | | | | * make sure we properly convert command line args for zenserver as well * make sure we *add* wildcards/excludes in addition to defaults
| * ZenServerProcess API changes (#719)Stefan Boberg2026-01-192-20/+133
| | | | | | | | | | | | | | This refactor aims to improve the `ZenServerProcess` classes by making them useful for managing child zenserver instances in more scenarios than just automated tests. This involves changing some functions to not talk about "test directory" and instead use "data directory" etc As a consequence of the API changes, some tests have changed accordingly. The code includes som reference to the "hub" mode but there is not yet any other code using this mode, it's just included in this PR to simplify future merges.
| * consul package and basic client added (#716)Stefan Boberg2026-01-192-0/+187
| | | | | | | | | | | | | | | | * this adds a consul package which can be used to fetch a consul binary * it also adds a `ConsulProcess` helper which can be used to spawn and manage a consul service instance * zencore dependencies brought across: - `except_fmt.h` for easer generation of formatted exception messages - `process.h/cpp` changes (adds `Kill` operation and process group support on Windows) - `string.h` changes to allow generic use of `WideToUtf8()`
| * Merge pull request #701 from ue-foundation/lm/mac-daemon-modeLiam Mitchell2026-01-151-32/+22
| |\ | | | | | | Implement final changes required for daemon mode on Mac
| | * Run clang-format on service.cppLiam Mitchell2026-01-151-3/+2
| | |
| | * Implement final changes required for daemon mode on MacLiam Mitchell2026-01-071-32/+23
| | |
| * | minor fixes (#708)Stefan Boberg2026-01-131-1/+2
| |/ | | | | | | | | * remove unreferenced local in projectstore_cmd * fix minor atomic memory issue in RotatingFileSink
* / Use well-known OidcToken paths or command line arguments to determine ↵Liam Mitchell2026-01-142-0/+67
|/ | | | OidcToken executable path