aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* add clink auto completionsb/completeStefan Boberg13 hours1-1/+39
|
* implemented initial support for shell command completionStefan Boberg13 hours6-1072/+1569
|
* hub async provision/deprovision/hibernate/wake (#891)HEADmainDan Engelbrecht39 hours6-396/+1126
| | | | | - Improvement: Hub provision, deprovision, hibernate, and wake operations are now async. HTTP requests returns 202 Accepted while the operation completes in the background - Improvement: Hub returns 202 Accepted (instead of 409 Conflict) when the same async operation is already in progress for a module - Improvement: Hub returns 200 OK when a requested state transition is already satisfied
* Subprocess Manager (#889)Stefan Boberg42 hours19-537/+2893
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a `SubprocessManager` for managing child processes with ASIO-integrated async exit detection, stdout/stderr pipe capture, and periodic metrics sampling. Also introduces `ProcessGroup` for OS-backed process grouping (Windows JobObjects / POSIX process groups). ### SubprocessManager - Async process exit detection using platform-native mechanisms (Windows `object_handle`, Linux `pidfd_open`, macOS `kqueue EVFILT_PROC`) — no polling - Stdout/stderr capture via async pipe readers with per-process or default callbacks - Periodic round-robin metrics sampling (CPU, memory) across managed processes - Spawn, adopt, remove, kill, and enumerate managed processes ### ProcessGroup - OS-level process grouping: Windows JobObject (kill-on-close guarantee), POSIX `setpgid` (bulk signal delivery) - Atomic group kill via `TerminateJobObject` (Windows) or `kill(-pgid, sig)` (POSIX) - Per-group aggregate metrics and enumeration ### ProcessHandle improvements - Added explicit constructors from `int` (pid) and `void*` (native handle) - Added move constructor and move assignment operator ### ProcessMetricsTracker - Cross-platform process metrics (CPU time, working set, page faults) via `QueryProcessMetrics()` - ASIO timer-driven periodic sampling with configurable interval and batch size - Aggregate metrics across tracked processes ### Other changes - Fixed `zentest-appstub` writing a spurious `Versions` file to cwd on every invocation
* refactor hub notifications (#888)Dan Engelbrecht2 days7-223/+371
| | | | * refactor hub callbacks * improve http responses
* Cross-platform process metrics support (#887)Stefan Boberg3 days10-53/+728
| | | | | | | - **Cross-platform `GetProcessMetrics`**: Implement Linux (`/proc/{pid}/stat`, `/proc/{pid}/statm`, `/proc/{pid}/status`) and macOS (`proc_pidinfo(PROC_PIDTASKINFO)`) support for CPU times and memory metrics. Fix Windows to populate the `MemoryBytes` field (was always 0). All platforms now set `MemoryBytes = WorkingSetSize`. - **`ProcessMetricsTracker`**: Experimental utility class (`zenutil`) that periodically samples resource usage for a set of tracked child processes. Supports both a dedicated background thread and an ASIO steady_timer mode. Computes delta-based CPU usage percentage across samples, with batched sampling (8 processes per tick) to limit per-cycle overhead. - **`ProcessHandle` documentation**: Add Doxygen comments to all public methods describing platform-specific behavior. - **Cleanup**: Remove unused `ZEN_RUN_TESTS` macro (inlined at its single call site in `zenserver/main.cpp`), remove dead `#if 0` thread-shutdown workaround block. - **Minor fixes**: Use `HttpClientAccessToken` constructor in hordeclient instead of setting private members directly. Log ASIO version at startup and include it in the server settings list.
* Merge branch 'de/v5.7.25-hotpatch' (#880)Dan Engelbrecht3 days2-14/+31
|
* add tests for s3 and file hydrators (#886)Dan Engelbrecht3 days2-4/+699
|
* Dashboard refresh (logs, storage, network, object store, docs) (#835)Stefan Boberg3 days52-451/+10005
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Summary This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements. ### Sessions Service - `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle) - Storage server registers itself with its own local sessions service on startup - Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.) - Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All) - `--sessions-url` config option for remote session announcement - In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard ### Session Log Viewer - POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array - In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching - Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring - Auto-selects the server's own session on page load ### TCP Log Streaming - `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP - Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps - Gathered buffer writes to reduce syscall overhead when flushing batches - Tests covering basic delivery, multi-line splitting, drop detection, and sequencing ### New Dashboard Pages - **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session - **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`) - **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels - **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog ### Documentation Page - In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking - New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md` - Dev docs moved to `docs/dev/` ### Infrastructure & Bug Fixes - **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers - **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner - **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection - Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers - Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
* add hub instance crash recovery (#885)Dan Engelbrecht3 days10-35/+328
| | | * add hub instance crash recovery
* Unique session/client tracking using HyperLogLog (#884)Stefan Boberg3 days7-2/+356
| | | | | | | | | | | | | | ## Summary Adds probabilistic cardinality estimation for tracking unique HTTP clients and sessions using a HyperLogLog implementation. - Add a `HyperLogLog<Precision>` template in `zentelemetry` with thread-safe lock-free register updates, merge support, and XXH3 hashing - Feed client IP addresses (via raw bytes) and session IDs (via `Oid` bytes) into their respective HyperLogLog estimators from both the ASIO and http.sys server backends - Emit `distinct_clients` and `distinct_sessions` cardinality estimates in HTTP `CollectStats()` - Add tests covering empty, single, duplicates, accuracy, merge, and clear scenarios ## Why HyperLogLog Tracking exact unique counts would require storing every observed IP or session ID. HyperLogLog provides a memory-bounded probabilistic estimate (~1–2% error) using only a few KB of memory regardless of traffic volume.
* Logger simplification (#883)Stefan Boberg3 days10-73/+156
| | | | | | | | | | | - **`Logger` now holds a single `SinkPtr`** instead of a `std::vector<SinkPtr>`. The `SetSinks`/`AddSink` API is replaced with a single `SetSink`. This removes complexity from `Logger` itself and makes `Clone()` cheaper (no vector copy). - **New `BroadcastSink`** (`zencore/logging/broadcastsink.h`) acts as a thread-safe, shared indirection point that fans out to a dynamic list of child sinks. Adding or removing a child sink via `AddSink`/`RemoveSink` is immediately visible to every `Logger` that holds a reference to it — including cloned loggers — without requiring each logger to be updated individually. - **`GetDefaultBroadcastSink()`** (exposed from `zenutil/logging.h`) gives server-layer code access to the shared broadcast sink so it can register optional sinks (OTel, TCP log stream) after logging is initialized, without going through `Default()->AddSink()`. ### Motivation Previously, dynamically adding sinks post-initialization mutated the default logger's internal sink vector directly. This was fragile: cloned loggers (created before `AddSink` was called) would not pick up the new sinks. `BroadcastSink` fixes this by making the sink list a shared, mutable object that all loggers sharing the same broadcast instance observe uniformly.
* Process management improvements (#881)Stefan Boberg3 days5-45/+160
| | | | | | | | | | | This PR improves process lifecycle handling and resilience across several areas: - **Reclaim stale shared-memory entries instead of exiting** (`zenserver.cpp`): When a zenserver instance fails to attach as a sponsor to an existing process (e.g. because the PID was reused by an unrelated process), the server now clears the stale shared-memory entry and proceeds with normal startup instead of calling `std::exit(1)`. - **Wait for child process exit in `Kill()` and `Terminate()` on Unix** (`process.cpp`): After sending `SIGTERM` in `Kill()`, the code now waits up to 5s for graceful shutdown (escalating to `SIGKILL` on timeout), matching the Windows behavior. `Terminate()` also waits after `SIGKILL` so the child is properly reaped and doesn't linger as a zombie clogging up the process table. - **Fix sysctl buffer race in macOS `FindProcess`** (`process.cpp`): The macOS process enumeration now retries the `sysctl` call (up to 3 attempts with 25% buffer padding) to handle the race where the process list changes between the sizing call and the data-fetching call. Also flattens the nesting and fixes the guard/free scoping. - **Terminate stale processes before integration tests** (`zenserver-test.cpp`, `test.lua`): The integration test runner now accepts a `--kill-stale-processes` flag (passed automatically by `test.lua`) that scans for and terminates any leftover `zenserver`, `zenserver-test`, and `zentest-appstub` processes from previous test runs, logging the executable name and PID of each. This addresses flaky test failures caused by stale processes from prior runs holding ports or other resources.
* improve zenserver startup time (#879)Dan Engelbrecht4 days2-12/+22
| | | | - Improvement: Lazy initialize CpuSampler - reduces startup time by ~600 ms - Bugfix: Don't try to wipe .sentry-native folder at missing manifest - sentry is already running. Reduces startup time by ~450 ms when data folder is empty
* S3 hydration backend for hub mode (#873)Dan Engelbrecht4 days3-34/+486
| | | | | | - Feature: Added S3 hydration backend for hub mode (`--hub-hydration-target-spec s3://<bucket>[/<prefix>]`) - Credentials resolved from `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars, falling back to EC2 instance profile via IMDS - Each dehydration uploads to a new timestamped folder and commits a `current-state.json` pointer on success, so a failed upload never invalidates the previous state - Hydration downloads to a temp directory first and only replaces the server state on full success; failures leave the existing state intact
* hub web UI improvements (#878)Dan Engelbrecht4 days5-20/+746
| | | | | | | | | | - Improvement: Hub dashboard module list improved - Animated state dots, with flashing transitions for hibernating, waking, provisioning, and deprovisioning - Per-row port used by the provisioned instance - Per-row hibernate, wake, and deprovision actions with confirmation for destructive operations - Per-row button to open the instance dashboard in a new window - Multi-select with bulk hibernate/wake/deprovision and select-all - Pagination at 50 lines per page - Inline fold-out panel per row with human-readable process metrics
* Upgrade mimalloc to v2.2.7 and log active memory allocator (#876)Stefan Boberg4 days13-29/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Upgrade mimalloc from v2.1.2 to v2.2.7. Note that mimalloc is no longer the default allocator so this only impacts users who somehow opt into mimalloc via `--malloc=mimalloc` or compile with different defaults - Add all available mimalloc versions (1.6.7–3.2.8) to the package definition for testing - Log the active memory allocator (with version where available) at server startup - Annotate vendored rpmalloc with its source commit and version ## Notable changes in mimalloc 2.1.2 → 2.2.7 - **Memory release fix** (2.2.4): fix case where OS memory was not always fully released - **Race condition fix** (2.2.6): fixed rare race condition and potential buffer overflow in debug statistics - **Windows arm64 support** (2.1.9) - **Guarded build** (2.1.9): new build mode that places OS guard pages behind objects to catch buffer overflows - **THP awareness** (2.2.6): auto-detects transparent huge pages and adjusts purge size to avoid fragmentation - **Faster TLS access on Windows** (2.2.6) - **Improved calloc and aligned allocation performance** (2.2.6) - **New diagnostic APIs** (2.2.2): `mi_options_print`, `mi_arenas_print`, `mi_stat_get` / `mi_stat_get_json` - **macOS**: use `MADV_FREE_REUSABLE` for better memory behavior (2.2.4) - **Build fixes**: Android, Xbox, musl, mingw, arm32, Debian 32-bit, non-BMI1 x64 systems ## Allocator logging Added `FMalloc::GetName()` pure virtual so the server logs which allocator is active at startup: ``` zenserver - memory allocator: mimalloc 2.2.7 ``` Allocator names include version where available: - `mimalloc 2.2.7` (runtime version via `mi_version()`) - `rpmalloc 1.5.0-dev.20250810` (ad-hoc version from vendored develop branch commit) - `ansi`, `stomp` (no version info available) ## Test plan - [x] Builds successfully on Windows (release) - [x] Verify server startup log shows allocator name - [x] Test with `--malloc=mimalloc` (default) and `--malloc=rpmalloc` - [x] Run test suites to check for regressions
* zen hub command (#877)Dan Engelbrecht4 days13-448/+1394
| | | | | | | | | | | | | | | - Feature: Added `zen hub` command for managing a hub server and its provisioned module instances: - `zen hub up` - Start a hub server (equivalent to `zen up` in hub mode) - `zen hub down` - Shut down a hub server - `zen hub provision <moduleid>` - Provision a storage server instance for a module - `zen hub deprovision <moduleid>` - Deprovision a storage server instance - `zen hub hibernate <moduleid>` - Hibernate a provisioned instance (shut down, data preserved) - `zen hub wake <moduleid>` - Wake a hibernated instance - `zen hub status [moduleid]` - Show state of all instances or a specific module - Feature: Added new hub HTTP endpoints for instance lifecycle management: - `POST /hub/modules/{moduleid}/hibernate` - Hibernate the instance for the given module - `POST /hub/modules/{moduleid}/wake` - Wake a hibernated instance for the given module - Improvement: `zen up` refactored to use shared `StartupZenServer`/`ShutdownZenServer` helpers (also used by `zen hub up`/`zen hub down`) - Bugfix: Fixed shutdown event not being cleared after the server process exits in `ZenServerInstance::Shutdown()`, which could cause stale state on reuse
* Interprocess pipe support (for stdout/stderr capture) (#866)Stefan Boberg4 days15-25/+1154
| | | | | | | | | | | | | | | | | - **RAII pipe handles for child process stdout/stderr capture**: `StdoutPipeHandles` is now a proper RAII type with automatic cleanup, move semantics, and partial close support. This makes it safe to use pipes for capturing child process output without risking handle/fd leaks. - **Optional separate stderr pipe**: `CreateProcOptions` now accepts a `StderrPipe` field so callers can capture stdout and stderr independently. When null (default), stderr shares the stdout pipe as before. - **LogStreamListener with pluggable handler**: The TCP log stream listener accepts connections from remote processes and delivers parsed log lines through a `LogStreamHandler` interface, set dynamically via `SetHandler()`. This allows any client to receive log messages without depending on a specific console implementation. - **TcpLogStreamSink for zen::logging**: A logging sink that forwards log messages to a `LogStreamListener` over TCP, using the native `zen::logging::Sink` infrastructure with proper thread-safe synchronization. - **Reliable child process exit codes on Linux**: `waitpid` result handling is fixed so `ProcessHandle::GetExitCode()` returns the real exit code. `ProcessHandle::Reset()` reaps zombies directly, replacing the global `IgnoreChildSignals()` which prevented exit code collection entirely. Also fixes a TOCTOU race in `ProcessHandle::Wait()` on Linux/Mac. - **Pipe capture test suite**: Tests covering stdout/stderr capture via pipes (both shared and separate modes), RAII cleanup, move semantics, and exit code propagation using `zentest-appstub` as the child process. - **Service command integration tests**: Shell-based integration tests for `zen service` covering the full lifecycle (install, status, start, stop, uninstall) on all three platforms — Linux (systemd), macOS (launchd), and Windows (SCM via PowerShell). - **Test script reorganization**: Platform-specific test scripts moved from `scripts/test_scripts/` into `scripts/test_linux/`, `test_mac/`, and `test_windows/`.
* fix null stats provider crash when build store is not configured (#875)v5.7.25-pre0Stefan Boberg5 days2-1/+8
| | | | | | - When build store is not configured, `m_BuildCidStore` is null but was unconditionally added as a stats provider, causing a null pointer dereference crash in `StatsReporter::ReportStats` - Added a null guard in `AddProvider` to reject null pointers defensively - Added a conditional check at the call site in `InitializeStructuredCache`
* auth fail no cache (#871)v5.7.24-pre0v5.7.24Dan Engelbrecht6 days2-1/+7
| | | | - Bugfix: Retry OIDC token refresh once on failure before propagating the error - Bugfix: Handle HTTP 501 (Not Implemented) from Jupiter as a signal to fall back from multi-range to single-range requests
* add hub instance info (#869)Dan Engelbrecht6 days18-205/+845
| | | | | | | - Improvement: Hub module listing now includes per-instance process metrics (memory, CPU time, working set, pagefile usage) - Improvement: Hub now monitors provisioned instance health in the background and refreshes process metrics periodically - Improvement: Hub no longer exposes raw `StorageServerInstance` pointers to callers; instance state is returned as value snapshots (`Hub::InstanceInfo`) - Improvement: Hub instance access is now guarded by RAII per-instance locks (`SharedLockedPtr`/`ExclusiveLockedPtr`), preventing concurrent modifications during provisioning and deprovisioning - Improvement: Hub instance lifecycle is now tracked as a `HubInstanceState` enum covering transitional states (Provisioning, Deprovisioning, Hibernating, Waking); exposed as a string in the HTTP API and dashboard
* Zs/consul token (#870)Zousar Shaker6 days5-15/+146
| | | - Feature: Added support for consul token passed via environment variable, and specified a default env var name of CONSUL_HTTP_TOKEN for it in hub mode
* Zen disk benchmark utility (#868)Stefan Boberg7 days2-0/+1424
| | | | | This PR adds a `zen bench disk` subcommand to help with gathering user performance metrics It also contains a fix for `xmake precommit`, the task now probes to find the most appropriate way to launch pre-commit
* add --hub-hydration-target-spec to zen hub (#867)Dan Engelbrecht7 days10-30/+57
|
* improve auth token refresh (#863)Dan Engelbrecht7 days11-65/+126
| | | Authentication callbacks are not thread safe, ensured call sites does single threaded calls
* Add lightweight crash handler for pre-Sentry startup backtraces (#853)Stefan Boberg8 days4-0/+247
| | | | | | | | | | - Install a crash handler at the very top of main() in both zenserver and zen - On Windows, uses SetUnhandledExceptionFilter with StackWalk64 for accurate crash-site backtraces with DbgHelp symbol resolution - On Linux/Mac, uses sigaction with async-signal-safe backtrace output - Automatically superseded when Sentry/crashpad installs its own handlers - Stays active for the full process lifetime if Sentry is disabled or absent - Include .sym debug symbol files in Linux release bundle
* Pre-initialization of default logger (#859)Stefan Boberg8 days4-6/+30
| | | Improved workaround for troubles with code potentially logging before logging is initialized. Any logging will be routed to a default console logger until logging is initialized fully
* Merge pull request #855 from ue-foundation/zs/long-filename-improvementZousar Shaker8 days4-5/+104
|\ | | | | Zs/long filename improvement
| * Addressing review feedbackzousar8 days2-7/+6
| |
| * Merge branch 'main' into zs/long-filename-improvementStefan Boberg8 days24-55/+189
| |\
| * | Fix long path handling for project storezousar8 days3-5/+13
| | |
| * | Add tests for long path handling in project storezousar8 days1-0/+92
| | |
* | | Add natvis for Compact Binary (#860)Devin Doucette8 days5-0/+1000
| | | | | | | | | | | | | | | | | | | | | Add natvis for Compact Binary Includes natvis for DateTime, TimeSpan, IoHash, Guid, Oid. Based on UE CL 51830581.
* | | workaround for change in xmake behaviour around download file naming (#858)Stefan Boberg8 days2-58/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xmake 3.0.7 has a different naming convention than 2.9.9 leading to issues in minio on_install also includes a fix for rpc.record test on Linux by replacing std::atomic wait/notify with condition_variable GCC's std::atomic<int64_t>::wait/notify on Linux uses a proxy hash table mechanism (futex only supports 32-bit words) with known issues (GCC Bug 98033, Bug 115955). Replace with std::mutex + std::condition_variable which is well-tested and consistent with the rest of the codebase.
* | | add --hub-instance-config option to set lua config path for hub instances (#854)Dan Engelbrecht8 days6-13/+69
| | | | | | | | | * add --hub-instance-config option to set lua config path for hub instances
* | | Simple S3 client (#836)Stefan Boberg8 days23-330/+3555
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This functionality is intended to be used to manage datasets for test cases, but may be useful elsewhere in the future. - **Add S3 client with AWS Signature V4 (SigV4) signing** — new `S3Client` in `zenutil/cloud/` supporting `GetObject`, `PutObject`, `DeleteObject`, `HeadObject`, and `ListObjects` operations - **Add EC2 IMDS credential provider** — automatically fetches and refreshes temporary AWS credentials from the EC2 Instance Metadata Service (IMDSv2) for use by the S3 client - **Add SigV4 signing library** — standalone implementation of AWS Signature Version 4 request signing (headers and query-string presigning) - **Add path-style addressing support** — enables compatibility with S3-compatible stores like MinIO (in addition to virtual-hosted style) - **Add S3 integration tests** — includes a `MinioProcess` test helper that spins up a local MinIO server, plus integration tests exercising the S3 client end-to-end - **Add S3-backed `HttpObjectStoreService` tests** — integration tests verifying the zenserver object store works against an S3 backend - **Refactor mock IMDS into `zenutil/cloud/`** — moved and generalized the mock IMDS server from `zencompute` so it can be reused by both compute and S3 credential tests
* | | Compute batching (#849)Stefan Boberg8 days47-2261/+4064
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Compute Batch Submission - Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads - Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure - Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps` ### Retracted Action State - Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount` - Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession` - Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints - Add state machine documentation and per-state comments to `RunnerAction` ### Compute Race Fixes - Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely - Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK ### Logging Optimization and ANSI improvements - Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex` - Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage` - Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h` - Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences - Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files ### CLI / Exec Refactoring - Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct - Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`) - Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser ### Testing Improvements - Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter - Add test suite banners to test listener output - Made `function.session.abandon_pending` test more robust ### Startup / Reliability Fixes - Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()` - Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing - Fail on unrecognized zenserver `--mode` instead of silently defaulting to store ### Other - Show host details (hostname, platform, CPU count, memory) when discovering new compute workers - Move frontend `html.zip` from source tree into build directory - Add format specifications for Compact Binary and Compressed Buffer wire formats - Add `WriteCompactBinaryObject` to zencore - Extended `ConsoleTui` with additional functionality - Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support - Disable compute/horde/nomad in release builds (not yet production-ready) - Disable unintended `ASIO_HAS_IO_URING` enablement - Fix crashpad patch missing leading whitespace - Clean up code triggering gcc false positives
* | | bugfix release - v5.7.23 (#851)Stefan Boberg8 days3-42/+64
| |/ |/| | | Works around issue where we could crash during startup when the logging system wasn't fully initialized and something used `ZEN_INFO` et al
* | zen hub port reuse (#850)Dan Engelbrecht8 days21-46/+165
| | | | | | | | | | | | | | | | - Feature: Added `--allow-port-probing` option to control whether zenserver searches for a free port on startup (default: true, automatically false when --dedicated is set) - Feature: Added new hub options for controlling provisioned storage server instances: - `--hub-instance-http` - HTTP server implementation for instances (asio/httpsys) - `--hub-instance-http-threads` - Number of HTTP connection threads per instance - `--hub-instance-corelimit` - Limit CPU concurrency per instance - Improvement: Hub now manages a deterministic port pool for provisioned instances allowing reuse of unused ports
* | add sanitizer options to xmake (#847)v5.7.23-pre1v5.7.23-pre0Dan Engelbrecht9 days3-9/+24
|/ | | | | | - Improvement: Add easy access options for sanitizers with `xmake config` and `xmake test` as options - `--msan=[y|n]` Enable MemorySanitizer (Linux only, requires all deps instrumented) - `--asan=[y|n]` Enable AddressSanitizer (disables mimalloc and sentry) - `--tsan=[y|n]` Enable ThreadSanitizer (Linux/Mac only)
* revise oplog block arrangement (#842)Dan Engelbrecht10 days7-2473/+4918
| | | | | - Improvement: Fixed issue where oplog upload could create blocks larger than the max limit (64Mb) Refactored remoteprojectstore.cpp to use ParallelWork and exceptions for error handling.
* Restructure zen builds using subcommands (#834)Stefan Boberg10 days2-2284/+2647
| | | Refactored builds_cmd to split subcommands into dedicated classes, in an effort to reduce surface area and complexity to improve maintainability.
* Enable cross compilation of Windows targets on Linux (#839)Stefan Boberg10 days18-27/+57
| | | | | | | This PR makes it *possible* to do a Windows build on Linux via `clang-cl`. It doesn't actually change any build process. No policy change, just mechanics and some code fixes to clear clang compilation. The code fixes are mainly related to #include file name casing, to match the on-disk casing of the SDK files (via xwin).
* URI decoding, process env, compiler info, httpasio strands, regex route ↵Stefan Boberg10 days17-381/+393
| | | | | | | | | | | | | | | | | removal (#841) - Percent-decode URIs in ASIO HTTP server to match http.sys CookedUrl behavior, ensuring consistent decoded paths across backends - Add Environment field to CreateProcOptions for passing extra env vars to child processes (Windows: merged into Unicode environment block; Unix: setenv in fork) - Add GetCompilerName() and include it in build options startup logging - Suppress Windows CRT error dialogs in test harness for headless/CI runs - Fix mimalloc package: pass CMAKE_BUILD_TYPE, skip cfuncs test for cross-compile - Add virtual destructor to SentryAssertImpl to fix debug-mode warning - Simplify object store path handling now that URIs arrive pre-decoded - Add URI decoding test coverage for percent-encoded paths and query params - Simplify httpasio request handling by using strands (guarantees no parallel handlers per connection) - Removed deprecated regex-based route matching support - Fix full GC never triggering after cross-toolchain builds: The `gc_state` file stores `system_clock` ticks, but the tick resolution differs between toolchains (nanoseconds on GCC/standard clang, microseconds on UE clang). A nanosecond timestamp misinterpreted as microseconds appears far in the future (~year 58,000), bypassing the staleness check and preventing time-based full GC from ever running. Fixed by also resetting when the stored timestamp is in the future. - Clamp GC countdown display to configured interval: Prevents nonsensical log output (e.g. "Full GC in 492128002h") caused by the above or any other clock anomaly. The clamp applies to both the scheduler log and the status API.
* block/file cloning support for macOS / Linux (#786)Stefan Boberg10 days2-72/+523
| | | | | | | | - Add block cloning (copy-on-write) support for Linux and macOS to complement the existing Windows (ReFS) implementation - **Linux**: `TryCloneFile` via `FICLONE` ioctl, `CloneQueryInterface` with range cloning via `FICLONERANGE` (Btrfs/XFS) - **macOS**: `TryCloneFile` via `clonefile()` syscall (APFS), `SupportsBlockRefCounting` via `VOL_CAP_INT_CLONE`. `CloneQueryInterface` is not implemented as macOS lacks a sub-file range clone API - Promote `ScopedFd` to file scope for broader use in filesystem code - Add test scripts for block cloning validation on Linux (Btrfs via loopback) and macOS (APFS) - Also added test script for testing on Windows (ReFS)
* Made CPR optional, html generated at build time (#840)Stefan Boberg12 days8-22/+85
| | | | | | | - Fix potential crash on startup caused by logging macros being invoked before the logging system is initialized (null logger dereference in `ZenServerState::Sweep()`). `LoggerRef::ShouldLog` now guards against a null logger pointer. - Make CPR an optional dependency (`--zencpr` build option, enabled by default) so builds can proceed without it - Make zenvfs Windows-only (platform-specific target) - Generate the frontend zip at build time from source HTML files instead of checking in a binary blob which would accumulate with every single update
* Add clang-cl build supportStefan Boberg13 days5-74/+82
| | | | | | | | | | - Add clang-cl warning suppressions in xmake.lua matching Linux/macOS set - Guard /experimental:c11atomics with {tools="cl"} for MSVC-only - Fix long long / int64_t redefinition in string.h for clang-cl - Fix unclosed namespace in callstacktrace.cpp #else branch - Fix missing override in httpplugin.cpp - Reorder WorkerPool fields to match designated initializer order - Use INVALID_SOCKET instead of SOCKET_ERROR for SOCKET comparisons
* Unix Domain Socket auto discovery (#833)Stefan Boberg13 days28-105/+538
| | | | | | | | This PR adds end-to-end Unix domain socket (UDS) support, allowing zen CLI to discover and connect to UDS-only servers automatically. - **`unix://` URI scheme in zen CLI**: The `-u` / `--hosturl` option now accepts `unix:///path/to/socket` to connect to a zenserver via a Unix domain socket instead of TCP. - **Per-instance shared memory for extended server info**: Each zenserver instance now publishes a small shared memory section (keyed by SessionId) containing per-instance data that doesn't fit in the fixed-size ZenServerEntry -- starting with the UDS socket path. This is a 4KB pagefile-backed section on Windows (`Global\ZenInstance_{sessionid}`) and a POSIX shared memory object on Linux/Mac (`/UnrealEngineZen_{sessionid}`). - **Client-side auto-discovery of UDS servers**: `zen info`, `zen status`, etc. now automatically discover and prefer UDS connections when a server publishes a socket path. Servers running with `--no-network` (UDS-only) are no longer invisible to the CLI. - **`kNoNetwork` flag in ZenServerEntry**: Servers started with `--no-network` advertise this in their shared state entry. Clients skip TCP fallback for these servers, and display commands (`ps`, `status`, `top`) show `-` instead of a port number to indicate TCP is not available.
* Switch httpclient default back-end over to libcurl (#832)Stefan Boberg13 days6-702/+454
| | | | | | | | | | | | | | | | | | | | | | | | | | | Switches the default HTTP client to the libcurl-based backend and follows up with a series of correctness fixes and code quality improvements to `CurlHttpClient`. **Backend switch & build fixes:** - Switch default HTTP client to libcurl-based backend - Suppress `[[nodiscard]]` warning when building fmt - Miscellaneous bugfixes in HttpClient/libcurl - Pass `-y` to `xmake config` in `xmake test` task **Boilerplate reduction:** - Add `Session::SetHeaders()` for RAII ownership of `curl_slist`, eliminating manual `curl_slist_free_all` calls from every verb method - Add `Session::PerformWithResponseCallbacks()` to absorb the repeated 12-line write+header callback setup block - Extract `ParseHeaderLine()` shared helper, replacing 4 duplicate header-parsing implementations - Extract `BuildHeaderMap()` and `ApplyContentTypeFromHeaders()` helpers to deduplicate header-to-map conversion and Content-Type scanning - Unify the two `DoWithRetry` overloads (PayloadFile variant now delegates to the Validate variant) **Correctness fixes:** - `TransactPackage`: both phases now use `PerformWithResponseCallbacks()`, fixing missing abort support and a dead header collection loop - `TransactPackage`: error path now routes through `CommonResponse`, preserving curl error codes and messages for the caller - `ValidatePayload`: merged 3 separate header-scan loops into a single pass **Performance improvements:** - Replace `fmt::format` with `ExtendableStringBuilder` in `BuildHeaderList` and `BuildUrlWithParameters`, eliminating heap allocations in the common case - Replace `curl_easy_escape`/`curl_free` with inline URL percent-encoding using `AsciiSet` - Remove wasteful `CommonResponse(...)` construction in retry logging, formatting directly from `CurlResult` fields