aboutsummaryrefslogtreecommitdiff
path: root/src/zenserver/storage/zenstorageserver.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Request validation and resilience improvements (#864)Stefan Boberg6 days1-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Security: Input validation & path safety - **Reject local file references by default** in package parsing — only allow when explicitly opted in by the service (`ParseFlags::kAllowLocalReferences`) and validated by an `ILocalRefPolicy` (fail-closed: no policy = rejected) - **`DataRootLocalRefPolicy`** restricts local ref paths to the server's data root via canonical path prefix matching - **Validate attachment hashes** in compute HTTP handlers — decompresses and re-hashes each attachment at ingestion time to reject tampered payloads - **Path traversal validation** for worker descriptions (`pathvalidation.h`) — rejects absolute paths, `..` components, Windows reserved device names, and invalid filename characters - **Harden CbPackage parsing** against corrupt inputs — overflow-safe attachment count, bounds checks on local ref offset/size, graceful failure instead of `ZEN_ASSERT` for untrusted data - **Harden legacy package parser** — reject zero-size binary fields, missing mappers, and optionally validate resolved attachment hashes - **Bounds check in `CbPackageReader::MarshalLocalChunkReference`** — detect when `MakeFromFile` silently clamps offset+size to file size ### Reliability: Lock consolidation & bug fixes - **Consolidate three action map locks into one** (`m_ActionMapLock`) — eliminates deadlock risk from multi-lock ordering, simplifies state transitions, and fixes a race where newly enqueued actions were briefly invisible to `GetActionResult`/`FindActionResult` - **Fix infinite loop in `BaseRunnerGroup::SubmitActions`** when actions exceed total runner capacity — cap round-robin at `TotalCapacity` and default unassigned results to "No capacity" - **Fix `MakeSafeAbsolutePathInPlace` for UNC paths** — `\server\share` now correctly becomes `\?\UNC\server\share` instead of `\?\server\share` - **Fix `max_retries=0`** — previously fell through to the default of 3; now correctly means "no retries" ### New: ManagedProcessRunner - Cross-platform process runner backed by `SubprocessManager` — uses async exit callbacks instead of polling, delegates CPU/memory metrics to the manager's built-in sampler - `ProcessGroup` (JobObject on Windows, process group on POSIX) for bulk cancellation on shutdown - `--managed` flag on `zen exec inproc` to select this runner - Refactored monitor thread lifecycle — `StartMonitorThread()` now called from derived constructors to avoid calling virtual functions from base constructor ### Process management - **Suppress crash dialogs** via `JOB_OBJECT_UILIMIT_ERRORMODE` + `SEM_NOGPFAULTERRORBOX` in both `WindowsProcessRunner` and `JobObject::Initialize` — prevents WER/Dr. Watson modal dialogs from blocking the monitor thread - **CREATE_SUSPENDED → AssignProcessToJobObject → ResumeThread** pattern in `WindowsProcessRunner` — ensures job object assignment before process execution - **Move stdout/stderr callbacks to `Spawn()` parameters** in `SubprocessManager` — prevents race where early output could be missed before callback installation - Consistent PID logging across all runner types ### Test infrastructure - **`zentest-appstub`**: Added `Fail` (configurable exit code) and `Crash` (abort / nullptr deref) test functions - **Compute integration tests**: exit code handling, auto-retry exhaustion, manual reschedule after failure, mixed success/failure queues, crash handling (abort + nullptr), crash auto-retry, immediate query visibility after enqueue - **Package format tests**: truncated header, bad magic, attachment count overflow, truncated data, local ref rejection/acceptance, policy enforcement (inside/outside root, traversal, no-policy fail-closed) - **Legacy package parser tests**: empty input, zero-size binary, hash resolution with/without mapper, hash mismatch detection - **UNC path tests** for `MakeSafeAbsolutePath` ### Misc - ANSI color helper macros (`ZEN_RED`, `ZEN_BRIGHT_WHITE`, etc.) and `ZEN_BOLD`/`ZEN_DIM`/etc. - Generic `fmt::formatter` for types with free `ToString` functions - Compute dashboard: truncated hash display with monospace font and hover for full value - Renamed `usonpackage_forcelink` → `cbpackage_forcelink` - Compute enabled by default in xmake config (releases still explicitly disable)
* idle deprovision in hub (#895)Dan Engelbrecht9 days1-2/+2
| | | | | | | | | | | | | - Feature: Hub watchdog automatically deprovisions inactive provisioned and hibernated instances - Feature: Added `stats/activity_counters` endpoint to measure server activity - Feature: Added configuration options for hub watchdog - `--hub-watchdog-provisioned-inactivity-timeout-seconds` Inactivity timeout before a provisioned instance is deprovisioned - `--hub-watchdog-hibernated-inactivity-timeout-seconds` Inactivity timeout before a hibernated instance is deprovisioned - `--hub-watchdog-inactivity-check-margin-seconds` Margin before timeout at which an activity check is issued - `--hub-watchdog-cycle-interval-ms` Watchdog poll interval in milliseconds - `--hub-watchdog-cycle-processing-budget-ms` Maximum time budget per watchdog cycle in milliseconds - `--hub-watchdog-instance-check-throttle-ms` Minimum delay between checks on a single instance - `--hub-watchdog-activity-check-connect-timeout-ms` Connect timeout for activity check requests - `--hub-watchdog-activity-check-request-timeout-ms` Request timeout for activity check requests
* Dashboard refresh (logs, storage, network, object store, docs) (#835)Stefan Boberg13 days1-4/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Summary This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements. ### Sessions Service - `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle) - Storage server registers itself with its own local sessions service on startup - Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.) - Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All) - `--sessions-url` config option for remote session announcement - In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard ### Session Log Viewer - POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array - In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching - Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring - Auto-selects the server's own session on page load ### TCP Log Streaming - `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP - Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps - Gathered buffer writes to reduce syscall overhead when flushing batches - Tests covering basic delivery, multi-line splitting, drop detection, and sequencing ### New Dashboard Pages - **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session - **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`) - **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels - **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog ### Documentation Page - In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking - New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md` - Dev docs moved to `docs/dev/` ### Infrastructure & Bug Fixes - **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers - **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner - **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection - Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers - Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
* Logger simplification (#883)Stefan Boberg13 days1-0/+3
| | | | | | | | | | | - **`Logger` now holds a single `SinkPtr`** instead of a `std::vector<SinkPtr>`. The `SetSinks`/`AddSink` API is replaced with a single `SetSink`. This removes complexity from `Logger` itself and makes `Clone()` cheaper (no vector copy). - **New `BroadcastSink`** (`zencore/logging/broadcastsink.h`) acts as a thread-safe, shared indirection point that fans out to a dynamic list of child sinks. Adding or removing a child sink via `AddSink`/`RemoveSink` is immediately visible to every `Logger` that holds a reference to it — including cloned loggers — without requiring each logger to be updated individually. - **`GetDefaultBroadcastSink()`** (exposed from `zenutil/logging.h`) gives server-layer code access to the shared broadcast sink so it can register optional sinks (OTel, TCP log stream) after logging is initialized, without going through `Default()->AddSink()`. ### Motivation Previously, dynamically adding sinks post-initialization mutated the default logger's internal sink vector directly. This was fragile: cloned loggers (created before `AddSink` was called) would not pick up the new sinks. `BroadcastSink` fixes this by making the sink list a shared, mutable object that all loggers sharing the same broadcast instance observe uniformly.
* improve zenserver startup time (#879)Dan Engelbrecht2026-03-221-1/+1
| | | | - Improvement: Lazy initialize CpuSampler - reduces startup time by ~600 ms - Bugfix: Don't try to wipe .sentry-native folder at missing manifest - sentry is already running. Reduces startup time by ~450 ms when data folder is empty
* fix null stats provider crash when build store is not configured (#875)v5.7.25-pre0Stefan Boberg2026-03-211-1/+4
| | | | | | - When build store is not configured, `m_BuildCidStore` is null but was unconditionally added as a stats provider, causing a null pointer dereference crash in `StatsReporter::ReportStats` - Added a null guard in `AddProvider` to reject null pointers defensively - Added a conditional check at the call site in `InitializeStructuredCache`
* zen hub port reuse (#850)Dan Engelbrecht2026-03-171-0/+1
| | | | | | | | - Feature: Added `--allow-port-probing` option to control whether zenserver searches for a free port on startup (default: true, automatically false when --dedicated is set) - Feature: Added new hub options for controlling provisioned storage server instances: - `--hub-instance-http` - HTTP server implementation for instances (asio/httpsys) - `--hub-instance-http-threads` - Number of HTTP connection threads per instance - `--hub-instance-corelimit` - Limit CPU concurrency per instance - Improvement: Hub now manages a deterministic port pool for provisioned instances allowing reuse of unused ports
* URI decoding, process env, compiler info, httpasio strands, regex route ↵Stefan Boberg2026-03-161-1/+1
| | | | | | | | | | | | | | | | | removal (#841) - Percent-decode URIs in ASIO HTTP server to match http.sys CookedUrl behavior, ensuring consistent decoded paths across backends - Add Environment field to CreateProcOptions for passing extra env vars to child processes (Windows: merged into Unicode environment block; Unix: setenv in fork) - Add GetCompilerName() and include it in build options startup logging - Suppress Windows CRT error dialogs in test harness for headless/CI runs - Fix mimalloc package: pass CMAKE_BUILD_TYPE, skip cfuncs test for cross-compile - Add virtual destructor to SentryAssertImpl to fix debug-mode warning - Simplify object store path handling now that URIs arrive pre-decoded - Add URI decoding test coverage for percent-encoded paths and query params - Simplify httpasio request handling by using strands (guarantees no parallel handlers per connection) - Removed deprecated regex-based route matching support - Fix full GC never triggering after cross-toolchain builds: The `gc_state` file stores `system_clock` ticks, but the tick resolution differs between toolchains (nanoseconds on GCC/standard clang, microseconds on UE clang). A nanosecond timestamp misinterpreted as microseconds appears far in the future (~year 58,000), bypassing the staleness check and preventing time-based full GC from ever running. Fixed by also resetting when the stored timestamp is in the future. - Clamp GC countdown display to configured interval: Prevents nonsensical log output (e.g. "Full GC in 492128002h") caused by the above or any other clock anomaly. The clamp applies to both the scheduler log and the status API.
* Add --no-network option (#831)Stefan Boberg2026-03-121-2/+2
| | | | - Add `--no-network` CLI option which disables all TCP/HTTPS listeners, restricting zenserver to Unix domain socket communication only. - Also fixes asio upgrade breakage on main
* Merge branch 'main' into lm/restrict-content-typeLiam Mitchell2026-03-091-1/+3
|\
| * Merge branch 'main' into lm/oidctoken-exe-pathLiam Mitchell2026-03-091-2/+53
| |\
| * \ Merge branch 'main' into lm/oidctoken-exe-pathLiam Mitchell2026-03-091-2/+13
| |\ \
| * | | Allow external OidcToken executable to be specified unless disabled via ↵Liam Mitchell2026-03-041-1/+2
| | | | | | | | | | | | | | | | command line or config
| * | | Pass command-line OidcToken option through config rather than env variables, ↵Liam Mitchell2026-01-151-2/+8
| | | | | | | | | | | | | | | | and add lua option
* | | | Merge branch 'main' into lm/restrict-content-typeLiam Mitchell2026-03-091-2/+53
|\ \ \ \ | | |_|/ | |/| |
| * | | Dashboard overhaul, compute integration (#814)Stefan Boberg2026-03-091-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS. - **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking. - **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration. - **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP. - **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch. - **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support. Also contains some fixes from TSAN analysis.
| * | | compute orchestration (#763)Stefan Boberg2026-03-041-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added local process runners for Linux/Wine, Mac with some sandboxing support - Horde & Nomad provisioning for development and testing - Client session queues with lifecycle management (active/draining/cancelled), automatic retry with configurable limits, and manual reschedule API - Improved web UI for orchestrator, compute, and hub dashboards with WebSocket push updates - Some security hardening - Improved scalability and `zen exec` command Still experimental - compute support is disabled by default
| * | | HttpService/Frontend improvements (#782)Stefan Boberg2026-02-251-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - zenhttp: added `GetServiceUri()`/`GetExternalHost()` - enables code to quickly generate an externally reachable URI for a given service - frontend: improved Uri handling (better defaults) - added support for 404 page (to make it easier to find a good URL)
| * | | structured compute basics (#714)Stefan Boberg2026-02-181-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this change adds the `zencompute` component, which can be used to distribute work dispatched from UE using the DDB (Derived Data Build) APIs via zenserver this change also adds a distinct zenserver compute mode (`zenserver compute`) which is intended to be used for leaf compute nodes to exercise the compute functionality without directly involving UE, a `zen exec` subcommand is also added, which can be used to feed replays through the system all new functionality is considered *experimental* and disabled by default at this time, behind the `zencompute` option in xmake config
| * | | logging config move to zenutil (#754)Stefan Boberg2026-02-131-1/+1
| | |/ | |/| | | | made logging config options from zenserver available in zen CLI
* / | Allow requests with invalid content-types unless specified in command line ↵Liam Mitchell2026-03-041-2/+8
|/ / | | | | | | or config
* / zenserver API changes, some other minor changes (#720)Stefan Boberg2026-01-191-2/+13
|/ | | | | | | * add system metrics output to top command * removed unnecessary xmake directives * file system API/comment tweaks * fixed out-of-range access in httpserver test * updated ZenServer base API to allow customization by mode
* remove error warning in output (#695)Dan Engelbrecht2025-12-171-1/+1
| | | * changed some logging string so they don't get caught in CI logging
* add otel instrumentation (#581)Stefan Boberg2025-12-111-0/+12
| | | | | | | | this change adds OTEL tracing to a few places * Top-level application lifecycle (config/init/cleanup, main loop) * http.sys requests it also brings some otlptrace optimizations and dynamic configuration of tracing. OTLP tracing is currently always disabled
* automatic scrub on startup (#667)Dan Engelbrecht2025-11-271-25/+23
| | | | | - Improvement: Deeper validation of data when scrub is activated (cas/cache/project) - Improvement: Enabled more multi threading when running scrub operations - Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
* switch to xmake for package management (#611)Stefan Boberg2025-11-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | This change removes our dependency on vcpkg for package management, in favour of bringing some code in-tree in the `thirdparty` folder as well as using the xmake build-in package management feature. For the latter, all the package definitions are maintained in the zen repo itself, in the `repo` folder. It should now also be easier to build the project as it will no longer depend on having the right version of vcpkg installed, which has been a common problem for new people coming in to the codebase. Now you should only need xmake to build. * Bumps xmake requirement on github runners to 2.9.9 to resolve an issue where xmake on Windows invokes cmake with `v144` toolchain which does not exist * BLAKE3 is now in-tree at `thirdparty/blake3` * cpr is now in-tree at `thirdparty/cpr` * cxxopts is now in-tree at `thirdparty/cxxopts` * fmt is now in-tree at `thirdparty/fmt` * robin-map is now in-tree at `thirdparty/robin-map` * ryml is now in-tree at `thirdparty/ryml` * sol2 is now in-tree at `thirdparty/sol2` * spdlog is now in-tree at `thirdparty/spdlog` * utfcpp is now in-tree at `thirdparty/utfcpp` * xmake package repo definitions is in `repo` * implemented support for sanitizers. ASAN is supported on windows, TSAN, UBSAN, MSAN etc are supported on Linux/MacOS though I have not yet tested it extensively on MacOS * the zencore encryption implementation also now supports using mbedTLS which is used on MacOS, though for now we still use openssl on Linux * crashpad * bumps libcurl to 8.11.0 (from 8.8.0) which should address a rare build upload bug
* make sure OpenProcessCache is initialized before use (#618)Stefan Boberg2025-10-291-4/+4
| | | previously, a null reference would be passed into ProjectStore constructor
* restructured zenserver configuration (#575)Stefan Boberg2025-10-151-9/+8
| | | this breaks out the configuration logic to allow multiple applications to share common configuration and initialization logic whilst customizing chosen aspects of the process
* move all storage-related services into storage tree (#571)Stefan Boberg2025-10-141-0/+961
* move all storage-related services into storage tree * move config into config/ * also move admin service into storage since it mostly has storage related functionality * header consolidation