| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Feature: Hub hydration packs small files into raw CAS pack blobs to reduce request count for modules dominated by tiny metadata files
- `--hub-hydration-enable-pack` (Lua: `hub.hydration.enablepack`, default true)
- `--hub-hydration-pack-threshold-bytes` (Lua: `hub.hydration.packthresholdbytes`, default 256 KiB)
- `--hub-hydration-max-pack-bytes` (Lua: `hub.hydration.maxpackbytes`, default 4 MiB)
- Feature: Hub hydration and dehydration can be disabled per direction
- `--hub-enable-hydration` (Lua: `hub.enablehydration`, default true)
- `--hub-enable-dehydration` (Lua: `hub.enabledehydration`, default true)
- Feature: Hub hydration accepts a configurable file exclude list via `HydrationOptions` `excludes` (array of wildcards). Built-in defaults skip transient runtime files (`.lock`, `.sentry-native/*`, `state_marker`, `*.bak`, `gc/reserve.gc`, `auth/*`) so they no longer participate in dehydrate scans. Override semantics: a present field replaces the default outright; explicit `[]` opts out of all defaults.
- Improvement: Hub hydration completion logs now report per-request average and max latency, peak in-flight workers, queue wait, and hash-cache hit percentage; loose and pack-blob transfers are reported separately
- Improvement: Hub hydration pre-creates unique parent directories before scheduling parallel writes
- Improvement: S3 hydration retries transient HTTP failures (timeouts, 429 throttling, 5xx server errors, connection errors) up to 3 times via the HTTP client retry layer
- Improvement: S3 hydration multipart chunk size is persisted in `state.cbo` per module so hydrate replays the partitioning used at dehydrate; default raised to 64 MiB (was 32 MiB)
- Improvement: Hub hydration `Obliterate` retries backend delete once before falling back to local cleanup
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Security review follow-ups to the `zen` CLI. Each fix stands on its own commit. Grouped by category below.
## Credentials and secrets
- **Per-install random auth encryption key instead of a hardcoded literal.** The default AES key and IV used to encrypt persisted OIDC refresh tokens / OAuth client secrets were ASCII literals compiled into the public source. Replaced with 32+16 random bytes persisted to `<system-root>/auth/machinekey.dat`. `SecureRandomBytes` added in zencore/crypto wrapping BCryptGenRandom / OpenSSL / mbedTLS CTR_DRBG. Partial override (only one of `--encryption-aes-key`/`--encryption-aes-iv`) is now rejected instead of silently using the hardcoded half.
- **Wrap the machine key with OS-protected storage.** `machinekey.dat` is now a tagged format (4-byte magic + flags + wrapped-or-raw payload). Windows wraps via DPAPI (`CryptProtectData` at per-user scope) so a stolen disk copy cannot decrypt without the OS master key. macOS uses Keychain Services (GenericPassword under `org.unrealengine.zen.auth`, `kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly`). Linux uses libsecret (opt-in via `--zenlibsecret=yes`, off by default because headless servers typically have no Secret Service daemon). All platforms fall back to raw persistence with `0600` perms on POSIX when wrapping is unavailable. Legacy files from the prior commit are detected by size and still read.
> Note: argv-redaction before Sentry on crash was previously part of this PR but was superseded by `ScrubSensitiveValues()` from #989; this PR now just calls that helper instead of walking argv itself.
## Path traversal
- **Reject unsafe filenames from the remote oplog in `oplog-mirror`.** The filename from each oplog entry was joined to the mirror root without normalisation; a compromised remote could use drive letters, UNC shares, device path prefixes, absolute paths, or `..` components to write anywhere the zen user could write. An `UnsafeFileNameReason` check runs immediately after extraction, logs the offending filename, and aborts the mirror.
- **Use the resolved absolute download-spec path in `builds download`.** `--download-spec-path` was computed into a sanitised absolute path, then the original unresolved path was passed to `ParseBuildManifest`, bypassing the `MakeSafeAbsolutePath` mitigations and reading from the process cwd rather than `--local-path`.
## Input validation
- **Stop asserting on malformed `--build-id` / `--build-part-id`.** `Oid::FromHexString` asserts on bad input and `ZEN_ASSERT` is active in release, so a too-short or non-hex user value aborted the process instead of surfacing an `OptionParseException`. Routed all callers through `TryFromHexString`. Also fixes `ParseBuildPartId` reporting errors under the wrong option name.
- **Check the JSON parse error in `oplog-export --builds-metadata-path`.** The single-arg `LoadCompactBinaryFromJson` overload discarded the parser error; malformed JSON shipped a truncated compact-binary `metadata` field to the server with no indication. Switched to the two-arg overload and throws a descriptive error naming the file and reason.
- **Format the actual value in the malformed `--url` error.** The message was constructed with a literal `{}` placeholder and no `fmt::format` call, so users saw the placeholder instead of the offending URL.
- **Require `--output-path` in `cache get` unless `--as-text` is set.** Previously an empty path auto-filled from the value key / attachment hash and wrote into the process cwd; the `--as-text && empty path` stdout branch was unreachable because the auto-fill ran first.
- **Clear the cxxopts `allow_unrecognised_options` flag after permissive parse.** `ParseOptionsPermissive` set the flag on the Options it received and never cleared it, priming that Options for silent typo acceptance on any later reuse. Added `disallow_unrecognised_options()` to the vendored cxxopts (local patch — flagged at the declaration) and wrapped the toggle in RAII.
## Resource lifecycle
- **Restore signal handlers via RAII.** `wipe`, `builds`, and `oplog-mirror` installed SIGINT/SIGBREAK handlers with raw `signal()` and never restored them; an option-parse throw left the handler targeting an abort flag nothing reads. Added `zen::ScopedSignalHandler` in zen.h and applied at all three sites (builds uses `std::optional` members so the guards survive past `OnParentOptionsParsed` into the subcommand's Run).
- **Route SIGINT in `oplog-mirror` to the worker-pool abort flag.** The command declared a local `std::atomic<bool> AbortFlag` but no handler targeted it — Ctrl-C killed the process instead of cleanly aborting. Added a `MirrorAbortFlag` / `MirrorSignalCallbackHandler` pair in projectstore_impl and bound the local as a reference; existing `.store`/`.load`/capture sites unchanged.
- **Clean up the `cache get` temp download on every exit path.** `Http.Download` parks the payload in the system temp dir; a failed `MoveToFile` (cross-volume, denied target) or an exception could leave the temp file behind. The downloaded buffer is already flagged delete-on-close by `HttpClient`, so the fix is just to clear that flag after a successful `MoveToFile` so the renamed-out file isn't reaped.
## Other
- **Fix wrong URL fields in `oplog-export` / `oplog-import` builds-branch descriptions.** Two operator-facing "[builds] URL/namespace/bucket/buildsid" messages formatted `m_CloudUrl` instead of `m_BuildsUrl` / `m_BuildsHost` (copy-paste from neighbouring `[cloud]` branches), shown as empty or stale at the start of an export/import.
- **Fix "Can't find oplog in project '{}'" formatting and a "Failed top mirror" typo in projectstore_cmd.**
- **Fix a misleading `oplog-export` comment on the `--zen` scheme default** ("Assume https" vs. the `http://` the code writes).
- **Fail `ScrambleDir` when `RemoveFile` doesn't delete.** The `zen builds test` scramble phase used `(void)RemoveFile(FilePath)`, discarding both the bool return and the error. A quiet delete failure let verification run against stale state; switched to the two-arg overload and throw on false return or non-empty `error_code`.
|
| |
|
|
|
|
|
| |
`--hub-instance-malloc` selects the memory allocator for child instances
`--hub-instance-trace` sets trace channels for child instances
`--hub-instance-tracehost` sets the trace streaming host for child instances
`--hub-instance-tracefile` sets the trace output file for child instances
add {moduleid} and {port} placeholder support for tracefile
|
| |
|
|
| |
registration (#939)
|
| |
|
|
| |
- Feature: Hub obliterate operation deletes all local and backend hydration data for a module
- Improvement: Hub dashboard adds obliterate button for individual, bulk, and by-name module deletion
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Feature: Incremental CAS-based hydration/dehydration replacing the previous full-copy approach
- Feature: S3 hydration backend with multipart upload/download support
- Feature: Configurable thread pools for hub instance provisioning and hydration
`--hub-instance-provision-threads` defaults to `max(cpu_count / 4, 2)`. Set to 0 for synchronous operation.
`--hub-hydration-threads` defaults to `max(cpu_count / 4, 2)`. Set to 0 for synchronous operation.
- Improvement: Hub triggers GC on instance before deprovisioning to compact storage before dehydration
- Improvement: GC status now reports pending triggers as running
- Improvement: S3 client debug logging gated behind verbose mode to reduce log noise at default verbosity
- Improvement: Hub dashboard Resources tile now shows total memory
- Improvement: `filesystemutils` moved from `zenremotestore` to `zenutil` for broader reuse
- Improvement: Hub uses separate provision and hydration worker pools to avoid deadlocks
- Improvement: Hibernate/wake/deprovision on non-existent or already-in-target-state modules are idempotent
- Improvement: `ScopedTemporaryDirectory` with empty path now creates a temporary directory instead of asserting
|
| |
|
|
|
|
|
|
|
|
|
| |
- Improvement: Hub server now supports Lua config file for all hub-specific options
- `hub.upstreamnotification.*` - upstream notification endpoint and instance ID
- `hub.consul.*` - service registration endpoint, token, health interval, deregister timeout
- `hub.instance.*` - base port, HTTP class, thread count, core limit, config path
- `hub.instance.limits.*` - instance count cap, disk and memory usage limits
- `hub.hydration.*` - hydration target spec and config path
- `hub.watchdog.*` - cycle timing, inactivity timeouts, and activity check timeouts
- Improvement: Added `--hub-instance-base-port-number` as an alias for `--hub-base-port-number`, and `--upstream-notification-instance-id` as an alias for `--instance-id`
- Improvement: Added hub mode documentation at docs/hub.md
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## Summary
This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements.
### Sessions Service
- `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle)
- Storage server registers itself with its own local sessions service on startup
- Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.)
- Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All)
- `--sessions-url` config option for remote session announcement
- In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard
### Session Log Viewer
- POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array
- In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching
- Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring
- Auto-selects the server's own session on page load
### TCP Log Streaming
- `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP
- Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps
- Gathered buffer writes to reduce syscall overhead when flushing batches
- Tests covering basic delivery, multi-line splitting, drop detection, and sequencing
### New Dashboard Pages
- **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session
- **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`)
- **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels
- **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog
### Documentation Page
- In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking
- New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md`
- Dev docs moved to `docs/dev/`
### Infrastructure & Bug Fixes
- **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers
- **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner
- **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection
- Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers
- Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
|
| |
|
| |
Restructured the docs folder in preparation for more docs. Improved the contents a bit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
### Compute Batch Submission
- Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads
- Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure
- Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps`
### Retracted Action State
- Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount`
- Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession`
- Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints
- Add state machine documentation and per-state comments to `RunnerAction`
### Compute Race Fixes
- Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely
- Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK
### Logging Optimization and ANSI improvements
- Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex`
- Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage`
- Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h`
- Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences
- Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files
### CLI / Exec Refactoring
- Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct
- Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`)
- Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser
### Testing Improvements
- Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter
- Add test suite banners to test listener output
- Made `function.session.abandon_pending` test more robust
### Startup / Reliability Fixes
- Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()`
- Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing
- Fail on unrecognized zenserver `--mode` instead of silently defaulting to store
### Other
- Show host details (hostname, platform, CPU count, memory) when discovering new compute workers
- Move frontend `html.zip` from source tree into build directory
- Add format specifications for Compact Binary and Compressed Buffer wire formats
- Add `WriteCompactBinaryObject` to zencore
- Extended `ConsoleTui` with additional functionality
- Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support
- Disable compute/horde/nomad in release builds (not yet production-ready)
- Disable unintended `ASIO_HAS_IO_URING` enablement
- Fix crashpad patch missing leading whitespace
- Clean up code triggering gcc false positives
|
| |
|
|
|
|
|
| |
This PR makes it *possible* to do a Windows build on Linux via `clang-cl`.
It doesn't actually change any build process. No policy change, just mechanics and some code fixes to clear clang compilation.
The code fixes are mainly related to #include file name casing, to match the on-disk casing of the SDK files (via xwin).
|
| |
|
|
|
|
|
|
|
|
| |
- Added local process runners for Linux/Wine, Mac with some sandboxing support
- Horde & Nomad provisioning for development and testing
- Client session queues with lifecycle management (active/draining/cancelled), automatic retry with configurable limits, and manual reschedule API
- Improved web UI for orchestrator, compute, and hub dashboards with WebSocket push updates
- Some security hardening
- Improved scalability and `zen exec` command
Still experimental - compute support is disabled by default
|
| |
|
|
|
|
|
|
|
| |
this change adds the `zencompute` component, which can be used to distribute work dispatched from UE using the DDB (Derived Data Build) APIs via zenserver
this change also adds a distinct zenserver compute mode (`zenserver compute`) which is intended to be used for leaf compute nodes
to exercise the compute functionality without directly involving UE, a `zen exec` subcommand is also added, which can be used to feed replays through the system
all new functionality is considered *experimental* and disabled by default at this time, behind the `zencompute` option in xmake config
|
| |
|
|
|
| |
Initial implementation of zenserver "hub" mode. This is an experimental feature.
zenserver can be started in hub mode by specifying `hub` as the first argument to zenserver
|
| | |
|
| |
|
|
|
| |
* add docs/Deploy.md
* added ability to specify a version on the command line, via `--version`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes our dependency on vcpkg for package management, in favour of bringing some code in-tree in the `thirdparty` folder as well as using the xmake build-in package management feature. For the latter, all the package definitions are maintained in the zen repo itself, in the `repo` folder.
It should now also be easier to build the project as it will no longer depend on having the right version of vcpkg installed, which has been a common problem for new people coming in to the codebase. Now you should only need xmake to build.
* Bumps xmake requirement on github runners to 2.9.9 to resolve an issue where xmake on Windows invokes cmake with `v144` toolchain which does not exist
* BLAKE3 is now in-tree at `thirdparty/blake3`
* cpr is now in-tree at `thirdparty/cpr`
* cxxopts is now in-tree at `thirdparty/cxxopts`
* fmt is now in-tree at `thirdparty/fmt`
* robin-map is now in-tree at `thirdparty/robin-map`
* ryml is now in-tree at `thirdparty/ryml`
* sol2 is now in-tree at `thirdparty/sol2`
* spdlog is now in-tree at `thirdparty/spdlog`
* utfcpp is now in-tree at `thirdparty/utfcpp`
* xmake package repo definitions is in `repo`
* implemented support for sanitizers. ASAN is supported on windows, TSAN, UBSAN, MSAN etc are supported on Linux/MacOS though I have not yet tested it extensively on MacOS
* the zencore encryption implementation also now supports using mbedTLS which is used on MacOS, though for now we still use openssl on Linux
* crashpad
* bumps libcurl to 8.11.0 (from 8.8.0) which should address a rare build upload bug
|
| |
|
|
|
| |
* update validate action for enterprise github
* updated runner setup options
* updated release workflow for enterprise github
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
| |
* moved source directories into `/src`
* updated bundle.lua for new `src` path
* moved some docs, icon
* removed old test trees
|
| | |
|
| |
|