| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Mirrors CompactString's compact length-prefix layout but stores an
atomic<uint32_t> in the buffer header so multiple instances can share
a single allocation. Copies just bump the refcount; the buffer is
freed when the last referencing instance is destroyed.
|
| |
|
| |
- Improvement: Hub pools HTTP connections to managed instances so provision/deprovision churn no longer exhausts Windows ephemeral ports
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A collection of security, correctness, and robustness fixes in `zenhttp` and `zencore` surfaced by security review. Most items are small, independent commits grouped here because they all tighten trust boundaries or fix UB along the same code paths.
## WebSocket protocol hardening (RFC 6455)
- **Enforce the client-side mask bit**. Server-side frame loops now reject unmasked frames with close code 1002 per §5.1. Prevents HTTP intermediary smuggling.
- **Validate control frames and RSV bits**. Fragmented control frames, oversized (>125 B) control payloads, and any non-zero RSV bit now fail the connection before allocation.
- **Lower per-frame payload cap** from 256 MB → 4 MB. Bounds per-connection accumulator memory.
- **Implement message fragmentation**. Continuation frames are coalesced and delivered as a single message; interleaved non-control frames close with 1002; assembled messages are capped at 4 MB (1009 on overflow). Previously partial fragments were delivered to handlers, bypassing payload validation.
- **Parse the 101 handshake response properly** in `HttpWsClient`. Status-line, `Upgrade`, `Connection`, and `Sec-WebSocket-Accept` are now matched exactly rather than via substring searches against the full body.
## Auth / OIDC hardening
- **Constant-time password compare** in `PasswordSecurity::IsAllowed` (closes a remote length/content timing oracle). Adds a shared `ConstantTimeEquals` helper.
- **Harden Basic-auth header parsing**: trim trailing LWS, reject control bytes and DEL in the credential.
- **OIDC discovery pinning**: require HTTPS (loopback exempt), verify `issuer` matches `BaseUrl`, require `token_endpoint` / `userinfo_endpoint` / `jwks_uri` to share origin with `BaseUrl`, reject empty `token_endpoint`.
- **Restrict `POST /auth/oidc/refreshtoken`** to local-machine requests. Previously unauthenticated in default deployments — remote callers could evict or replace cached tokens.
- **Stop logging OIDC provider response bodies** on refresh failure (IdPs echo `refresh_token` back in error bodies).
- **Drop the unused `IdentityToken` field** from `OidcClient` / `OpenIdToken` so nothing in the tree accidentally trusts an unverified JWT.
## Auth state encryption migration
- Add `AesGcm` AEAD primitive (BCrypt / OpenSSL backends, mbedTLS stubbed) and `CryptoRandom::Fill` CSPRNG helper in `zencore/crypto.h`.
- Migrate authstate file from AES-256-CBC with a fixed IV to AES-GCM with a fresh 12-byte random nonce per write and the 4-byte `ZEN1` magic bound as AAD. Legacy-CBC files are transparently read once and rewritten in the new format.
## Filesystem / IO robustness
- `IoBufferExtendedCore::Materialize` now checks `MAP_FAILED` on POSIX (was comparing to `nullptr`, which let the failure sentinel propagate into later reads and `munmap(MAP_FAILED, ...)`).
- `IoBufferBuilder::MakeFromFile / MakeFromTemporaryFile`: close the FD/HANDLE on exception via a dismissable `ScopeGuard`; actually check the `fstat()` return value (previously used an uninitialized `FileSize`).
- `ReadFromFileMaybe`: loop short reads, retry `EINTR`, chunk Windows `ReadFile` at `0xFFFFFFFF` bytes (fixes silent truncation of multi-GiB reads).
- `WipeDirectory`: compare `FindFirstFileW` handle against `INVALID_HANDLE_VALUE` rather than `nullptr`.
- `RemoveFileNative` (Linux/macOS): report non-`ENOENT` stat failures via the `std::error_code` out-param and stop reading `st_mode` after a failed stat.
## Buffer / compression correctness
- Avoid per-copy `IoBufferCore` heap allocations in `CompositeBuffer::CopyTo / ViewOrCopyRange` iterators; add fast path for `BufferHeader::Read` when the 64-byte header fits in the first plain-memory segment.
- `BufferHeader`: add `IsHeaderValid()` gate covering `BlockSizeExponent` range, `BlockCount * BlockSize` overflow, and `TotalRawSize` bounds before any arithmetic uses them. Defends against attacker-controlled headers that can pass the CRC and trigger OOB writes in `DecompressBlock`.
|
| |
|
|
|
|
|
|
|
|
| |
- **Viewport scrolling.** Cap rendered rows to the visible terminal height and track a scroll offset that follows the selection, so long lists no longer overflow the screen and corrupt the cursor-up redraw. Hint shows `[i/N]` when the list exceeds the viewport.
- **Single-write frame rendering.** Each frame is built into one `ExtendableStringBuilder` and emitted via `TuiWrite`. On Windows, `TuiWrite` routes through `WriteConsoleW` when stdout is a console, so a frame is one syscall instead of one per `printf` — eliminates the visible per-character repaint.
- **All `consoletui` helpers go through `TuiWrite`.** `TuiCursorHome`, `TuiSetScrollRegion`, `TuiResetScrollRegion`, `TuiMoveCursor`, `TuiSaveCursor`, `TuiRestoreCursor`, `TuiEraseLine`, `TuiShowCursor`, and the alternate-screen enter/exit pair now bypass the CRT on Windows consoles, matching the picker. `TuiFlush` remains an unconditional `fflush(stdout)` so callers that mixed `printf` output earlier in a sequence still drain correctly.
- **Width detection fix.** `TuiConsoleColumns` now reports the visible window width rather than the screen-buffer width, so labels sized to it don't wrap on legacy cmd.exe configs where the buffer is wider than the window.
- **PgUp / PgDn.** Jump by one viewport, clamped to the list ends. `VK_PRIOR` / `VK_NEXT` on Windows; `ESC[5~` / `ESC[6~` on POSIX.
- **Terminal resize handling.** Enable `ENABLE_WINDOW_INPUT` on stdin (Windows) and install a `SIGWINCH` handler without `SA_RESTART` (POSIX) so the blocking key read returns a new `ConsoleKey::Resize`. The picker recomputes viewport/label budgets, clears the visible screen, and redraws as a fresh first frame; pre-picker output stays in scrollback.
- **Centralized label truncation.** The picker truncates item labels to fit the current terminal width (cols minus the 3-column indicator), walking back to a UTF-8 codepoint boundary so multi-byte sequences are never split. The hand-rolled width-aware truncation in `history_cmd::BuildLabel` and `ui_cmd` is removed; callers hand the picker the full label and let it clip.
|
| |
|
|
|
| |
- Refactors the five `project-*` top-level commands into a `project <sub>` subcommand structure, mirroring the existing `cache <sub>` pattern. New surface: `project create | drop | info | op-details | stats`.
- Legacy `project-create`, `project-drop`, `project-info`, `project-op-details`, `project-stats` remain functional as hidden deprecated shims that forward through `project_legacy_shim::RunAs`, so existing scripts (e.g. `scripts/test_scripts/oplog-import-export-test.py`) keep working unchanged.
|
| |
|
|
|
| |
- Consolidates the seven `oplog-*` top-level commands into a single `zen oplog <sub>` command tree, mirroring the cache refactor and PR #1026's `project <sub>` work. New surface: `oplog create | export | import | snapshot | mirror | validate | download`.
- Legacy `oplog-create`, `oplog-export`, `oplog-import`, `oplog-snapshot`, `oplog-mirror`, `oplog-validate`, `oplog-download` remain functional as hidden deprecated aliases that forward through `oplog_legacy_shim::RunAs`, so existing scripts keep working.
|
| | |
|
| |
|
| |
* don't set default build part name if download spec is given
|
| | |
|
| | |
|
| | |
|
| |
|
| |
For backwards compatibility, `builds ls` retains past behavior of listing all parts, but allow both `builds download` and `builds prime-cache` to use the new standard of only operating on the "default" part.
|
| |
|
|
|
|
|
| |
- `GetEnvVariable` now returns `std::optional<std::string>` so callers can distinguish an unset variable from one set to an empty value.
- Windows path uses `SetLastError(ERROR_SUCCESS)` + `ERROR_ENVVAR_NOT_FOUND` to detect "not found"; POSIX path returns `nullopt` when `getenv` returns `nullptr`.
- All call sites migrated. Most use `.value_or("")` to preserve current empty-or-unset behavior. The diagnostic helpers in `zen-test/artifactprovider-tests.cpp` now report `<unset>` vs `<empty>` distinctly.
- Added a check in the `ExpandEnvironmentVariables` test confirming `nullopt` for an unset variable; PATH/HOME lookups in that test use `REQUIRE(has_value())` so a missing var fails cleanly instead of throwing `bad_optional_access`.
|
| | |
|
| |
|
| |
- Improvement: `zen builds` `--exclude-folders` and `--exclude-extensions` values now match paths case-insensitively and tolerate surrounding whitespace between separators
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Feature: Hub hydration packs small files into raw CAS pack blobs to reduce request count for modules dominated by tiny metadata files
- `--hub-hydration-enable-pack` (Lua: `hub.hydration.enablepack`, default true)
- `--hub-hydration-pack-threshold-bytes` (Lua: `hub.hydration.packthresholdbytes`, default 256 KiB)
- `--hub-hydration-max-pack-bytes` (Lua: `hub.hydration.maxpackbytes`, default 4 MiB)
- Feature: Hub hydration and dehydration can be disabled per direction
- `--hub-enable-hydration` (Lua: `hub.enablehydration`, default true)
- `--hub-enable-dehydration` (Lua: `hub.enabledehydration`, default true)
- Feature: Hub hydration accepts a configurable file exclude list via `HydrationOptions` `excludes` (array of wildcards). Built-in defaults skip transient runtime files (`.lock`, `.sentry-native/*`, `state_marker`, `*.bak`, `gc/reserve.gc`, `auth/*`) so they no longer participate in dehydrate scans. Override semantics: a present field replaces the default outright; explicit `[]` opts out of all defaults.
- Improvement: Hub hydration completion logs now report per-request average and max latency, peak in-flight workers, queue wait, and hash-cache hit percentage; loose and pack-blob transfers are reported separately
- Improvement: Hub hydration pre-creates unique parent directories before scheduling parallel writes
- Improvement: S3 hydration retries transient HTTP failures (timeouts, 429 throttling, 5xx server errors, connection errors) up to 3 times via the HTTP client retry layer
- Improvement: S3 hydration multipart chunk size is persisted in `state.cbo` per module so hydrate replays the partitioning used at dehydrate; default raised to 64 MiB (was 32 MiB)
- Improvement: Hub hydration `Obliterate` retries backend delete once before falling back to local cleanup
|
| |
|
| |
* fix crash when scavenging sequences or copying local chunks
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RegionName and Category on Misc.RegionBeginWithId were declared as
uint8[] — a byte array with no Field_String class flag. UE Insights'
FEventData::GetString() explicitly requires Field_String and returns
false otherwise, so Insights analyzers that check(GetString(...)) fire
when reading zen traces.
Upstream UE declares these fields as WideString; zen's source strings
are std::string_view, so AnsiString is the natural fit and the wire
bytes are unchanged (same Field_8 aux stream — only the schema class
bit differs). Insights' FString GetString variant accepts either ANSI
or WIDE, so analyzers work without change.
Zen's own tourist-based analyzer in src/zen/trace/trace_model.cpp
reads raw aux bytes via Array<uint8[]> regardless of the schema tag,
and its DecodeRegionName already handles both 1-byte and 2-byte
widths, so it's unaffected.
|
| | |
|
| | |
|
| |
|
|
|
| |
- Improvement: Hub hydration and dehydration completion logs now include per-phase wall time, bytes transferred, bits/s throughput, number of unique worker threads used, and the storage source/target URI
- Improvement: Hub storage server instance lifecycle logs now report elapsed time for spawn and shutdown
- Improvement: Hub deprovisioning now logs GC completion status and elapsed time; a GC that does not complete within the 5s deadline is logged as a warning and shutdown proceeds anyway
|
| |
|
| |
- Improvement: Hub Consul client HTTP timeout defaults raised to 1s connect / 2s total so transient latency to a slow Consul agent no longer fails registration calls
|
| |
|
|
|
|
|
|
| |
1. **Assert invariant in `RemoveActiveWriteBlock`** — `erase(std::find(...))` was UB if the invariant ever broke. Now asserts the iterator before erasing.
2. **Single atomic delta in `SetMetaData`** — was `+= new; -= old` as two atomic ops, briefly inflating `TotalSize()` for concurrent readers. Collapsed into one `fetch_add`.
3. **Consistent `IncludeBlocks` / `IncludeBlock`** — `IncludeBlocks` asserted on duplicate keys while `IncludeBlock` silently skipped. Made both tolerant; also made the `reserve` call additive so a second call can't shrink the capacity request.
4. **Replace `operator[]` reads with `find` on `m_ChunkBlocks`** — `tsl::robin_map::operator[]` default-inserts; several read-intent lookups could produce ghost null entries if invariants broke (especially on compaction rollback paths).
5. **Bound `GetChunk` against actual file size** — `m_IoBuffer.GetSize()` is the mapped capacity (block size, e.g. 256 MiB), not written bytes. Requests inside the mapped region but past the real EOF returned views over zero-filled memory. Now bounds against `FileSize()`.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Replaces the old (not fully implemented) UE `Logging.*` sink with a typed `ZenLog.*` trace path that preserves structured fmt args end-to-end, so the zen trace analyzer (and future consumers) can re-render log messages with full formatter support.
- Hook `Logger::Log` to tap `fmt::format_args` before `vformat` renders them, and emit three new events on a dedicated `ZenLogChannel`: `Category`, `MessageSpec`, `Message`. Args are serialized as `[count][descriptors][payload]` with distinct categories for bool, int, float, and string. Custom formatters fall back to a pre-rendered string.
- Bool has its own wire category so `{}` renders as `true`/`false` and `{:d}` as `1`/`0`.
- Zen `LogLevel` is translated to UE `ELogVerbosity` on emit so severity filtering works consistently.
- Extend the zen trace analyzer to decode `ZenLog.*` via `fmt::vformat` + `dynamic_format_arg_store` — nested widths, chrono specs, etc. all work. Strings are passed as views directly from the event payload (which outlives the format call) rather than copied through a pool.
- Retire the old `TraceSink` stub; the typed path supersedes it.
- Switch `--trace=default` alias from `cpu,log` to `cpu,zenlog`.
- Add `__int128` overloads to the arg encoder guarded by `FMT_USE_INT128` so fmt's int128 dispatch resolves unambiguously on clang/gcc. MSVC and clang-cl are unaffected.
|
| |
|
| |
- Improvement: Dashboard banner displays the zenserver version next to the wordmark
|
| |
|
| |
- Bugfix: Hub provision requests now return 202 Accepted when the module is `Recovering` or `Waking` instead of rejecting
|
| |
|
|
|
|
|
| |
- Improvement: `zen builds` zen-folder handling is now consistent per subcommand
- `list-namespaces`, `list`, `list-blocks`, `ls`: no local scratch folder is created; responses stay in memory
- `upload`, `fetch-blob`, `prime-cache`, `validate-part`: default to `<cwd>/.zen` (no change)
- `download`: default to `<local-path>/.zen` (no change)
- Bugfix: `zen builds ls` no longer fails against cloud build storage (`--host`/`--url`) when `--storage-path` is not supplied
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Surface "did you mean?" suggestions when the `zen` CLI is invoked with an unknown command or subcommand, so users don't have to dig through `zen --help` every time they mistype.
```
$ zen stauts
Unknown command specified: 'stauts'
The most similar commands are:
status
Run 'zen --help' for the full list of commands.
```
```
$ zen cache statz
Unknown subcommand: 'statz'
The most similar subcommands are:
stats
```
## Algorithm
- Damerau-Levenshtein edit distance with case-insensitive ASCII comparison — handles insertions, deletions, substitutions, and adjacent transpositions (e.g. `versoin` → `version`).
- Small prefix-match bonus so short inputs like `ca` still surface longer commands like `cache` without having to relax the distance threshold to the point where it admits noise.
- Distance threshold scales with input length (`clamp(len/2, 1, 3)`). Very short inputs rely on the prefix bonus; longer inputs tolerate up to three edits.
- Top 5 results by distance, stable-sorted.
- Hidden commands (deprecated shims like `cache-stats`) are excluded from the candidate set so we don't advertise them.
|
| |
|
|
|
| |
- `ftok()` internally re-`stat()`s the path and fails with `ENOENT` if another owner's destructor unlinks the backing file between our `open()` and `ftok()`; the held fd does not protect against this
- derive the IPC key via `fstat()` on the fd instead, using the same `(ino & 0xffff) | ((dev & 0xff) << 16) | (proj << 24)` formula that glibc and macOS `ftok()` compute internally
- fixes intermittent "Failed to create an SysV IPC key" failures on macOS, where the slower on-disk /tmp widens the race window
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
* changelog
* move log formatting fixes to correct version
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stopping the zenserver Windows service (via `sc stop`, `zen service stop`, system shutdown, or any other SCM path) was being ignored. SCM would eventually force-kill the process after its timeout, giving an ungraceful shutdown.
## Root cause
PR #751 ("add simple http client tests", c37421a3b) restructured each HTTP server's `OnRun` loop from
```cpp
do { m_ShutdownEvent.Wait(WaitTimeout); }
while (!IsApplicationExitRequested());
```
to
```cpp
do { ShutdownRequested = m_ShutdownEvent.Wait(WaitTimeout); }
while (!ShutdownRequested);
```
That was well-intentioned — tests wanted to start/stop an HTTP server without touching global process state — but the old loop was the only thing that turned `RequestApplicationExit()` into an actual server wake-up. Once it was removed, `RequestApplicationExit(0)` was silently downgraded to "just sets a flag". The `WindowsService::SvcCtrlHandler` stop path was calling exactly that, so SCM stops stopped working. The sponsor-process check path kept working only because it *also* calls `m_Http->RequestExit()` via `ZenServerBase::RequestExit()`.
## Fix
- Restore `IsApplicationExitRequested()` as a secondary exit condition in each HTTP server's `OnRun` loop (`httpsys`, `httpasio`, `httpmulti`, `httpnull`, `httpplugin`) alongside the per-server `m_ShutdownEvent` that #751 introduced. Preserves #751's goal — tests can still call `server->RequestExit()` without touching global state — while making `RequestApplicationExit()` wake the server up again, which the rest of the codebase and `SvcCtrlHandler` assume.
- Clean up the service control handler in the same pass: also accept `SERVICE_CONTROL_SHUTDOWN`, report `STOP_PENDING` with a 30s `dwWaitHint` (was 0), drop the redundant second `ReportSvcStatus` call, and remove `ghSvcStopEvent` which nothing ever `Wait()`-ed on.
- Advertise `SERVICE_ACCEPT_STOP | SERVICE_ACCEPT_SHUTDOWN` while running; drop controls while stop-pending/stopped.
- Make `WindowsService` destructor virtual (latent UB given `Run()` was already virtual).
|
| | |
|
| |
|
| |
- Improvement: File copy, scan, clone, and move operations now report the underlying OS error in failure messages
|
| |
|
|
|
|
|
|
|
| |
- Improvement: Hub shares a single S3 client and IMDS credential provider across all modules, reducing IMDS load and surviving transient IMDS blips during bulk provisioning
- Improvement: Hub validates hydration config at startup; bad `--hub-hydration-target-spec` or `--hub-hydration-target-config` now fails `zen hub` at boot instead of per-module at first hydrate
- Improvement: S3 hydration multipart chunk size configurable via `settings.chunk-size` (default 32 MiB)
- Improvement: S3 client extracts `<Error><Code>` and `<Message>` from XML error bodies (previously logged as `<unhandled content format>`)
- Improvement: S3 client fails fast with a "no credentials available" error when AWS credentials are missing, instead of sending an unsigned request that S3 rejects with a generic 400
- Improvement: IMDS credential provider retries transient connection failures (up to 3 attempts with backoff)
- Improvement: HTTP clients with `RetryCount > 0` also retry on `CURLE_COULDNT_CONNECT`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Security review follow-ups to the `zen` CLI. Each fix stands on its own commit. Grouped by category below.
## Credentials and secrets
- **Per-install random auth encryption key instead of a hardcoded literal.** The default AES key and IV used to encrypt persisted OIDC refresh tokens / OAuth client secrets were ASCII literals compiled into the public source. Replaced with 32+16 random bytes persisted to `<system-root>/auth/machinekey.dat`. `SecureRandomBytes` added in zencore/crypto wrapping BCryptGenRandom / OpenSSL / mbedTLS CTR_DRBG. Partial override (only one of `--encryption-aes-key`/`--encryption-aes-iv`) is now rejected instead of silently using the hardcoded half.
- **Wrap the machine key with OS-protected storage.** `machinekey.dat` is now a tagged format (4-byte magic + flags + wrapped-or-raw payload). Windows wraps via DPAPI (`CryptProtectData` at per-user scope) so a stolen disk copy cannot decrypt without the OS master key. macOS uses Keychain Services (GenericPassword under `org.unrealengine.zen.auth`, `kSecAttrAccessibleAfterFirstUnlockThisDeviceOnly`). Linux uses libsecret (opt-in via `--zenlibsecret=yes`, off by default because headless servers typically have no Secret Service daemon). All platforms fall back to raw persistence with `0600` perms on POSIX when wrapping is unavailable. Legacy files from the prior commit are detected by size and still read.
> Note: argv-redaction before Sentry on crash was previously part of this PR but was superseded by `ScrubSensitiveValues()` from #989; this PR now just calls that helper instead of walking argv itself.
## Path traversal
- **Reject unsafe filenames from the remote oplog in `oplog-mirror`.** The filename from each oplog entry was joined to the mirror root without normalisation; a compromised remote could use drive letters, UNC shares, device path prefixes, absolute paths, or `..` components to write anywhere the zen user could write. An `UnsafeFileNameReason` check runs immediately after extraction, logs the offending filename, and aborts the mirror.
- **Use the resolved absolute download-spec path in `builds download`.** `--download-spec-path` was computed into a sanitised absolute path, then the original unresolved path was passed to `ParseBuildManifest`, bypassing the `MakeSafeAbsolutePath` mitigations and reading from the process cwd rather than `--local-path`.
## Input validation
- **Stop asserting on malformed `--build-id` / `--build-part-id`.** `Oid::FromHexString` asserts on bad input and `ZEN_ASSERT` is active in release, so a too-short or non-hex user value aborted the process instead of surfacing an `OptionParseException`. Routed all callers through `TryFromHexString`. Also fixes `ParseBuildPartId` reporting errors under the wrong option name.
- **Check the JSON parse error in `oplog-export --builds-metadata-path`.** The single-arg `LoadCompactBinaryFromJson` overload discarded the parser error; malformed JSON shipped a truncated compact-binary `metadata` field to the server with no indication. Switched to the two-arg overload and throws a descriptive error naming the file and reason.
- **Format the actual value in the malformed `--url` error.** The message was constructed with a literal `{}` placeholder and no `fmt::format` call, so users saw the placeholder instead of the offending URL.
- **Require `--output-path` in `cache get` unless `--as-text` is set.** Previously an empty path auto-filled from the value key / attachment hash and wrote into the process cwd; the `--as-text && empty path` stdout branch was unreachable because the auto-fill ran first.
- **Clear the cxxopts `allow_unrecognised_options` flag after permissive parse.** `ParseOptionsPermissive` set the flag on the Options it received and never cleared it, priming that Options for silent typo acceptance on any later reuse. Added `disallow_unrecognised_options()` to the vendored cxxopts (local patch — flagged at the declaration) and wrapped the toggle in RAII.
## Resource lifecycle
- **Restore signal handlers via RAII.** `wipe`, `builds`, and `oplog-mirror` installed SIGINT/SIGBREAK handlers with raw `signal()` and never restored them; an option-parse throw left the handler targeting an abort flag nothing reads. Added `zen::ScopedSignalHandler` in zen.h and applied at all three sites (builds uses `std::optional` members so the guards survive past `OnParentOptionsParsed` into the subcommand's Run).
- **Route SIGINT in `oplog-mirror` to the worker-pool abort flag.** The command declared a local `std::atomic<bool> AbortFlag` but no handler targeted it — Ctrl-C killed the process instead of cleanly aborting. Added a `MirrorAbortFlag` / `MirrorSignalCallbackHandler` pair in projectstore_impl and bound the local as a reference; existing `.store`/`.load`/capture sites unchanged.
- **Clean up the `cache get` temp download on every exit path.** `Http.Download` parks the payload in the system temp dir; a failed `MoveToFile` (cross-volume, denied target) or an exception could leave the temp file behind. The downloaded buffer is already flagged delete-on-close by `HttpClient`, so the fix is just to clear that flag after a successful `MoveToFile` so the renamed-out file isn't reaped.
## Other
- **Fix wrong URL fields in `oplog-export` / `oplog-import` builds-branch descriptions.** Two operator-facing "[builds] URL/namespace/bucket/buildsid" messages formatted `m_CloudUrl` instead of `m_BuildsUrl` / `m_BuildsHost` (copy-paste from neighbouring `[cloud]` branches), shown as empty or stale at the start of an export/import.
- **Fix "Can't find oplog in project '{}'" formatting and a "Failed top mirror" typo in projectstore_cmd.**
- **Fix a misleading `oplog-export` comment on the `--zen` scheme default** ("Assume https" vs. the `http://` the code writes).
- **Fail `ScrambleDir` when `RemoveFile` doesn't delete.** The `zen builds test` scramble phase used `(void)RemoveFile(FilePath)`, discarding both the bool return and the error. A quiet delete failure let verification run against stale state; switched to the two-arg overload and throw on false return or non-empty `error_code`.
|
| |
|
| |
- Improvement: Hub Consul service registration and deregistration are now dispatched on a dedicated background thread so instance state transitions no longer stall when the Consul agent is slow or unreachable
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix double WriteResponse in PUT record failure path; the detail-body branch now short-circuits instead of falling through to a second WriteResponse call
- Return 405 Method Not Allowed for unsupported verbs in the root, namespace, bucket, record, and chunk handlers (previously fell through to no response)
- Clamp exec$/replay-recording thread_count so a bogus query value cannot spawn an unbounded worker pool
## Performance / cleanup
- NamespaceMap now uses TransparentStringHash + std::equal_to<>, so Get/Put/Find/Drop can probe the map with a std::string_view directly instead of constructing a temporary std::string on every request
- Replace insert_or_assign with try_emplace under the exclusive lock in GetNamespace; the find() re-check already guarantees the key is absent, so try_emplace matches intent better
## Reverted
- The earlier change to erase the pinned entry from m_DroppedNamespaces after DropNamespace's post-drop work was reverted: other threads may still hold pointers into a dropped namespace, so tearing it down eagerly is unsafe. Dropped namespaces remain pinned for the lifetime of the process as before.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduces a common `ZenServiceClient` RAII wrapper for zen CLI commands that interact with a zenserver instance. CLI operations (admin, builds, cache, exec, hub, info, projectstore, trace, ui, version, vfs, workspaces) automatically register sessions so they become visible in the server's session list, and forward log output to the server's session log endpoint.
All session HTTP I/O (announce, remove, log batches) runs on a single background worker thread, so CLI startup and shutdown never block on server availability.
### Key changes
- **`ZenServiceClient`** — new RAII class that wraps host resolution, HTTP client creation, and session lifecycle (register on connect, remove on exit). Replaces ad-hoc boilerplate across all command files that talk to a server, including the new `trace` subcommands (`start`, `stop`, `status`).
- **Async session I/O** — `SessionsServiceClient` now owns a single worker thread and command queue. `Announce()`, `Remove()`, and `UpdateMetadata()` enqueue commands and return immediately. The worker creates one `HttpClient` with a 5-second total timeout, bounding any individual request. Eliminates main-thread stalls when the server is unreachable.
- **Session log forwarding** — `SessionLogSink` is a thin enqueuer that posts log messages to the same worker queue (no separate thread or HTTP client). Log levels are serialized as integers; the server-side ingest handles both string and integer formats for backwards compatibility, with bounds checking on integer values.
- **Build & projectstore session registration** — Long-running `builds` and projectstore cache (oplog-download) connections register sessions too, making them visible alongside regular CLI command sessions.
### Cleanup
- Extract `SetupCacheSession` helper on `StorageInstance` to reduce duplication.
- Remove unused `HttpClient` reference in ui command.
|
| |
|
|
|
|
|
| |
- Renames `logging::ToStringView` → `ToString` and `ShortToStringView` → `ShortToString` for consistency with the rest of the codebase, where `ToString` is the convention for enum-to-string conversions (return type already communicates it's a view).
- Updates all call sites in logbase, logging helpers, session log sink, admin service, and tcplogstreamsink.
Split off from the `sb/zen-monitor` branch so the ZenServiceClient refactor PR stays focused.
|
| |
|
| |
* scrub sensitive command line options from log and sentry
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Integrates the **tourist** trace analysis library and builds a full `zen trace` command suite for working with Unreal Engine `.utrace` files.
### Trace analysis library (`thirdparty/tourist/`)
- Adds the tourist library as a third-party dependency with three modules: **foundation** (platform primitives, memory, scheduling), **trace** (UE Trace protocol decoding), and **analysis** (event dispatching and analyzer framework).
- Cross-platform support for Windows, Linux, and macOS.
### `zen trace` CLI commands (`src/zen/cmds/`, `src/zen/trace/`)
- **`zen trace analyze`** — Summarize a `.utrace` file: session metadata, thread inventory, command line + build configuration, CPU profiling scopes, timing, event rates, log messages, and (with symbols) memory allocation metrics including live-allocs dumps, callstack-keyed aggregation, and allocation churn. Optional HTML output for memory reports.
- **`zen trace inspect`** — Dump the event schema (declared types, fields, sizes) from a trace file.
- **`zen trace trim`** — Extract a time-window from a trace into a new `.utrace` file.
- **`zen trace serve`** — Launch a local HTTP server hosting an interactive trace viewer; opens in the default browser.
### Symbolication (`src/zen/trace/symbol_resolver.*`, `thirdparty/raw_pdb/`)
- Pluggable resolver with multiple backends: `pdb` (in-tree raw_pdb), `dbghelp` (Windows), `llvm-symbolizer` (all platforms), `atos` (macOS). An `auto` backend picks the best available tool per platform.
- Microsoft Symbol Server support: downloads PDBs on demand using a redirect-aware HTTP client.
- Local PDB cache keyed by image GUID preserves symbols across binary recompilation.
- Callstack trimming heuristic strips UE internal noise from reports.
- Binary analysis cache (`.ucache_z`) avoids re-resolving the same trace.
### Interactive trace viewer (`src/zen/frontend/html/`, `src/zen/trace/trace_viewer_service.*`)
- Timeline: scope-level detail, horizontal zoom/pan, vertical scrolling, viewport-driven loading with pre-computed LOD for responsive navigation of large traces.
- Thread grouping (collapsible sidebar sections) synthesized from name suffixes, natural sort order, visual distinction between lane threads and OS threads.
- Bookmark and region annotations; region categories with per-category toggles; bookmark marker toggle in the toolbar.
- Filterable Logs tab showing captured `UE_LOG` output.
- Stats tab with per-scope aggregate statistics.
- Memory tab with interactive allocation analysis and an allocation size histogram.
- CsvProfiler event parsing and chart UI.
### Other in-branch supporting changes
- **Cross-platform browser launcher** (`browser_launcher.{h,cpp}`) used by `trace serve`.
- **`ReciprocalU64`** fast 64-bit integer division (zencore/intmath) for trace analyzers.
- **`parallelsort`** cross-platform parallel sort helper (zenutil).
- Frontend zip build rule so the viewer's HTML assets are bundled into `zen.exe`.
- `/Zo` flag for better optimized debug info on Windows release builds.
- `trace-tests.cpp` in the `zen-test` harness (harness itself landed on main via #985).
|
| |
|
|
|
| |
- Introduce `CompactString`: a move-only, heap-allocated, immutable string wrapper that stores its length in a prefix byte for cheap `Size()`/`ToView()` while keeping the object to a single pointer.
- Swap the `ToString()` integer-formatting helpers in `zencore/string.cpp` to `std::to_chars`, which is ~5-10x faster and benefits every `IntNum` / `StringBuilder` / `CbJsonWriter` caller.
- No in-tree users on `main` yet; the type is ready for callers that want owned-string storage with lower per-entry overhead than `std::string` (e.g. long-lived log buffers, session records).
|
| |
|
|
|
|
|
|
|
|
|
| |
Switch several deque-based queues from `std::deque` to `eastl::deque` to reduce per-element heap allocation overhead. MSVC's `std::deque` allocates one node per element for anything larger than ~16 bytes; `eastl::deque` groups 4, 8, or 32 elements per block depending on element size.
Converted call sites:
- `BlockingQueue` and `WorkerThreadPool` (generic — downstream callers benefit automatically)
- Session log entry buffer (~10k-entry ring of large log records — 4 per block vs 1)
- Job queue (`Ref<Job>` — 32 per block vs 2)
- RPC recording request queue (large `QueuedRequest` struct — 4 per block vs 1)
- StatsD client message queues (~32-byte buffers — 8 per block vs 1)
|