| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#1014)
Branch started as a sessions-service overhaul (persistence, client liveness, UE_LOGFMT intake) and grew to pick up adjacent infrastructure work: an early-startup log backlog, a hardened `MemoryArena`, the `zen trace serve` viewer gaining a counter view + compact timeline + tabbed callsite panel, defensive fixes in the third-party `tourist` trace parser, a series of allocation reductions across the HTTP and compact-binary hot paths, and a new `zen sessions` CLI command tree.
## Sessions service
**Persistence.** Each session lives on disk under `<DataRoot>/sessions/<id>/` as `info.cb` (metadata) plus `log.bin` (length-prefixed CbObject log records). On startup the service scans that directory and loads prior sessions as ended sessions, preloading the tail of each log so historical views work after a restart. `SessionLog` is noexcept-constructed and falls back to a disabled state on disk errors, so a bad disk can't take down `RegisterSession`. `GetSession` falls back to the ended-sessions list (fixes historical log fetches over HTTP). `LoadTail` counts only successfully-parsed records.
**Pruning.** Periodic cleanup task drops ended sessions once any of three caps is exceeded: age (default 1 year), count (default 1000), or total on-disk footprint (default 50 MiB). Runs 30 s after startup, hourly thereafter. Active sessions never pruned; disk removal and directory stat happen outside the exclusive lock so a slow filesystem can't stall lookups.
**Client liveness.** Sessions carry a `ProcessHandle` for the client-reported pid, captured at registration time so Windows pid recycling can't produce false positives. A 30 s asio timer probes liveness and ends dead sessions through the normal remove path, producing a synthetic `Session ended: process exited (...)` line persisted to `log.bin`. Windows decodes common NTSTATUS exit codes to human names (Ctrl-C, access violation, stack overflow, ...); POSIX stays at plain `process exited`. Clients auto-fill `ClientPid` only for local targets (unix socket / loopback); the server defensively accepts pids only from `IsLocalMachineRequest()` peers. zenserver also reports its own pid when registering its self-session, so it shows up with a real pid in the dashboard and `zen sessions ls`.
**Synthetic end-of-session line.** `RemoveSession` takes an optional reason; before the session moves to the ended list it appends an Info-level `Session ended[: reason]` entry through the normal log path (released outside `m_Lock`). Current reasons: `client request` (HTTP DELETE), `server shutdown` (self-session), `process exited (...)` (liveness).
**UE_LOGFMT structured entries.** `POST /sessions/{id}/log` now accepts `{level, logger, format, fields}` alongside the existing `{level, logger, message}` shape. New `logtemplate.{h,cpp}` implements UE's `StructuredLog.cpp` template grammar (field paths with `.name` / `[N]`, `{{`/`}}` escapes, `$text` / `$format` / `$locformat` object conventions, bounded recursion). Renders to a displayable message at intake while persisting raw format + fields so a future UI can drill into fields without another schema bump. Hot path is zero-alloc — renders into `ExtendableStringBuilder<256>` using stack-buffered `Oid::ToString` / `IoHash::ToHexString` overloads. UI shows a `{…}` marker with the raw template + JSON-pretty fields on hover.
**Parent sessions.** `SessionInfo` gains `parent_session_id`; hub-managed storage server child processes inherit the hub's session id via `--parent-session=<id>`. `ZEN_SESSIONS_URL` env var becomes a fallback for `--sessions-url` / config when neither is provided. The in-process session log sink is disabled when a remote sessions target is configured (logs flow through `SessionsServiceClient` instead). The sessions UI groups child sessions under their parent (collapsible/expandable, sorts as a unit, supports nesting).
**Platform reporting.** `SessionInfo` gains `Platform`, flowed end-to-end: client auto-fills via `GetRuntimePlatformName()`, server persists in `info.cb` (`plat`) and emits on GET. UI renders as a SimpleIcons-style inline SVG (windows / macOS / iOS / linux / wine / android / playstation / xbox / nintendo) with case-insensitive alias resolution (Win32/Win64, PS4/PS5, XSX/XSS, NintendoSwitch, iPhone/iPad, Darwin/OSX). Unknown values fall back to text; sorting runs on the underlying string.
**WebSocket log streaming.** Sessions UI moves from 2 s polling to a WebSocket push model. New `WsSubscriber` has a stable id + helper methods. UI caps the log-line DOM at 5 000 entries with a shared cursor-regression helper, factored out of two call sites. Per-broadcast allocations trimmed on the push path; fixed a stack overrun in the WS log broadcast hex-id buffer.
**Log memory.** `LogEntry::Level` is now `logging::LogLevel` (1 byte) instead of `std::string` (~32 B) — saves ~310 KB per full 10 k-entry deque and eliminates a per-message allocation in the in-proc sink. On-disk format writes an int32 and accepts either int or legacy string on read. `LogEntry` strings now live in a `MemoryArena`; logger names are interned across the deque. `SessionLog::Append` and `WriteSessionInfoFile` drop their `UniqueBuffer` round-trip and write `CbObject::GetView()` straight through `BasicFile` / `SafeWriteFile`. Multi-entry `POST /log` batched under one lock + one push.
**In-proc log timestamps.** `InProcSessionLogSink::TimePointToDateTime` previously preserved only whole seconds, so every in-proc entry rendered at `.000` ms in the dashboard and `zen sessions tail`. It now adds the sub-second part (nanoseconds → 100 ns ticks) to keep ms precision end-to-end.
**UI.** Side "Session Details" panel is gone — its info is inline in the table (appname, mode, platform, id, timestamps, this/log pills, active dot). Bottom panel is a tabbed `Log | Metadata` view with a right-side "Session Information" panel beside metadata; log-only controls (filter, newest-first, follow, log-level filter, expand/collapse) hide when Metadata is active, polling keeps running across tab switches. Wide-mode toggle fills the viewport edge-to-edge. Log lines show the logger category; timestamps render in 24 h with zero-padded fields regardless of locale. Sessions list defaults to All / 10 per page / created-desc, gains click-to-sort headers on the full dataset, a header filter box, and a pager aligned to the table's right edge. Duplicate auto-injected `<h1>Sessions</h1>` removed.
## `zen sessions` CLI
New command tree on the `zen` client for inspecting the sessions service from the terminal:
- **`zen sessions ls`** — lists sessions (active first, ended next; newest-first within each group) with id, status, app/mode, pid, created, duration, and log count. Supports `--status active|ended|all` (default `all`).
- **`zen sessions status`** — prints the sessions service summary: self id, active / ended counts, and the read/write/delete/list/request/bad-request counters from `/stats/sessions`.
- **`zen sessions tail [session]`** — tails a session's log. With no argument it tails zenserver's own session (resolved via `/sessions/list`'s `self_id`); an explicit 24-hex id targets any session, including ended ones (historical replay). `--lines N` (default 50, 0 = all buffered) trims the initial dump client-side. `--follow` prefers a WebSocket push subscription on `/sessions/ws` for sub-second latency; on upgrade failure (older server, blocked port, unix-socket transport) it falls back to HTTP cursor polling at `--interval-ms` (default 500), with sleeps chunked to 50 ms so Ctrl-C reacts quickly. Output matches `zen::logging::FullFormatter` (`[YY-MM-DD HH:MM:SS.mmm] [lvl] [logger] message`); on a TTY the level is colored and the logger is bold, with continuation lines indented under the message column using the *visible* prefix width. 404 surfaces as `(session ended)` and connection errors as `(server gone)` — both clean exits, so stopping the server mid-tail no longer prints a stack trace.
- **`zen sessions ui`** — opens `<host>/dashboard/?page=sessions` in the user's default browser. Rejects unix-socket hosts.
A small `ZenServiceClient::IsUnixSocket()` helper now wraps the unix-socket check used by `ui`, `sessions tail` (WS path), and `sessions ui`.
## Logging
`BacklogSink` captures early-startup log entries in a fixed-capacity ring so late-attached sinks (session sink, file sink) can replay them. Detaches from the broadcast list when disabled; backed by destructor-only cleanup (no `unique_ptr` indirection per entry). Tuned defaults so the backlog covers typical bring-up without unbounded growth.
## `zen trace serve` viewer
- Compact timeline mode for high-density views.
- New `TRACE_INT_VALUE` / `TRACE_FLOAT_VALUE` counter trace points + a counters page in the viewer.
- Callsite tables collapsed into a single tabbed panel.
- Lossless `Oid <-> Guid` bridge for trace session ids; trace `SessionId` plumbed through.
- `tourist` parser hardening: bounds-check `BufferStream::read`, validate `Type::info_size` before `patch()`, convert `parse_important_aux` to a loop (avoids deep recursion), widen `ParserPool` index to `uint32`, bounds-check field offsets in the dispatcher, pin `Types::parse` buffer up-front.
## `MemoryArena`
Configurable chunk size, inline chunk list, oversize requests routed to truly-dedicated chunks (no slack waste, no fragmentation when one allocation is much larger than the chunk).
## Allocation cleanups across hot paths
- `zenhttp::HttpRequestRouter::HandleRequest` and `FormatPackageMessageInternal`: drop heap allocations.
- Compact-binary validation: `eastl::fixed_vector` + `eastl::sort`; eliminate `std::vector` churn.
- `zenserverprocess`: trim transient allocations in spawn paths.
- Sessions HTTP intake / broadcast: drop transient `std::string` allocs.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A collection of security, correctness, and robustness fixes in `zenhttp` and `zencore` surfaced by security review. Most items are small, independent commits grouped here because they all tighten trust boundaries or fix UB along the same code paths.
## WebSocket protocol hardening (RFC 6455)
- **Enforce the client-side mask bit**. Server-side frame loops now reject unmasked frames with close code 1002 per §5.1. Prevents HTTP intermediary smuggling.
- **Validate control frames and RSV bits**. Fragmented control frames, oversized (>125 B) control payloads, and any non-zero RSV bit now fail the connection before allocation.
- **Lower per-frame payload cap** from 256 MB → 4 MB. Bounds per-connection accumulator memory.
- **Implement message fragmentation**. Continuation frames are coalesced and delivered as a single message; interleaved non-control frames close with 1002; assembled messages are capped at 4 MB (1009 on overflow). Previously partial fragments were delivered to handlers, bypassing payload validation.
- **Parse the 101 handshake response properly** in `HttpWsClient`. Status-line, `Upgrade`, `Connection`, and `Sec-WebSocket-Accept` are now matched exactly rather than via substring searches against the full body.
## Auth / OIDC hardening
- **Constant-time password compare** in `PasswordSecurity::IsAllowed` (closes a remote length/content timing oracle). Adds a shared `ConstantTimeEquals` helper.
- **Harden Basic-auth header parsing**: trim trailing LWS, reject control bytes and DEL in the credential.
- **OIDC discovery pinning**: require HTTPS (loopback exempt), verify `issuer` matches `BaseUrl`, require `token_endpoint` / `userinfo_endpoint` / `jwks_uri` to share origin with `BaseUrl`, reject empty `token_endpoint`.
- **Restrict `POST /auth/oidc/refreshtoken`** to local-machine requests. Previously unauthenticated in default deployments — remote callers could evict or replace cached tokens.
- **Stop logging OIDC provider response bodies** on refresh failure (IdPs echo `refresh_token` back in error bodies).
- **Drop the unused `IdentityToken` field** from `OidcClient` / `OpenIdToken` so nothing in the tree accidentally trusts an unverified JWT.
## Auth state encryption migration
- Add `AesGcm` AEAD primitive (BCrypt / OpenSSL backends, mbedTLS stubbed) and `CryptoRandom::Fill` CSPRNG helper in `zencore/crypto.h`.
- Migrate authstate file from AES-256-CBC with a fixed IV to AES-GCM with a fresh 12-byte random nonce per write and the 4-byte `ZEN1` magic bound as AAD. Legacy-CBC files are transparently read once and rewritten in the new format.
## Filesystem / IO robustness
- `IoBufferExtendedCore::Materialize` now checks `MAP_FAILED` on POSIX (was comparing to `nullptr`, which let the failure sentinel propagate into later reads and `munmap(MAP_FAILED, ...)`).
- `IoBufferBuilder::MakeFromFile / MakeFromTemporaryFile`: close the FD/HANDLE on exception via a dismissable `ScopeGuard`; actually check the `fstat()` return value (previously used an uninitialized `FileSize`).
- `ReadFromFileMaybe`: loop short reads, retry `EINTR`, chunk Windows `ReadFile` at `0xFFFFFFFF` bytes (fixes silent truncation of multi-GiB reads).
- `WipeDirectory`: compare `FindFirstFileW` handle against `INVALID_HANDLE_VALUE` rather than `nullptr`.
- `RemoveFileNative` (Linux/macOS): report non-`ENOENT` stat failures via the `std::error_code` out-param and stop reading `st_mode` after a failed stat.
## Buffer / compression correctness
- Avoid per-copy `IoBufferCore` heap allocations in `CompositeBuffer::CopyTo / ViewOrCopyRange` iterators; add fast path for `BufferHeader::Read` when the 64-byte header fits in the first plain-memory segment.
- `BufferHeader`: add `IsHeaderValid()` gate covering `BlockSizeExponent` range, `BlockCount * BlockSize` overflow, and `TotalRawSize` bounds before any arithmetic uses them. Defends against attacker-controlled headers that can pass the CRC and trigger OOB writes in `DecompressBlock`.
|
| |
|
|
|
|
|
| |
- `GetEnvVariable` now returns `std::optional<std::string>` so callers can distinguish an unset variable from one set to an empty value.
- Windows path uses `SetLastError(ERROR_SUCCESS)` + `ERROR_ENVVAR_NOT_FOUND` to detect "not found"; POSIX path returns `nullopt` when `getenv` returns `nullptr`.
- All call sites migrated. Most use `.value_or("")` to preserve current empty-or-unset behavior. The diagnostic helpers in `zen-test/artifactprovider-tests.cpp` now report `<unset>` vs `<empty>` distinctly.
- Added a check in the `ExpandEnvironmentVariables` test confirming `nullopt` for an unset variable; PATH/HOME lookups in that test use `REQUIRE(has_value())` so a missing var fails cleanly instead of throwing `bad_optional_access`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix double WriteResponse in PUT record failure path; the detail-body branch now short-circuits instead of falling through to a second WriteResponse call
- Return 405 Method Not Allowed for unsupported verbs in the root, namespace, bucket, record, and chunk handlers (previously fell through to no response)
- Clamp exec$/replay-recording thread_count so a bogus query value cannot spawn an unbounded worker pool
## Performance / cleanup
- NamespaceMap now uses TransparentStringHash + std::equal_to<>, so Get/Put/Find/Drop can probe the map with a std::string_view directly instead of constructing a temporary std::string on every request
- Replace insert_or_assign with try_emplace under the exclusive lock in GetNamespace; the find() re-check already guarantees the key is absent, so try_emplace matches intent better
## Reverted
- The earlier change to erase the pinned entry from m_DroppedNamespaces after DropNamespace's post-drop work was reverted: other threads may still hold pointers into a dropped namespace, so tearing it down eagerly is unsafe. Dropped namespaces remain pinned for the lifetime of the process as before.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduces a common `ZenServiceClient` RAII wrapper for zen CLI commands that interact with a zenserver instance. CLI operations (admin, builds, cache, exec, hub, info, projectstore, trace, ui, version, vfs, workspaces) automatically register sessions so they become visible in the server's session list, and forward log output to the server's session log endpoint.
All session HTTP I/O (announce, remove, log batches) runs on a single background worker thread, so CLI startup and shutdown never block on server availability.
### Key changes
- **`ZenServiceClient`** — new RAII class that wraps host resolution, HTTP client creation, and session lifecycle (register on connect, remove on exit). Replaces ad-hoc boilerplate across all command files that talk to a server, including the new `trace` subcommands (`start`, `stop`, `status`).
- **Async session I/O** — `SessionsServiceClient` now owns a single worker thread and command queue. `Announce()`, `Remove()`, and `UpdateMetadata()` enqueue commands and return immediately. The worker creates one `HttpClient` with a 5-second total timeout, bounding any individual request. Eliminates main-thread stalls when the server is unreachable.
- **Session log forwarding** — `SessionLogSink` is a thin enqueuer that posts log messages to the same worker queue (no separate thread or HTTP client). Log levels are serialized as integers; the server-side ingest handles both string and integer formats for backwards compatibility, with bounds checking on integer values.
- **Build & projectstore session registration** — Long-running `builds` and projectstore cache (oplog-download) connections register sessions too, making them visible alongside regular CLI command sessions.
### Cleanup
- Extract `SetupCacheSession` helper on `StorageInstance` to reduce duplication.
- Remove unused `HttpClient` reference in ui command.
|
| |
|
|
|
|
|
| |
- Renames `logging::ToStringView` → `ToString` and `ShortToStringView` → `ShortToString` for consistency with the rest of the codebase, where `ToString` is the convention for enum-to-string conversions (return type already communicates it's a view).
- Updates all call sites in logbase, logging helpers, session log sink, admin service, and tcplogstreamsink.
Split off from the `sb/zen-monitor` branch so the ZenServiceClient refactor PR stays focused.
|
| |
|
|
|
|
|
|
|
| |
- Bugfix: `builds download` partial-block fetch decisions now account for build storage host latency
- Bugfix: Transfer rate displays in `builds` commands now smooth correctly
- Split `buildstorageoperations.cpp` (8.5k lines) into per-operation TUs: buildinspect, buildprimecache, buildstorageresolve, buildupdatefolder, builduploadfolder, buildvalidatebuildpart; stats moved to buildstoragestats.h.
- FilteredRate extracted to zenutil.
- BuildsCommand shared state consolidated into a BuildsConfiguration struct; subcommands inherit from BuildsSubCmdBase holding a `const BuildsConfiguration&` instead of a `BuildsCommand&`.
- `ProgressBar` renamed to `ConsoleProgressBar`; mode enum (`ConsoleProgressMode`) lifted to namespace scope; `PushLogOperation`/`PopLogOperation`/`ForceLinebreak` promoted to virtuals on `ProgressBase`.
- Free-function wrappers (`UploadFolder`, `DownloadFolder`, `ValidateBuildPart`) added around the existing operation classes so callers stop reimplementing setup + stats logging.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A series of correctness and API hygiene fixes to the intrusive refcount primitives in `zenbase`, culminating in the removal of `RefPtr<T>` in favour of a single unified `Ref<T>` smart pointer.
The changes are motivated by two pieces of latent UB sitting under every `Ref<T>` / `TRefCounted<T>` in the codebase, plus a handful of API footguns on the smart-pointer side (silent raw-pointer decay, missing converting moves, unconstrained conversions from unrelated types).
## Correctness fixes
- **Strict-aliasing UB in atomic helpers** — `AtomicIncrement`/`Decrement`/`Add` took a `volatile uint32_t&` and reinterpret-cast it to `std::atomic<T>*`. The object was never constructed as a `std::atomic`, so the access was type-punning UB. Fixed by changing `m_RefCount` to `std::atomic<uint32_t>` directly in `RefCounted`, `TRefCounted<T>` and `IoBufferCore`. The helpers (and `zenbase/atomic.h`) are later removed entirely — the three callers now invoke `fetch_add`/`fetch_sub` directly.
- **const_cast of non-mutable member** — `AddRef()` / `Release()` are `const` but mutated `m_RefCount` via `const_cast`. Since `m_RefCount` wasn't `mutable`, writing through the cast was UB for any `const`-qualified holder (e.g. a `static const` refcounted singleton). Fixed by marking `m_RefCount` `mutable` and dropping the `const_cast` in `AddRef`/`Release`.
- **Public non-virtual `TRefCounted` destructor** — allowed `delete basePtr;` to slice past the CRTP `DeleteThis()` contract. Moved to `protected`.
## Memory-ordering cleanup
- `AddRef` weakened from seq_cst to **relaxed** (a thread can only take a new reference via one it already holds; nothing needs to synchronize).
- `Release` weakened from seq_cst to **acq_rel** (sufficient to order prior writes before the destructor, and make the decrement visible to observers).
- Diagnostic `RefCount()` / `GetRefCount()` reads made **relaxed** and spelled out as explicit `.load()` — the returned value is stale the moment it's observed, so stronger ordering gives no guarantee.
- No-op on x86 (`lock xadd` either way), but removes a full barrier on every `Ref<T>` copy on ARM64 (Apple silicon / Windows-on-ARM).
## `RefPtr` / `Ref` unification
Before this branch, `RefPtr<T>` and `Ref<T>` were subtly different in ways that made the safer of the two (`Ref`) harder to use and the looser one (`RefPtr`) dangerous:
- `RefPtr::operator T*()` was implicit — `delete refPtr;` compiled silently (double-delete), and the raw pointer could outlive the temporary `RefPtr` it was extracted from. Made `explicit`, then removed entirely once call sites were migrated to `.Get()`.
- `RefPtr(T*)` was implicit while `RefPtr(RefPtr<Derived>&&)` was `explicit` — exactly the opposite of the safety intent. Reversed.
- `RefPtr`'s converting move was unconstrained (any `RefPtr<U>` with an implicitly-convertible `U*` satisfied it, including `void*` and multiple-inheritance base offsets). Added a `DerivedFrom<U, T>` constraint matching `Ref<T>`.
- `Ref<T>` was missing a converting move ctor / move-assignment from `Ref<Derived>` — upcasts of rvalues were going through `AddRef`+`Release` instead of a pointer steal. Added.
- `Release()` and the non-move smart-pointer ops were not `noexcept`, despite being so in practice. Marked `noexcept` throughout.
After all of the above, the two types were functionally identical. The final commit deletes `RefPtr` and rewrites the ~10 consumer files to use `Ref`.
|
| |
|
|
| |
- Improvement: New `ZEN_SCOPED_LOG(Expr)` macro routes `ZEN_INFO`/`ZEN_WARN`/`ZEN_DEBUG` in the enclosing block through the given logger expression instead of the default
- Improvement: `BuildContainer`, `SaveOplog`, and `LoadOplogContext` now take a caller-provided `LoggerRef` so diagnostic messages route through the caller's logger
|
| |
|
| |
- Improvement: Replaced `OperationLogOutput` with `ProgressBase` in `zenutil`; logging and progress reporting are now separate concerns. Operation classes receive a `LoggerRef` for logging and a `ProgressBase&` for progress bars
|
| |
|
| |
- Improvement: Add option `--buildstore-disksizelimit-percent` - Max percentage of total drive capacity (of --data-dir drive) for build storage. When combined with `--buildstore-disksizelimit`, the lower value wins.
|
| |
|
|
| |
Fix strings passed to RegisterRoute in HttpProjectStore: strings are literals now instead of regexes, so '\$' needs to be changed to just '$'.
|
| |
|
|
| |
* move session service to zenserver base class and make it available in all zenserver modes
* fix deadlock in sessionsclient shutdown
|
| |
|
|
|
|
|
|
|
|
| |
This PR introduces an in-memory `CidStore` option primarily for use with compute, to avoid hitting disk for ephemeral data which is not really worth persisting. And in particular not worth paying the critical path cost of persistence.
- **MemoryCidStore**: In-memory CidStore implementation backed by a hash map, optionally layered over a standard CidStore. Writes to the backing store are dispatched asynchronously via a dedicated flush thread to avoid blocking callers on disk I/O. Reads check memory first, then fall back to the backing store without caching the result.
- **ChunkStore interface**: Extract `ChunkStore` abstract class (`AddChunk`, `ContainsChunk`, `FilterChunks`) and `FallbackChunkResolver` into `zenstore.h` so `HttpComputeService` can accept different storage backends for action inputs vs worker binaries. `CidStore` and `MemoryCidStore` both implement `ChunkStore`.
- **Compute service wiring**: `HttpComputeService` takes two `ChunkStore&` params (action + worker). The compute server uses `MemoryCidStore` for actions (no disk persistence needed) and disk-backed `CidStore` for workers (cross-action reuse). The storage server passes its `CidStore` for both (unchanged behavior).
|
| |
|
|
|
|
|
| |
* objectstore.cpp - m_TotalBytesServed now tracks all range cases (single, multi, 416)
* async http: docstring corrected: curl_multi_socket_action() / ASIO socket async_wait
remove non-ascii characters
* fix singlethreaded gc option in lua to not use dash
* fix changelog order
|
| |
|
|
|
|
|
|
|
| |
- Improvement: HTTP range responses (RFC 7233) are now fully compliant across the object store and build store
- 206 Partial Content responses now include a `Content-Range` header; previously absent for single-range requests, which broke `HttpClient::GetRanges()`
- 416 Range Not Satisfiable responses now include `Content-Range: bytes */N` as required by RFC 7233
- Out-of-bounds range requests return 416 Range Not Satisfiable (was 400 Bad Request)
- Single-byte ranges (`bytes=N-N`) are now correctly accepted (were previously rejected)
- Range byte positions widened from 32-bit to 64-bit; RFC 7233 imposes no size limit on byte range values
- Build store binary GET requests with a Range header now return 206 Partial Content with `Content-Range` (previously returned 200 OK without it)
|
| |
|
|
| |
(#927)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
### Security: Input validation & path safety
- **Reject local file references by default** in package parsing — only allow when explicitly opted in by the service (`ParseFlags::kAllowLocalReferences`) and validated by an `ILocalRefPolicy` (fail-closed: no policy = rejected)
- **`DataRootLocalRefPolicy`** restricts local ref paths to the server's data root via canonical path prefix matching
- **Validate attachment hashes** in compute HTTP handlers — decompresses and re-hashes each attachment at ingestion time to reject tampered payloads
- **Path traversal validation** for worker descriptions (`pathvalidation.h`) — rejects absolute paths, `..` components, Windows reserved device names, and invalid filename characters
- **Harden CbPackage parsing** against corrupt inputs — overflow-safe attachment count, bounds checks on local ref offset/size, graceful failure instead of `ZEN_ASSERT` for untrusted data
- **Harden legacy package parser** — reject zero-size binary fields, missing mappers, and optionally validate resolved attachment hashes
- **Bounds check in `CbPackageReader::MarshalLocalChunkReference`** — detect when `MakeFromFile` silently clamps offset+size to file size
### Reliability: Lock consolidation & bug fixes
- **Consolidate three action map locks into one** (`m_ActionMapLock`) — eliminates deadlock risk from multi-lock ordering, simplifies state transitions, and fixes a race where newly enqueued actions were briefly invisible to `GetActionResult`/`FindActionResult`
- **Fix infinite loop in `BaseRunnerGroup::SubmitActions`** when actions exceed total runner capacity — cap round-robin at `TotalCapacity` and default unassigned results to "No capacity"
- **Fix `MakeSafeAbsolutePathInPlace` for UNC paths** — `\server\share` now correctly becomes `\?\UNC\server\share` instead of `\?\server\share`
- **Fix `max_retries=0`** — previously fell through to the default of 3; now correctly means "no retries"
### New: ManagedProcessRunner
- Cross-platform process runner backed by `SubprocessManager` — uses async exit callbacks instead of polling, delegates CPU/memory metrics to the manager's built-in sampler
- `ProcessGroup` (JobObject on Windows, process group on POSIX) for bulk cancellation on shutdown
- `--managed` flag on `zen exec inproc` to select this runner
- Refactored monitor thread lifecycle — `StartMonitorThread()` now called from derived constructors to avoid calling virtual functions from base constructor
### Process management
- **Suppress crash dialogs** via `JOB_OBJECT_UILIMIT_ERRORMODE` + `SEM_NOGPFAULTERRORBOX` in both `WindowsProcessRunner` and `JobObject::Initialize` — prevents WER/Dr. Watson modal dialogs from blocking the monitor thread
- **CREATE_SUSPENDED → AssignProcessToJobObject → ResumeThread** pattern in `WindowsProcessRunner` — ensures job object assignment before process execution
- **Move stdout/stderr callbacks to `Spawn()` parameters** in `SubprocessManager` — prevents race where early output could be missed before callback installation
- Consistent PID logging across all runner types
### Test infrastructure
- **`zentest-appstub`**: Added `Fail` (configurable exit code) and `Crash` (abort / nullptr deref) test functions
- **Compute integration tests**: exit code handling, auto-retry exhaustion, manual reschedule after failure, mixed success/failure queues, crash handling (abort + nullptr), crash auto-retry, immediate query visibility after enqueue
- **Package format tests**: truncated header, bad magic, attachment count overflow, truncated data, local ref rejection/acceptance, policy enforcement (inside/outside root, traversal, no-policy fail-closed)
- **Legacy package parser tests**: empty input, zero-size binary, hash resolution with/without mapper, hash mismatch detection
- **UNC path tests** for `MakeSafeAbsolutePath`
### Misc
- ANSI color helper macros (`ZEN_RED`, `ZEN_BRIGHT_WHITE`, etc.) and `ZEN_BOLD`/`ZEN_DIM`/etc.
- Generic `fmt::formatter` for types with free `ToString` functions
- Compute dashboard: truncated hash display with monospace font and hover for full value
- Renamed `usonpackage_forcelink` → `cbpackage_forcelink`
- Compute enabled by default in xmake config (releases still explicitly disable)
|
| |
|
|
|
|
| |
- Feature: Added Workspaces dashboard page with HTTP request stats and per-workspace metrics
- Feature: Added Build Storage dashboard page with service-specific HTTP request stats
- Improvement: Front page now shows Hub and Object Store activity tiles; HTTP panel is fixed above the tiles grid
- Improvement: HTTP stats tiles now include 5m/15m rates and p999/max latency across all service pages
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
- Feature: Hub watchdog automatically deprovisions inactive provisioned and hibernated instances
- Feature: Added `stats/activity_counters` endpoint to measure server activity
- Feature: Added configuration options for hub watchdog
- `--hub-watchdog-provisioned-inactivity-timeout-seconds` Inactivity timeout before a provisioned instance is deprovisioned
- `--hub-watchdog-hibernated-inactivity-timeout-seconds` Inactivity timeout before a hibernated instance is deprovisioned
- `--hub-watchdog-inactivity-check-margin-seconds` Margin before timeout at which an activity check is issued
- `--hub-watchdog-cycle-interval-ms` Watchdog poll interval in milliseconds
- `--hub-watchdog-cycle-processing-budget-ms` Maximum time budget per watchdog cycle in milliseconds
- `--hub-watchdog-instance-check-throttle-ms` Minimum delay between checks on a single instance
- `--hub-watchdog-activity-check-connect-timeout-ms` Connect timeout for activity check requests
- `--hub-watchdog-activity-check-request-timeout-ms` Request timeout for activity check requests
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## Summary
This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements.
### Sessions Service
- `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle)
- Storage server registers itself with its own local sessions service on startup
- Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.)
- Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All)
- `--sessions-url` config option for remote session announcement
- In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard
### Session Log Viewer
- POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array
- In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching
- Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring
- Auto-selects the server's own session on page load
### TCP Log Streaming
- `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP
- Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps
- Gathered buffer writes to reduce syscall overhead when flushing batches
- Tests covering basic delivery, multi-line splitting, drop detection, and sequencing
### New Dashboard Pages
- **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session
- **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`)
- **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels
- **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog
### Documentation Page
- In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking
- New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md`
- Dev docs moved to `docs/dev/`
### Infrastructure & Bug Fixes
- **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers
- **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner
- **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection
- Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers
- Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
|
| |
|
|
|
|
|
|
|
|
|
| |
- **`Logger` now holds a single `SinkPtr`** instead of a `std::vector<SinkPtr>`. The `SetSinks`/`AddSink` API is replaced with a single `SetSink`. This removes complexity from `Logger` itself and makes `Clone()` cheaper (no vector copy).
- **New `BroadcastSink`** (`zencore/logging/broadcastsink.h`) acts as a thread-safe, shared indirection point that fans out to a dynamic list of child sinks. Adding or removing a child sink via `AddSink`/`RemoveSink` is immediately visible to every `Logger` that holds a reference to it — including cloned loggers — without requiring each logger to be updated individually.
- **`GetDefaultBroadcastSink()`** (exposed from `zenutil/logging.h`) gives server-layer code access to the shared broadcast sink so it can register optional sinks (OTel, TCP log stream) after logging is initialized, without going through `Default()->AddSink()`.
### Motivation
Previously, dynamically adding sinks post-initialization mutated the default logger's internal sink vector directly. This was fragile: cloned loggers (created before `AddSink` was called) would not pick up the new sinks. `BroadcastSink` fixes this by making the sink list a shared, mutable object that all loggers sharing the same broadcast instance observe uniformly.
|
| |
|
|
| |
- Improvement: Lazy initialize CpuSampler - reduces startup time by ~600 ms
- Bugfix: Don't try to wipe .sentry-native folder at missing manifest - sentry is already running. Reduces startup time by ~450 ms when data folder is empty
|
| |
|
|
|
|
| |
- When build store is not configured, `m_BuildCidStore` is null but was unconditionally added as a stats provider, causing a null pointer dereference crash in `StatsReporter::ReportStats`
- Added a null guard in `AddProvider` to reject null pointers defensively
- Added a conditional check at the call site in `InitializeStructuredCache`
|
| |
|
| |
Authentication callbacks are not thread safe, ensured call sites does single threaded calls
|
| | |
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| | |
- Feature: Added `--allow-port-probing` option to control whether zenserver searches for a free port on startup (default: true, automatically false when --dedicated is set)
- Feature: Added new hub options for controlling provisioned storage server instances:
- `--hub-instance-http` - HTTP server implementation for instances (asio/httpsys)
- `--hub-instance-http-threads` - Number of HTTP connection threads per instance
- `--hub-instance-corelimit` - Limit CPU concurrency per instance
- Improvement: Hub now manages a deterministic port pool for provisioned instances allowing reuse of unused ports
|
| |/ |
|
| |
|
|
|
| |
- Improvement: Fixed issue where oplog upload could create blocks larger than the max limit (64Mb)
Refactored remoteprojectstore.cpp to use ParallelWork and exceptions for error handling.
|
| |
|
|
|
|
|
| |
This PR makes it *possible* to do a Windows build on Linux via `clang-cl`.
It doesn't actually change any build process. No policy change, just mechanics and some code fixes to clear clang compilation.
The code fixes are mainly related to #include file name casing, to match the on-disk casing of the SDK files (via xwin).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
removal (#841)
- Percent-decode URIs in ASIO HTTP server to match http.sys CookedUrl behavior, ensuring consistent decoded paths across backends
- Add Environment field to CreateProcOptions for passing extra env vars to child processes (Windows: merged into Unicode environment block; Unix: setenv in fork)
- Add GetCompilerName() and include it in build options startup logging
- Suppress Windows CRT error dialogs in test harness for headless/CI runs
- Fix mimalloc package: pass CMAKE_BUILD_TYPE, skip cfuncs test for cross-compile
- Add virtual destructor to SentryAssertImpl to fix debug-mode warning
- Simplify object store path handling now that URIs arrive pre-decoded
- Add URI decoding test coverage for percent-encoded paths and query params
- Simplify httpasio request handling by using strands (guarantees no parallel handlers per connection)
- Removed deprecated regex-based route matching support
- Fix full GC never triggering after cross-toolchain builds: The `gc_state` file stores `system_clock` ticks, but the tick resolution differs between toolchains (nanoseconds on GCC/standard clang, microseconds on UE clang). A nanosecond timestamp misinterpreted as microseconds appears far in the future (~year 58,000), bypassing the staleness check and preventing time-based full GC from ever running. Fixed by also resetting when the stored timestamp is in the future.
- Clamp GC countdown display to configured interval: Prevents nonsensical log output (e.g. "Full GC in 492128002h") caused by the above or any other clock anomaly. The clamp applies to both the scheduler log and the status API.
|
| |
|
|
| |
- Add `--no-network` CLI option which disables all TCP/HTTPS listeners, restricting zenserver to Unix domain socket communication only.
- Also fixes asio upgrade breakage on main
|
| | |
|
| |
|
|
|
|
|
|
| |
- Feature: Basic consul integration for zenserver hub mode, restricted to host local consul agent and register/deregister of services
- Feature: Added new options to zenserver hub mode
- `consul-endpoint` - Consul endpoint URL for service registration (empty = disabled)
- `hub-base-port-number` - Base port number for provisioned instances
- `hub-instance-limit` - Maximum number of provisioned instances for this hub
- `hub-use-job-object` - Enable the use of a Windows Job Object for child process management (Windows only)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main goal of this change is to eliminate the cpr back-end altogether and replace it with the curl implementation. I would expect to drop cpr as soon as we feel happy with the libcurl back-end. That would leave us with a direct dependency on libcurl only, and cpr can be eliminated as a dependency.
### HttpClient Backend Overhaul
- Implemented a new **libcurl-based HttpClient** backend (`httpclientcurl.cpp`, ~2000 lines)
as an alternative to the cpr-based one
- Made HttpClient backend **configurable at runtime** via constructor arguments
and `-httpclient=...` CLI option (for zen, zenserver, and tests)
- Extended HttpClient test suite to cover multipart/content-range scenarios
### Unix Domain Socket Support
- Added Unix domain socket support to **httpasio** (server side)
- Added Unix domain socket support to **HttpClient**
- Added Unix domain socket support to **HttpWsClient** (WebSocket client)
- Templatized `HttpServerConnectionT<SocketType>` and `WsAsioConnectionT<SocketType>`
to handle TCP, Unix, and SSL sockets uniformly via `if constexpr` dispatch
### HTTPS Support
- Added **preliminary HTTPS support to httpasio** (for Mac/Linux via OpenSSL)
- Added **basic HTTPS support for http.sys** (Windows)
- Implemented HTTPS test for httpasio
- Split `InitializeServer` into smaller sub-functions for http.sys
### Other Notable Changes
- Improved **zenhttp-test stability** with dynamic port allocation
- Enhanced port retry logic in http.sys (handles ERROR_ACCESS_DENIED)
- Fatal signal/exception handlers for backtrace generation in tests
- Added `zen bench http` subcommand to exercise network + HTTP client/server communication stack
|
| |\ |
|
| | |\ |
|
| | |\ \ |
|
| | | | |
| | | |
| | | |
| | | | |
command line or config
|
| | | | |
| | | |
| | | |
| | | | |
and add lua option
|
| | | | |
| | | |
| | | |
| | | | |
OidcToken executable path
|
| |\ \ \ \
| | |_|/
| |/| | |
|
| | | | |
| | | |
| | | |
| | | |
| | | | |
* create oplogs as they are imported
* Improved logic for partial block analisys
* unit tests for ChunkBlockAnalyser
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
- **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS.
- **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking.
- **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration.
- **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP.
- **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch.
- **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support.
Also contains some fixes from TSAN analysis.
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* clean up BuildStorageResolveResult to allow capabilities
* add check for multirange request capability
* add MaxRangeCountPerRequest capabilities
* project export tests
* add InMemoryBuildStorageCache
* progress and logging improvements
* fix ElapsedSeconds calculations in fileremoteprojectstore.cpp
* oplogs/builds test script
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Removes the vendored spdlog library (~12,000 lines) and replaces it with a purpose-built logging system in zencore (~1,800 lines). The new implementation provides the same functionality with fewer abstractions, no shared_ptr overhead, and full control over the logging pipeline.
### What changed
**New logging core in zencore/logging/:**
- LogMessage, Formatter, Sink, Logger, Registry - core abstractions matching spdlog's model but simplified
- AnsiColorStdoutSink - ANSI color console output (replaces spdlog stdout_color_sink)
- MsvcSink - OutputDebugString on Windows (replaces spdlog msvc_sink)
- AsyncSink - async logging via BlockingQueue worker thread (replaces spdlog async_logger)
- NullSink, MessageOnlyFormatter - utility types
- Thread-safe timestamp caching in formatters using RwLock
**Moved to zenutil/logging/:**
- FullFormatter - full log formatting with timestamp, logger name, level, source location, multiline alignment
- JsonFormatter - structured JSON log output
- RotatingFileSink - rotating file sink with atomic size tracking
**API changes:**
- Log levels are now an enum (LogLevel) instead of int, eliminating the zen::logging::level namespace
- LoggerRef no longer wraps shared_ptr - it holds a raw pointer with the registry owning lifetime
- Logger error handler is wired through Registry and propagated to all loggers on registration
- Logger::Log() now populates ThreadId on every message
**Cleanup:**
- Deleted thirdparty/spdlog/ entirely (110+ files)
- Deleted full_test_formatter (was ~80% duplicate of FullFormatter)
- Renamed snake_case classes to PascalCase (full_formatter -> FullFormatter, json_formatter -> JsonFormatter, sentry_sink -> SentrySink)
- Removed spdlog from xmake dependency graph
### Build / test impact
- zencore no longer depends on spdlog
- zenutil and zenvfs xmake.lua updated to drop spdlog dep
- zentelemetry xmake.lua updated to drop spdlog dep
- All existing tests pass, no test changes required beyond formatter class renames
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
- Added local process runners for Linux/Wine, Mac with some sandboxing support
- Horde & Nomad provisioning for development and testing
- Client session queues with lifecycle management (active/draining/cancelled), automatic retry with configurable limits, and manual reschedule API
- Improved web UI for orchestrator, compute, and hub dashboards with WebSocket push updates
- Some security hardening
- Improved scalability and `zen exec` command
Still experimental - compute support is disabled by default
|
| | | | |
| | | |
| | | |
| | | | |
handle it (#804)
|
| | | | |
| | | |
| | | |
| | | |
| | | | |
Various fixes to make cpp files build in unity build mode
as an aside using Unity build doesn't really seem to work on Linux, unsure why but it leads to link-time issues
|