| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- ZenCacheNamespace::CreatePutBatch/CreateGetBatch now return
std::unique_ptr so ownership is explicit at the call site
- ZenCacheNamespace::PutBatchHandle/GetBatchHandle own their disk-layer
handle and clean it up in the destructor, so the public
DeletePutBatch/DeleteGetBatch entry points on ZenCacheNamespace are
removed
- ZenCacheStore::PutBatch/GetBatch store the handle as unique_ptr and
their destructors collapse to `= default`; the try/catch wrappers are
no longer needed since destruction is driven by the destructors of
types that do not throw
- Disk-layer public API (CreatePutBatch, DeletePutBatch, etc.) is
untouched because its inner batch-handle structs live in the .cpp and
exposing them to the header to satisfy std::unique_ptr's completeness
requirement would leak implementation details
|
| |
|
|
|
|
|
|
|
|
|
| |
Switch several deque-based queues from `std::deque` to `eastl::deque` to reduce per-element heap allocation overhead. MSVC's `std::deque` allocates one node per element for anything larger than ~16 bytes; `eastl::deque` groups 4, 8, or 32 elements per block depending on element size.
Converted call sites:
- `BlockingQueue` and `WorkerThreadPool` (generic — downstream callers benefit automatically)
- Session log entry buffer (~10k-entry ring of large log records — 4 per block vs 1)
- Job queue (`Ref<Job>` — 32 per block vs 2)
- RPC recording request queue (large `QueuedRequest` struct — 4 per block vs 1)
- StatsD client message queues (~32-byte buffers — 8 per block vs 1)
|
| |
|
|
| |
* add Touch() function to s3 client
* touch all used cas files in s3 dehydration path
|
| |
|
|
|
|
|
|
|
| |
- Feature: Per-user invocation history for `zen` and `zenserver`; each startup appends a record to a JSONL file capped at the most recent 100 entries. Location: `%LOCALAPPDATA%\Epic\Zen\History\invocations.jsonl` on Windows, `~/.zen/History/invocations.jsonl` on POSIX
- `zen history` opens an interactive picker; selecting a zen row re-runs it inline and forwards the exit code, selecting a zenserver row spawns it detached
- `zen history --list` (`-l`) prints the table to stdout instead of showing the picker
- `zen history --filter zen|zenserver` restricts the listing to one executable
- `zen history --print` prints the reconstructed command line of the selected row instead of launching it
- `--enable-execution-history` global option on both binaries (default `true`) to opt out per invocation
- The history file is attached to Sentry crash reports (alongside the existing zenserver log)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#985)
Establishes a new end-to-end integration test harness for the `zen` CLI, the shared fetcher it uses to pull test artifacts, and the CI plumbing that feeds both. Also lowers the default test-harness log level and broadens the artifact fetcher's credential resolution.
### `zen-test` executable (`src/zen-test/`)
- New binary modeled on `zenserver-test`, built only in debug.
- `zen-test.{h,cpp}` harness: spawns `zen.exe` via `CreateProc` and captures combined stdout/stderr into a `ZenCommandResult` for assertion.
- Registered with `scripts/test.lua` under the short name `zen` (`xmake test --run=zen`) and enabled for `--kill-stale-processes`.
- Prints a clear console message when invoked from a release build (tests disabled), so misconfiguration is easy to spot.
- Documented in `CLAUDE.md` (test-suite naming table + test projects section) and `README.md`.
- Test cases in the `zen.artifactprovider` suite:
- `probe.lyra_cook_rpc_recording` — probe against a canonical Lyra cook RPC recording that skips with a diagnostic `MESSAGE` when no artifact source is configured.
- `probe.s3_readme` — probes the configured S3 bucket for `README.md` using a fresh temp cache to force the request through to S3; skips on macOS without static creds (no EC2 Mac runners in our fleet).
- `zen.utility-cmd` suite: new integration tests exercising `zen print`, `zen wipe`, and `zen copy`.
### `TestArtifactProvider` (`src/zenutil/testartifactprovider.{h,cpp}`)
- `Ref<TestArtifactProvider>` factory returning a local-only or S3-backed provider, selected from env vars:
- `ZEN_TEST_ARTIFACTS_PATH` — local directory to serve from (write-through cache for remote fetches).
- `ZEN_TEST_ARTIFACTS_S3` — S3 URL to fetch from.
- `AWS_DEFAULT_REGION` / `AWS_REGION`, `AWS_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN` — standard AWS config.
- `Exists(path)` / `Fetch(path)` API with a `TestArtifactFetchResult` return carrying the content buffer and a diagnostic error string. Content is cached on disk across test runs.
- **IMDS credential fallback**: when no static `AWS_ACCESS_KEY_ID` is present, attaches an `ImdsCredentialProvider` so self-hosted EC2 runners with an attached IAM role can sign S3 requests without static credentials (mirrors the pattern in `zenserver/hub/hydration.cpp`).
- **IMDS opt-out**: honors the standard `AWS_EC2_METADATA_DISABLED=true` env var, and skips IMDS by default on macOS where the link-local probe would just emit noise.
### Test harness log level (`src/zencore/testing.cpp`)
- `TestRunner::ApplyCommandLine` now defaults the global log level to `Info` (was effectively `Trace`), cutting the noise from `xmake test --run=all` now that the suite has grown. Applies uniformly to `zencore-test`, `zenhttp-test`, `zenstore-test`, `zenutil-test`, `zenserver-test`, `zen-test`, etc. `--debug` (Debug) and `--verbose` (Trace) still opt back in when chasing failures.
### CI (`.github/workflows/validate.yml`)
- **Runner info step** on all three platforms (Windows/Linux/macOS): prints host, CPU topology, memory, and disk usage before the build/test step, so flakes that correlate with a particular runner or low disk space are easy to spot.
- **Artifact env wiring**: passes `ZEN_TEST_ARTIFACTS_S3` and `AWS_DEFAULT_REGION` into the debug Build & Test step on all three platforms so the probe can reach its source when the repo variable is configured. The probe skips cleanly when unset.
|
| | |
|
| |
|
| |
These CLI commands are no longer useful and have been dropped from the zen client.
|
| |
|
|
|
|
|
|
|
| |
- Introduces a UE-trace Region primitive in `zencore/trace.{h,cpp}` for marking named, potentially long-running intervals of work that Unreal Insights render as banners in the timeline, separately from CPU scopes.
- New API:
- `uint64_t TraceBeginRegion(RegionName, Category={})` / `void TraceEndRegion(RegionId)` for manual begin/end pairs.
- `ScopedTraceRegion` RAII helper plus `ZEN_TRACE_REGION(name)` / `ZEN_TRACE_REGION_CAT(name, category)` macros for scope-based use.
- Emits the `Misc.RegionBeginWithId` / `Misc.RegionEndWithId` trace events (paired by a `GetHifreqTimerValue()`-derived id).
- Full no-op fallback under `#if !ZEN_WITH_TRACE` so callers compile in all configurations.
- Annotates `GcScheduler::CollectGarbage` with `ZEN_TRACE_REGION_CAT("GcScheduler::CollectGarbage", "gc")` as a first caller — makes GC passes visible as banners in Insights without relying on the existing `ZEN_TRACE_CPU` scope alone (which doesn't render as a region).
|
| |
|
|
|
| |
- The Linux/macOS branch of `SearchPathForExecutable` was previously a no-op that returned the input unchanged. Callers passing a bare executable name (e.g. `llvm-symbolizer`) got the same bare name back even when the binary lived elsewhere on `PATH`.
- Now walks `$PATH` like `execvp` does: skip the search if the input contains a `/`, try each colon-separated entry (empty entry == cwd), and return the first candidate that is both a regular file and executable by the current user. Falls back to returning the input unchanged if nothing matches, preserving the previous behavior for the no-match case.
- Windows branch is unchanged (still uses `SearchPathW`).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two related improvements to `CreateProc`:
### 1. Stdin pipe support
- Adds `StdinPipeHandles` + `CreateStdinPipe` alongside the existing `StdoutPipeHandles`, letting callers feed data into a child process's stdin.
- Platform-agnostic RAII (Windows `HANDLE` pair / POSIX `pipe()` fd pair) with the same semantics as the stdout pipe: the inherited end goes to the child, the non-inherited end stays with the parent, destructor closes both.
- `CreateProcOptions` gains a `StdinPipe*` field.
- On Windows, `CreateProcNormal` is reworked so stdin/stdout redirection handles all combinations (stdin + stdout, each alone, neither) uniformly. POSIX already supported arbitrary fd redirection and just needed to honor the new option.
- `zentest-appstub` gains a `-stdin_echo` mode that reads stdin to EOF and echoes it back (switching to binary mode on Windows so CRLF translation doesn't mangle bytes).
- `zenserver-test` gets a `server.process` / `stdin_pipe.*` test group that exercises launching a child with a stdin pipe, writing, closing the write end, and reading back the echoed data.
### 2. Shell-style quote stripping in `BuildArgV`
- Callers that build a single command-line string for `CreateProc` commonly wrap spacey paths in double quotes (e.g. `--tracefile="$path"`). The old `BuildArgV` only used quotes to suppress space-splitting and left the characters in the resulting argv element, so the spawned process saw literal `--tracefile="..."` and the value parser failed to open the quoted path.
- `BuildArgV` now compacts in place, dropping quote chars as it goes, matching shell semantics for paired double quotes.
|
| |
|
|
|
|
| |
- Adds `FollowRedirects` (default `false`) and `MaxRedirects` (default `5`) fields to `HttpClientSettings`.
- When `FollowRedirects` is enabled, the curl backend sets `CURLOPT_FOLLOWLOCATION` and `CURLOPT_MAXREDIRS` so HTTP 3xx redirects are handled transparently in the transport layer — callers no longer need to parse `Location` headers and re-issue requests themselves.
- Defaults are off, so existing callers see no behavior change.
|
| |
|
| |
Moves `ZipFs` from `src/zenserver/frontend/` to `src/zenhttp/` so any binary linking `zenhttp` can serve a bundled web UI from a zip archive (motivator: the upcoming `zen trace serve` subcommand).
|
| |
|
|
|
| |
- Moves the RAII `ScopedEnvVar` helper out of `hydration.cpp`'s anonymous test namespace and into `zencore/filesystem.{h,cpp}` next to `GetEnvVariable` so it can be reused by other subsystems.
- Makes the class non-copyable/non-movable and moves its members to `private`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consolidate the scattered cache-related top-level commands into a single `zen cache <sub>` command tree, keeping the old names as hidden deprecated aliases so any existing scripts keep working.
## Motivation
`zen` has accumulated a flat list of cache-adjacent commands (`cache-info`, `cache-stats`, `cache-details`, `cache-gen`, `cache-get`, `drop`, `rpc-record-start/stop`, `rpc-record-replay`). Each one re-declares `--hosturl` parsing and host resolution, and there is no natural home for new cache tooling. Grouping them under `cache` gives a consistent UX and a shared base class to hang common options off of.
## Changes
### Subcommand consolidation
- Moved into `cache <sub>` form:
- `cache info`, `cache stats`, `cache details`, `cache gen`, `cache get`, `cache drop`
- `cache record <path>` / `cache record stop` (formerly `rpc-record-start` / `rpc-record-stop`)
- `cache replay` (formerly `rpc-record-replay`)
- All old top-level names remain as deprecated aliases and forward through a shared legacy-shim dispatcher that rewrites `argv` and re-enters the new dispatcher, so behavior is byte-identical for existing callers.
- Deprecated aliases are now hidden from the top-level `zen --help` listing (new `ZenCmdBase::IsHidden()` + `DeprecatedCacheStoreCommand` base). They still dispatch normally; `zen cache --help` is the canonical discovery surface.
### Shared base class
- New `CacheSubCmdBase` owns the `--hosturl` option and `ResolveHost()` logic, eliminating the copy/pasted block at the top of every `Run()`.
### Output format
- Added `--yaml` to `cache info`, `cache stats`, and `cache details` (negotiated server-side via `Accept: text/yaml`). `cache details` now rejects `--csv --yaml` combined.
### Hardening
- `cache gen`: bounds-check requested sizes before allocating.
- `cache replay`: validate `--stride` / `--offset` and fix progress-math overflow edge cases.
|
| |
|
|
|
|
|
|
|
| |
- Bugfix: `builds download` partial-block fetch decisions now account for build storage host latency
- Bugfix: Transfer rate displays in `builds` commands now smooth correctly
- Split `buildstorageoperations.cpp` (8.5k lines) into per-operation TUs: buildinspect, buildprimecache, buildstorageresolve, buildupdatefolder, builduploadfolder, buildvalidatebuildpart; stats moved to buildstoragestats.h.
- FilteredRate extracted to zenutil.
- BuildsCommand shared state consolidated into a BuildsConfiguration struct; subcommands inherit from BuildsSubCmdBase holding a `const BuildsConfiguration&` instead of a `BuildsCommand&`.
- `ProgressBar` renamed to `ConsoleProgressBar`; mode enum (`ConsoleProgressMode`) lifted to namespace scope; `PushLogOperation`/`PopLogOperation`/`ForceLinebreak` promoted to virtuals on `ProgressBase`.
- Free-function wrappers (`UploadFolder`, `DownloadFolder`, `ValidateBuildPart`) added around the existing operation classes so callers stop reimplementing setup + stats logging.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
### Critical (cryptographic correctness)
- AES-GCM nonce: replace homebrew `N32[0]++; N32[1]--; N32[2] = ^` scheme with NIST SP 800-38D §8.2.1 deterministic construction (64-bit big-endian counter). Session tears down on counter exhaustion instead of reusing a nonce.
- Remove `std::random_device` / `mt19937` nonce seed - the deterministic construction from the previous commit doesn't need an RNG, and `std::random_device` isn't guaranteed to be a CSPRNG.
- BCrypt return values: check every `BCRYPT_SUCCESS`, cache the `BCRYPT_KEY_HANDLE` on the context instead of re-creating it per message, destroy under null-guards. Closes the silent-downgrade-to-non-GCM path.
### High
- OpenSSL: check `EVP_CIPHER_CTX_new` / `EVP_EncryptInit_ex` / `EVP_DecryptInit_ex` return values in the constructor and set `HasErrors` on failure.
- Log AES-GCM tag-verification failures distinctly from other decrypt errors (BCrypt `STATUS_AUTH_TAG_MISMATCH` / OpenSSL `EVP_DecryptFinal_ex` post-set-tag), with a sequence counter for correlation.
- Thread a bounds-checked `ReadCursor` through every `Read*` parser helper; `ReadException` / `ReadExecuteResult` / `ReadBlobRequest` now return `bool` and callers treat malformed frames as protocol errors. Closes the `0xFF` varint OOB-read.
- Validate `ReadBlobRequest` locator as a safe filename component (reject path separators, `..`, NUL/control, drive colons, leading/trailing dot/space, length > 255). Closes the path-traversal attack on the `BundleDir / (Locator + ".blob")` join.
- Bind `AsyncAgentMessageChannel`'s timer and `AsyncReadResponse` entry onto the socket's strand; expose `AsyncComputeSocket::GetStrand()`. Removes the race between the bare-io_context timer completion and `OnFrame` on `m_PendingHandler` under the 3-thread pool.
- Drop the long-lived `m_EncryptBuffer` member - encrypt into a fresh per-write buffer shared with the completion handler. Also fixes thread-safety of the encrypt path.
- Validate server-returned `ClusterId` against `[A-Za-z0-9._-]{1,64}` before concatenating into the `api/v2/compute/<ClusterId>` URL.
### Medium
- `EVP_CIPHER_CTX_reset` + re-bind cipher on every encrypt/decrypt so stale state cannot bleed across messages. Also logs EVP failures.
- Malformed `ExecuteResult` (size != 4) now tears down the agent instead of silently reporting `ExitCode = -1`.
- Replace `assert(Eq != nullptr)` on env var parsing with a `zen::runtime_error` - assert is compiled out in release and `*(Eq+1)` was UB.
- Blob name uses `zen::Oid::NewOid()` (24 hex chars, seeded from `random_device` run-id + monotonic serial) instead of predictable `<pid>_<ms>_<counter>`. Refuse to overwrite an existing blob path.
- Cap `m_RecentlyDrainedWorkerIds` at 256 entries with an FIFO eviction queue.
- `Blob(Data, Length)` rejects `Length > INT32_MAX` instead of wrapping the int32 wire fields.
- Static `AuthToken` path uses `HttpClientAccessToken::TimePoint::max()` (never-expires sentinel) instead of synthesizing `now + 24h`.
- Remove dead `m_Transport` field and `else if (m_Transport)` branch in `AsyncHordeAgent::Cancel()`.
|
| |
|
|
|
|
| |
* fix redundant captures in lambdas
* remove unused parameters
* break BuildsOperationUpdateFolder::Execute into multiple helper functions
* refactor the buildstorageoperations classes for readability
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A series of correctness and API hygiene fixes to the intrusive refcount primitives in `zenbase`, culminating in the removal of `RefPtr<T>` in favour of a single unified `Ref<T>` smart pointer.
The changes are motivated by two pieces of latent UB sitting under every `Ref<T>` / `TRefCounted<T>` in the codebase, plus a handful of API footguns on the smart-pointer side (silent raw-pointer decay, missing converting moves, unconstrained conversions from unrelated types).
## Correctness fixes
- **Strict-aliasing UB in atomic helpers** — `AtomicIncrement`/`Decrement`/`Add` took a `volatile uint32_t&` and reinterpret-cast it to `std::atomic<T>*`. The object was never constructed as a `std::atomic`, so the access was type-punning UB. Fixed by changing `m_RefCount` to `std::atomic<uint32_t>` directly in `RefCounted`, `TRefCounted<T>` and `IoBufferCore`. The helpers (and `zenbase/atomic.h`) are later removed entirely — the three callers now invoke `fetch_add`/`fetch_sub` directly.
- **const_cast of non-mutable member** — `AddRef()` / `Release()` are `const` but mutated `m_RefCount` via `const_cast`. Since `m_RefCount` wasn't `mutable`, writing through the cast was UB for any `const`-qualified holder (e.g. a `static const` refcounted singleton). Fixed by marking `m_RefCount` `mutable` and dropping the `const_cast` in `AddRef`/`Release`.
- **Public non-virtual `TRefCounted` destructor** — allowed `delete basePtr;` to slice past the CRTP `DeleteThis()` contract. Moved to `protected`.
## Memory-ordering cleanup
- `AddRef` weakened from seq_cst to **relaxed** (a thread can only take a new reference via one it already holds; nothing needs to synchronize).
- `Release` weakened from seq_cst to **acq_rel** (sufficient to order prior writes before the destructor, and make the decrement visible to observers).
- Diagnostic `RefCount()` / `GetRefCount()` reads made **relaxed** and spelled out as explicit `.load()` — the returned value is stale the moment it's observed, so stronger ordering gives no guarantee.
- No-op on x86 (`lock xadd` either way), but removes a full barrier on every `Ref<T>` copy on ARM64 (Apple silicon / Windows-on-ARM).
## `RefPtr` / `Ref` unification
Before this branch, `RefPtr<T>` and `Ref<T>` were subtly different in ways that made the safer of the two (`Ref`) harder to use and the looser one (`RefPtr`) dangerous:
- `RefPtr::operator T*()` was implicit — `delete refPtr;` compiled silently (double-delete), and the raw pointer could outlive the temporary `RefPtr` it was extracted from. Made `explicit`, then removed entirely once call sites were migrated to `.Get()`.
- `RefPtr(T*)` was implicit while `RefPtr(RefPtr<Derived>&&)` was `explicit` — exactly the opposite of the safety intent. Reversed.
- `RefPtr`'s converting move was unconstrained (any `RefPtr<U>` with an implicitly-convertible `U*` satisfied it, including `void*` and multiple-inheritance base offsets). Added a `DerivedFrom<U, T>` constraint matching `Ref<T>`.
- `Ref<T>` was missing a converting move ctor / move-assignment from `Ref<Derived>` — upcasts of rvalues were going through `AddRef`+`Release` instead of a pointer steal. Added.
- `Release()` and the non-move smart-pointer ops were not `noexcept`, despite being so in practice. Marked `noexcept` throughout.
After all of the above, the two types were functionally identical. The final commit deletes `RefPtr` and rewrites the ~10 consumer files to use `Ref`.
|
| | |
|
| |
|
|
| |
- Improvement: New `ZEN_SCOPED_LOG(Expr)` macro routes `ZEN_INFO`/`ZEN_WARN`/`ZEN_DEBUG` in the enclosing block through the given logger expression instead of the default
- Improvement: `BuildContainer`, `SaveOplog`, and `LoadOplogContext` now take a caller-provided `LoggerRef` so diagnostic messages route through the caller's logger
|
| |
|
| |
- Improvement: Replaced `OperationLogOutput` with `ProgressBase` in `zenutil`; logging and progress reporting are now separate concerns. Operation classes receive a `LoggerRef` for logging and a `ProgressBase&` for progress bars
|
| |
|
| |
- Improvement: Add option `--buildstore-disksizelimit-percent` - Max percentage of total drive capacity (of --data-dir drive) for build storage. When combined with `--buildstore-disksizelimit`, the lower value wins.
|
| |
|
|
|
|
|
|
| |
Adds infrastructure for reducing short-lived heap allocations, to be applied across the codebase in follow-up PRs.
- **`reduce-allocs` Claude Code skill** — reviews code for unnecessary heap allocations and suggests fixes using stack-friendly patterns (`ExtendableStringBuilder`, `eastl::fixed_vector`, `TRefCounted`, etc.)
- **`TransparentStringHash`** (`zencore/hashutils.h`) — enables `std::string_view` lookups on `std::string`-keyed `unordered_map` without allocating a temporary string (C++20 heterogeneous lookup via `is_transparent`)
- **`AppendPaddedInt()`** and **`AppendFill()`** on `StringBuilderBase` (`zencore/string.h`) — zero-padded integer formatting and repeated-character fills without going through `fmt::format`
- **`StringBuilderAppender`** output iterator adapter — allows `fmt::format_to` to write directly into a `StringBuilderBase`
|
| | |
|
| |
|
| |
Update the description of oplog-snapshot to be more descriptive.
|
| |
|
|
| |
Fix strings passed to RegisterRoute in HttpProjectStore: strings are literals now instead of regexes, so '\$' needs to be changed to just '$'.
|
| |
|
| |
* update last activity time for hub instances that are asked to be provisioned but are already provisioned
|
| |
|
|
| |
* move session service to zenserver base class and make it available in all zenserver modes
* fix deadlock in sessionsclient shutdown
|
| |
|
|
|
| |
* remove obsolete prime-cache-only flag
* if a downloaded blob should be sent to cache, make sure it is disk based
keeping it in memory overloads memory when boost-worker-memory is enabled
|
| |
|
|
| |
- Bugfix: OAuth client credentials token request now sends correct `application/x-www-form-urlencoded` content type
- Improvement: HTTP client Content-Type in additional headers now overrides the payload content type
|
| |
|
|
| |
of a generic bad gateway error (#956)
|
| |
|
|
| |
According to documentation, shm_open already sets O_CLOEXEC.
|
| | |
|
| | |
|
| | |
|
| |
|
| |
* make mimalloc default again
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework of the Horde agent subsystem from synchronous per-thread I/O to an async ASIO-driven architecture, plus provisioner scale-down with graceful draining, OIDC authentication, scheduler improvements, and dashboard UI for provisioner control.
### Async Horde Agent Rewrite
- Replace synchronous `HordeAgent` (one thread per agent, blocking I/O) with `AsyncHordeAgent` — an ASIO state machine running on a shared `io_context` thread pool
- Replace `TcpComputeTransport`/`AesComputeTransport` with `AsyncTcpComputeTransport`/`AsyncAesComputeTransport`
- Replace `AgentMessageChannel` with `AsyncAgentMessageChannel` using frame queuing and ASIO timers
- Delete `ComputeBuffer` and `ComputeChannel` ring-buffer classes (no longer needed)
### Provisioner Drain / Scale-Down
- `HordeProvisioner` can now drain agents when target core count is lowered: queries each agent's `/compute/session/status` for workload, selects candidates by largest-fit/lowest-workload, and sends `/compute/session/drain`
- Configurable `--horde-drain-grace-period` (default 300s) before force-kill
- Implement `IProvisionerStateProvider` interface to expose provisioner state to the orchestrator HTTP layer
- Forward `--coordinator-session`, `--provision-clean`, and `--provision-tracehost` through both Horde and Nomad provisioners to spawned workers
### OIDC Authentication
- `HordeClient` accepts an `AccessTokenProvider` (refreshable token function) as alternative to static `--horde-token`
- Wire up `OidcToken.exe` auto-discovery via `httpclientauth::CreateFromOidcTokenExecutable` with `--HordeUrl` mode
- New `--horde-oidctoken-exe-path` CLI option for explicit path override
### Orchestrator & Scheduler
- Orchestrator generates a session ID at startup; workers include `coordinator_session` in announcements so the orchestrator can reject stale-session workers
- New `Rejected` action state — when a remote runner declines at capacity, the action is rescheduled without retry count increment
- Reduce scheduler lock contention: snapshot pending actions under shared lock, sort/trim outside the lock
- Parallelize remote action submission across runners via `WorkerThreadPool` with slow-submit warnings
- New action field `FailureReason` populated by all runner types (exit codes, sandbox failures, exceptions)
- New endpoints: `session/drain`, `session/status`, `session/sunset`, `provisioner/status`, `provisioner/target`
### Remote Execution
- Eager-attach mode for `RemoteHttpRunner` — bundles all attachments upfront in a `CbPackage` for single-roundtrip submits
- Track in-flight submissions to prevent over-queuing
- Show remote runner hostname in `GetDisplayName()`
- `--announce-url` to override the endpoint announced to the coordinator (e.g. relay-visible address)
### Frontend Dashboard
- Delete standalone `compute.html` (925 lines) and `orchestrator.html` (669 lines), consolidated into JS page modules
- Add provisioner panel to orchestrator dashboard: target/active/estimated core counts, draining agent count
- Editable target-cores input with debounced POST to `/orch/provisioner/target`
- Per-agent provisioning status badges (active / draining / deallocated) in the agents table
- Active vs total CPU counts in agents summary row
### CLI
- New `zen compute record-start` / `record-stop` subcommands
- `zen exec` progress bar with submit and completion phases, atomic work counters, `--progress` mode (Pretty/Plain/Quiet)
### Other
- `DataDir` supports environment variable expansion
- Worker manifest validation checks for `worker.zcb` marker to detect incomplete cached directories
- Linux/Mac runners `nice(5)` child processes to avoid starving the main server
- `ComputeService::SetShutdownCallback` wired to `RequestExit` via `session/sunset`
- Curl HTTP client logs effective URL on failure
- `MachineInfo` carries `Pool` and `Mode` from Horde response
- Horde bundle creation includes `.pdb` on Windows
|
| |
|
| |
* log curl raw error on retry, add retry on CURLE_PARTIAL_FILE error
|
| |
|
| |
* silence exceptions in threaded requests to build storage if already aborted
|
| |
|
|
|
|
|
|
|
|
| |
- Replace per-type fmt::formatter specializations (StringBuilderBase, NiceBase) with a single generic formatter using a HasStringViewConversion concept
- Add ThousandsNum for comma-separated integer formatting (e.g. "1,234,567")
- Thread naming now accepts a sort hint for trace ordering
- Fix main thread trace registration to use actual thread ID and sort first
- Add ExpandEnvironmentVariables() for expanding %VAR% references in strings, with tests
- Add ParseHexBytes() overload with expected byte count validation
- Add Flag_BelowNormalPriority to CreateProcOptions (BELOW_NORMAL_PRIORITY_CLASS on Windows, setpriority on POSIX)
- Add PrettyScroll progress bar mode that pins the status line to the bottom of the terminal using scroll regions, with signal handler cleanup for Ctrl+C/SIGTERM
|
| |
|
|
|
|
|
|
|
|
| |
This PR introduces an in-memory `CidStore` option primarily for use with compute, to avoid hitting disk for ephemeral data which is not really worth persisting. And in particular not worth paying the critical path cost of persistence.
- **MemoryCidStore**: In-memory CidStore implementation backed by a hash map, optionally layered over a standard CidStore. Writes to the backing store are dispatched asynchronously via a dedicated flush thread to avoid blocking callers on disk I/O. Reads check memory first, then fall back to the backing store without caching the result.
- **ChunkStore interface**: Extract `ChunkStore` abstract class (`AddChunk`, `ContainsChunk`, `FilterChunks`) and `FallbackChunkResolver` into `zenstore.h` so `HttpComputeService` can accept different storage backends for action inputs vs worker binaries. `CidStore` and `MemoryCidStore` both implement `ChunkStore`.
- **Compute service wiring**: `HttpComputeService` takes two `ChunkStore&` params (action + worker). The compute server uses `MemoryCidStore` for actions (no disk persistence needed) and disk-backed `CidStore` for workers (cross-action reuse). The storage server passes its `CidStore` for both (unchanged behavior).
|
| |
|
|
|
|
|
| |
* objectstore.cpp - m_TotalBytesServed now tracks all range cases (single, multi, 416)
* async http: docstring corrected: curl_multi_socket_action() / ASIO socket async_wait
remove non-ascii characters
* fix singlethreaded gc option in lua to not use dash
* fix changelog order
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Core logging and system diagnostics improvements, extracted from the compute branch.
### Logging
- **Elapsed timestamps**: Console log now shows elapsed time since launch `[HH:MM:SS.mmm]` instead of full date/time; file logging is unchanged
- **Short level names**: 3-letter short level names (`trc`/`dbg`/`inf`/`wrn`/`err`/`crt`) used by both console and file formatters via `ShortToStringView()`
- **Consistent field order**: Standardized to `[timestamp] [level] [logger]` across both console and file formatters
- **Slim LogMessage/LogPoint**: Remove redundant fields from `LogMessage` (derive level/source from `LogPoint`), flatten `LogPoint` to inline filename/line fields, shrink `LogLevel` to `int8_t` with `static_assert(sizeof(LogPoint) <= 32)`
- **Remove default member initializers** and static default `LogPoint` from `LogMessage` — all fields initialized by constructor
- **LoggerRef string constructor**: Convenience constructor accepting a string directly
- **Fix SendMessage macro collision**: Replace `thread.h` include in `logmsg.h` with a forward declaration of `GetCurrentThreadId()` to avoid pulling in `windows.h` transitively
### System Diagnostics
- **Cache static system metrics**: Add `RefreshDynamicSystemMetrics()` that only queries values that change at runtime (available memory, uptime, swap). `SystemMetricsTracker` snapshots full `GetSystemMetrics()` once at construction and reuses cached topology/total memory on each `Query()`, avoiding repeated `GetLogicalProcessorInformationEx` traversal on Windows, `/proc/cpuinfo` parsing on Linux, and `sysctl` topology calls on macOS
|
| |
|
|
|
|
|
| |
`--hub-instance-malloc` selects the memory allocator for child instances
`--hub-instance-trace` sets trace channels for child instances
`--hub-instance-tracehost` sets the trace streaming host for child instances
`--hub-instance-tracefile` sets the trace output file for child instances
add {moduleid} and {port} placeholder support for tracefile
|
| | |
|
| |
|
| |
Remove the `zens3-testbed` target and source files. This was a standalone test harness for S3 operations that is no longer needed.
|
| |
|
|
| |
registration (#939)
|
| |
|
| |
* implement "deprovision all" for hub
|
| |
|
|
|
| |
- Improvement: Dashboard paginated lists now include a search input that jumps to the page containing the first match and highlights the row
- Improvement: Dashboard paginated lists show a loading indicator while fetching data
- Improvement: Hub dashboard navigates to and highlights newly provisioned instances
|
| |
|
|
| |
space (#935)
|