| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
### Critical (cryptographic correctness)
- AES-GCM nonce: replace homebrew `N32[0]++; N32[1]--; N32[2] = ^` scheme with NIST SP 800-38D §8.2.1 deterministic construction (64-bit big-endian counter). Session tears down on counter exhaustion instead of reusing a nonce.
- Remove `std::random_device` / `mt19937` nonce seed - the deterministic construction from the previous commit doesn't need an RNG, and `std::random_device` isn't guaranteed to be a CSPRNG.
- BCrypt return values: check every `BCRYPT_SUCCESS`, cache the `BCRYPT_KEY_HANDLE` on the context instead of re-creating it per message, destroy under null-guards. Closes the silent-downgrade-to-non-GCM path.
### High
- OpenSSL: check `EVP_CIPHER_CTX_new` / `EVP_EncryptInit_ex` / `EVP_DecryptInit_ex` return values in the constructor and set `HasErrors` on failure.
- Log AES-GCM tag-verification failures distinctly from other decrypt errors (BCrypt `STATUS_AUTH_TAG_MISMATCH` / OpenSSL `EVP_DecryptFinal_ex` post-set-tag), with a sequence counter for correlation.
- Thread a bounds-checked `ReadCursor` through every `Read*` parser helper; `ReadException` / `ReadExecuteResult` / `ReadBlobRequest` now return `bool` and callers treat malformed frames as protocol errors. Closes the `0xFF` varint OOB-read.
- Validate `ReadBlobRequest` locator as a safe filename component (reject path separators, `..`, NUL/control, drive colons, leading/trailing dot/space, length > 255). Closes the path-traversal attack on the `BundleDir / (Locator + ".blob")` join.
- Bind `AsyncAgentMessageChannel`'s timer and `AsyncReadResponse` entry onto the socket's strand; expose `AsyncComputeSocket::GetStrand()`. Removes the race between the bare-io_context timer completion and `OnFrame` on `m_PendingHandler` under the 3-thread pool.
- Drop the long-lived `m_EncryptBuffer` member - encrypt into a fresh per-write buffer shared with the completion handler. Also fixes thread-safety of the encrypt path.
- Validate server-returned `ClusterId` against `[A-Za-z0-9._-]{1,64}` before concatenating into the `api/v2/compute/<ClusterId>` URL.
### Medium
- `EVP_CIPHER_CTX_reset` + re-bind cipher on every encrypt/decrypt so stale state cannot bleed across messages. Also logs EVP failures.
- Malformed `ExecuteResult` (size != 4) now tears down the agent instead of silently reporting `ExitCode = -1`.
- Replace `assert(Eq != nullptr)` on env var parsing with a `zen::runtime_error` - assert is compiled out in release and `*(Eq+1)` was UB.
- Blob name uses `zen::Oid::NewOid()` (24 hex chars, seeded from `random_device` run-id + monotonic serial) instead of predictable `<pid>_<ms>_<counter>`. Refuse to overwrite an existing blob path.
- Cap `m_RecentlyDrainedWorkerIds` at 256 entries with an FIFO eviction queue.
- `Blob(Data, Length)` rejects `Length > INT32_MAX` instead of wrapping the int32 wire fields.
- Static `AuthToken` path uses `HttpClientAccessToken::TimePoint::max()` (never-expires sentinel) instead of synthesizing `now + 24h`.
- Remove dead `m_Transport` field and `else if (m_Transport)` branch in `AsyncHordeAgent::Cancel()`.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework of the Horde agent subsystem from synchronous per-thread I/O to an async ASIO-driven architecture, plus provisioner scale-down with graceful draining, OIDC authentication, scheduler improvements, and dashboard UI for provisioner control.
### Async Horde Agent Rewrite
- Replace synchronous `HordeAgent` (one thread per agent, blocking I/O) with `AsyncHordeAgent` — an ASIO state machine running on a shared `io_context` thread pool
- Replace `TcpComputeTransport`/`AesComputeTransport` with `AsyncTcpComputeTransport`/`AsyncAesComputeTransport`
- Replace `AgentMessageChannel` with `AsyncAgentMessageChannel` using frame queuing and ASIO timers
- Delete `ComputeBuffer` and `ComputeChannel` ring-buffer classes (no longer needed)
### Provisioner Drain / Scale-Down
- `HordeProvisioner` can now drain agents when target core count is lowered: queries each agent's `/compute/session/status` for workload, selects candidates by largest-fit/lowest-workload, and sends `/compute/session/drain`
- Configurable `--horde-drain-grace-period` (default 300s) before force-kill
- Implement `IProvisionerStateProvider` interface to expose provisioner state to the orchestrator HTTP layer
- Forward `--coordinator-session`, `--provision-clean`, and `--provision-tracehost` through both Horde and Nomad provisioners to spawned workers
### OIDC Authentication
- `HordeClient` accepts an `AccessTokenProvider` (refreshable token function) as alternative to static `--horde-token`
- Wire up `OidcToken.exe` auto-discovery via `httpclientauth::CreateFromOidcTokenExecutable` with `--HordeUrl` mode
- New `--horde-oidctoken-exe-path` CLI option for explicit path override
### Orchestrator & Scheduler
- Orchestrator generates a session ID at startup; workers include `coordinator_session` in announcements so the orchestrator can reject stale-session workers
- New `Rejected` action state — when a remote runner declines at capacity, the action is rescheduled without retry count increment
- Reduce scheduler lock contention: snapshot pending actions under shared lock, sort/trim outside the lock
- Parallelize remote action submission across runners via `WorkerThreadPool` with slow-submit warnings
- New action field `FailureReason` populated by all runner types (exit codes, sandbox failures, exceptions)
- New endpoints: `session/drain`, `session/status`, `session/sunset`, `provisioner/status`, `provisioner/target`
### Remote Execution
- Eager-attach mode for `RemoteHttpRunner` — bundles all attachments upfront in a `CbPackage` for single-roundtrip submits
- Track in-flight submissions to prevent over-queuing
- Show remote runner hostname in `GetDisplayName()`
- `--announce-url` to override the endpoint announced to the coordinator (e.g. relay-visible address)
### Frontend Dashboard
- Delete standalone `compute.html` (925 lines) and `orchestrator.html` (669 lines), consolidated into JS page modules
- Add provisioner panel to orchestrator dashboard: target/active/estimated core counts, draining agent count
- Editable target-cores input with debounced POST to `/orch/provisioner/target`
- Per-agent provisioning status badges (active / draining / deallocated) in the agents table
- Active vs total CPU counts in agents summary row
### CLI
- New `zen compute record-start` / `record-stop` subcommands
- `zen exec` progress bar with submit and completion phases, atomic work counters, `--progress` mode (Pretty/Plain/Quiet)
### Other
- `DataDir` supports environment variable expansion
- Worker manifest validation checks for `worker.zcb` marker to detect incomplete cached directories
- Linux/Mac runners `nice(5)` child processes to avoid starving the main server
- `ComputeService::SetShutdownCallback` wired to `RequestExit` via `session/sunset`
- Curl HTTP client logs effective URL on failure
- `MachineInfo` carries `Pool` and `Mode` from Horde response
- Horde bundle creation includes `.pdb` on Windows
|
| |
|
|
|
|
|
|
|
|
| |
- **Eliminate `<regex>` usage** — Replaced `std::regex`-based URL parsing in `jupiterbuildstorage.cpp` with manual `string_view` parsing. Added `CXXOPTS_NO_REGEX` to disable regex in cxxopts. Includes comprehensive tests for the new URL parser.
- **Add missing HTTP response codes** — Added `102`, `103`, `203`, `207`, `208`, `226`, `306`, `421`, `425`, `451` to the enum and reason string lookup.
- **Add `ForceColor` support to zen CLI** — Plumbed the `ForceColor` logging option through to the zen client.
- **Add `.clangd` config** — Strips MSVC-specific flags clangd can't handle and suppresses noisy clang-tidy checks.
- **Generic `fmt::formatter` for `ToString`** — Concept-based formatter that auto-formats any type with a free `ToString()` function, removing the need for per-type specializations.
- **Fix OpenSSL dependency** — Changed `zenhorde` to use `openssl3` package on Linux/macOS.
- **Add `<cmath>` include** — Missing include in `hyperloglog.h`.
- **GCC compile fix** — Moved `static constinit` variable inside lambda in `logging.cpp`.
|
| |
|
|
|
|
|
| |
- **Cross-platform `GetProcessMetrics`**: Implement Linux (`/proc/{pid}/stat`, `/proc/{pid}/statm`, `/proc/{pid}/status`) and macOS (`proc_pidinfo(PROC_PIDTASKINFO)`) support for CPU times and memory metrics. Fix Windows to populate the `MemoryBytes` field (was always 0). All platforms now set `MemoryBytes = WorkingSetSize`.
- **`ProcessMetricsTracker`**: Experimental utility class (`zenutil`) that periodically samples resource usage for a set of tracked child processes. Supports both a dedicated background thread and an ASIO steady_timer mode. Computes delta-based CPU usage percentage across samples, with batched sampling (8 processes per tick) to limit per-cycle overhead.
- **`ProcessHandle` documentation**: Add Doxygen comments to all public methods describing platform-specific behavior.
- **Cleanup**: Remove unused `ZEN_RUN_TESTS` macro (inlined at its single call site in `zenserver/main.cpp`), remove dead `#if 0` thread-shutdown workaround block.
- **Minor fixes**: Use `HttpClientAccessToken` constructor in hordeclient instead of setting private members directly. Log ASIO version at startup and include it in the server settings list.
|
| |
|
|
|
|
|
| |
This PR makes it *possible* to do a Windows build on Linux via `clang-cl`.
It doesn't actually change any build process. No policy change, just mechanics and some code fixes to clear clang compilation.
The code fixes are mainly related to #include file name casing, to match the on-disk casing of the SDK files (via xwin).
|
|
|
- Added local process runners for Linux/Wine, Mac with some sandboxing support
- Horde & Nomad provisioning for development and testing
- Client session queues with lifecycle management (active/draining/cancelled), automatic retry with configurable limits, and manual reschedule API
- Improved web UI for orchestrator, compute, and hub dashboards with WebSocket push updates
- Some security hardening
- Improved scalability and `zen exec` command
Still experimental - compute support is disabled by default
|