| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
keeping it in memory overloads memory when boost-worker-memory is enabled
|
| | |
|
| |
|
|
| |
- Bugfix: OAuth client credentials token request now sends correct `application/x-www-form-urlencoded` content type
- Improvement: HTTP client Content-Type in additional headers now overrides the payload content type
|
| | |
|
| |
|
|
| |
of a generic bad gateway error (#956)
|
| | |
|
| |\
| |
| | |
Stop using O_CLOEXEC in shm_open
|
| | |
| |
| |
| | |
According to documentation, shm_open already sets O_CLOEXEC.
|
| | | |
|
| | | |
|
| |/ |
|
| | |
|
| |
|
| |
* make mimalloc default again
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework of the Horde agent subsystem from synchronous per-thread I/O to an async ASIO-driven architecture, plus provisioner scale-down with graceful draining, OIDC authentication, scheduler improvements, and dashboard UI for provisioner control.
### Async Horde Agent Rewrite
- Replace synchronous `HordeAgent` (one thread per agent, blocking I/O) with `AsyncHordeAgent` — an ASIO state machine running on a shared `io_context` thread pool
- Replace `TcpComputeTransport`/`AesComputeTransport` with `AsyncTcpComputeTransport`/`AsyncAesComputeTransport`
- Replace `AgentMessageChannel` with `AsyncAgentMessageChannel` using frame queuing and ASIO timers
- Delete `ComputeBuffer` and `ComputeChannel` ring-buffer classes (no longer needed)
### Provisioner Drain / Scale-Down
- `HordeProvisioner` can now drain agents when target core count is lowered: queries each agent's `/compute/session/status` for workload, selects candidates by largest-fit/lowest-workload, and sends `/compute/session/drain`
- Configurable `--horde-drain-grace-period` (default 300s) before force-kill
- Implement `IProvisionerStateProvider` interface to expose provisioner state to the orchestrator HTTP layer
- Forward `--coordinator-session`, `--provision-clean`, and `--provision-tracehost` through both Horde and Nomad provisioners to spawned workers
### OIDC Authentication
- `HordeClient` accepts an `AccessTokenProvider` (refreshable token function) as alternative to static `--horde-token`
- Wire up `OidcToken.exe` auto-discovery via `httpclientauth::CreateFromOidcTokenExecutable` with `--HordeUrl` mode
- New `--horde-oidctoken-exe-path` CLI option for explicit path override
### Orchestrator & Scheduler
- Orchestrator generates a session ID at startup; workers include `coordinator_session` in announcements so the orchestrator can reject stale-session workers
- New `Rejected` action state — when a remote runner declines at capacity, the action is rescheduled without retry count increment
- Reduce scheduler lock contention: snapshot pending actions under shared lock, sort/trim outside the lock
- Parallelize remote action submission across runners via `WorkerThreadPool` with slow-submit warnings
- New action field `FailureReason` populated by all runner types (exit codes, sandbox failures, exceptions)
- New endpoints: `session/drain`, `session/status`, `session/sunset`, `provisioner/status`, `provisioner/target`
### Remote Execution
- Eager-attach mode for `RemoteHttpRunner` — bundles all attachments upfront in a `CbPackage` for single-roundtrip submits
- Track in-flight submissions to prevent over-queuing
- Show remote runner hostname in `GetDisplayName()`
- `--announce-url` to override the endpoint announced to the coordinator (e.g. relay-visible address)
### Frontend Dashboard
- Delete standalone `compute.html` (925 lines) and `orchestrator.html` (669 lines), consolidated into JS page modules
- Add provisioner panel to orchestrator dashboard: target/active/estimated core counts, draining agent count
- Editable target-cores input with debounced POST to `/orch/provisioner/target`
- Per-agent provisioning status badges (active / draining / deallocated) in the agents table
- Active vs total CPU counts in agents summary row
### CLI
- New `zen compute record-start` / `record-stop` subcommands
- `zen exec` progress bar with submit and completion phases, atomic work counters, `--progress` mode (Pretty/Plain/Quiet)
### Other
- `DataDir` supports environment variable expansion
- Worker manifest validation checks for `worker.zcb` marker to detect incomplete cached directories
- Linux/Mac runners `nice(5)` child processes to avoid starving the main server
- `ComputeService::SetShutdownCallback` wired to `RequestExit` via `session/sunset`
- Curl HTTP client logs effective URL on failure
- `MachineInfo` carries `Pool` and `Mode` from Horde response
- Horde bundle creation includes `.pdb` on Windows
|
| | |
|
| |
|
| |
* log curl raw error on retry, add retry on CURLE_PARTIAL_FILE error
|
| |
|
| |
* silence exceptions in threaded requests to build storage if already aborted
|
| |
|
|
|
|
|
|
|
|
| |
- Replace per-type fmt::formatter specializations (StringBuilderBase, NiceBase) with a single generic formatter using a HasStringViewConversion concept
- Add ThousandsNum for comma-separated integer formatting (e.g. "1,234,567")
- Thread naming now accepts a sort hint for trace ordering
- Fix main thread trace registration to use actual thread ID and sort first
- Add ExpandEnvironmentVariables() for expanding %VAR% references in strings, with tests
- Add ParseHexBytes() overload with expected byte count validation
- Add Flag_BelowNormalPriority to CreateProcOptions (BELOW_NORMAL_PRIORITY_CLASS on Windows, setpriority on POSIX)
- Add PrettyScroll progress bar mode that pins the status line to the bottom of the terminal using scroll regions, with signal handler cleanup for Ctrl+C/SIGTERM
|
| |
|
|
|
|
|
|
|
|
| |
This PR introduces an in-memory `CidStore` option primarily for use with compute, to avoid hitting disk for ephemeral data which is not really worth persisting. And in particular not worth paying the critical path cost of persistence.
- **MemoryCidStore**: In-memory CidStore implementation backed by a hash map, optionally layered over a standard CidStore. Writes to the backing store are dispatched asynchronously via a dedicated flush thread to avoid blocking callers on disk I/O. Reads check memory first, then fall back to the backing store without caching the result.
- **ChunkStore interface**: Extract `ChunkStore` abstract class (`AddChunk`, `ContainsChunk`, `FilterChunks`) and `FallbackChunkResolver` into `zenstore.h` so `HttpComputeService` can accept different storage backends for action inputs vs worker binaries. `CidStore` and `MemoryCidStore` both implement `ChunkStore`.
- **Compute service wiring**: `HttpComputeService` takes two `ChunkStore&` params (action + worker). The compute server uses `MemoryCidStore` for actions (no disk persistence needed) and disk-backed `CidStore` for workers (cross-action reuse). The storage server passes its `CidStore` for both (unchanged behavior).
|
| | |
|
| |
|
|
|
|
|
| |
* objectstore.cpp - m_TotalBytesServed now tracks all range cases (single, multi, 416)
* async http: docstring corrected: curl_multi_socket_action() / ASIO socket async_wait
remove non-ascii characters
* fix singlethreaded gc option in lua to not use dash
* fix changelog order
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Core logging and system diagnostics improvements, extracted from the compute branch.
### Logging
- **Elapsed timestamps**: Console log now shows elapsed time since launch `[HH:MM:SS.mmm]` instead of full date/time; file logging is unchanged
- **Short level names**: 3-letter short level names (`trc`/`dbg`/`inf`/`wrn`/`err`/`crt`) used by both console and file formatters via `ShortToStringView()`
- **Consistent field order**: Standardized to `[timestamp] [level] [logger]` across both console and file formatters
- **Slim LogMessage/LogPoint**: Remove redundant fields from `LogMessage` (derive level/source from `LogPoint`), flatten `LogPoint` to inline filename/line fields, shrink `LogLevel` to `int8_t` with `static_assert(sizeof(LogPoint) <= 32)`
- **Remove default member initializers** and static default `LogPoint` from `LogMessage` — all fields initialized by constructor
- **LoggerRef string constructor**: Convenience constructor accepting a string directly
- **Fix SendMessage macro collision**: Replace `thread.h` include in `logmsg.h` with a forward declaration of `GetCurrentThreadId()` to avoid pulling in `windows.h` transitively
### System Diagnostics
- **Cache static system metrics**: Add `RefreshDynamicSystemMetrics()` that only queries values that change at runtime (available memory, uptime, swap). `SystemMetricsTracker` snapshots full `GetSystemMetrics()` once at construction and reuses cached topology/total memory on each `Query()`, avoiding repeated `GetLogicalProcessorInformationEx` traversal on Windows, `/proc/cpuinfo` parsing on Linux, and `sysctl` topology calls on macOS
|
| | |
|
| |
|
|
|
|
|
| |
`--hub-instance-malloc` selects the memory allocator for child instances
`--hub-instance-trace` sets trace channels for child instances
`--hub-instance-tracehost` sets the trace streaming host for child instances
`--hub-instance-tracefile` sets the trace output file for child instances
add {moduleid} and {port} placeholder support for tracefile
|
| |
|
|
|
| |
- Adds a `workflow_dispatch` workflow ("Manual Test Run") that can be triggered from the Actions tab
- Configurable options: platform, memory allocator (`--malloc=stomp`/mimalloc/rpmalloc), sanitizer (asan/tsan/msan), test suite, and freeform extra arguments
- Mirrors the build & test steps from `validate.yml` but always builds debug with sentry disabled, and with longer timeout (40min) to accommodate sanitizer overhead
|
| | |
|
| |
|
| |
Remove the `zens3-testbed` target and source files. This was a standalone test harness for S3 operations that is no longer needed.
|
| |
|
|
| |
registration (#939)
|
| |
|
| |
* implement "deprovision all" for hub
|
| |
|
|
|
| |
- Improvement: Dashboard paginated lists now include a search input that jumps to the page containing the first match and highlights the row
- Improvement: Dashboard paginated lists show a loading indicator while fetching data
- Improvement: Hub dashboard navigates to and highlights newly provisioned instances
|
| |
|
|
| |
space (#935)
|
| |
|
|
| |
- Improvement: Updated rpmalloc to develop branch commit feb43aee0d4d (2025-10-26), which fixes `VirtualAlloc(MEM_COMMIT)` failures being silently ignored under memory pressure
- Improvement: Increased rpmalloc page decommit thresholds to reduce commit/decommit churn under high allocation turnover
|
| |
|
|
|
|
|
|
|
| |
- Improvement: HTTP range responses (RFC 7233) are now fully compliant across the object store and build store
- 206 Partial Content responses now include a `Content-Range` header; previously absent for single-range requests, which broke `HttpClient::GetRanges()`
- 416 Range Not Satisfiable responses now include `Content-Range: bytes */N` as required by RFC 7233
- Out-of-bounds range requests return 416 Range Not Satisfiable (was 400 Bad Request)
- Single-byte ranges (`bytes=N-N`) are now correctly accepted (were previously rejected)
- Range byte positions widened from 32-bit to 64-bit; RFC 7233 imposes no size limit on byte range values
- Build store binary GET requests with a Range header now return 206 Partial Content with `Content-Range` (previously returned 200 OK without it)
|
| |
|
|
|
|
|
|
| |
* reduce zenserver spawns in tests
* fix filesystemutils wrong test suite name
* tweak tests for faster runtime
* reduce more test runtime
* more wall time improvements
* fast http and processmanager tests
|
| | |
|
| |
|
|
|
|
| |
- Detect stale shared-memory entries whose PID matches the current process but predate our registration (m_OurEntry == nullptr)
- Sweep() now reclaims such entries instead of skipping them
- Lookup() and LookupByEffectivePort() skip stale same-PID entries
- Fixes startup failure on k8s where PID 1 is always reused after an unclean shutdown
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Adds `AsyncHttpClient` — an asynchronous HTTP client using `curl_multi_socket_action` integrated with ASIO for event-driven I/O. Supports GET, POST, PUT, DELETE, HEAD with both callback-based and `std::future`-based APIs.
- Extracts shared curl helpers (callbacks, URL encoding, header construction, error mapping) into `httpclientcurlhelpers.h`, eliminating duplication between the sync and async implementations.
## Design
- All curl_multi state is serialized on an `asio::strand`, safe with multi-threaded io_contexts.
- Two construction modes: owned io_context (creates internal thread) or external io_context (caller runs the loop).
- Socket readiness is detected via `asio::ip::tcp::socket::async_wait` driven by curl's `CURLMOPT_SOCKETFUNCTION`/`CURLMOPT_TIMERFUNCTION` — no polling, sub-millisecond latency.
- Completion callbacks are dispatched off the strand onto the io_context so slow callbacks don't starve the curl event loop. Exceptions in callbacks are caught and logged.
## Files
| File | Change |
|------|--------|
| `zenhttp/include/zenhttp/asynchttpclient.h` | New public header |
| `zenhttp/clients/asynchttpclient.cpp` | Implementation (~1000 lines) |
| `zenhttp/clients/httpclientcurlhelpers.h` | Shared curl helpers extracted from sync client |
| `zenhttp/clients/httpclientcurl.cpp` | Removed duplicated helpers, uses shared header |
| `zenhttp/asynchttpclient_test.cpp` | 8 test cases: verbs, payloads, callbacks, concurrency, external io_context, connection errors |
| `zenhttp/zenhttp.cpp` | Forcelink registration for new tests |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
in consul (#930)
|
| |
|
|
| |
(#927)
|
| | |
|
| |
|
| |
* add reused block to oplog during export
|
| |
|
|
| |
- Feature: Hub obliterate operation deletes all local and backend hydration data for a module
- Improvement: Hub dashboard adds obliterate button for individual, bulk, and by-name module deletion
|
| |
|
| |
* add pagination and consistent sorting on cache and projects ui pages
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Feature: Incremental CAS-based hydration/dehydration replacing the previous full-copy approach
- Feature: S3 hydration backend with multipart upload/download support
- Feature: Configurable thread pools for hub instance provisioning and hydration
`--hub-instance-provision-threads` defaults to `max(cpu_count / 4, 2)`. Set to 0 for synchronous operation.
`--hub-hydration-threads` defaults to `max(cpu_count / 4, 2)`. Set to 0 for synchronous operation.
- Improvement: Hub triggers GC on instance before deprovisioning to compact storage before dehydration
- Improvement: GC status now reports pending triggers as running
- Improvement: S3 client debug logging gated behind verbose mode to reduce log noise at default verbosity
- Improvement: Hub dashboard Resources tile now shows total memory
- Improvement: `filesystemutils` moved from `zenremotestore` to `zenutil` for broader reuse
- Improvement: Hub uses separate provision and hydration worker pools to avoid deadlocks
- Improvement: Hibernate/wake/deprovision on non-existent or already-in-target-state modules are idempotent
- Improvement: `ScopedTemporaryDirectory` with empty path now creates a temporary directory instead of asserting
|
| | |
|
| | |
|