aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Use bash arrays for arg passing and add set -x for visibilitysb/manual-test-workflowsStefan Boberg2026-04-111-21/+30
| | | | | | | Shell string variables with spaces don't survive word splitting reliably. Switch to bash arrays so the -- separator and --malloc=... flags are passed as distinct argv entries to xmake. Also add set -x so the expanded command is visible in the GHA log.
* Add manual test workflow with configurable sanitizers and allocatorsStefan Boberg2026-04-111-0/+298
| | | | | | | | | Adds a workflow_dispatch workflow that lets you run tests with: - Platform selection (windows/linux/macos/all) - Allocator override (stomp/mimalloc/rpmalloc) - Sanitizer selection (asan/tsan/msan) - Test suite filtering - Arbitrary extra arguments
* Dashboard stats tiles no longer flicker (#943)Dan Engelbrecht2026-04-117-190/+199
|
* removed s3 test program (#942)Stefan Boberg2026-04-113-535/+0
| | | Remove the `zens3-testbed` target and source files. This was a standalone test harness for S3 operations that is no longer needed.
* `--consul-register-hub` option to disable hub parent service Consul ↵Dan Engelbrecht2026-04-114-18/+37
| | | | registration (#939)
* hub deprovision all (#938)Dan Engelbrecht2026-04-114-5/+110
| | | * implement "deprovision all" for hub
* dashboard search (#936)Dan Engelbrecht2026-04-117-13/+187
| | | | | - Improvement: Dashboard paginated lists now include a search input that jumps to the page containing the first match and highlights the row - Improvement: Dashboard paginated lists show a loading indicator while fetching data - Improvement: Hub dashboard navigates to and highlights newly provisioned instances
* improve messaging when zen builds download target disk does not have enought ↵Dan Engelbrecht2026-04-112-1/+5
| | | | space (#935)
* update rpmalloc and tweak for commit/decommit churn (#934)Dan Engelbrecht2026-04-115-88/+226
| | | | - Improvement: Updated rpmalloc to develop branch commit feb43aee0d4d (2025-10-26), which fixes `VirtualAlloc(MEM_COMMIT)` failures being silently ignored under memory pressure - Improvement: Increased rpmalloc page decommit thresholds to reduce commit/decommit churn under high allocation turnover
* HTTP range responses (RFC 7233) - httpobjectstore (#928)Dan Engelbrecht2026-04-1011-137/+515
| | | | | | | | | - Improvement: HTTP range responses (RFC 7233) are now fully compliant across the object store and build store - 206 Partial Content responses now include a `Content-Range` header; previously absent for single-range requests, which broke `HttpClient::GetRanges()` - 416 Range Not Satisfiable responses now include `Content-Range: bytes */N` as required by RFC 7233 - Out-of-bounds range requests return 416 Range Not Satisfiable (was 400 Bad Request) - Single-byte ranges (`bytes=N-N`) are now correctly accepted (were previously rejected) - Range byte positions widened from 32-bit to 64-bit; RFC 7233 imposes no size limit on byte range values - Build store binary GET requests with a Range header now return 206 Partial Content with `Content-Range` (previously returned 200 OK without it)
* reduce test runtime (#933)Dan Engelbrecht2026-04-1015-1386/+1242
| | | | | | | | * reduce zenserver spawns in tests * fix filesystemutils wrong test suite name * tweak tests for faster runtime * reduce more test runtime * more wall time improvements * fast http and processmanager tests
* Update CHANGELOG.mdStefan Boberg2026-04-091-0/+4
|
* Fix ZenServerState stale entry detection on PID reuse (k8s) (#932)Stefan Boberg2026-04-091-0/+31
| | | | | | - Detect stale shared-memory entries whose PID matches the current process but predate our registration (m_OurEntry == nullptr) - Sweep() now reclaims such entries instead of skipping them - Lookup() and LookupByEffectivePort() skip stale same-PID entries - Fixes startup failure on k8s where PID 1 is always reused after an unclean shutdown
* Add async HTTP client (curl_multi + ASIO) (#918)Stefan Boberg2026-04-096-269/+1776
| | | | | | | | | | | | | | | | | | | | | | | - Adds `AsyncHttpClient` — an asynchronous HTTP client using `curl_multi_socket_action` integrated with ASIO for event-driven I/O. Supports GET, POST, PUT, DELETE, HEAD with both callback-based and `std::future`-based APIs. - Extracts shared curl helpers (callbacks, URL encoding, header construction, error mapping) into `httpclientcurlhelpers.h`, eliminating duplication between the sync and async implementations. ## Design - All curl_multi state is serialized on an `asio::strand`, safe with multi-threaded io_contexts. - Two construction modes: owned io_context (creates internal thread) or external io_context (caller runs the loop). - Socket readiness is detected via `asio::ip::tcp::socket::async_wait` driven by curl's `CURLMOPT_SOCKETFUNCTION`/`CURLMOPT_TIMERFUNCTION` — no polling, sub-millisecond latency. - Completion callbacks are dispatched off the strand onto the io_context so slow callbacks don't starve the curl event loop. Exceptions in callbacks are caught and logged. ## Files | File | Change | |------|--------| | `zenhttp/include/zenhttp/asynchttpclient.h` | New public header | | `zenhttp/clients/asynchttpclient.cpp` | Implementation (~1000 lines) | | `zenhttp/clients/httpclientcurlhelpers.h` | Shared curl helpers extracted from sync client | | `zenhttp/clients/httpclientcurl.cpp` | Removed duplicated helpers, uses shared header | | `zenhttp/asynchttpclient_test.cpp` | 8 test cases: verbs, payloads, callbacks, concurrency, external io_context, connection errors | | `zenhttp/zenhttp.cpp` | Forcelink registration for new tests |
* migrate from http_parser to llhttp (#929)Dan Engelbrecht2026-04-0912-143/+423
|
* 5.8.3v5.8.3Dan Engelbrecht2026-04-081-1/+1
|
* 5.8.3-pre2v5.8.3-pre2Dan Engelbrecht2026-04-081-1/+1
|
* fully provisioned hub instances now sets initial check status to "passing" ↵Dan Engelbrecht2026-04-084-7/+15
| | | | in consul (#930)
* use correct return code for unsupported multirange requests in objectstore ↵Dan Engelbrecht2026-04-083-2/+65
| | | | (#927)
* don't hard fail if .pending folder is not empty on oplog export (#926)Dan Engelbrecht2026-04-083-2/+11
|
* fix missing chunk in oplog export (#925)Dan Engelbrecht2026-04-082-0/+161
| | | * add reused block to oplog during export
* hydration data obliteration (#923)Dan Engelbrecht2026-04-0816-153/+738
| | | | - Feature: Hub obliterate operation deletes all local and backend hydration data for a module - Improvement: Hub dashboard adds obliterate button for individual, bulk, and by-name module deletion
* sort items on dashboard (#924)Dan Engelbrecht2026-04-073-79/+114
| | | * add pagination and consistent sorting on cache and projects ui pages
* add pagination of cooked projects and caches on dashboard front page (#922)Dan Engelbrecht2026-04-073-65/+173
|
* incremental dehydrate (#921)Dan Engelbrecht2026-04-0722-1148/+1910
| | | | | | | | | | | | | | | - Feature: Incremental CAS-based hydration/dehydration replacing the previous full-copy approach - Feature: S3 hydration backend with multipart upload/download support - Feature: Configurable thread pools for hub instance provisioning and hydration `--hub-instance-provision-threads` defaults to `max(cpu_count / 4, 2)`. Set to 0 for synchronous operation. `--hub-hydration-threads` defaults to `max(cpu_count / 4, 2)`. Set to 0 for synchronous operation. - Improvement: Hub triggers GC on instance before deprovisioning to compact storage before dehydration - Improvement: GC status now reports pending triggers as running - Improvement: S3 client debug logging gated behind verbose mode to reduce log noise at default verbosity - Improvement: Hub dashboard Resources tile now shows total memory - Improvement: `filesystemutils` moved from `zenremotestore` to `zenutil` for broader reuse - Improvement: Hub uses separate provision and hydration worker pools to avoid deadlocks - Improvement: Hibernate/wake/deprovision on non-existent or already-in-target-state modules are idempotent - Improvement: `ScopedTemporaryDirectory` with empty path now creates a temporary directory instead of asserting
* disable zencompute in bundle stepStefan Boberg2026-04-031-0/+3
|
* 5.8.3-pre0v5.8.3-pre0Dan Engelbrecht2026-04-021-1/+1
|
* fix hub consule health endpoint registration (#917)Dan Engelbrecht2026-04-023-1/+6
| | | | * use correct health endpoint for zenhubserver consul registration * add total disk space on hub resource pane
* 5.8.2v5.8.2Dan Engelbrecht2026-04-021-1/+1
|
* 5.8.2-pre1v5.8.2-pre1Dan Engelbrecht2026-04-021-1/+1
|
* s3 and consul fixes (#916)Dan Engelbrecht2026-04-026-7/+284
| | | | | | | | | | | * fix endpoint for stats/hub in compute/hub.html page * fix api token call failure for imds (using wrong overload for Put) * add "localhost" to healt check url in consul when no address is given * add consul fallback deregister if normal deregister fails * add consul registration unit test
* add provision button to hub ui (#915)Dan Engelbrecht2026-04-022-0/+134
|
* hub instance dashboard proxy (#914)Dan Engelbrecht2026-04-0124-34/+714
| | | - Feature: Hub dashboard proxy - instance dashboards are accessible through the hub server at `/hub/proxy/{port}/` without requiring direct port access
* 5.8.2-pre0v5.8.2-pre0Dan Engelbrecht2026-04-011-1/+1
|
* fix fork() issues on linux and MacOS (#910)Dan Engelbrecht2026-04-017-22/+177
| | | | | - Improvement: Hub child process spawning on macOS now uses `posix_spawn` in line with Apple recommendations - Bugfix: Hub child process spawning on Linux now uses `vfork` instead of `fork`, preventing ENOMEM failures on systems with strict memory overcommit (`vm.overcommit_memory=2`) - Bugfix: Fixed process group management on POSIX; child processes were not placed into the correct process group, breaking group-wide signal delivery
* consul env token refresh (#912)Dan Engelbrecht2026-04-016-13/+34
| | | - Improvement: Consul token is now re-read from the environment variable on every request, allowing token rotation without restarting the service
* kill stale test processes (zenserver, minio, nomad, consul) before and after ↵Stefan Boberg2026-04-012-4/+92
| | | | | CI test runs (#909) Adds steps to the validate workflow on all platforms that kill any zenserver, minio, nomad, or consul processes launched from the build output directory. Runs before tests to clear stale processes from previous runs, and after tests (always, even on failure) to clean up.
* Zs/oplog export zero size attachment fix (#911)Zousar Shaker2026-04-014-2/+157
| | | | | | * Unit test coverage for zero byte file handling in oplogs * Unit test fixes for the zero length file case * Fixes for zero length file attachments * Additional fix for zero length file attachments
* 5.8.1v5.8.1Dan Engelbrecht2026-03-311-1/+1
|
* 5.8.1-pre1v5.8.1-pre1Dan Engelbrecht2026-03-311-1/+1
|
* fix potential race with stats counters missing when to Stop filtered values ↵Dan Engelbrecht2026-03-314-79/+92
| | | | | | | | (#907) * fix potential race with stats counters missing when to Stop filtered values * fix off by one in PutMultipartBuildBlob retry path * use move operation instead of copy operation PutMultipartBlob * fix filter Stop() for upload operations and fix bug with generateblock count filter
* fix jupiterbuildstorage concurrency (#906)Dan Engelbrecht2026-03-312-15/+35
| | | - Bugfix: Fixed concurrency issue in JupiterBuildStorage when updating stats
* 5.8.1-pre0v5.8.1-pre0Dan Engelbrecht2026-03-301-1/+1
|
* add lua config options for all zenhubserver command line options (#904)Dan Engelbrecht2026-03-304-3/+505
| | | | | | | | | | | - Improvement: Hub server now supports Lua config file for all hub-specific options - `hub.upstreamnotification.*` - upstream notification endpoint and instance ID - `hub.consul.*` - service registration endpoint, token, health interval, deregister timeout - `hub.instance.*` - base port, HTTP class, thread count, core limit, config path - `hub.instance.limits.*` - instance count cap, disk and memory usage limits - `hub.hydration.*` - hydration target spec and config path - `hub.watchdog.*` - cycle timing, inactivity timeouts, and activity check timeouts - Improvement: Added `--hub-instance-base-port-number` as an alias for `--hub-base-port-number`, and `--upstream-notification-instance-id` as an alias for `--instance-id` - Improvement: Added hub mode documentation at docs/hub.md
* Request validation and resilience improvements (#864)Stefan Boberg2026-03-3045-377/+2484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Security: Input validation & path safety - **Reject local file references by default** in package parsing — only allow when explicitly opted in by the service (`ParseFlags::kAllowLocalReferences`) and validated by an `ILocalRefPolicy` (fail-closed: no policy = rejected) - **`DataRootLocalRefPolicy`** restricts local ref paths to the server's data root via canonical path prefix matching - **Validate attachment hashes** in compute HTTP handlers — decompresses and re-hashes each attachment at ingestion time to reject tampered payloads - **Path traversal validation** for worker descriptions (`pathvalidation.h`) — rejects absolute paths, `..` components, Windows reserved device names, and invalid filename characters - **Harden CbPackage parsing** against corrupt inputs — overflow-safe attachment count, bounds checks on local ref offset/size, graceful failure instead of `ZEN_ASSERT` for untrusted data - **Harden legacy package parser** — reject zero-size binary fields, missing mappers, and optionally validate resolved attachment hashes - **Bounds check in `CbPackageReader::MarshalLocalChunkReference`** — detect when `MakeFromFile` silently clamps offset+size to file size ### Reliability: Lock consolidation & bug fixes - **Consolidate three action map locks into one** (`m_ActionMapLock`) — eliminates deadlock risk from multi-lock ordering, simplifies state transitions, and fixes a race where newly enqueued actions were briefly invisible to `GetActionResult`/`FindActionResult` - **Fix infinite loop in `BaseRunnerGroup::SubmitActions`** when actions exceed total runner capacity — cap round-robin at `TotalCapacity` and default unassigned results to "No capacity" - **Fix `MakeSafeAbsolutePathInPlace` for UNC paths** — `\server\share` now correctly becomes `\?\UNC\server\share` instead of `\?\server\share` - **Fix `max_retries=0`** — previously fell through to the default of 3; now correctly means "no retries" ### New: ManagedProcessRunner - Cross-platform process runner backed by `SubprocessManager` — uses async exit callbacks instead of polling, delegates CPU/memory metrics to the manager's built-in sampler - `ProcessGroup` (JobObject on Windows, process group on POSIX) for bulk cancellation on shutdown - `--managed` flag on `zen exec inproc` to select this runner - Refactored monitor thread lifecycle — `StartMonitorThread()` now called from derived constructors to avoid calling virtual functions from base constructor ### Process management - **Suppress crash dialogs** via `JOB_OBJECT_UILIMIT_ERRORMODE` + `SEM_NOGPFAULTERRORBOX` in both `WindowsProcessRunner` and `JobObject::Initialize` — prevents WER/Dr. Watson modal dialogs from blocking the monitor thread - **CREATE_SUSPENDED → AssignProcessToJobObject → ResumeThread** pattern in `WindowsProcessRunner` — ensures job object assignment before process execution - **Move stdout/stderr callbacks to `Spawn()` parameters** in `SubprocessManager` — prevents race where early output could be missed before callback installation - Consistent PID logging across all runner types ### Test infrastructure - **`zentest-appstub`**: Added `Fail` (configurable exit code) and `Crash` (abort / nullptr deref) test functions - **Compute integration tests**: exit code handling, auto-retry exhaustion, manual reschedule after failure, mixed success/failure queues, crash handling (abort + nullptr), crash auto-retry, immediate query visibility after enqueue - **Package format tests**: truncated header, bad magic, attachment count overflow, truncated data, local ref rejection/acceptance, policy enforcement (inside/outside root, traversal, no-policy fail-closed) - **Legacy package parser tests**: empty input, zero-size binary, hash resolution with/without mapper, hash mismatch detection - **UNC path tests** for `MakeSafeAbsolutePath` ### Misc - ANSI color helper macros (`ZEN_RED`, `ZEN_BRIGHT_WHITE`, etc.) and `ZEN_BOLD`/`ZEN_DIM`/etc. - Generic `fmt::formatter` for types with free `ToString` functions - Compute dashboard: truncated hash display with monospace font and hover for full value - Renamed `usonpackage_forcelink` → `cbpackage_forcelink` - Compute enabled by default in xmake config (releases still explicitly disable)
* include rawHash in structure output for builds ls command (#903)Dan Engelbrecht2026-03-302-0/+2
|
* hub s3 hydrate improvements (#902)Dan Engelbrecht2026-03-3015-117/+1890
| | | | | | | | | | | | | | | | | | | | | | | | - Feature: Added `--hub-hydration-target-config` option to specify the hydration target via a JSON config file (mutually exclusive with `--hub-hydration-target-spec`); supports `file` and `s3` types with structured settings ```json { "type": "file", "settings": { "path": "/path/to/hydration/storage" } } ``` ```json { "type": "s3", "settings": { "uri": "s3://bucket[/prefix]", "region": "us-east-1", "endpoint": "http://localhost:9000", "path-style": true } } ``` - Improvement: Hub hydration dehydration skips the `.sentry-native` directory - Bugfix: Fixed `MakeSafeAbsolutePathInPlace` when a UNC prefix is present but path uses mixed delimiters
* hub resource limits (#900)Dan Engelbrecht2026-03-3018-306/+510
| | | | | | | | | | | | - Feature: Hub dashboard now shows a Resources tile with disk and memory usage against configured limits - Feature: Hub module listing now shows state-change timestamps and duration for each instance - Improvement: Hub provisioning rejects new instances when disk or memory usage exceeds configurable thresholds; limits are disabled by default (0 = no limit) - `--hub-provision-disk-limit-bytes` - Reject provisioning when used disk exceeds this many bytes - `--hub-provision-disk-limit-percent` - Reject provisioning when used disk exceeds this percentage of total disk - `--hub-provision-memory-limit-bytes` - Reject provisioning when used memory exceeds this many bytes - `--hub-provision-memory-limit-percent` - Reject provisioning when used memory exceeds this percentage of total RAM - Improvement: Hub process metrics are now tracked atomically per active instance slot, eliminating per-query process handle lookups - Improvement: Hub, Build Store, and Workspaces service stats sections in the dashboard are now collapsible - Bugfix: Hub watchdog loop did not check `m_ShutdownFlag`, causing it to spin indefinitely on shutdown
* reuse single MinIO instance across s3client integration test (#901)Stefan Boberg2026-03-301-11/+9
| | | Replace doctest SUBCASEs with sequential scoped blocks so the MinIO server is spawned once and torn down via RAII at scope exit, instead of being restarted for every subcase re-entry. Fixes flaky CI on macOS caused by repeated MinIO process start/stop.
* Merge pull request #899 from ue-foundation/zs/file-intern-extern-conversionZousar Shaker2026-03-283-0/+409
|\ | | | | Zs/file intern extern conversion