aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Replace timer-based polling with curl socket-action integrationsb/async-httpclientStefan Boberg13 hours1-38/+187
| | | | | | | | | Switch from a fixed 10ms poll loop (curl_multi_perform) to the event-driven curl_multi_socket_action API. curl now tells us which sockets to watch via CURLMOPT_SOCKETFUNCTION and when to fire timeouts via CURLMOPT_TIMERFUNCTION. Socket readiness is detected through asio::ip::tcp::socket::async_wait, eliminating the polling latency entirely.
* Add async HTTP client using curl_multi + ASIOStefan Boberg13 hours6-269/+1627
| | | | | | | | | | | | Introduces AsyncHttpClient backed by curl_multi_perform driven by an ASIO steady_timer. Supports callback-based and std::future-based APIs for GET/POST/PUT/DELETE/HEAD, with both owned and external io_context modes. All curl_multi state is serialized on an asio::strand, making it safe with multi-threaded io_contexts. Shared curl helpers (callbacks, URL encoding, header construction, error mapping) extracted into httpclientcurlhelpers.h to eliminate duplication between the sync and async implementations.
* disable zencompute in bundle stepHEADmainStefan Boberg21 hours1-0/+3
|
* 5.8.3-pre0v5.8.3-pre0Dan Engelbrecht35 hours1-1/+1
|
* fix hub consule health endpoint registration (#917)Dan Engelbrecht35 hours3-1/+6
| | | | * use correct health endpoint for zenhubserver consul registration * add total disk space on hub resource pane
* 5.8.2v5.8.2Dan Engelbrecht42 hours1-1/+1
|
* 5.8.2-pre1v5.8.2-pre1Dan Engelbrecht45 hours1-1/+1
|
* s3 and consul fixes (#916)Dan Engelbrecht45 hours6-7/+284
| | | | | | | | | | | * fix endpoint for stats/hub in compute/hub.html page * fix api token call failure for imds (using wrong overload for Put) * add "localhost" to healt check url in consul when no address is given * add consul fallback deregister if normal deregister fails * add consul registration unit test
* add provision button to hub ui (#915)Dan Engelbrecht45 hours2-0/+134
|
* hub instance dashboard proxy (#914)Dan Engelbrecht2 days24-34/+714
| | | - Feature: Hub dashboard proxy - instance dashboards are accessible through the hub server at `/hub/proxy/{port}/` without requiring direct port access
* 5.8.2-pre0v5.8.2-pre0Dan Engelbrecht3 days1-1/+1
|
* fix fork() issues on linux and MacOS (#910)Dan Engelbrecht3 days7-22/+177
| | | | | - Improvement: Hub child process spawning on macOS now uses `posix_spawn` in line with Apple recommendations - Bugfix: Hub child process spawning on Linux now uses `vfork` instead of `fork`, preventing ENOMEM failures on systems with strict memory overcommit (`vm.overcommit_memory=2`) - Bugfix: Fixed process group management on POSIX; child processes were not placed into the correct process group, breaking group-wide signal delivery
* consul env token refresh (#912)Dan Engelbrecht3 days6-13/+34
| | | - Improvement: Consul token is now re-read from the environment variable on every request, allowing token rotation without restarting the service
* kill stale test processes (zenserver, minio, nomad, consul) before and after ↵Stefan Boberg3 days2-4/+92
| | | | | CI test runs (#909) Adds steps to the validate workflow on all platforms that kill any zenserver, minio, nomad, or consul processes launched from the build output directory. Runs before tests to clear stale processes from previous runs, and after tests (always, even on failure) to clean up.
* Zs/oplog export zero size attachment fix (#911)Zousar Shaker3 days4-2/+157
| | | | | | * Unit test coverage for zero byte file handling in oplogs * Unit test fixes for the zero length file case * Fixes for zero length file attachments * Additional fix for zero length file attachments
* 5.8.1v5.8.1Dan Engelbrecht4 days1-1/+1
|
* 5.8.1-pre1v5.8.1-pre1Dan Engelbrecht4 days1-1/+1
|
* fix potential race with stats counters missing when to Stop filtered values ↵Dan Engelbrecht4 days4-79/+92
| | | | | | | | (#907) * fix potential race with stats counters missing when to Stop filtered values * fix off by one in PutMultipartBuildBlob retry path * use move operation instead of copy operation PutMultipartBlob * fix filter Stop() for upload operations and fix bug with generateblock count filter
* fix jupiterbuildstorage concurrency (#906)Dan Engelbrecht4 days2-15/+35
| | | - Bugfix: Fixed concurrency issue in JupiterBuildStorage when updating stats
* 5.8.1-pre0v5.8.1-pre0Dan Engelbrecht4 days1-1/+1
|
* add lua config options for all zenhubserver command line options (#904)Dan Engelbrecht4 days4-3/+505
| | | | | | | | | | | - Improvement: Hub server now supports Lua config file for all hub-specific options - `hub.upstreamnotification.*` - upstream notification endpoint and instance ID - `hub.consul.*` - service registration endpoint, token, health interval, deregister timeout - `hub.instance.*` - base port, HTTP class, thread count, core limit, config path - `hub.instance.limits.*` - instance count cap, disk and memory usage limits - `hub.hydration.*` - hydration target spec and config path - `hub.watchdog.*` - cycle timing, inactivity timeouts, and activity check timeouts - Improvement: Added `--hub-instance-base-port-number` as an alias for `--hub-base-port-number`, and `--upstream-notification-instance-id` as an alias for `--instance-id` - Improvement: Added hub mode documentation at docs/hub.md
* Request validation and resilience improvements (#864)Stefan Boberg5 days45-377/+2484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Security: Input validation & path safety - **Reject local file references by default** in package parsing — only allow when explicitly opted in by the service (`ParseFlags::kAllowLocalReferences`) and validated by an `ILocalRefPolicy` (fail-closed: no policy = rejected) - **`DataRootLocalRefPolicy`** restricts local ref paths to the server's data root via canonical path prefix matching - **Validate attachment hashes** in compute HTTP handlers — decompresses and re-hashes each attachment at ingestion time to reject tampered payloads - **Path traversal validation** for worker descriptions (`pathvalidation.h`) — rejects absolute paths, `..` components, Windows reserved device names, and invalid filename characters - **Harden CbPackage parsing** against corrupt inputs — overflow-safe attachment count, bounds checks on local ref offset/size, graceful failure instead of `ZEN_ASSERT` for untrusted data - **Harden legacy package parser** — reject zero-size binary fields, missing mappers, and optionally validate resolved attachment hashes - **Bounds check in `CbPackageReader::MarshalLocalChunkReference`** — detect when `MakeFromFile` silently clamps offset+size to file size ### Reliability: Lock consolidation & bug fixes - **Consolidate three action map locks into one** (`m_ActionMapLock`) — eliminates deadlock risk from multi-lock ordering, simplifies state transitions, and fixes a race where newly enqueued actions were briefly invisible to `GetActionResult`/`FindActionResult` - **Fix infinite loop in `BaseRunnerGroup::SubmitActions`** when actions exceed total runner capacity — cap round-robin at `TotalCapacity` and default unassigned results to "No capacity" - **Fix `MakeSafeAbsolutePathInPlace` for UNC paths** — `\server\share` now correctly becomes `\?\UNC\server\share` instead of `\?\server\share` - **Fix `max_retries=0`** — previously fell through to the default of 3; now correctly means "no retries" ### New: ManagedProcessRunner - Cross-platform process runner backed by `SubprocessManager` — uses async exit callbacks instead of polling, delegates CPU/memory metrics to the manager's built-in sampler - `ProcessGroup` (JobObject on Windows, process group on POSIX) for bulk cancellation on shutdown - `--managed` flag on `zen exec inproc` to select this runner - Refactored monitor thread lifecycle — `StartMonitorThread()` now called from derived constructors to avoid calling virtual functions from base constructor ### Process management - **Suppress crash dialogs** via `JOB_OBJECT_UILIMIT_ERRORMODE` + `SEM_NOGPFAULTERRORBOX` in both `WindowsProcessRunner` and `JobObject::Initialize` — prevents WER/Dr. Watson modal dialogs from blocking the monitor thread - **CREATE_SUSPENDED → AssignProcessToJobObject → ResumeThread** pattern in `WindowsProcessRunner` — ensures job object assignment before process execution - **Move stdout/stderr callbacks to `Spawn()` parameters** in `SubprocessManager` — prevents race where early output could be missed before callback installation - Consistent PID logging across all runner types ### Test infrastructure - **`zentest-appstub`**: Added `Fail` (configurable exit code) and `Crash` (abort / nullptr deref) test functions - **Compute integration tests**: exit code handling, auto-retry exhaustion, manual reschedule after failure, mixed success/failure queues, crash handling (abort + nullptr), crash auto-retry, immediate query visibility after enqueue - **Package format tests**: truncated header, bad magic, attachment count overflow, truncated data, local ref rejection/acceptance, policy enforcement (inside/outside root, traversal, no-policy fail-closed) - **Legacy package parser tests**: empty input, zero-size binary, hash resolution with/without mapper, hash mismatch detection - **UNC path tests** for `MakeSafeAbsolutePath` ### Misc - ANSI color helper macros (`ZEN_RED`, `ZEN_BRIGHT_WHITE`, etc.) and `ZEN_BOLD`/`ZEN_DIM`/etc. - Generic `fmt::formatter` for types with free `ToString` functions - Compute dashboard: truncated hash display with monospace font and hover for full value - Renamed `usonpackage_forcelink` → `cbpackage_forcelink` - Compute enabled by default in xmake config (releases still explicitly disable)
* include rawHash in structure output for builds ls command (#903)Dan Engelbrecht5 days2-0/+2
|
* hub s3 hydrate improvements (#902)Dan Engelbrecht5 days15-117/+1890
| | | | | | | | | | | | | | | | | | | | | | | | - Feature: Added `--hub-hydration-target-config` option to specify the hydration target via a JSON config file (mutually exclusive with `--hub-hydration-target-spec`); supports `file` and `s3` types with structured settings ```json { "type": "file", "settings": { "path": "/path/to/hydration/storage" } } ``` ```json { "type": "s3", "settings": { "uri": "s3://bucket[/prefix]", "region": "us-east-1", "endpoint": "http://localhost:9000", "path-style": true } } ``` - Improvement: Hub hydration dehydration skips the `.sentry-native` directory - Bugfix: Fixed `MakeSafeAbsolutePathInPlace` when a UNC prefix is present but path uses mixed delimiters
* hub resource limits (#900)Dan Engelbrecht5 days18-306/+510
| | | | | | | | | | | | - Feature: Hub dashboard now shows a Resources tile with disk and memory usage against configured limits - Feature: Hub module listing now shows state-change timestamps and duration for each instance - Improvement: Hub provisioning rejects new instances when disk or memory usage exceeds configurable thresholds; limits are disabled by default (0 = no limit) - `--hub-provision-disk-limit-bytes` - Reject provisioning when used disk exceeds this many bytes - `--hub-provision-disk-limit-percent` - Reject provisioning when used disk exceeds this percentage of total disk - `--hub-provision-memory-limit-bytes` - Reject provisioning when used memory exceeds this many bytes - `--hub-provision-memory-limit-percent` - Reject provisioning when used memory exceeds this percentage of total RAM - Improvement: Hub process metrics are now tracked atomically per active instance slot, eliminating per-query process handle lookups - Improvement: Hub, Build Store, and Workspaces service stats sections in the dashboard are now collapsible - Bugfix: Hub watchdog loop did not check `m_ShutdownFlag`, causing it to spin indefinitely on shutdown
* reuse single MinIO instance across s3client integration test (#901)Stefan Boberg5 days1-11/+9
| | | Replace doctest SUBCASEs with sequential scoped blocks so the MinIO server is spawned once and torn down via RAII at scope exit, instead of being restarted for every subcase re-entry. Fixes flaky CI on macOS caused by repeated MinIO process start/stop.
* Merge pull request #899 from ue-foundation/zs/file-intern-extern-conversionZousar Shaker6 days3-0/+409
|\ | | | | Zs/file intern extern conversion
| * Changelogzousar7 days1-0/+1
| |
| * Clean up chunk map when externally referencing a filezousar7 days1-0/+1
| |
| * Test cases for transitioning file reference typeszousar7 days1-0/+407
|/
* Skip release workflow when version tag already exists (#898)Stefan Boberg7 days3-277/+309
| | | Split create_release.yml into a lightweight gate that checks for an existing git tag and a reusable workflow (create_release_impl.yml) containing all build/release jobs. When VERSION.txt is merged to a non-release branch the tag check short-circuits the entire workflow, preventing duplicate builds and failed artifact uploads.
* changelog 5.8.0 headerDan Engelbrecht7 days1-0/+2
|
* 5.8.0v5.8.0Dan Engelbrecht8 days1-1/+1
|
* remove emdash from changelogDan Engelbrecht8 days1-2/+2
|
* 5.8.0-pre1v5.8.0-pre1Dan Engelbrecht8 days1-1/+1
|
* Misc small fixes (#897)Stefan Boberg8 days12-24/+189
| | | | | | | | | | - **Eliminate `<regex>` usage** — Replaced `std::regex`-based URL parsing in `jupiterbuildstorage.cpp` with manual `string_view` parsing. Added `CXXOPTS_NO_REGEX` to disable regex in cxxopts. Includes comprehensive tests for the new URL parser. - **Add missing HTTP response codes** — Added `102`, `103`, `203`, `207`, `208`, `226`, `306`, `421`, `425`, `451` to the enum and reason string lookup. - **Add `ForceColor` support to zen CLI** — Plumbed the `ForceColor` logging option through to the zen client. - **Add `.clangd` config** — Strips MSVC-specific flags clangd can't handle and suppresses noisy clang-tidy checks. - **Generic `fmt::formatter` for `ToString`** — Concept-based formatter that auto-formats any type with a free `ToString()` function, removing the need for per-type specializations. - **Fix OpenSSL dependency** — Changed `zenhorde` to use `openssl3` package on Linux/macOS. - **Add `<cmath>` include** — Missing include in `hyperloglog.h`. - **GCC compile fix** — Moved `static constinit` variable inside lambda in `logging.cpp`.
* 5.8.0-pre0v5.8.0-pre0Dan Engelbrecht8 days1-1/+1
|
* dashboard improvements (#896)Dan Engelbrecht8 days20-250/+646
| | | | | | - Feature: Added Workspaces dashboard page with HTTP request stats and per-workspace metrics - Feature: Added Build Storage dashboard page with service-specific HTTP request stats - Improvement: Front page now shows Hub and Object Store activity tiles; HTTP panel is fixed above the tiles grid - Improvement: HTTP stats tiles now include 5m/15m rates and p999/max latency across all service pages
* remove CPR HTTP client backend (#894)Stefan Boberg8 days172-20954/+13
| | | CPR is no longer needed now that HttpClient has fully transitioned to raw libcurl. This removes the CPR library, its build integration, implementation files, and all conditional compilation guards, leaving curl as the sole HTTP client backend.
* idle deprovision in hub (#895)Dan Engelbrecht8 days35-353/+1104
| | | | | | | | | | | | | - Feature: Hub watchdog automatically deprovisions inactive provisioned and hibernated instances - Feature: Added `stats/activity_counters` endpoint to measure server activity - Feature: Added configuration options for hub watchdog - `--hub-watchdog-provisioned-inactivity-timeout-seconds` Inactivity timeout before a provisioned instance is deprovisioned - `--hub-watchdog-hibernated-inactivity-timeout-seconds` Inactivity timeout before a hibernated instance is deprovisioned - `--hub-watchdog-inactivity-check-margin-seconds` Margin before timeout at which an activity check is issued - `--hub-watchdog-cycle-interval-ms` Watchdog poll interval in milliseconds - `--hub-watchdog-cycle-processing-budget-ms` Maximum time budget per watchdog cycle in milliseconds - `--hub-watchdog-instance-check-throttle-ms` Minimum delay between checks on a single instance - `--hub-watchdog-activity-check-connect-timeout-ms` Connect timeout for activity check requests - `--hub-watchdog-activity-check-request-timeout-ms` Request timeout for activity check requests
* update Oodle 2.9.14 -> 2.9.15 (#893)Stefan Boberg8 days11-136/+5
| | | Fixes incorrect .pdata section for ASM functions on Windows, early failure on invalid dictionarySize, and removes spurious libcxx dependency on Linux/Mac. Adds Linux ARM64 and Windows ARM64 libraries.
* hub instance state refactor (#892)Dan Engelbrecht8 days20-713/+907
| | | | | | - Improvement: Provisioning a hibernated instance now automatically wakes it instead of requiring an explicit wake call first - Improvement: Deprovisioning now accepts instances in Crashed or Hibernated states, not just Provisioned - Improvement: Added `--consul-health-interval-seconds` and `--consul-deregister-after-seconds` options to control Consul health check behavior (defaults: 10s and 30s) - Improvement: Consul registration now occurs when provisioning starts; health check intervals are applied once provisioning completes
* hub async provision/deprovision/hibernate/wake (#891)Dan Engelbrecht10 days7-396/+1129
| | | | | - Improvement: Hub provision, deprovision, hibernate, and wake operations are now async. HTTP requests returns 202 Accepted while the operation completes in the background - Improvement: Hub returns 202 Accepted (instead of 409 Conflict) when the same async operation is already in progress for a module - Improvement: Hub returns 200 OK when a requested state transition is already satisfied
* Subprocess Manager (#889)Stefan Boberg11 days19-537/+2893
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a `SubprocessManager` for managing child processes with ASIO-integrated async exit detection, stdout/stderr pipe capture, and periodic metrics sampling. Also introduces `ProcessGroup` for OS-backed process grouping (Windows JobObjects / POSIX process groups). ### SubprocessManager - Async process exit detection using platform-native mechanisms (Windows `object_handle`, Linux `pidfd_open`, macOS `kqueue EVFILT_PROC`) — no polling - Stdout/stderr capture via async pipe readers with per-process or default callbacks - Periodic round-robin metrics sampling (CPU, memory) across managed processes - Spawn, adopt, remove, kill, and enumerate managed processes ### ProcessGroup - OS-level process grouping: Windows JobObject (kill-on-close guarantee), POSIX `setpgid` (bulk signal delivery) - Atomic group kill via `TerminateJobObject` (Windows) or `kill(-pgid, sig)` (POSIX) - Per-group aggregate metrics and enumeration ### ProcessHandle improvements - Added explicit constructors from `int` (pid) and `void*` (native handle) - Added move constructor and move assignment operator ### ProcessMetricsTracker - Cross-platform process metrics (CPU time, working set, page faults) via `QueryProcessMetrics()` - ASIO timer-driven periodic sampling with configurable interval and batch size - Aggregate metrics across tracked processes ### Other changes - Fixed `zentest-appstub` writing a spurious `Versions` file to cwd on every invocation
* v5.7.25 hotpatch (#874)Dan Engelbrecht11 days1-1/+1
|
* Linux Crashpad fix (#890)Stefan Boberg11 days4-26/+226
| | | | | | - **Replace crashpad static-libc++ patch file with `io.replace()` in `on_install`** — The old `.patch` file was fragile (trailing-whitespace stripping on Windows would silently break it). Using `io.replace()` in the xmake build script is more robust and easier to maintain. - **Skip sentry-native `on_test` link check on Linux** — The link test requires `-lc++abi` when building with the UE clang toolchain but adding it unconditionally breaks GCC/libstdc++ builds. The zenserver build itself validates that the library is usable. - **Add `crashpad-test.sh`** — A test script that launches a release zenserver, waits for the health endpoint, then verifies that `crashpad_handler` is running, no `sentry_init` failure was logged, and the handler has no dynamic `libc++.so.1` dependency. - **Add Crashpad Check step to Linux release CI** — Runs `crashpad-test.sh` in the `validate` workflow for release builds to catch crashpad regressions before merge.
* refactor hub notifications (#888)Dan Engelbrecht11 days7-223/+371
| | | | * refactor hub callbacks * improve http responses
* Cross-platform process metrics support (#887)Stefan Boberg11 days10-53/+728
| | | | | | | - **Cross-platform `GetProcessMetrics`**: Implement Linux (`/proc/{pid}/stat`, `/proc/{pid}/statm`, `/proc/{pid}/status`) and macOS (`proc_pidinfo(PROC_PIDTASKINFO)`) support for CPU times and memory metrics. Fix Windows to populate the `MemoryBytes` field (was always 0). All platforms now set `MemoryBytes = WorkingSetSize`. - **`ProcessMetricsTracker`**: Experimental utility class (`zenutil`) that periodically samples resource usage for a set of tracked child processes. Supports both a dedicated background thread and an ASIO steady_timer mode. Computes delta-based CPU usage percentage across samples, with batched sampling (8 processes per tick) to limit per-cycle overhead. - **`ProcessHandle` documentation**: Add Doxygen comments to all public methods describing platform-specific behavior. - **Cleanup**: Remove unused `ZEN_RUN_TESTS` macro (inlined at its single call site in `zenserver/main.cpp`), remove dead `#if 0` thread-shutdown workaround block. - **Minor fixes**: Use `HttpClientAccessToken` constructor in hordeclient instead of setting private members directly. Log ASIO version at startup and include it in the server settings list.
* Merge branch 'de/v5.7.25-hotpatch' (#880)Dan Engelbrecht11 days4-14/+44
|
* add tests for s3 and file hydrators (#886)Dan Engelbrecht12 days2-4/+699
|