| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#985)
Establishes a new end-to-end integration test harness for the `zen` CLI, the shared fetcher it uses to pull test artifacts, and the CI plumbing that feeds both. Also lowers the default test-harness log level and broadens the artifact fetcher's credential resolution.
### `zen-test` executable (`src/zen-test/`)
- New binary modeled on `zenserver-test`, built only in debug.
- `zen-test.{h,cpp}` harness: spawns `zen.exe` via `CreateProc` and captures combined stdout/stderr into a `ZenCommandResult` for assertion.
- Registered with `scripts/test.lua` under the short name `zen` (`xmake test --run=zen`) and enabled for `--kill-stale-processes`.
- Prints a clear console message when invoked from a release build (tests disabled), so misconfiguration is easy to spot.
- Documented in `CLAUDE.md` (test-suite naming table + test projects section) and `README.md`.
- Test cases in the `zen.artifactprovider` suite:
- `probe.lyra_cook_rpc_recording` — probe against a canonical Lyra cook RPC recording that skips with a diagnostic `MESSAGE` when no artifact source is configured.
- `probe.s3_readme` — probes the configured S3 bucket for `README.md` using a fresh temp cache to force the request through to S3; skips on macOS without static creds (no EC2 Mac runners in our fleet).
- `zen.utility-cmd` suite: new integration tests exercising `zen print`, `zen wipe`, and `zen copy`.
### `TestArtifactProvider` (`src/zenutil/testartifactprovider.{h,cpp}`)
- `Ref<TestArtifactProvider>` factory returning a local-only or S3-backed provider, selected from env vars:
- `ZEN_TEST_ARTIFACTS_PATH` — local directory to serve from (write-through cache for remote fetches).
- `ZEN_TEST_ARTIFACTS_S3` — S3 URL to fetch from.
- `AWS_DEFAULT_REGION` / `AWS_REGION`, `AWS_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN` — standard AWS config.
- `Exists(path)` / `Fetch(path)` API with a `TestArtifactFetchResult` return carrying the content buffer and a diagnostic error string. Content is cached on disk across test runs.
- **IMDS credential fallback**: when no static `AWS_ACCESS_KEY_ID` is present, attaches an `ImdsCredentialProvider` so self-hosted EC2 runners with an attached IAM role can sign S3 requests without static credentials (mirrors the pattern in `zenserver/hub/hydration.cpp`).
- **IMDS opt-out**: honors the standard `AWS_EC2_METADATA_DISABLED=true` env var, and skips IMDS by default on macOS where the link-local probe would just emit noise.
### Test harness log level (`src/zencore/testing.cpp`)
- `TestRunner::ApplyCommandLine` now defaults the global log level to `Info` (was effectively `Trace`), cutting the noise from `xmake test --run=all` now that the suite has grown. Applies uniformly to `zencore-test`, `zenhttp-test`, `zenstore-test`, `zenutil-test`, `zenserver-test`, `zen-test`, etc. `--debug` (Debug) and `--verbose` (Trace) still opt back in when chasing failures.
### CI (`.github/workflows/validate.yml`)
- **Runner info step** on all three platforms (Windows/Linux/macOS): prints host, CPU topology, memory, and disk usage before the build/test step, so flakes that correlate with a particular runner or low disk space are easy to spot.
- **Artifact env wiring**: passes `ZEN_TEST_ARTIFACTS_S3` and `AWS_DEFAULT_REGION` into the debug Build & Test step on all three platforms so the probe can reach its source when the repo variable is configured. The probe skips cleanly when unset.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
### Critical (cryptographic correctness)
- AES-GCM nonce: replace homebrew `N32[0]++; N32[1]--; N32[2] = ^` scheme with NIST SP 800-38D §8.2.1 deterministic construction (64-bit big-endian counter). Session tears down on counter exhaustion instead of reusing a nonce.
- Remove `std::random_device` / `mt19937` nonce seed - the deterministic construction from the previous commit doesn't need an RNG, and `std::random_device` isn't guaranteed to be a CSPRNG.
- BCrypt return values: check every `BCRYPT_SUCCESS`, cache the `BCRYPT_KEY_HANDLE` on the context instead of re-creating it per message, destroy under null-guards. Closes the silent-downgrade-to-non-GCM path.
### High
- OpenSSL: check `EVP_CIPHER_CTX_new` / `EVP_EncryptInit_ex` / `EVP_DecryptInit_ex` return values in the constructor and set `HasErrors` on failure.
- Log AES-GCM tag-verification failures distinctly from other decrypt errors (BCrypt `STATUS_AUTH_TAG_MISMATCH` / OpenSSL `EVP_DecryptFinal_ex` post-set-tag), with a sequence counter for correlation.
- Thread a bounds-checked `ReadCursor` through every `Read*` parser helper; `ReadException` / `ReadExecuteResult` / `ReadBlobRequest` now return `bool` and callers treat malformed frames as protocol errors. Closes the `0xFF` varint OOB-read.
- Validate `ReadBlobRequest` locator as a safe filename component (reject path separators, `..`, NUL/control, drive colons, leading/trailing dot/space, length > 255). Closes the path-traversal attack on the `BundleDir / (Locator + ".blob")` join.
- Bind `AsyncAgentMessageChannel`'s timer and `AsyncReadResponse` entry onto the socket's strand; expose `AsyncComputeSocket::GetStrand()`. Removes the race between the bare-io_context timer completion and `OnFrame` on `m_PendingHandler` under the 3-thread pool.
- Drop the long-lived `m_EncryptBuffer` member - encrypt into a fresh per-write buffer shared with the completion handler. Also fixes thread-safety of the encrypt path.
- Validate server-returned `ClusterId` against `[A-Za-z0-9._-]{1,64}` before concatenating into the `api/v2/compute/<ClusterId>` URL.
### Medium
- `EVP_CIPHER_CTX_reset` + re-bind cipher on every encrypt/decrypt so stale state cannot bleed across messages. Also logs EVP failures.
- Malformed `ExecuteResult` (size != 4) now tears down the agent instead of silently reporting `ExitCode = -1`.
- Replace `assert(Eq != nullptr)` on env var parsing with a `zen::runtime_error` - assert is compiled out in release and `*(Eq+1)` was UB.
- Blob name uses `zen::Oid::NewOid()` (24 hex chars, seeded from `random_device` run-id + monotonic serial) instead of predictable `<pid>_<ms>_<counter>`. Refuse to overwrite an existing blob path.
- Cap `m_RecentlyDrainedWorkerIds` at 256 entries with an FIFO eviction queue.
- `Blob(Data, Length)` rejects `Length > INT32_MAX` instead of wrapping the int32 wire fields.
- Static `AuthToken` path uses `HttpClientAccessToken::TimePoint::max()` (never-expires sentinel) instead of synthesizing `now + 24h`.
- Remove dead `m_Transport` field and `else if (m_Transport)` branch in `AsyncHordeAgent::Cancel()`.
|
| |
|
|
|
|
| |
archive.ubuntu.com has 8+ second connect times from our runners (and
locally). The EC2 regional mirror should resolve within the same
network, avoiding the slow external path.
|
| |
|
|
|
|
|
| |
Add a diagnostics step before Docker builds that checks DNS resolution
and download speed to archive.ubuntu.com and dl.winehq.org to help
identify runner network issues. Also enable --progress=plain for full
build output visibility.
|
| |
|
|
|
|
|
|
| |
The Wine installation layer was being rebuilt from scratch on every CI
run due to no layer caching on ephemeral runners. Pull a stable
latest-wine/latest-linux cache tag before building and embed cache
metadata with BUILDKIT_INLINE_CACHE so subsequent builds reuse the
expensive Wine/apt layer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Core logging and system diagnostics improvements, extracted from the compute branch.
### Logging
- **Elapsed timestamps**: Console log now shows elapsed time since launch `[HH:MM:SS.mmm]` instead of full date/time; file logging is unchanged
- **Short level names**: 3-letter short level names (`trc`/`dbg`/`inf`/`wrn`/`err`/`crt`) used by both console and file formatters via `ShortToStringView()`
- **Consistent field order**: Standardized to `[timestamp] [level] [logger]` across both console and file formatters
- **Slim LogMessage/LogPoint**: Remove redundant fields from `LogMessage` (derive level/source from `LogPoint`), flatten `LogPoint` to inline filename/line fields, shrink `LogLevel` to `int8_t` with `static_assert(sizeof(LogPoint) <= 32)`
- **Remove default member initializers** and static default `LogPoint` from `LogMessage` — all fields initialized by constructor
- **LoggerRef string constructor**: Convenience constructor accepting a string directly
- **Fix SendMessage macro collision**: Replace `thread.h` include in `logmsg.h` with a forward declaration of `GetCurrentThreadId()` to avoid pulling in `windows.h` transitively
### System Diagnostics
- **Cache static system metrics**: Add `RefreshDynamicSystemMetrics()` that only queries values that change at runtime (available memory, uptime, swap). `SystemMetricsTracker` snapshots full `GetSystemMetrics()` once at construction and reuses cached topology/total memory on each `Query()`, avoiding repeated `GetLogicalProcessorInformationEx` traversal on Windows, `/proc/cpuinfo` parsing on Linux, and `sysctl` topology calls on macOS
|
| |
|
|
|
| |
- Adds a `workflow_dispatch` workflow ("Manual Test Run") that can be triggered from the Actions tab
- Configurable options: platform, memory allocator (`--malloc=stomp`/mimalloc/rpmalloc), sanitizer (asan/tsan/msan), test suite, and freeform extra arguments
- Mirrors the build & test steps from `validate.yml` but always builds debug with sentry disabled, and with longer timeout (40min) to accommodate sanitizer overhead
|
| |
|
|
|
| |
CI test runs (#909)
Adds steps to the validate workflow on all platforms that kill any zenserver, minio, nomad, or consul processes launched from the build output directory. Runs before tests to clear stale processes from previous runs, and after tests (always, even on failure) to clean up.
|
| |
|
| |
Split create_release.yml into a lightweight gate that checks for an existing git tag and a reusable workflow (create_release_impl.yml) containing all build/release jobs. When VERSION.txt is merged to a non-release branch the tag check short-circuits the entire workflow, preventing duplicate builds and failed artifact uploads.
|
| |
|
|
|
|
| |
- **Replace crashpad static-libc++ patch file with `io.replace()` in `on_install`** — The old `.patch` file was fragile (trailing-whitespace stripping on Windows would silently break it). Using `io.replace()` in the xmake build script is more robust and easier to maintain.
- **Skip sentry-native `on_test` link check on Linux** — The link test requires `-lc++abi` when building with the UE clang toolchain but adding it unconditionally breaks GCC/libstdc++ builds. The zenserver build itself validates that the library is usable.
- **Add `crashpad-test.sh`** — A test script that launches a release zenserver, waits for the health endpoint, then verifies that `crashpad_handler` is running, no `sentry_init` failure was logged, and the handler has no dynamic `libc++.so.1` dependency.
- **Add Crashpad Check step to Linux release CI** — Runs `crashpad-test.sh` in the `validate` workflow for release builds to catch crashpad regressions before merge.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
### Compute Batch Submission
- Consolidate duplicated action submission logic in `httpcomputeservice` into a single `HandleSubmitAction` supporting both single-action and batch (actions array) payloads
- Group actions by queue in `RemoteHttpRunner` and submit as batches with configurable chunk size, falling back to individual submission on failure
- Extract shared helpers: `MakeErrorResult`, `ValidateQueueForEnqueue`, `ActivateActionInQueue`, `RemoveActionFromActiveMaps`
### Retracted Action State
- Add `Retracted` state to `RunnerAction` for retry-free rescheduling — an explicit request to pull an action back and reschedule it on a different runner without incrementing `RetryCount`
- Implement idempotent `RetractAction()` on `RunnerAction` and `ComputeServiceSession`
- Add `POST jobs/{lsn}/retract` and `queues/{queueref}/jobs/{lsn}/retract` HTTP endpoints
- Add state machine documentation and per-state comments to `RunnerAction`
### Compute Race Fixes
- Fix race in `HandleActionUpdates` where actions enqueued between session abandon and scheduler tick were never abandoned, causing `GetActionResult` to return 202 indefinitely
- Fix queue `ActiveCount` race where `NotifyQueueActionComplete` was called after releasing `m_ResultsLock`, allowing callers to observe stale counters immediately after `GetActionResult` returned OK
### Logging Optimization and ANSI improvements
- Improve `AnsiColorStdoutSink` write efficiency — single write call, dirty-flag flush, `RwLock` instead of `std::mutex`
- Move ANSI color emission from sink into formatters via `Formatter::SetColorEnabled()`; remove `ColorRangeStart`/`End` from `LogMessage`
- Extract color helpers (`AnsiColorForLevel`, `StripAnsiSgrSequences`) into `helpers.h`
- Strip upstream ANSI SGR escapes in non-color output mode. This enables colour in log messages without polluting log files with ANSI control sequences
- Move `RotatingFileSink`, `JsonFormatter`, and `FullFormatter` from header-only to pimpl with `.cpp` files
### CLI / Exec Refactoring
- Extract `ExecSessionRunner` class from ~920-line `ExecUsingSession` into focused methods and a `ExecSessionConfig` struct
- Replace monolithic `ExecCommand` with subcommand-based architecture (`http`, `inproc`, `beacon`, `dump`, `buildlog`)
- Allow parent options to appear after subcommand name by parsing subcommand args permissively and forwarding unmatched tokens to the parent parser
### Testing Improvements
- Fix `--test-suite` filter being ignored due to accumulation with default wildcard filter
- Add test suite banners to test listener output
- Made `function.session.abandon_pending` test more robust
### Startup / Reliability Fixes
- Fix silent exit when a second zenserver instance detects a port conflict — use `ZEN_CONSOLE_*` for log calls that precede `InitializeLogging()`
- Fix two potential SIGSEGV paths during early startup: guard `sentry_options_new()` returning nullptr, and throw on `ZenServerState::Register()` returning nullptr instead of dereferencing
- Fail on unrecognized zenserver `--mode` instead of silently defaulting to store
### Other
- Show host details (hostname, platform, CPU count, memory) when discovering new compute workers
- Move frontend `html.zip` from source tree into build directory
- Add format specifications for Compact Binary and Compressed Buffer wire formats
- Add `WriteCompactBinaryObject` to zencore
- Extended `ConsoleTui` with additional functionality
- Add `--vscode` option to `xmake sln` for clangd / `compile_commands.json` support
- Disable compute/horde/nomad in release builds (not yet production-ready)
- Disable unintended `ASIO_HAS_IO_URING` enablement
- Fix crashpad patch missing leading whitespace
- Clean up code triggering gcc false positives
|
| |
|
| |
Pin version to last v3 version using node20, since our GHES does not support v4
|
| |
|
|
|
|
|
| |
- Add ECR login via aws CLI (using IMDS credentials)
- Tag and push images to 728559092788.dkr.ecr.us-east-1.amazonaws.com/zenserver
- Use tag suffixes (-wine, -linux) to discriminate image variants
- Replace read-file-action with shell equivalent for VERSION.txt
- Enable docker-build and artifact uploads on all branches for validation
|
| |
|
| |
Adds a Dockerfile (Ubuntu 24.04 + WineHQ) and an `xmake docker` task to build and optionally push a zenserver-compute Docker image, enabling Linux deployment of compute workers that run Windows executables via Wine.
|
| |
|
|
|
| |
* added OidcToken binary to the build process. The binary is mirrored from p4 and is placed next to the output of the build process. It is also placed in the release zip archives.
* also fixed issue with Linux symbol stripping which was introduced in toolchain changes yesterday
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change is meant to provide a smoother experience when working on Linux. After this change, the toolchain setup process is now simply
```bash
$ scripts/ue_build_linux/get_ue_toolchain.sh
```
and then at config time the toolchain is automatically detected if you downloaded it to the default location or have the `UE_TOOLCHAIN_DIR` environment variable set
```bash
xmake config --mode=debug
```
Compared to the old script-based approach this configures the toolchain more precisely, avoiding leakage into unrelated build processes such as when a package manager decides to build something like Ninja locally etc.
|
| |
|
|
|
|
| |
* when `--verbose` is specified to zenserver-test, all child process output (typically, zenserver instances) is piped through to stdout. you can also pass `--verbose` to `xmake test` to accomplish the same thing.
* this PR also consolidates all test runner `main` function logic (such as from zencore-test, zenhttp-test etc) into central implementation in zencore for consistency and ease of maintenance
* also added extended utf8-tests including a fix to `Utf8ToWide()`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**CI/CD improvements (validate.yml):**
- Add test reporter (`ue-foundation/test-reporter@v2`) for all three platforms, rendering JUnit test results directly in PR check runs
- Add "Trust workspace" step on Windows to fix git safe.directory ownership issue with self-hosted runners
- Clean stale report files before each test run to prevent false failures from leftover XML
- Broaden `paths-ignore` to skip builds for non-code changes (`*.md`, `LICENSE`, `.gitignore`, `docs/**`)
**Test improvements:**
- Convert `CHECK` to `REQUIRE` in several test suites (projectstore, integration, http) for fail-fast behavior
- Mark some tests with `doctest::skip()` for selective execution
- Skip httpclient transport tests pending investigation
- Add `--noskip` option to `xmake test` task
- Add `--repeat=<N>` option to `xmake test` task, to run tests repeatedly N times or until there is a failure
**xmake test output improvements:**
- Add totals row to test summary table
- Right-justify numeric columns in summary table
|
| |
|
|
|
|
| |
this change relocates the xmake global state to a directory beside the workspace directory so it doesn't get wiped on every run
thus we can avoid rebuilding every package on every run. Unlike vcpkg, xmake separates revisions of packages into their own tree it's robust enough to handle different versions of different packages on different branches. It's however not clear to me that modifying the contents of an `xmake.lua` package definition file across branches is ok. It *may* be necessary to change the directory name for the shared state in this case but it should be a rare event.
|
| |
|
|
|
| |
* Automated more of the decisions around which options to set when using ASAN
* Also disabled Sentry by default as it's a bit annoying to have it upload crashes during development. Sentry is still automatically enabled and integrated as part of the `xmake bundle` step however so released builds will still have it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes our dependency on vcpkg for package management, in favour of bringing some code in-tree in the `thirdparty` folder as well as using the xmake build-in package management feature. For the latter, all the package definitions are maintained in the zen repo itself, in the `repo` folder.
It should now also be easier to build the project as it will no longer depend on having the right version of vcpkg installed, which has been a common problem for new people coming in to the codebase. Now you should only need xmake to build.
* Bumps xmake requirement on github runners to 2.9.9 to resolve an issue where xmake on Windows invokes cmake with `v144` toolchain which does not exist
* BLAKE3 is now in-tree at `thirdparty/blake3`
* cpr is now in-tree at `thirdparty/cpr`
* cxxopts is now in-tree at `thirdparty/cxxopts`
* fmt is now in-tree at `thirdparty/fmt`
* robin-map is now in-tree at `thirdparty/robin-map`
* ryml is now in-tree at `thirdparty/ryml`
* sol2 is now in-tree at `thirdparty/sol2`
* spdlog is now in-tree at `thirdparty/spdlog`
* utfcpp is now in-tree at `thirdparty/utfcpp`
* xmake package repo definitions is in `repo`
* implemented support for sanitizers. ASAN is supported on windows, TSAN, UBSAN, MSAN etc are supported on Linux/MacOS though I have not yet tested it extensively on MacOS
* the zencore encryption implementation also now supports using mbedTLS which is used on MacOS, though for now we still use openssl on Linux
* crashpad
* bumps libcurl to 8.11.0 (from 8.8.0) which should address a rare build upload bug
|
| |
|
|
|
|
| |
* added cpr 1.10.5 in-tree to allow updates to vcpkg without breaking the build
* added asio 1.29.0 in-tree to remove one more vcpkg dependency
* bumped vcpkg to 2024.06.15 to address failure to build due to use of deprecated binaries in vcpkg (404 error: `https://mirror.msys2.org/mingw/mingw64/mingw-w64-x86_64-pkgconf-1~2.1.0-1-any.pkg.tar.zst` during build)
|
| |
|
| |
this changes the validate job to use a batching version of the clang-format-action which reduces turnaround from some six minutes to six seconds
|
| |
|
| |
- Improvement: Updated README.md to state the required version vcpkg
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
By removing the minimal version we set (as this is exclusive meaning it would not replicate the current release). As such we cant guarantee that the replication takes the current release so we replicate a few releases.
Also fixed the display name of the step when manually running the release mirroring.
|
| |
|
|
| |
* upload mac/linux executables to sentry using `debug-files bundle-sources` on all platforms
* update sentry-cli to latest on windows
|
| |
|
|
|
| |
(#191)" (#193)
This reverts commit e809931618b443809e9740edb70a62d0cab01f87.
|
| |
|
|
|
|
| |
* remove temporary workaround involving _LIBCPP_DISABLE_AVAILABILITY
* temp disable signing on Mac
this change should be revisited once we have resumed regular service wrt MacOS runners
|
| |
|
|
|
|
|
| |
(#158)
* Running the public github release mirroring as part of creating the release
This because workflows does not trigger events when created using the built in GITHUB_TOKEN so the release we create does not trigger the release replication workflow.
|
| |
|
|
|
|
|
|
|
| |
* Ignore changes to the mirror_releases script
* Only trigger release mirroring when new releases are made
* Added a minimum release number to workaround issues with certain older releases
* Lowered number of releases that are replicated
|
| |
|
|
|
| |
* Use our local copy of the clone-release action
* Avoid pre-releases and draft releases
|
| | |
|
| |
|
| |
- Improvement: Bumped xmake to 2.9.1 and vcpkg version to 2024.03.25
|
| |
|
|
|
|
| |
- Improvement: Add limit to the number of times we attempt to finalize and exported oplog
- Improvement: Switch to large thread pool when executing oplog export/import
- Improvement: Clean up reporting of missing attachments in oplog export/import
- Improvement: Remove double-reporting of abort reason for oplog export/import
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
we use `zen` tags to discriminate now instead
|
| |
|
| |
* Enabled signing on windows agents again
|
| |
|
|
|
| |
* make sure zenserver reacts and exist on SIGTERM signal
* add zen tag to all runners
* temp disable mac codesigning
|