| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
- **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS.
- **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking.
- **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration.
- **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP.
- **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch.
- **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support.
Also contains some fixes from TSAN analysis.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Bug fixes across zenstore, zenremotestore, and related subsystems, primarily surfaced by static analysis.**
## Cache subsystem (cachedisklayer.cpp)
- Fixed tombstone scoping bug: tombstone flag and missing entry were recorded outside the block where data was removed, causing non-missing entries to be incorrectly tombstoned
- Fixed use-after-overwrite: `RemoveMemCachedData`/`RemoveMetaData` were called after `Payload` was overwritten on cache put, leaking stale data
- Fixed incorrect retry sleep formula (`100 - (3 - RetriesLeft) * 100` always produced the same or negative value; corrected to `(3 - RetriesLeft) * 100`)
- Fixed broken `break` missing from sidecar file read loop, causing reads past valid data
- Fixed missing format argument in three `ZEN_WARN`/`ZEN_ERROR` log calls (format string had `{}` placeholders with no corresponding argument, or vice versa)
- Fixed elapsed timer being accumulated inside the wrong scope in `HandleRpcGetCacheRecords`
- Fixed test asserting against unserialized `RecordPolicy` instead of the deserialized `Loaded` copy
- Initialized `AbortFlag`/`PauseFlag` atomics at declaration (UB if read before first write)
## Build store (buildstore.cpp / buildstore.h)
- Fixed wrong variable used in warning log: used loop index `ResultIndex` instead of `Index`/`MetaLocationResultIndexes[Index]`, logging wrong hash values
- Fixed `sizeof(AccessTimesHeader)` used instead of `sizeof(AccessTimeRecord)` when advancing write offset, corrupting the access times file if the sizes differ
- Initialized `m_LastAccessTimeUpdateCount` atomic member (was uninitialized)
- Changed map iteration loops to use `const auto&` to avoid unnecessary copies
## Project store (projectstore.cpp / projectstore.h)
- Fixed wrong iterator dereferenced in `IterateChunks`: used `ChunkIt->second` (from a different map lookup) instead of `MetaIt->second`
- Fixed wrong assert variable: `Sizes[Index]` should be `RawSizes[Index]`
- Fixed `MakeTombstone`/`IsTombstone` inconsistency: `MakeTombstone` was zeroing `OpLsn` but `IsTombstone` checks `OpLsn.Number != 0`; tombstone creation now preserves `OpLsn`
- Fixed uninitialized `InvalidEntries` counter
- Fixed format string mismatch in warning log
- Initialized `AbortFlag`/`PauseFlag` atomics; changed map iteration to `const auto&`
## Workspaces (workspaces.cpp)
- Fixed missing alias registration when a workspace share is updated: alias was deleted but never re-inserted
- Fixed integer overflow in range clamping: `(RequestedOffset + RequestedSize) > Size` could wrap; corrected to `RequestedSize > Size - RequestedOffset`
- Changed map iteration loops to `const auto&`
## CAS subsystem (cas.cpp, caslog.cpp, compactcas.cpp, filecas.cpp)
- Fixed `IterateChunks` passing original `Payload` buffer instead of the modified `Chunk` buffer (content type was set on the copy but the original was sent to the callback)
- Fixed invalid `std::future::get()` call on default-constructed futures
- Fixed sign-comparison in `CasLogFile::Replay` loop (`int i` vs `size_t`)
- Changed `CasLogFile::IsValid` and `Open` to take `const std::filesystem::path&` instead of by value
- Fixed format string in `~CasContainerStrategy` error log
## Remote store (zenremotestore)
- Fixed `FolderContent::operator==` always returning true: loop variable `PathCount` was initialized to 0 instead of `Paths.size()`
- Fixed `GetChunkIndexForRawHash` looking up from wrong map (`RawHashToSequenceIndex` instead of `ChunkHashToChunkIndex`)
- Fixed double-counted `UniqueSequencesFound` stat (incremented in both branches of an if/else)
- Fixed `RawSize` sentinel value truncation: `(uint32_t)-1` assigned to a `uint64_t` field; corrected to `(uint64_t)-1`
- Initialized uninitialized atomic and struct members across `buildstorageoperations.h`, `chunkblock.h`, and `remoteprojectstore.h`
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Makes all test cases part of a test suite. Test suites are named after the module and the name of the file containing the implementation of the test.
* This allows for better and more predictable filtering of which test cases to run which should also be able to reduce the time CI spends in tests since it can filter on the tests for that particular module.
Also improves `xmake test` behaviour:
* instead of an explicit list of projects just enumerate the test projects which are available based on build system state
* also introduces logic to avoid running `xmake config` unnecessarily which would invalidate the existing build and do lots of unnecessary work since dependencies were invalidated by the updated config
* also invokes build only for the chosen test targets
As a bonus, also adds `xmake sln --open` which allows opening IDE after generation of solution/xmake project is done.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**CI/CD improvements (validate.yml):**
- Add test reporter (`ue-foundation/test-reporter@v2`) for all three platforms, rendering JUnit test results directly in PR check runs
- Add "Trust workspace" step on Windows to fix git safe.directory ownership issue with self-hosted runners
- Clean stale report files before each test run to prevent false failures from leftover XML
- Broaden `paths-ignore` to skip builds for non-code changes (`*.md`, `LICENSE`, `.gitignore`, `docs/**`)
**Test improvements:**
- Convert `CHECK` to `REQUIRE` in several test suites (projectstore, integration, http) for fail-fast behavior
- Mark some tests with `doctest::skip()` for selective execution
- Skip httpclient transport tests pending investigation
- Add `--noskip` option to `xmake test` task
- Add `--repeat=<N>` option to `xmake test` task, to run tests repeatedly N times or until there is a failure
**xmake test output improvements:**
- Add totals row to test summary table
- Right-justify numeric columns in summary table
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zencore fixes:
- filesystem.cpp: ReadFile error reporting logic
- compactbinaryvalue.h: CbValue::As*String error reporting logic
zenhttp fixes:
- httpasio BindAcceptor would `return 0;` in a function returning `std::string` (UB)
- httpsys async workpool initialization race
zenstore fixes:
- cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost)
- structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses)
- cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete)
- cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size
- buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash)
- buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob
zenserver fixes:
- httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow)
- hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper)
- zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths)
|
| |
|
|
| |
This reverts commit 3c89c486338890ce39ddebe5be4722a09e85701a.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zenstore fixes:
- cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost)
- structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses)
- cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete)
- cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size
- buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash)
- buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob
zenserver fixes:
- httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow)
- hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper)
- zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
| |
|
| |
* reduce held locks while performing scrub operation
|
| |
|
| |
* don't do full cb-object validation on cache records when read from disk
|
| | |
|
| |
|
|
| |
When changing the default limit-overwrite behavior, a unit test surfaced a bug where an put of data with overwrite cache policy would not get propagated via zen's built-in upstream mechanism with a matching overwrite cache policy to the upstream. This change ensures that it does and leaves the unit test configured to exercise this scenario.
|
| |
|
|
|
| |
executed in the destructor (#683)
don't call WriteChunks in batch operation if no chunks needs to be written
|
| |
|
|
|
| |
* use fixed vectors for batch requests
* refactor cache batch value put/get to not execute code that can throw execeptions in destructor
* extend test with multi-bucket requests
|
| |
|
| |
* add checkes to protect against access violation due to failed disk read
|
| |
|
|
|
| |
- Improvement: Deeper validation of data when scrub is activated (cas/cache/project)
- Improvement: Enabled more multi threading when running scrub operations
- Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
|
| |
|
|
| |
with RawSize = 0 if the offset was out of bounds for the value. (#666)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* gcc: avoid using memset on nontrivial struct
* redundant `return std::move`
* fixed various compilation issues flagged by gcc
* fix issue in xmake.lua detecting whether we are building with the UE toolchain or not
* add GCC ignore -Wundef (comment is inaccurate)
* remove redundant std::move
* don't catch exceptions by value
* unreferenced variables
* initialize "by the book" instead of memset
* remove unused exception reference
* add #include <cstring> to fix gcc build
* explicitly poulate KeyValueMap by traversing input spans fixes gcc compilation
* remove unreferenced variable
* eliminate redundant `std::move` which gcc complains about
* fix gcc compilation by including <cstring>
* tag unreferenced variable to fix gcc compilation
* fixes for various cases of naming members the same as their type
|
| |
|
| |
* try to move file into place before trying speculative remove of target file
|
| |
|
|
|
|
|
|
|
|
|
|
| |
effective concurrency in zenserver can be limited via the `--corelimit=<N>` option on the command line. Any value passed in here will be used instead of the return value from `std::thread::hardware_concurrency()` if it is lower.
* added --corelimit option to zenserver
* made sure thread pools are configured lazily and not during global init
* added log output indicating effective and HW concurrency
* added change log entry
* removed debug logging from ZenEntryPoint::Run()
also removed main thread naming on Linux since it makes the output from `top` and similar tools confusing (it shows `main` instead of `zenserver`)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* faster FileSystemTraversal test
* faster jobqueue test
* faster NamedEvent test
* faster cache tests
* faster basic http tests
* faster blockstore test
* faster cache store tests
* faster compactcas tests
* more responsive zenserver launch
* tweak worker pool sizes in tests
|
| |
|
|
| |
* don't use cacherequests utils in cache_cmd.cpp
* make zenutil/cacherequests code into test code helpers only
|
| |
|
|
| |
* move referencemetadata to zenstore
* rename zenutil/windows/service to windowsservice
|
| |
|
|
|
|
|
|
|
| |
* remove dependency to zenutil/workerpools.h from remoteprojectstore.cpp
* remove dependency to zenutil/workerpools.h from buildstoragecache.cpp
* remove unneded include
* move jupiter helpers to zenremotestore
* move parallelwork to zencore
* remove zenutil dependency from zenremotestore
* clean up test project dependencies - use indirect dependencies
|
| |
|
| |
- Improvement: Add additional validations when reading disk cache records to get references in GC
|
| |
|
| |
When requesting a value via the GetCacheChunks rpc method, if the request specified a raw hash, only return a hit if the raw hash matches what was in the cache.
|
| |
|
|
| |
From review feedback
|
| |
|
|
| |
When requesting partial records, report back when a record is incomplete via an "Incomplete" array of bools that is a sibling to the "Result" array for batch/rpc operations, or via the HttpResponseCode::PartialContent status code for individual record requests.
|
| |
|
|
|
| |
- Ensure that text responses are in a field named "Message"
- Change the record response to be named "Record" instead of "Object"
|
| |
|
|
| |
Conflicts are now treated as successes, and we optionally return a Details array instead of an ErrorMessages array. Details are returned for all requests in a batch, or no requests in a batch depending on whether there are any details to be shared about any of the put requests. The details for a conflict include the raw hash and raw size of the item. If the item is a record, we also include the record as an object.
|
| |
|
| |
- Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
|
| |
|
| |
* add validation of compact binary payloads before reading them
|
| | |
|
| |
|
|
|
|
|
| |
- Refactor so we can have more than one cas store for project store and cache.
- Refactor `UpstreamCacheClient` so it is not tied to a specific CidStore
- Refactor scrub to keep the GC interface ScrubStorage function separate from scrub accessor functions (renamed to Scrub).
- Refactor storage size to keep GC interface StorageSize function separate from size accessor functions (renamed to TotalSize)
- Refactor cache storage so `ZenCacheDiskLayer::CacheBucket` implements GcStorage interface rather than `ZenCacheNamespace`
|
| |
|
|
| |
keep rawsize and rawhash if available when using batch for inline puts
keep rawsize and rawhash of input value if we have calculated it for validation already
|
| |\ |
|
| | | |
|
| | | |
|
| | |
| |
| |
| | |
Previously rejected puts would put the chunks, but not write them to the index, which was wrong.
|
| | | |
|
| |\| |
|
| | | |
|
| | | |
|
| |\| |
|
| | | |
|
| | |
| |
| |
| |
| | |
* exception safety when issuing ParallelWork
* add asserts to Latch usage to catch usage errors
* extended error messaging and recovery handling in ParallelWork destructor to help find issues
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* make sure to close log file when resetting log
* drop entries that refers to missing blocks
* Don't scrub keys that has been rewritten
* currectly count added bytes / m_TotalSize
* fix negative sleep time in BlockStoreFile::Open()
* be defensive when fetching log position
* append to log files *after* we updated all state successfully
* explicitly close stuff in destructors with exception catching
* clean up empty size block store files
|
| | |
| |
| |
| |
| | |
- Feature: `zen builds pause`, `zen builds resume` and `zen builds abort` commands to control a running `zen builds` command
- `--process-id` the process id to control, if omitted it tries to find a running process using the same executable as itself
- Improvement: Process report now indicates if it is pausing or aborting
|
| | |
| |
| |
| |
| | |
* Don't count a miss twice for memory stats if the entry can't be found
* changelog
|
| | |
| |
| |
| | |
- Bugfix: Flush the last block before closing the last new block written to during blockstore compact. UE-291196
- Feature: Drop unreachable CAS data during GC pass. UE-291196
|
| | |
| |
| |
| | |
* don't hold exclusive locks while deleting files from a dropped bucket/namespace
* cleaner detection of missing namespace when issuing a drop
|