aboutsummaryrefslogtreecommitdiff
path: root/src/zenstore/cache
Commit message (Collapse)AuthorAgeFilesLines
* Dashboard overhaul, compute integration (#814)Stefan Boberg13 days1-1/+3
| | | | | | | | | | - **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS. - **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking. - **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration. - **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP. - **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch. - **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support. Also contains some fixes from TSAN analysis.
* zenstore bug-fixes from static analysis pass (#815)Stefan Boberg2026-03-064-28/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | **Bug fixes across zenstore, zenremotestore, and related subsystems, primarily surfaced by static analysis.** ## Cache subsystem (cachedisklayer.cpp) - Fixed tombstone scoping bug: tombstone flag and missing entry were recorded outside the block where data was removed, causing non-missing entries to be incorrectly tombstoned - Fixed use-after-overwrite: `RemoveMemCachedData`/`RemoveMetaData` were called after `Payload` was overwritten on cache put, leaking stale data - Fixed incorrect retry sleep formula (`100 - (3 - RetriesLeft) * 100` always produced the same or negative value; corrected to `(3 - RetriesLeft) * 100`) - Fixed broken `break` missing from sidecar file read loop, causing reads past valid data - Fixed missing format argument in three `ZEN_WARN`/`ZEN_ERROR` log calls (format string had `{}` placeholders with no corresponding argument, or vice versa) - Fixed elapsed timer being accumulated inside the wrong scope in `HandleRpcGetCacheRecords` - Fixed test asserting against unserialized `RecordPolicy` instead of the deserialized `Loaded` copy - Initialized `AbortFlag`/`PauseFlag` atomics at declaration (UB if read before first write) ## Build store (buildstore.cpp / buildstore.h) - Fixed wrong variable used in warning log: used loop index `ResultIndex` instead of `Index`/`MetaLocationResultIndexes[Index]`, logging wrong hash values - Fixed `sizeof(AccessTimesHeader)` used instead of `sizeof(AccessTimeRecord)` when advancing write offset, corrupting the access times file if the sizes differ - Initialized `m_LastAccessTimeUpdateCount` atomic member (was uninitialized) - Changed map iteration loops to use `const auto&` to avoid unnecessary copies ## Project store (projectstore.cpp / projectstore.h) - Fixed wrong iterator dereferenced in `IterateChunks`: used `ChunkIt->second` (from a different map lookup) instead of `MetaIt->second` - Fixed wrong assert variable: `Sizes[Index]` should be `RawSizes[Index]` - Fixed `MakeTombstone`/`IsTombstone` inconsistency: `MakeTombstone` was zeroing `OpLsn` but `IsTombstone` checks `OpLsn.Number != 0`; tombstone creation now preserves `OpLsn` - Fixed uninitialized `InvalidEntries` counter - Fixed format string mismatch in warning log - Initialized `AbortFlag`/`PauseFlag` atomics; changed map iteration to `const auto&` ## Workspaces (workspaces.cpp) - Fixed missing alias registration when a workspace share is updated: alias was deleted but never re-inserted - Fixed integer overflow in range clamping: `(RequestedOffset + RequestedSize) > Size` could wrap; corrected to `RequestedSize > Size - RequestedOffset` - Changed map iteration loops to `const auto&` ## CAS subsystem (cas.cpp, caslog.cpp, compactcas.cpp, filecas.cpp) - Fixed `IterateChunks` passing original `Payload` buffer instead of the modified `Chunk` buffer (content type was set on the copy but the original was sent to the callback) - Fixed invalid `std::future::get()` call on default-constructed futures - Fixed sign-comparison in `CasLogFile::Replay` loop (`int i` vs `size_t`) - Changed `CasLogFile::IsValid` and `Open` to take `const std::filesystem::path&` instead of by value - Fixed format string in `~CasContainerStrategy` error log ## Remote store (zenremotestore) - Fixed `FolderContent::operator==` always returning true: loop variable `PathCount` was initialized to 0 instead of `Paths.size()` - Fixed `GetChunkIndexForRawHash` looking up from wrong map (`RawHashToSequenceIndex` instead of `ChunkHashToChunkIndex`) - Fixed double-counted `UniqueSequencesFound` stat (incremented in both branches of an if/else) - Fixed `RawSize` sentinel value truncation: `(uint32_t)-1` assigned to a `uint64_t` field; corrected to `(uint64_t)-1` - Initialized uninitialized atomic and struct members across `buildstorageoperations.h`, `chunkblock.h`, and `remoteprojectstore.h`
* Add test suites (#799)Stefan Boberg2026-03-022-0/+9
| | | | | | | | | | | | | Makes all test cases part of a test suite. Test suites are named after the module and the name of the file containing the implementation of the test. * This allows for better and more predictable filtering of which test cases to run which should also be able to reduce the time CI spends in tests since it can filter on the tests for that particular module. Also improves `xmake test` behaviour: * instead of an explicit list of projects just enumerate the test projects which are available based on build system state * also introduces logic to avoid running `xmake config` unnecessarily which would invalidate the existing build and do lots of unnecessary work since dependencies were invalidated by the updated config * also invokes build only for the chosen test targets As a bonus, also adds `xmake sln --open` which allows opening IDE after generation of solution/xmake project is done.
* test running / reporting improvements (#797)Stefan Boberg2026-02-281-1/+1
| | | | | | | | | | | | | | | | | | | **CI/CD improvements (validate.yml):** - Add test reporter (`ue-foundation/test-reporter@v2`) for all three platforms, rendering JUnit test results directly in PR check runs - Add "Trust workspace" step on Windows to fix git safe.directory ownership issue with self-hosted runners - Clean stale report files before each test run to prevent false failures from leftover XML - Broaden `paths-ignore` to skip builds for non-code changes (`*.md`, `LICENSE`, `.gitignore`, `docs/**`) **Test improvements:** - Convert `CHECK` to `REQUIRE` in several test suites (projectstore, integration, http) for fail-fast behavior - Mark some tests with `doctest::skip()` for selective execution - Skip httpclient transport tests pending investigation - Add `--noskip` option to `xmake test` task - Add `--repeat=<N>` option to `xmake test` task, to run tests repeatedly N times or until there is a failure **xmake test output improvements:** - Add totals row to test summary table - Right-justify numeric columns in summary table
* Various bug fixes (#778)Stefan Boberg2026-02-243-4/+7
| | | | | | | | | | | | | | | | | | | | | | zencore fixes: - filesystem.cpp: ReadFile error reporting logic - compactbinaryvalue.h: CbValue::As*String error reporting logic zenhttp fixes: - httpasio BindAcceptor would `return 0;` in a function returning `std::string` (UB) - httpsys async workpool initialization race zenstore fixes: - cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost) - structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses) - cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete) - cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size - buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash) - buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob zenserver fixes: - httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow) - hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper) - zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths)
* Revert "Fix correctness and concurrency bugs found during code review"Stefan Boberg2026-02-243-7/+4
| | | | This reverts commit 3c89c486338890ce39ddebe5be4722a09e85701a.
* Fix correctness and concurrency bugs found during code reviewStefan Boberg2026-02-243-4/+7
| | | | | | | | | | | | | | | | | zenstore fixes: - cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost) - structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses) - cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete) - cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size - buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash) - buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob zenserver fixes: - httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow) - hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper) - zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths) Co-Authored-By: Claude Opus 4.6 <[email protected]>
* reduce blocking in scrub (#743)Dan Engelbrecht2026-02-031-56/+86
| | | * reduce held locks while performing scrub operation
* don't do full cb-object validation on cache records when read from disk (#739)Dan Engelbrecht2026-01-291-10/+9
| | | * don't do full cb-object validation on cache records when read from disk
* Fix unit test that relies on being able to overwritezousar2025-12-191-2/+2
|
* Ensure upstream put propagation includes overwritezousar2025-12-191-5/+12
| | | | When changing the default limit-overwrite behavior, a unit test surfaced a bug where an put of data with overwrite cache policy would not get propagated via zen's built-in upstream mechanism with a matching overwrite cache policy to the upstream. This change ensures that it does and leaves the unit test configured to exercise this scenario.
* remove catching of exceptions in batch operations now that they are not ↵Dan Engelbrecht2025-12-102-101/+61
| | | | | executed in the destructor (#683) don't call WriteChunks in batch operation if no chunks needs to be written
* batch op not in destructor (#676)Dan Engelbrecht2025-12-043-328/+402
| | | | | * use fixed vectors for batch requests * refactor cache batch value put/get to not execute code that can throw execeptions in destructor * extend test with multi-bucket requests
* add checks to protect against access violation due to failed disk read (#675)Dan Engelbrecht2025-12-041-0/+12
| | | * add checkes to protect against access violation due to failed disk read
* automatic scrub on startup (#667)Dan Engelbrecht2025-11-273-64/+69
| | | | | - Improvement: Deeper validation of data when scrub is activated (cas/cache/project) - Improvement: Enabled more multi threading when running scrub operations - Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
* RawOffset can be anything and we expect an empty buffer to be returned along ↵Dan Engelbrecht2025-11-261-17/+23
| | | | with RawSize = 0 if the offset was out of bounds for the value. (#666)
* Various fixes to address issues flagged by gcc / non-UE toolchain build (#621)Stefan Boberg2025-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | * gcc: avoid using memset on nontrivial struct * redundant `return std::move` * fixed various compilation issues flagged by gcc * fix issue in xmake.lua detecting whether we are building with the UE toolchain or not * add GCC ignore -Wundef (comment is inaccurate) * remove redundant std::move * don't catch exceptions by value * unreferenced variables * initialize "by the book" instead of memset * remove unused exception reference * add #include <cstring> to fix gcc build * explicitly poulate KeyValueMap by traversing input spans fixes gcc compilation * remove unreferenced variable * eliminate redundant `std::move` which gcc complains about * fix gcc compilation by including <cstring> * tag unreferenced variable to fix gcc compilation * fixes for various cases of naming members the same as their type
* optimize filecas write file (#613)Dan Engelbrecht2025-10-241-16/+10
| | | * try to move file into place before trying speculative remove of target file
* add ability to limit concurrency (#565)Stefan Boberg2025-10-101-2/+2
| | | | | | | | | | | | effective concurrency in zenserver can be limited via the `--corelimit=<N>` option on the command line. Any value passed in here will be used instead of the return value from `std::thread::hardware_concurrency()` if it is lower. * added --corelimit option to zenserver * made sure thread pools are configured lazily and not during global init * added log output indicating effective and HW concurrency * added change log entry * removed debug logging from ZenEntryPoint::Run() also removed main thread naming on Linux since it makes the output from `top` and similar tools confusing (it shows `main` instead of `zenserver`)
* speed up tests (#555)Dan Engelbrecht2025-10-061-32/+56
| | | | | | | | | | | | * faster FileSystemTraversal test * faster jobqueue test * faster NamedEvent test * faster cache tests * faster basic http tests * faster blockstore test * faster cache store tests * faster compactcas tests * more responsive zenserver launch * tweak worker pool sizes in tests
* cacherequests helpers test only (#551)Dan Engelbrecht2025-10-035-5/+673
| | | | * don't use cacherequests utils in cache_cmd.cpp * make zenutil/cacherequests code into test code helpers only
* zenutil cleanup (#550)Dan Engelbrecht2025-10-031-1/+2
| | | | * move referencemetadata to zenstore * rename zenutil/windows/service to windowsservice
* remove zenutil dependency in zenremotestore (#547)Dan Engelbrecht2025-10-031-1/+1
| | | | | | | | | * remove dependency to zenutil/workerpools.h from remoteprojectstore.cpp * remove dependency to zenutil/workerpools.h from buildstoragecache.cpp * remove unneded include * move jupiter helpers to zenremotestore * move parallelwork to zencore * remove zenutil dependency from zenremotestore * clean up test project dependencies - use indirect dependencies
* more cbobject validations (#527)Dan Engelbrecht2025-09-291-5/+19
| | | - Improvement: Add additional validations when reading disk cache records to get references in GC
* GetCacheChunk value request respects RawHash (#518)Zousar Shaker2025-09-291-1/+3
| | | When requesting a value via the GetCacheChunks rpc method, if the request specified a raw hash, only return a hit if the raw hash matches what was in the cache.
* Improvement to Incomplete Result Iterationzousar2025-09-251-2/+1
| | | | From review feedback
* Report Incomplete Records To Clientzousar2025-09-241-5/+39
| | | | When requesting partial records, report back when a record is incomplete via an "Incomplete" array of bools that is a sibling to the "Result" array for batch/rpc operations, or via the HttpResponseCode::PartialContent status code for individual record requests.
* Adjust the responses from PUT commandszousar2025-09-233-14/+9
| | | | | - Ensure that text responses are in a field named "Message" - Change the record response to be named "Record" instead of "Object"
* Change batch put responses for client reportingzousar2025-09-193-33/+75
| | | | Conflicts are now treated as successes, and we optionally return a Details array instead of an ErrorMessages array. Details are returned for all requests in a batch, or no requests in a batch depending on whether there are any details to be shared about any of the put requests. The details for a conflict include the raw hash and raw size of the item. If the item is a record, we also include the record as an object.
* add EMode to WorkerTheadPool to avoid thread starvation (#492)Dan Engelbrecht2025-09-102-48/+63
| | | - Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
* add validation of compact binary payloads before reading them (#483)Dan Engelbrecht2025-09-041-5/+3
| | | * add validation of compact binary payloads before reading them
* revert multi-cid store (#475)Dan Engelbrecht2025-08-261-37/+17
|
* per namespace/project cas prep refactor (#470)Dan Engelbrecht2025-08-203-68/+82
| | | | | | | - Refactor so we can have more than one cas store for project store and cache. - Refactor `UpstreamCacheClient` so it is not tied to a specific CidStore - Refactor scrub to keep the GC interface ScrubStorage function separate from scrub accessor functions (renamed to Scrub). - Refactor storage size to keep GC interface StorageSize function separate from size accessor functions (renamed to TotalSize) - Refactor cache storage so `ZenCacheDiskLayer::CacheBucket` implements GcStorage interface rather than `ZenCacheNamespace`
* reduce lock contention when checking for disk cache put reject (#465)Dan Engelbrecht2025-08-121-91/+81
| | | | keep rawsize and rawhash if available when using batch for inline puts keep rawsize and rawhash of input value if we have calculated it for validation already
* Merge branch 'main' into zs/put-overwrite-policyZousar Shaker2025-08-081-2/+2
|\
| * add the correct set of references hashes in batched inline mode (#459)Dan Engelbrecht2025-08-061-3/+3
| |
* | precommitzousar2025-08-071-9/+4
| |
* | Avoid committing chunks for batch rejected putszousar2025-08-071-42/+35
| | | | | | | | Previously rejected puts would put the chunks, but not write them to the index, which was wrong.
* | Moving put rejections to happen in batch handlingzousar2025-08-051-33/+87
| |
* | Merge branch 'main' into zs/put-overwrite-policyzousar2025-08-051-11/+40
|\|
| * add hardening for legacy cache bucket manifests (#454)Dan Engelbrecht2025-08-041-11/+40
| |
* | xmake precommitzousar2025-06-241-4/+4
| |
* | Merge branch 'main' into zs/put-overwrite-policyzousar2025-06-243-308/+530
|\|
| * make sure we unregister from GC before we drop bucket/namespaces (#443)Dan Engelbrecht2025-06-192-1/+4
| |
| * graceful wait in parallelwork destructor (#438)Dan Engelbrecht2025-06-161-35/+45
| | | | | | | | | | * exception safety when issuing ParallelWork * add asserts to Latch usage to catch usage errors * extended error messaging and recovery handling in ParallelWork destructor to help find issues
| * missing chunks bugfix (#424)Dan Engelbrecht2025-06-091-14/+70
| | | | | | | | | | | | | | | | | | | | | | * make sure to close log file when resetting log * drop entries that refers to missing blocks * Don't scrub keys that has been rewritten * currectly count added bytes / m_TotalSize * fix negative sleep time in BlockStoreFile::Open() * be defensive when fetching log position * append to log files *after* we updated all state successfully * explicitly close stuff in destructors with exception catching * clean up empty size block store files
| * pause, resume and abort running builds cmd (#421)Dan Engelbrecht2025-06-051-4/+8
| | | | | | | | | | - Feature: `zen builds pause`, `zen builds resume` and `zen builds abort` commands to control a running `zen builds` command - `--process-id` the process id to control, if omitted it tries to find a running process using the same executable as itself - Improvement: Process report now indicates if it is pausing or aborting
| * fix cachbucket mem hit count (#415)Dan Engelbrecht2025-06-021-4/+7
| | | | | | | | | | * Don't count a miss twice for memory stats if the entry can't be found * changelog
| * add missing flush inblockstore compact (#411)Dan Engelbrecht2025-05-301-2/+22
| | | | | | | | - Bugfix: Flush the last block before closing the last new block written to during blockstore compact. UE-291196 - Feature: Drop unreachable CAS data during GC pass. UE-291196
| * unblock cache bucket drop (#406)Dan Engelbrecht2025-05-262-40/+110
| | | | | | | | * don't hold exclusive locks while deleting files from a dropped bucket/namespace * cleaner detection of missing namespace when issuing a drop