| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
- **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS.
- **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking.
- **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration.
- **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP.
- **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch.
- **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support.
Also contains some fixes from TSAN analysis.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Bug fixes across zenstore, zenremotestore, and related subsystems, primarily surfaced by static analysis.**
## Cache subsystem (cachedisklayer.cpp)
- Fixed tombstone scoping bug: tombstone flag and missing entry were recorded outside the block where data was removed, causing non-missing entries to be incorrectly tombstoned
- Fixed use-after-overwrite: `RemoveMemCachedData`/`RemoveMetaData` were called after `Payload` was overwritten on cache put, leaking stale data
- Fixed incorrect retry sleep formula (`100 - (3 - RetriesLeft) * 100` always produced the same or negative value; corrected to `(3 - RetriesLeft) * 100`)
- Fixed broken `break` missing from sidecar file read loop, causing reads past valid data
- Fixed missing format argument in three `ZEN_WARN`/`ZEN_ERROR` log calls (format string had `{}` placeholders with no corresponding argument, or vice versa)
- Fixed elapsed timer being accumulated inside the wrong scope in `HandleRpcGetCacheRecords`
- Fixed test asserting against unserialized `RecordPolicy` instead of the deserialized `Loaded` copy
- Initialized `AbortFlag`/`PauseFlag` atomics at declaration (UB if read before first write)
## Build store (buildstore.cpp / buildstore.h)
- Fixed wrong variable used in warning log: used loop index `ResultIndex` instead of `Index`/`MetaLocationResultIndexes[Index]`, logging wrong hash values
- Fixed `sizeof(AccessTimesHeader)` used instead of `sizeof(AccessTimeRecord)` when advancing write offset, corrupting the access times file if the sizes differ
- Initialized `m_LastAccessTimeUpdateCount` atomic member (was uninitialized)
- Changed map iteration loops to use `const auto&` to avoid unnecessary copies
## Project store (projectstore.cpp / projectstore.h)
- Fixed wrong iterator dereferenced in `IterateChunks`: used `ChunkIt->second` (from a different map lookup) instead of `MetaIt->second`
- Fixed wrong assert variable: `Sizes[Index]` should be `RawSizes[Index]`
- Fixed `MakeTombstone`/`IsTombstone` inconsistency: `MakeTombstone` was zeroing `OpLsn` but `IsTombstone` checks `OpLsn.Number != 0`; tombstone creation now preserves `OpLsn`
- Fixed uninitialized `InvalidEntries` counter
- Fixed format string mismatch in warning log
- Initialized `AbortFlag`/`PauseFlag` atomics; changed map iteration to `const auto&`
## Workspaces (workspaces.cpp)
- Fixed missing alias registration when a workspace share is updated: alias was deleted but never re-inserted
- Fixed integer overflow in range clamping: `(RequestedOffset + RequestedSize) > Size` could wrap; corrected to `RequestedSize > Size - RequestedOffset`
- Changed map iteration loops to `const auto&`
## CAS subsystem (cas.cpp, caslog.cpp, compactcas.cpp, filecas.cpp)
- Fixed `IterateChunks` passing original `Payload` buffer instead of the modified `Chunk` buffer (content type was set on the copy but the original was sent to the callback)
- Fixed invalid `std::future::get()` call on default-constructed futures
- Fixed sign-comparison in `CasLogFile::Replay` loop (`int i` vs `size_t`)
- Changed `CasLogFile::IsValid` and `Open` to take `const std::filesystem::path&` instead of by value
- Fixed format string in `~CasContainerStrategy` error log
## Remote store (zenremotestore)
- Fixed `FolderContent::operator==` always returning true: loop variable `PathCount` was initialized to 0 instead of `Paths.size()`
- Fixed `GetChunkIndexForRawHash` looking up from wrong map (`RawHashToSequenceIndex` instead of `ChunkHashToChunkIndex`)
- Fixed double-counted `UniqueSequencesFound` stat (incremented in both branches of an if/else)
- Fixed `RawSize` sentinel value truncation: `(uint32_t)-1` assigned to a `uint64_t` field; corrected to `(uint64_t)-1`
- Initialized uninitialized atomic and struct members across `buildstorageoperations.h`, `chunkblock.h`, and `remoteprojectstore.h`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zencore fixes:
- filesystem.cpp: ReadFile error reporting logic
- compactbinaryvalue.h: CbValue::As*String error reporting logic
zenhttp fixes:
- httpasio BindAcceptor would `return 0;` in a function returning `std::string` (UB)
- httpsys async workpool initialization race
zenstore fixes:
- cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost)
- structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses)
- cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete)
- cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size
- buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash)
- buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob
zenserver fixes:
- httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow)
- hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper)
- zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths)
|
| |
|
|
| |
This reverts commit 3c89c486338890ce39ddebe5be4722a09e85701a.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zenstore fixes:
- cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost)
- structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses)
- cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete)
- cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size
- buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash)
- buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob
zenserver fixes:
- httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow)
- hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper)
- zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
| |
|
| |
* reduce held locks while performing scrub operation
|
| |
|
|
|
| |
executed in the destructor (#683)
don't call WriteChunks in batch operation if no chunks needs to be written
|
| |
|
|
|
| |
* use fixed vectors for batch requests
* refactor cache batch value put/get to not execute code that can throw execeptions in destructor
* extend test with multi-bucket requests
|
| |
|
| |
* add checkes to protect against access violation due to failed disk read
|
| |
|
|
|
| |
- Improvement: Deeper validation of data when scrub is activated (cas/cache/project)
- Improvement: Enabled more multi threading when running scrub operations
- Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
|
| |
|
| |
* try to move file into place before trying speculative remove of target file
|
| |
|
|
| |
* move referencemetadata to zenstore
* rename zenutil/windows/service to windowsservice
|
| |
|
|
|
|
|
|
|
| |
* remove dependency to zenutil/workerpools.h from remoteprojectstore.cpp
* remove dependency to zenutil/workerpools.h from buildstoragecache.cpp
* remove unneded include
* move jupiter helpers to zenremotestore
* move parallelwork to zencore
* remove zenutil dependency from zenremotestore
* clean up test project dependencies - use indirect dependencies
|
| |
|
| |
- Improvement: Add additional validations when reading disk cache records to get references in GC
|
| |
|
|
|
| |
- Ensure that text responses are in a field named "Message"
- Change the record response to be named "Record" instead of "Object"
|
| |
|
|
| |
Conflicts are now treated as successes, and we optionally return a Details array instead of an ErrorMessages array. Details are returned for all requests in a batch, or no requests in a batch depending on whether there are any details to be shared about any of the put requests. The details for a conflict include the raw hash and raw size of the item. If the item is a record, we also include the record as an object.
|
| |
|
| |
- Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
|
| |
|
|
|
|
|
| |
- Refactor so we can have more than one cas store for project store and cache.
- Refactor `UpstreamCacheClient` so it is not tied to a specific CidStore
- Refactor scrub to keep the GC interface ScrubStorage function separate from scrub accessor functions (renamed to Scrub).
- Refactor storage size to keep GC interface StorageSize function separate from size accessor functions (renamed to TotalSize)
- Refactor cache storage so `ZenCacheDiskLayer::CacheBucket` implements GcStorage interface rather than `ZenCacheNamespace`
|
| |
|
|
| |
keep rawsize and rawhash if available when using batch for inline puts
keep rawsize and rawhash of input value if we have calculated it for validation already
|
| |\ |
|
| | | |
|
| | | |
|
| | |
| |
| |
| | |
Previously rejected puts would put the chunks, but not write them to the index, which was wrong.
|
| | | |
|
| |\| |
|
| | | |
|
| |\| |
|
| | | |
|
| | |
| |
| |
| |
| | |
* exception safety when issuing ParallelWork
* add asserts to Latch usage to catch usage errors
* extended error messaging and recovery handling in ParallelWork destructor to help find issues
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* make sure to close log file when resetting log
* drop entries that refers to missing blocks
* Don't scrub keys that has been rewritten
* currectly count added bytes / m_TotalSize
* fix negative sleep time in BlockStoreFile::Open()
* be defensive when fetching log position
* append to log files *after* we updated all state successfully
* explicitly close stuff in destructors with exception catching
* clean up empty size block store files
|
| | |
| |
| |
| |
| | |
- Feature: `zen builds pause`, `zen builds resume` and `zen builds abort` commands to control a running `zen builds` command
- `--process-id` the process id to control, if omitted it tries to find a running process using the same executable as itself
- Improvement: Process report now indicates if it is pausing or aborting
|
| | |
| |
| |
| |
| | |
* Don't count a miss twice for memory stats if the entry can't be found
* changelog
|
| | |
| |
| |
| | |
- Bugfix: Flush the last block before closing the last new block written to during blockstore compact. UE-291196
- Feature: Drop unreachable CAS data during GC pass. UE-291196
|
| | |
| |
| |
| | |
* don't hold exclusive locks while deleting files from a dropped bucket/namespace
* cleaner detection of missing namespace when issuing a drop
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* use ParallelWork in rpc playback
* use ParallelWork in projectstore
* use ParallelWork in buildstore
* use ParallelWork in cachedisklayer
* use ParallelWork in compactcas
* use ParallelWork in filecas
* don't set abort flag in ParallelWork destructor
* add PrepareFileForScatteredWrite for temp files in httpclient
* Use PrepareFileForScatteredWrite when stream-decompressing files
* be more relaxed when deleting temp files
* allow explicit zen-cache when using direct host url without resolving
* fix lambda capture when writing loose chunks
* no delay when attempting to remove temp files
|
| | |
| |
| |
| |
| | |
- Improvement: Cleaned up snapshot writing for CompactCAS/FileCas/Cache/Project stores
- Improvement: Safer recovery when failing to delete log for CompactCAS/FileCas/Cache/Project stores
- Improvement: Added log file reset when writing snapshot at startup for FileCas
|
| | |
| |
| |
| | |
Feature: Add per bucket cache configuration (Lua options file only)
Improvement: --cache-memlayer-sizethreshold is now deprecated and has a new name: --cache-bucket-memlayer-sizethreshold to line up with per cache bucket configuration
|
| | |
| |
| | |
* tweak block iteration chunk sizes
|
| | |
| |
| | |
* optimize cache bucket snapshot and sidecar writing
|
| | | |
|
| | | |
|
| | |
| |
| | |
* do cache bucket flush/write snapshot as part of compact to reduce disk I/O
|
| | |
| |
| | |
- Bugfix: Long file paths now works correctly on Windows
|
| | |
| |
| |
| |
| |
| | |
* Added EASTL to help with eliminating memory allocations
* Applied EASTL to eliminate memory allocations, primarily by using `fixed_vector` et al to use stack allocations / inline struct allocations
Reduces memory events in traces by close to a factor of 10 in test scenario (starting editor for project F)
|
| | |
| |
| |
| | |
Result structure contains status and a string message (may be empty)
|
| | | |
|
| | |
| |
| |
| | |
Value comparison methods moved to more appropriate area in file.
|
| |/
|
|
| |
Overwrite with differing value should be denied if QueryLocal is not present and StoreLocal is present. Overwrite with equal value should succeed regardless of policy flags.
|
| |
|
|
|
|
| |
add DirectoryContent::IncludeFileSizes
add DirectoryContent::IncludeAttributes
add multithreaded GetDirectoryContent
use multithreaded GetDirectoryContent in workspace folder scanning
|
| |
|
|
|
|
|
| |
With this change, LLM tags are assigned using the name,parent tuple rather than just by name only. This allows tag hierarchies like `cache/store` and `project/store` which would previously get collapsed into the first pair seen when registering the `store` tag.
This PR also adds some more LLM tag annotations to more accurately associate memory allocations with subsystems
In addition, this PR also tweaks the frequency of timer marker events to increase the resolution in Insights and avoid some cases of Insights deciding that marker events are too far apart since we don't allocate as frequently as UE tends to.
|