aboutsummaryrefslogtreecommitdiff
path: root/src/zenstore/cache/cachedisklayer.cpp
Commit message (Collapse)AuthorAgeFilesLines
* zen trace analysis support (#945)Stefan Boberg2026-04-201-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Integrates the **tourist** trace analysis library and builds a full `zen trace` command suite for working with Unreal Engine `.utrace` files. ### Trace analysis library (`thirdparty/tourist/`) - Adds the tourist library as a third-party dependency with three modules: **foundation** (platform primitives, memory, scheduling), **trace** (UE Trace protocol decoding), and **analysis** (event dispatching and analyzer framework). - Cross-platform support for Windows, Linux, and macOS. ### `zen trace` CLI commands (`src/zen/cmds/`, `src/zen/trace/`) - **`zen trace analyze`** — Summarize a `.utrace` file: session metadata, thread inventory, command line + build configuration, CPU profiling scopes, timing, event rates, log messages, and (with symbols) memory allocation metrics including live-allocs dumps, callstack-keyed aggregation, and allocation churn. Optional HTML output for memory reports. - **`zen trace inspect`** — Dump the event schema (declared types, fields, sizes) from a trace file. - **`zen trace trim`** — Extract a time-window from a trace into a new `.utrace` file. - **`zen trace serve`** — Launch a local HTTP server hosting an interactive trace viewer; opens in the default browser. ### Symbolication (`src/zen/trace/symbol_resolver.*`, `thirdparty/raw_pdb/`) - Pluggable resolver with multiple backends: `pdb` (in-tree raw_pdb), `dbghelp` (Windows), `llvm-symbolizer` (all platforms), `atos` (macOS). An `auto` backend picks the best available tool per platform. - Microsoft Symbol Server support: downloads PDBs on demand using a redirect-aware HTTP client. - Local PDB cache keyed by image GUID preserves symbols across binary recompilation. - Callstack trimming heuristic strips UE internal noise from reports. - Binary analysis cache (`.ucache_z`) avoids re-resolving the same trace. ### Interactive trace viewer (`src/zen/frontend/html/`, `src/zen/trace/trace_viewer_service.*`) - Timeline: scope-level detail, horizontal zoom/pan, vertical scrolling, viewport-driven loading with pre-computed LOD for responsive navigation of large traces. - Thread grouping (collapsible sidebar sections) synthesized from name suffixes, natural sort order, visual distinction between lane threads and OS threads. - Bookmark and region annotations; region categories with per-category toggles; bookmark marker toggle in the toolbar. - Filterable Logs tab showing captured `UE_LOG` output. - Stats tab with per-scope aggregate statistics. - Memory tab with interactive allocation analysis and an allocation size histogram. - CsvProfiler event parsing and chart UI. ### Other in-branch supporting changes - **Cross-platform browser launcher** (`browser_launcher.{h,cpp}`) used by `trace serve`. - **`ReciprocalU64`** fast 64-bit integer division (zencore/intmath) for trace analyzers. - **`parallelsort`** cross-platform parallel sort helper (zenutil). - Frontend zip build rule so the viewer's HTML assets are bundled into `zen.exe`. - `/Zo` flag for better optimized debug info on Windows release builds. - `trace-tests.cpp` in the `zen-test` harness (harness itself landed on main via #985).
* log cleanup (#969)Dan Engelbrecht2026-04-171-7/+7
| | | | - Improvement: New `ZEN_SCOPED_LOG(Expr)` macro routes `ZEN_INFO`/`ZEN_WARN`/`ZEN_DEBUG` in the enclosing block through the given logger expression instead of the default - Improvement: `BuildContainer`, `SaveOplog`, and `LoadOplogContext` now take a caller-provided `LoggerRef` so diagnostic messages route through the caller's logger
* Dashboard overhaul, compute integration (#814)Stefan Boberg2026-03-091-1/+3
| | | | | | | | | | - **Frontend dashboard overhaul**: Unified compute/main dashboards into a single shared UI. Added new pages for cache, projects, metrics, sessions, info (build/runtime config, system stats). Added live-update via WebSockets with pause control, sortable detail tables, themed styling. Refactored compute/hub/orchestrator pages into modular JS. - **HTTP server fixes and stats**: Fixed http.sys local-only fallback when default port is in use, implemented root endpoint redirect for http.sys, fixed Linux/Mac port reuse. Added /stats endpoint exposing HTTP server metrics (bytes transferred, request rates). Added WebSocket stats tracking. - **OTEL/diagnostics hardening**: Improved OTLP HTTP exporter with better error handling and resilience. Extended diagnostics services configuration. - **Session management**: Added new sessions service with HTTP endpoints for registering, updating, querying, and removing sessions. Includes session log file support. This is still WIP. - **CLI subcommand support**: Added support for commands with subcommands in the zen CLI tool, with improved command dispatch. - **Misc**: Exposed CPU usage/hostname to frontend, fixed JS compact binary float32/float64 decoding, limited projects displayed on front page to 25 sorted by last access, added vscode:// link support. Also contains some fixes from TSAN analysis.
* zenstore bug-fixes from static analysis pass (#815)Stefan Boberg2026-03-061-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | **Bug fixes across zenstore, zenremotestore, and related subsystems, primarily surfaced by static analysis.** ## Cache subsystem (cachedisklayer.cpp) - Fixed tombstone scoping bug: tombstone flag and missing entry were recorded outside the block where data was removed, causing non-missing entries to be incorrectly tombstoned - Fixed use-after-overwrite: `RemoveMemCachedData`/`RemoveMetaData` were called after `Payload` was overwritten on cache put, leaking stale data - Fixed incorrect retry sleep formula (`100 - (3 - RetriesLeft) * 100` always produced the same or negative value; corrected to `(3 - RetriesLeft) * 100`) - Fixed broken `break` missing from sidecar file read loop, causing reads past valid data - Fixed missing format argument in three `ZEN_WARN`/`ZEN_ERROR` log calls (format string had `{}` placeholders with no corresponding argument, or vice versa) - Fixed elapsed timer being accumulated inside the wrong scope in `HandleRpcGetCacheRecords` - Fixed test asserting against unserialized `RecordPolicy` instead of the deserialized `Loaded` copy - Initialized `AbortFlag`/`PauseFlag` atomics at declaration (UB if read before first write) ## Build store (buildstore.cpp / buildstore.h) - Fixed wrong variable used in warning log: used loop index `ResultIndex` instead of `Index`/`MetaLocationResultIndexes[Index]`, logging wrong hash values - Fixed `sizeof(AccessTimesHeader)` used instead of `sizeof(AccessTimeRecord)` when advancing write offset, corrupting the access times file if the sizes differ - Initialized `m_LastAccessTimeUpdateCount` atomic member (was uninitialized) - Changed map iteration loops to use `const auto&` to avoid unnecessary copies ## Project store (projectstore.cpp / projectstore.h) - Fixed wrong iterator dereferenced in `IterateChunks`: used `ChunkIt->second` (from a different map lookup) instead of `MetaIt->second` - Fixed wrong assert variable: `Sizes[Index]` should be `RawSizes[Index]` - Fixed `MakeTombstone`/`IsTombstone` inconsistency: `MakeTombstone` was zeroing `OpLsn` but `IsTombstone` checks `OpLsn.Number != 0`; tombstone creation now preserves `OpLsn` - Fixed uninitialized `InvalidEntries` counter - Fixed format string mismatch in warning log - Initialized `AbortFlag`/`PauseFlag` atomics; changed map iteration to `const auto&` ## Workspaces (workspaces.cpp) - Fixed missing alias registration when a workspace share is updated: alias was deleted but never re-inserted - Fixed integer overflow in range clamping: `(RequestedOffset + RequestedSize) > Size` could wrap; corrected to `RequestedSize > Size - RequestedOffset` - Changed map iteration loops to `const auto&` ## CAS subsystem (cas.cpp, caslog.cpp, compactcas.cpp, filecas.cpp) - Fixed `IterateChunks` passing original `Payload` buffer instead of the modified `Chunk` buffer (content type was set on the copy but the original was sent to the callback) - Fixed invalid `std::future::get()` call on default-constructed futures - Fixed sign-comparison in `CasLogFile::Replay` loop (`int i` vs `size_t`) - Changed `CasLogFile::IsValid` and `Open` to take `const std::filesystem::path&` instead of by value - Fixed format string in `~CasContainerStrategy` error log ## Remote store (zenremotestore) - Fixed `FolderContent::operator==` always returning true: loop variable `PathCount` was initialized to 0 instead of `Paths.size()` - Fixed `GetChunkIndexForRawHash` looking up from wrong map (`RawHashToSequenceIndex` instead of `ChunkHashToChunkIndex`) - Fixed double-counted `UniqueSequencesFound` stat (incremented in both branches of an if/else) - Fixed `RawSize` sentinel value truncation: `(uint32_t)-1` assigned to a `uint64_t` field; corrected to `(uint64_t)-1` - Initialized uninitialized atomic and struct members across `buildstorageoperations.h`, `chunkblock.h`, and `remoteprojectstore.h`
* Various bug fixes (#778)Stefan Boberg2026-02-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | zencore fixes: - filesystem.cpp: ReadFile error reporting logic - compactbinaryvalue.h: CbValue::As*String error reporting logic zenhttp fixes: - httpasio BindAcceptor would `return 0;` in a function returning `std::string` (UB) - httpsys async workpool initialization race zenstore fixes: - cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost) - structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses) - cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete) - cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size - buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash) - buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob zenserver fixes: - httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow) - hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper) - zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths)
* Revert "Fix correctness and concurrency bugs found during code review"Stefan Boberg2026-02-241-2/+2
| | | | This reverts commit 3c89c486338890ce39ddebe5be4722a09e85701a.
* Fix correctness and concurrency bugs found during code reviewStefan Boberg2026-02-241-2/+2
| | | | | | | | | | | | | | | | | zenstore fixes: - cas.cpp: GetFileCasResults Results param passed by value instead of reference (large chunk results were silently lost) - structuredcachestore.cpp: MissCount unconditionally incremented (counted hits as misses) - cacherpc.cpp: Wrong boolean in Incomplete response array (all entries marked incomplete) - cachedisklayer.cpp: sizeof(sizeof(...)) in two validation checks computed sizeof(size_t) instead of struct size - buildstore.cpp: Wrong hash tracked in GC key list (BlobHash pushed twice instead of MetadataHash) - buildstore.cpp: Removed duplicate m_LastAccessTimeUpdateCount increment in PutBlob zenserver fixes: - httpbuildstore.cpp: Reversed subtraction in HTTP range calculation (unsigned underflow) - hubservice.cpp: Deadlock in Provision() calling Wake() while holding m_Lock (extracted WakeLocked helper) - zipfs.cpp: Data race in GetFile() lazy initialization (added RwLock with shared/exclusive paths) Co-Authored-By: Claude Opus 4.6 <[email protected]>
* reduce blocking in scrub (#743)Dan Engelbrecht2026-02-031-56/+86
| | | * reduce held locks while performing scrub operation
* remove catching of exceptions in batch operations now that they are not ↵Dan Engelbrecht2025-12-101-79/+50
| | | | | executed in the destructor (#683) don't call WriteChunks in batch operation if no chunks needs to be written
* batch op not in destructor (#676)Dan Engelbrecht2025-12-041-281/+308
| | | | | * use fixed vectors for batch requests * refactor cache batch value put/get to not execute code that can throw execeptions in destructor * extend test with multi-bucket requests
* add checks to protect against access violation due to failed disk read (#675)Dan Engelbrecht2025-12-041-0/+12
| | | * add checkes to protect against access violation due to failed disk read
* automatic scrub on startup (#667)Dan Engelbrecht2025-11-271-34/+44
| | | | | - Improvement: Deeper validation of data when scrub is activated (cas/cache/project) - Improvement: Enabled more multi threading when running scrub operations - Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
* optimize filecas write file (#613)Dan Engelbrecht2025-10-241-16/+10
| | | * try to move file into place before trying speculative remove of target file
* zenutil cleanup (#550)Dan Engelbrecht2025-10-031-1/+2
| | | | * move referencemetadata to zenstore * rename zenutil/windows/service to windowsservice
* remove zenutil dependency in zenremotestore (#547)Dan Engelbrecht2025-10-031-1/+1
| | | | | | | | | * remove dependency to zenutil/workerpools.h from remoteprojectstore.cpp * remove dependency to zenutil/workerpools.h from buildstoragecache.cpp * remove unneded include * move jupiter helpers to zenremotestore * move parallelwork to zencore * remove zenutil dependency from zenremotestore * clean up test project dependencies - use indirect dependencies
* more cbobject validations (#527)Dan Engelbrecht2025-09-291-5/+19
| | | - Improvement: Add additional validations when reading disk cache records to get references in GC
* Adjust the responses from PUT commandszousar2025-09-231-10/+5
| | | | | - Ensure that text responses are in a field named "Message" - Change the record response to be named "Record" instead of "Object"
* Change batch put responses for client reportingzousar2025-09-191-11/+24
| | | | Conflicts are now treated as successes, and we optionally return a Details array instead of an ErrorMessages array. Details are returned for all requests in a batch, or no requests in a batch depending on whether there are any details to be shared about any of the put requests. The details for a conflict include the raw hash and raw size of the item. If the item is a record, we also include the record as an object.
* add EMode to WorkerTheadPool to avoid thread starvation (#492)Dan Engelbrecht2025-09-101-3/+4
| | | - Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
* per namespace/project cas prep refactor (#470)Dan Engelbrecht2025-08-201-12/+16
| | | | | | | - Refactor so we can have more than one cas store for project store and cache. - Refactor `UpstreamCacheClient` so it is not tied to a specific CidStore - Refactor scrub to keep the GC interface ScrubStorage function separate from scrub accessor functions (renamed to Scrub). - Refactor storage size to keep GC interface StorageSize function separate from size accessor functions (renamed to TotalSize) - Refactor cache storage so `ZenCacheDiskLayer::CacheBucket` implements GcStorage interface rather than `ZenCacheNamespace`
* reduce lock contention when checking for disk cache put reject (#465)Dan Engelbrecht2025-08-121-91/+81
| | | | keep rawsize and rawhash if available when using batch for inline puts keep rawsize and rawhash of input value if we have calculated it for validation already
* Merge branch 'main' into zs/put-overwrite-policyZousar Shaker2025-08-081-2/+2
|\
| * add the correct set of references hashes in batched inline mode (#459)Dan Engelbrecht2025-08-061-3/+3
| |
* | precommitzousar2025-08-071-9/+4
| |
* | Avoid committing chunks for batch rejected putszousar2025-08-071-42/+35
| | | | | | | | Previously rejected puts would put the chunks, but not write them to the index, which was wrong.
* | Moving put rejections to happen in batch handlingzousar2025-08-051-33/+87
| |
* | Merge branch 'main' into zs/put-overwrite-policyzousar2025-08-051-11/+40
|\|
| * add hardening for legacy cache bucket manifests (#454)Dan Engelbrecht2025-08-041-11/+40
| |
* | Merge branch 'main' into zs/put-overwrite-policyzousar2025-06-241-246/+425
|\|
| * make sure we unregister from GC before we drop bucket/namespaces (#443)Dan Engelbrecht2025-06-191-1/+3
| |
| * graceful wait in parallelwork destructor (#438)Dan Engelbrecht2025-06-161-35/+45
| | | | | | | | | | * exception safety when issuing ParallelWork * add asserts to Latch usage to catch usage errors * extended error messaging and recovery handling in ParallelWork destructor to help find issues
| * missing chunks bugfix (#424)Dan Engelbrecht2025-06-091-14/+70
| | | | | | | | | | | | | | | | | | | | | | * make sure to close log file when resetting log * drop entries that refers to missing blocks * Don't scrub keys that has been rewritten * currectly count added bytes / m_TotalSize * fix negative sleep time in BlockStoreFile::Open() * be defensive when fetching log position * append to log files *after* we updated all state successfully * explicitly close stuff in destructors with exception catching * clean up empty size block store files
| * pause, resume and abort running builds cmd (#421)Dan Engelbrecht2025-06-051-4/+8
| | | | | | | | | | - Feature: `zen builds pause`, `zen builds resume` and `zen builds abort` commands to control a running `zen builds` command - `--process-id` the process id to control, if omitted it tries to find a running process using the same executable as itself - Improvement: Process report now indicates if it is pausing or aborting
| * fix cachbucket mem hit count (#415)Dan Engelbrecht2025-06-021-4/+7
| | | | | | | | | | * Don't count a miss twice for memory stats if the entry can't be found * changelog
| * add missing flush inblockstore compact (#411)Dan Engelbrecht2025-05-301-2/+22
| | | | | | | | - Bugfix: Flush the last block before closing the last new block written to during blockstore compact. UE-291196 - Feature: Drop unreachable CAS data during GC pass. UE-291196
| * unblock cache bucket drop (#406)Dan Engelbrecht2025-05-261-26/+82
| | | | | | | | * don't hold exclusive locks while deleting files from a dropped bucket/namespace * cleaner detection of missing namespace when issuing a drop
| * handle exception with batch work (#401)Dan Engelbrecht2025-05-191-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * use ParallelWork in rpc playback * use ParallelWork in projectstore * use ParallelWork in buildstore * use ParallelWork in cachedisklayer * use ParallelWork in compactcas * use ParallelWork in filecas * don't set abort flag in ParallelWork destructor * add PrepareFileForScatteredWrite for temp files in httpclient * Use PrepareFileForScatteredWrite when stream-decompressing files * be more relaxed when deleting temp files * allow explicit zen-cache when using direct host url without resolving * fix lambda capture when writing loose chunks * no delay when attempting to remove temp files
| * keep snapshot on log delete fail (#391)Dan Engelbrecht2025-05-121-26/+21
| | | | | | | | | | - Improvement: Cleaned up snapshot writing for CompactCAS/FileCas/Cache/Project stores - Improvement: Safer recovery when failing to delete log for CompactCAS/FileCas/Cache/Project stores - Improvement: Added log file reset when writing snapshot at startup for FileCas
| * enable per bucket config (#388)Dan Engelbrecht2025-05-121-2/+19
| | | | | | | | Feature: Add per bucket cache configuration (Lua options file only) Improvement: --cache-memlayer-sizethreshold is now deprecated and has a new name: --cache-bucket-memlayer-sizethreshold to line up with per cache bucket configuration
| * tweak iterate block parameters (#390)Dan Engelbrecht2025-05-121-32/+45
| | | | | | * tweak block iteration chunk sizes
| * optimize cache bucket state writing (#382)Dan Engelbrecht2025-05-061-42/+59
| | | | | | * optimize cache bucket snapshot and sidecar writing
| * replace local equal_to_2 with eastl impl (#368)Stefan Boberg2025-04-251-16/+2
| |
| * predicate to enable compiling with later EASTL version (#367)Stefan Boberg2025-04-241-1/+1
| |
| * reduce disk io during gc (#335)Dan Engelbrecht2025-04-011-20/+8
| | | | | | * do cache bucket flush/write snapshot as part of compact to reduce disk I/O
| * long filename support (#330)Dan Engelbrecht2025-03-311-24/+24
| | | | | | - Bugfix: Long file paths now works correctly on Windows
| * reduced memory churn using fixed_xxx containers (#236)Stefan Boberg2025-03-061-52/+69
| | | | | | | | | | | | * Added EASTL to help with eliminating memory allocations * Applied EASTL to eliminate memory allocations, primarily by using `fixed_vector` et al to use stack allocations / inline struct allocations Reduces memory events in traces by close to a factor of 10 in test scenario (starting editor for project F)
* | Change to PutResult structurezousar2025-06-241-19/+48
| | | | | | | | Result structure contains status and a string message (may be empty)
* | Control overwrite enforcement with a config settingzousar2025-03-021-1/+2
| |
* | Move utility methods in cachedisklayerzousar2025-02-261-38/+38
| | | | | | | | Value comparison methods moved to more appropriate area in file.
* | Enforce Overwrite Prevention According To Cache Policyzousar2025-02-261-4/+89
|/ | | | Overwrite with differing value should be denied if QueryLocal is not present and StoreLocal is present. Overwrite with equal value should succeed regardless of policy flags.