aboutsummaryrefslogtreecommitdiff
path: root/src/zenstore/compactcas.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Transparent proxy mode (#823)Stefan Boberg5 days1-2/+9
| | | | | | | | | | | | | | | | | Adds a **transparent TCP proxy mode** to zenserver (activated via `zenserver proxy`), allowing it to sit between clients and upstream Zen servers to inspect and monitor HTTP/1.x traffic in real time. Primarily useful during development, to be able to observe multi-server/client interactions in one place. - **Dedicated proxy port** -- Proxy mode defaults to port 8118 with its own data directory to avoid collisions with a normal zenserver instance. - **TCP proxy core** (`src/zenserver/proxy/`) -- A new transparent TCP proxy that forwards connections to upstream targets, with support for both TCP/IP and Unix socket listeners. Multi-threaded I/O for connection handling. Supports Unix domain sockets for both upstream/downstream. - **HTTP traffic inspection** -- Parses HTTP/1.x request/response streams inline to extract method, path, status, content length, and WebSocket upgrades without breaking the proxied data. - **Proxy dashboard** -- A web UI showing live connection stats, per-target request counts, active connections, bytes transferred, and client IP/session ID rollups. - **Server mode display** -- Dashboard banner now shows the running server mode (Zen Proxy, Zen Compute, etc.). Supporting changes included in this branch: - **Wildcard log level matching** -- Log levels can now be set per-category using wildcard patterns (e.g. `proxy.*=debug`). - **`zen down --all`** -- New flag to shut down all running zenserver instances; also used by the new `xmake kill` task. - Minor test stability fixes (flaky hash collisions, per-thread RNG seeds). - Support ZEN_MALLOC environment variable for default allocator selection and switch default to rpmalloc - Fixed sentry-native build to allow LTO on Windows
* zenstore bug-fixes from static analysis pass (#815)Stefan Boberg11 days1-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | **Bug fixes across zenstore, zenremotestore, and related subsystems, primarily surfaced by static analysis.** ## Cache subsystem (cachedisklayer.cpp) - Fixed tombstone scoping bug: tombstone flag and missing entry were recorded outside the block where data was removed, causing non-missing entries to be incorrectly tombstoned - Fixed use-after-overwrite: `RemoveMemCachedData`/`RemoveMetaData` were called after `Payload` was overwritten on cache put, leaking stale data - Fixed incorrect retry sleep formula (`100 - (3 - RetriesLeft) * 100` always produced the same or negative value; corrected to `(3 - RetriesLeft) * 100`) - Fixed broken `break` missing from sidecar file read loop, causing reads past valid data - Fixed missing format argument in three `ZEN_WARN`/`ZEN_ERROR` log calls (format string had `{}` placeholders with no corresponding argument, or vice versa) - Fixed elapsed timer being accumulated inside the wrong scope in `HandleRpcGetCacheRecords` - Fixed test asserting against unserialized `RecordPolicy` instead of the deserialized `Loaded` copy - Initialized `AbortFlag`/`PauseFlag` atomics at declaration (UB if read before first write) ## Build store (buildstore.cpp / buildstore.h) - Fixed wrong variable used in warning log: used loop index `ResultIndex` instead of `Index`/`MetaLocationResultIndexes[Index]`, logging wrong hash values - Fixed `sizeof(AccessTimesHeader)` used instead of `sizeof(AccessTimeRecord)` when advancing write offset, corrupting the access times file if the sizes differ - Initialized `m_LastAccessTimeUpdateCount` atomic member (was uninitialized) - Changed map iteration loops to use `const auto&` to avoid unnecessary copies ## Project store (projectstore.cpp / projectstore.h) - Fixed wrong iterator dereferenced in `IterateChunks`: used `ChunkIt->second` (from a different map lookup) instead of `MetaIt->second` - Fixed wrong assert variable: `Sizes[Index]` should be `RawSizes[Index]` - Fixed `MakeTombstone`/`IsTombstone` inconsistency: `MakeTombstone` was zeroing `OpLsn` but `IsTombstone` checks `OpLsn.Number != 0`; tombstone creation now preserves `OpLsn` - Fixed uninitialized `InvalidEntries` counter - Fixed format string mismatch in warning log - Initialized `AbortFlag`/`PauseFlag` atomics; changed map iteration to `const auto&` ## Workspaces (workspaces.cpp) - Fixed missing alias registration when a workspace share is updated: alias was deleted but never re-inserted - Fixed integer overflow in range clamping: `(RequestedOffset + RequestedSize) > Size` could wrap; corrected to `RequestedSize > Size - RequestedOffset` - Changed map iteration loops to `const auto&` ## CAS subsystem (cas.cpp, caslog.cpp, compactcas.cpp, filecas.cpp) - Fixed `IterateChunks` passing original `Payload` buffer instead of the modified `Chunk` buffer (content type was set on the copy but the original was sent to the callback) - Fixed invalid `std::future::get()` call on default-constructed futures - Fixed sign-comparison in `CasLogFile::Replay` loop (`int i` vs `size_t`) - Changed `CasLogFile::IsValid` and `Open` to take `const std::filesystem::path&` instead of by value - Fixed format string in `~CasContainerStrategy` error log ## Remote store (zenremotestore) - Fixed `FolderContent::operator==` always returning true: loop variable `PathCount` was initialized to 0 instead of `Paths.size()` - Fixed `GetChunkIndexForRawHash` looking up from wrong map (`RawHashToSequenceIndex` instead of `ChunkHashToChunkIndex`) - Fixed double-counted `UniqueSequencesFound` stat (incremented in both branches of an if/else) - Fixed `RawSize` sentinel value truncation: `(uint32_t)-1` assigned to a `uint64_t` field; corrected to `(uint64_t)-1` - Initialized uninitialized atomic and struct members across `buildstorageoperations.h`, `chunkblock.h`, and `remoteprojectstore.h`
* Add test suites (#799)Stefan Boberg2026-03-021-0/+4
| | | | | | | | | | | | | Makes all test cases part of a test suite. Test suites are named after the module and the name of the file containing the implementation of the test. * This allows for better and more predictable filtering of which test cases to run which should also be able to reduce the time CI spends in tests since it can filter on the tests for that particular module. Also improves `xmake test` behaviour: * instead of an explicit list of projects just enumerate the test projects which are available based on build system state * also introduces logic to avoid running `xmake config` unnecessarily which would invalidate the existing build and do lots of unnecessary work since dependencies were invalidated by the updated config * also invokes build only for the chosen test targets As a bonus, also adds `xmake sln --open` which allows opening IDE after generation of solution/xmake project is done.
* use matcher over regex (#744)Dan Engelbrecht2026-02-041-2/+2
| | | | * replace http router AddPattern with AddMatcher * fix scrub logging
* reduce blocking in scrub (#743)Dan Engelbrecht2026-02-031-11/+12
| | | * reduce held locks while performing scrub operation
* various optimizations (#704)Dan Engelbrecht2026-01-091-4/+6
| | | | | | | | | - Improvement: Validate chunk hashes when dechunking files in oplog import - Improvement: Use stream decompression when dechunking files - Improvement: When assembling blocks for oplog export, make sure we keep under/at block size limit - Improvement: Make cancelling of oplog import more responsive - Improvement: Use decompress to composite to avoid allocating a new memory buffer for uncompressed chunks during oplog import - Improvement: Reduce memory buffer size and allocate it on demand when writing multiple chunks to block store - Improvement: Reduce lock contention when fetching/checking existence of chunks in block store
* automatic scrub on startup (#667)Dan Engelbrecht2025-11-271-23/+26
| | | | | - Improvement: Deeper validation of data when scrub is activated (cas/cache/project) - Improvement: Enabled more multi threading when running scrub operations - Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
* fix use-after-free in TEST_CASE("compactcas.threadedinsert") (#620)v5.7.8-pre1Stefan Boberg2025-10-301-6/+8
|
* refactor CasContainerStrategy::IterateOneBlock to make it more readable (#607)Dan Engelbrecht2025-10-241-91/+96
|
* add ability to limit concurrency (#565)Stefan Boberg2025-10-101-3/+3
| | | | | | | | | | | | effective concurrency in zenserver can be limited via the `--corelimit=<N>` option on the command line. Any value passed in here will be used instead of the return value from `std::thread::hardware_concurrency()` if it is lower. * added --corelimit option to zenserver * made sure thread pools are configured lazily and not during global init * added log output indicating effective and HW concurrency * added change log entry * removed debug logging from ZenEntryPoint::Run() also removed main thread naming on Linux since it makes the output from `top` and similar tools confusing (it shows `main` instead of `zenserver`)
* fixed issue in compactcas.restart test due to std::vector<bool> (#559)Stefan Boberg2025-10-061-2/+5
| | | `std::vector<bool>` is a special container since it bit packs the values rather than just using an array of booleans. This means that updating it on multiple threads simultaneously is dangerous
* speed up tests (#555)Dan Engelbrecht2025-10-061-64/+81
| | | | | | | | | | | | * faster FileSystemTraversal test * faster jobqueue test * faster NamedEvent test * faster cache tests * faster basic http tests * faster blockstore test * faster cache store tests * faster compactcas tests * more responsive zenserver launch * tweak worker pool sizes in tests
* remove zenutil dependency in zenremotestore (#547)Dan Engelbrecht2025-10-031-1/+1
| | | | | | | | | * remove dependency to zenutil/workerpools.h from remoteprojectstore.cpp * remove dependency to zenutil/workerpools.h from buildstoragecache.cpp * remove unneded include * move jupiter helpers to zenremotestore * move parallelwork to zencore * remove zenutil dependency from zenremotestore * clean up test project dependencies - use indirect dependencies
* more iterate chunk logging (#516)Dan Engelbrecht2025-09-261-12/+44
| | | | | * add log warnings when we can't read payloads in cas when we thing we should have them * fix misleading option help
* add EMode to WorkerTheadPool to avoid thread starvation (#492)Dan Engelbrecht2025-09-101-102/+116
| | | - Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
* graceful wait in parallelwork destructor (#438)Dan Engelbrecht2025-06-161-65/+72
| | | | | * exception safety when issuing ParallelWork * add asserts to Latch usage to catch usage errors * extended error messaging and recovery handling in ParallelWork destructor to help find issues
* missing chunks bugfix (#424)Dan Engelbrecht2025-06-091-16/+290
| | | | | | | | | | | * make sure to close log file when resetting log * drop entries that refers to missing blocks * Don't scrub keys that has been rewritten * currectly count added bytes / m_TotalSize * fix negative sleep time in BlockStoreFile::Open() * be defensive when fetching log position * append to log files *after* we updated all state successfully * explicitly close stuff in destructors with exception catching * clean up empty size block store files
* pause, resume and abort running builds cmd (#421)Dan Engelbrecht2025-06-051-1/+2
| | | | | - Feature: `zen builds pause`, `zen builds resume` and `zen builds abort` commands to control a running `zen builds` command - `--process-id` the process id to control, if omitted it tries to find a running process using the same executable as itself - Improvement: Process report now indicates if it is pausing or aborting
* add missing flush inblockstore compact (#411)Dan Engelbrecht2025-05-301-1/+15
| | | | - Bugfix: Flush the last block before closing the last new block written to during blockstore compact. UE-291196 - Feature: Drop unreachable CAS data during GC pass. UE-291196
* faster oplog validate (#408)Dan Engelbrecht2025-05-301-1/+6
| | | Improvement: Faster oplog validate to reduce GC wall time and disk I/O pressure
* handle exception with batch work (#401)Dan Engelbrecht2025-05-191-58/+70
| | | | | | | | | | | | | | | * use ParallelWork in rpc playback * use ParallelWork in projectstore * use ParallelWork in buildstore * use ParallelWork in cachedisklayer * use ParallelWork in compactcas * use ParallelWork in filecas * don't set abort flag in ParallelWork destructor * add PrepareFileForScatteredWrite for temp files in httpclient * Use PrepareFileForScatteredWrite when stream-decompressing files * be more relaxed when deleting temp files * allow explicit zen-cache when using direct host url without resolving * fix lambda capture when writing loose chunks * no delay when attempting to remove temp files
* keep snapshot on log delete fail (#391)Dan Engelbrecht2025-05-121-24/+22
| | | | | - Improvement: Cleaned up snapshot writing for CompactCAS/FileCas/Cache/Project stores - Improvement: Safer recovery when failing to delete log for CompactCAS/FileCas/Cache/Project stores - Improvement: Added log file reset when writing snapshot at startup for FileCas
* flush cas log file (#387)Dan Engelbrecht2025-05-091-45/+30
| | | * make sure we remove the cas log file when writing full index at startup
* iterate chunks crash fix (#376)Dan Engelbrecht2025-05-021-35/+246
| | | * Bugfix: Add explicit lambda capture in CasContainer::IterateChunks to avoid accessing state data references
* long filename support (#330)Dan Engelbrecht2025-03-311-15/+15
| | | - Bugfix: Long file paths now works correctly on Windows
* zen build cache service (#318)Dan Engelbrecht2025-03-261-2/+2
| | | | | | | | | - **EXPERIMENTAL** `zen builds` - Feature: `--zen-cache-host` option for `upload` and `download` operations to use a zenserver host `/builds` endpoint for storing build blob and blob metadata - Feature: New `/builds` endpoint for caching build blobs and blob metadata - `/builds/{namespace}/{bucket}/{buildid}/blobs/{hash}` `GET` and `PUT` method for storing and fetching blobs - `/builds/{namespace}/{bucket}/{buildid}/blobs/putBlobMetadata` `POST` method for storing metadata about blobs - `/builds/{namespace}/{bucket}/{buildid}/blobs/getBlobMetadata` `POST` method for fetching metadata about blobs - `/builds/{namespace}/{bucket}/{buildid}/blobs/exists` `POST` method for checking existance of blobs
* Miscellaneous minor LLM fixes (#268)v5.5.17-pre0Stefan Boberg2024-12-171-1/+2
| | | | | | | With this change, LLM tags are assigned using the name,parent tuple rather than just by name only. This allows tag hierarchies like `cache/store` and `project/store` which would previously get collapsed into the first pair seen when registering the `store` tag. This PR also adds some more LLM tag annotations to more accurately associate memory allocations with subsystems In addition, this PR also tweaks the frequency of timer marker events to increase the resolution in Insights and avoid some cases of Insights deciding that marker events are too far apart since we don't allocate as frequently as UE tends to.
* batch fetch record cache values (#266)Dan Engelbrecht2024-12-171-17/+18
| | | | | | - Improvement: Batch fetch record attachments when appropriate - Improvement: Reduce memory buffer allocation in BlockStore::IterateBlock - Improvement: Tweaked BlockStore::IterateBlock logic when to use threaded work (at least 4 chunks requested) - Bugfix: CasContainerStrategy::IterateChunks could give wrong payload/index when requesting 1 or 2 chunks
* projectstore getchunks rpc with modtag (#244)Dan Engelbrecht2024-12-051-15/+17
| | | Feature: Project store "getchunks" rpc call /prj/{project}/oplog/{log}/rpc extended to accept both CAS (RawHash) and Id (Oid) identifiers as well as partial ranges
* Unity build fixes (#253)Stefan Boberg2024-12-051-13/+13
| | | some fixes to make everything build using unity build mode. Mostly moved code from anonymous namespaces into local impl namespace to avoid ambiguity in name resolution.
* added support for dynamic LLM tags (#245)Stefan Boberg2024-12-021-0/+34
| | | | | * added FLLMTag which can be used to register memory tags outside of core * changed `UE_MEMSCOPE` -> `ZEN_MEMSCOPE` for consistency * instrumented some subsystems with dynamic tags
* use plain sorted array instead of map of vectors (#237)Dan Engelbrecht2024-11-271-4/+7
| | | | | * use plain sorted array instead of map of vectors * reserve vectors up front = 5% perf increase * don't do batch read of chunks if we have a single chunk -> 1% perf gain
* caller controls threshold for bulk-loading chunks in IterateChunks (#222)Dan Engelbrecht2024-11-251-3/+5
| | | | | | * Allow caller to control threshold for bulk-loading chunks in IterateChunks * use smaller batch chunk reading for /fileinfos and /chunkinfos as we do not intend to read the payload * use smaller batch read buffer when just querying for size of attachments
* stronger validation of payload existance (#229)Dan Engelbrecht2024-11-251-2/+26
| | | | | | - Don't add RawSize and Size in ProjectStore::GetProjectFiles response if we can't get the payload - Use validation of payload size/existance in all chunk fetch operations in file cas - In project store oplog validate, make sure we can reach all the payloads - Add threading to oplog validate request
* fix inconsistencies in filecas due to failing to remove payload file during ↵Dan Engelbrecht2024-11-221-1/+1
| | | | | | | | GC (#224) make sure we rewrite filecas entries if chunk size changes (due to compression changes) hardening of move/write files in filecas if we encounter a filecas entry with mismatching size (due to pre-existing bug) we validate the file and update the index if we find a bad filecas file on disk we now attempt to remove it
* remove gc v1 (#121)Dan Engelbrecht2024-10-031-610/+29
| | | | | * kill gc v1 * block use of gc v1 from zen command line * warn and flip to gcv2 if --gc-v2=false is specified for zenserver
* gc block size target max size (#180)Dan Engelbrecht2024-10-021-28/+25
| | | | | | * If a block is small (less than half max size) we add it to blocks to compact Sort blocks when iterating over them * do compact of block stores even if no new unused are found * do compact phase even if bucket is empty
* optimize startup time (#175)Dan Engelbrecht2024-09-301-3/+3
| | | | | | * use tsl::robin_set for BlockIndexSet don't calculate full block location when only block index is needed * don't copy visitor function * reserve space for attachments
* Add `gc-attachment-passes` option to zenserver (#167)Dan Engelbrecht2024-09-251-1/+1
| | | | | Added option `gc-attachment-passes` to zenserver Cleaned up GCv2 start and stop logs and added identifier to easily find matching start and end of a GC pass in log file Fixed project store not properly sorting references found during lock phase
* gc unused refactor (#165)Dan Engelbrecht2024-09-231-5/+9
| | | | | * optimize IoHash and OId comparisions * refactor filtering of unused references * add attachment filtering to gc
* gc performance improvements (#160)Dan Engelbrecht2024-09-171-13/+14
| | | | | | | | | | * optimized ValidateCbUInt * optimized iohash comparision * replace unordered set/map with tsl/robin set/map in blockstore * increase max buffer size when writing cache bucket sidecar * only store meta data for files < 4Gb * faster ReadAttachmentsFromMetaData * remove memcpy call in BlockStoreDiskLocation * only write cache bucket state to disk if GC deleted anything
* trace scopes improvementsDan Engelbrecht2024-09-101-0/+1
|
* move gc logs to gc logger (#142)Dan Engelbrecht2024-09-041-0/+6
| | | - Improvement: Move GC logging in callback functions into "gc" context
* stop exceptions from leaking on threaded work (#102)Dan Engelbrecht2024-08-061-23/+33
| | | | * catch exceptions in threaded work * don't abort all project file/chunk info fetch for single failure
* refactor BlockStore IterateChunks (#77)Dan Engelbrecht2024-05-171-18/+55
| | | Improvement: Refactored IterateChunks to allow reuse in diskcachelayer and hide public GetBlockFile() function in BlockStore
* use direct file access for large file hash (#63)Dan Engelbrecht2024-04-261-4/+4
| | | - Improvement: Refactor `IoHash::HashBuffer` and `BLAKE3::HashBuffer` to not use memory mapped files. Performs better and saves ~10% of oplog export time on CI
* iterate cas chunks (#59)Dan Engelbrecht2024-04-241-10/+43
| | | - Improvement: Reworked GetChunkInfos in oplog store to reduce disk thrashing and improve performance
* safer gcv2 on error (#60)Dan Engelbrecht2024-04-241-0/+2
| | | - Bugfix: Harden GCv2 when errors occur and gracefully abort GC operation on error
* InsertChunks for CAS store (#55)Dan Engelbrecht2024-04-221-0/+58
| | | - Improvement: Add batching when writing multiple small chunks to block store - decreases I/O load significantly on oplog import
* improved assert (#37)Dan Engelbrecht2024-04-041-2/+2
| | | | - Improvement: Add file and line to ASSERT exceptions - Improvement: Catch call stack when throwing assert exceptions and log/output call stack at important places to provide more context to caller