aboutsummaryrefslogtreecommitdiff
path: root/src/zenstore/filecas.cpp
Commit message (Collapse)AuthorAgeFilesLines
* automatic scrub on startup (#667)Dan Engelbrecht2025-11-271-69/+40
| | | | | - Improvement: Deeper validation of data when scrub is activated (cas/cache/project) - Improvement: Enabled more multi threading when running scrub operations - Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
* Various fixes to address issues flagged by gcc / non-UE toolchain build (#621)Stefan Boberg2025-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | * gcc: avoid using memset on nontrivial struct * redundant `return std::move` * fixed various compilation issues flagged by gcc * fix issue in xmake.lua detecting whether we are building with the UE toolchain or not * add GCC ignore -Wundef (comment is inaccurate) * remove redundant std::move * don't catch exceptions by value * unreferenced variables * initialize "by the book" instead of memset * remove unused exception reference * add #include <cstring> to fix gcc build * explicitly poulate KeyValueMap by traversing input spans fixes gcc compilation * remove unreferenced variable * eliminate redundant `std::move` which gcc complains about * fix gcc compilation by including <cstring> * tag unreferenced variable to fix gcc compilation * fixes for various cases of naming members the same as their type
* fix --zentrace=no compile errors (#616)Stefan Boberg2025-10-281-8/+8
| | | | | | * make sure the correct `UE_WITH_TRACE` conditional is used to enable/disable support code as appropriate * fixed some accidental `int32`, `int64` et al usage, due to typedefs leaking through from trace header with this fix, it is now possible to build with `--zentrace=no` again
* remove zenutil dependency in zenremotestore (#547)Dan Engelbrecht2025-10-031-1/+1
| | | | | | | | | * remove dependency to zenutil/workerpools.h from remoteprojectstore.cpp * remove dependency to zenutil/workerpools.h from buildstoragecache.cpp * remove unneded include * move jupiter helpers to zenremotestore * move parallelwork to zencore * remove zenutil dependency from zenremotestore * clean up test project dependencies - use indirect dependencies
* more iterate chunk logging (#516)Dan Engelbrecht2025-09-261-0/+12
| | | | | * add log warnings when we can't read payloads in cas when we thing we should have them * fix misleading option help
* add EMode to WorkerTheadPool to avoid thread starvation (#492)Dan Engelbrecht2025-09-101-1/+1
| | | - Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
* graceful wait in parallelwork destructor (#438)Dan Engelbrecht2025-06-161-37/+49
| | | | | * exception safety when issuing ParallelWork * add asserts to Latch usage to catch usage errors * extended error messaging and recovery handling in ParallelWork destructor to help find issues
* missing chunks bugfix (#424)Dan Engelbrecht2025-06-091-5/+4
| | | | | | | | | | | * make sure to close log file when resetting log * drop entries that refers to missing blocks * Don't scrub keys that has been rewritten * currectly count added bytes / m_TotalSize * fix negative sleep time in BlockStoreFile::Open() * be defensive when fetching log position * append to log files *after* we updated all state successfully * explicitly close stuff in destructors with exception catching * clean up empty size block store files
* pause, resume and abort running builds cmd (#421)Dan Engelbrecht2025-06-051-1/+2
| | | | | - Feature: `zen builds pause`, `zen builds resume` and `zen builds abort` commands to control a running `zen builds` command - `--process-id` the process id to control, if omitted it tries to find a running process using the same executable as itself - Improvement: Process report now indicates if it is pausing or aborting
* faster oplog validate (#408)Dan Engelbrecht2025-05-301-2/+18
| | | Improvement: Faster oplog validate to reduce GC wall time and disk I/O pressure
* handle exception with batch work (#401)Dan Engelbrecht2025-05-191-30/+40
| | | | | | | | | | | | | | | * use ParallelWork in rpc playback * use ParallelWork in projectstore * use ParallelWork in buildstore * use ParallelWork in cachedisklayer * use ParallelWork in compactcas * use ParallelWork in filecas * don't set abort flag in ParallelWork destructor * add PrepareFileForScatteredWrite for temp files in httpclient * Use PrepareFileForScatteredWrite when stream-decompressing files * be more relaxed when deleting temp files * allow explicit zen-cache when using direct host url without resolving * fix lambda capture when writing loose chunks * no delay when attempting to remove temp files
* keep snapshot on log delete fail (#391)Dan Engelbrecht2025-05-121-47/+31
| | | | | - Improvement: Cleaned up snapshot writing for CompactCAS/FileCas/Cache/Project stores - Improvement: Safer recovery when failing to delete log for CompactCAS/FileCas/Cache/Project stores - Improvement: Added log file reset when writing snapshot at startup for FileCas
* long filename support (#330)Dan Engelbrecht2025-03-311-39/+39
| | | - Bugfix: Long file paths now works correctly on Windows
* improvements and infrastructure for upcoming builds api command line (#284)Dan Engelbrecht2025-02-251-2/+2
| | | | | | | | | | | | | | * add modification tick to filesystem traversal * add ShowDetails option to ProgressBar * log callstack if we terminate process * handle chunking if MaxSize > 1MB * BasicFile write helpers and WriteToTempFile simplifications * bugfix for CompositeBuffer::IterateRange when using DecompressToComposite for actually comrpessed data revert of earlier optimization * faster compress/decompress for large disk-based files * enable progress feedback in IoHash::HashBuffer * add payload validation in HttpClient::Get * fix range requests (range is including end byte) * remove BuildPartId for blob/block related operations in builds api
* Add multithreading directory scanning in core/filesystem (#277)Dan Engelbrecht2025-01-221-9/+4
| | | | | | add DirectoryContent::IncludeFileSizes add DirectoryContent::IncludeAttributes add multithreaded GetDirectoryContent use multithreaded GetDirectoryContent in workspace folder scanning
* move basicfile.h/cpp -> zencore (#273)Dan Engelbrecht2025-01-161-1/+1
| | | | | | move jupiter.h/cpp -> zenutil move packageformat.h/.cpp -> zenhttp zenutil now depends on zenhttp instead of the inverse
* Unity build fixes (#253)Stefan Boberg2024-12-051-8/+6
| | | some fixes to make everything build using unity build mode. Mostly moved code from anonymous namespaces into local impl namespace to avoid ambiguity in name resolution.
* added support for dynamic LLM tags (#245)Stefan Boberg2024-12-021-0/+26
| | | | | * added FLLMTag which can be used to register memory tags outside of core * changed `UE_MEMSCOPE` -> `ZEN_MEMSCOPE` for consistency * instrumented some subsystems with dynamic tags
* add missing shard lock in filecas compact (#239)Dan Engelbrecht2024-11-271-1/+6
| | | | * add missing shardlock during compact in filecas * add warning log when filecas fails to open a file it expects to be present
* stronger validation of payload existance (#229)Dan Engelbrecht2024-11-251-54/+60
| | | | | | - Don't add RawSize and Size in ProjectStore::GetProjectFiles response if we can't get the payload - Use validation of payload size/existance in all chunk fetch operations in file cas - In project store oplog validate, make sure we can reach all the payloads - Add threading to oplog validate request
* Insights-compatible memory tracking (#214)Stefan Boberg2024-11-251-1/+1
| | | | | | | | | | | | | This change introduces support for tracing of memory allocation activity. The code is ported from UE5, and Unreal Insights can be used to analyze the output. This is currently only fully supported on Windows, but will be extended to Mac/Linux in the near future. To activate full memory tracking, pass `--trace=memory` on the commandline alongside `--tracehost=<ip>` or `-tracefile=<path>`. For more control over how much detail is traced you can instead pass some combination of `callstack`, `memtag`, `memalloc` instead. In practice, `--trace=memory` is an alias for `--trace=callstack,memtag,memalloc`). For convenience we also support `--trace=memory_light` which omits call stacks. This change also introduces multiple memory allocators, which may be selected via command-line option `--malloc=<allocator>`: * `mimalloc` - mimalloc (default, same as before) * `rpmalloc` - rpmalloc is another high performance allocator for multithreaded applications which may be a better option than mimalloc (to be evaluated). Due to toolchain limitations this is currently only supported on Windows. * `stomp` - an allocator intended to be used during development/debugging to help track down memory issues such as use-after-free or out-of-bounds access. Currently only supported on Windows. * `ansi` - fallback to default system allocator
* fix inconsistencies in filecas due to failing to remove payload file during ↵Dan Engelbrecht2024-11-221-424/+433
| | | | | | | | GC (#224) make sure we rewrite filecas entries if chunk size changes (due to compression changes) hardening of move/write files in filecas if we encounter a filecas entry with mismatching size (due to pre-existing bug) we validate the file and update the index if we find a bad filecas file on disk we now attempt to remove it
* safer path from handle (#195)Dan Engelbrecht2024-10-161-31/+41
| | | * remove PathFromHandle that throws to give better context on failures
* remove gc v1 (#121)Dan Engelbrecht2024-10-031-170/+0
| | | | | * kill gc v1 * block use of gc v1 from zen command line * warn and flip to gcv2 if --gc-v2=false is specified for zenserver
* Add `gc-attachment-passes` option to zenserver (#167)Dan Engelbrecht2024-09-251-1/+1
| | | | | Added option `gc-attachment-passes` to zenserver Cleaned up GCv2 start and stop logs and added identifier to easily find matching start and end of a GC pass in log file Fixed project store not properly sorting references found during lock phase
* gc unused refactor (#165)Dan Engelbrecht2024-09-231-5/+9
| | | | | * optimize IoHash and OId comparisions * refactor filtering of unused references * add attachment filtering to gc
* move gc logs to gc logger (#142)Dan Engelbrecht2024-09-041-0/+6
| | | - Improvement: Move GC logging in callback functions into "gc" context
* close payload file if size mismatch for file cas (#128)Dan Engelbrecht2024-08-201-2/+2
|
* stop exceptions from leaking on threaded work (#102)Dan Engelbrecht2024-08-061-3/+13
| | | | * catch exceptions in threaded work * don't abort all project file/chunk info fetch for single failure
* fix get project files loop (#68)Dan Engelbrecht2024-04-301-7/+42
| | | | | - Bugfix: Remove extra loop causing GetProjectFiles for project store to find all chunks once for each chunk found - Bugfix: Don't capture ChunkIndex variable in CasImpl::IterateChunks by reference as it causes crash - Improvement: Make FileCasStrategy::IterateChunks (optionally) multithreaded (improves GetProjectFiles performance)
* use direct file access for large file hash (#63)Dan Engelbrecht2024-04-261-1/+1
| | | - Improvement: Refactor `IoHash::HashBuffer` and `BLAKE3::HashBuffer` to not use memory mapped files. Performs better and saves ~10% of oplog export time on CI
* iterate cas chunks (#59)Dan Engelbrecht2024-04-241-0/+31
| | | - Improvement: Reworked GetChunkInfos in oplog store to reduce disk thrashing and improve performance
* safer gcv2 on error (#60)Dan Engelbrecht2024-04-241-1/+3
| | | - Bugfix: Harden GCv2 when errors occur and gracefully abort GC operation on error
* improved assert (#37)Dan Engelbrecht2024-04-041-1/+1
| | | | - Improvement: Add file and line to ASSERT exceptions - Improvement: Catch call stack when throwing assert exceptions and log/output call stack at important places to provide more context to caller
* validate rpc chunk responses (#36)Dan Engelbrecht2024-04-031-4/+13
| | | * Validate size of found chunks in cas/cache
* fix potential partially written files (#2)Dan Engelbrecht2024-03-131-3/+12
| | | | * Make sure WriteFile() does not leave incomplete files * use TemporaryFile and MoveTemporaryIntoPlace to avoid leaving partial files on error
* Make sure we wait for all scheduled tasks to complete before throwing ↵Dan Engelbrecht2024-02-281-2/+6
| | | | | exceptions further (#662) Bugfix: We must not throw exceptions to calling function until all async work we spawned has returned
* improved block store logging and more gcv2 tests (#659)Dan Engelbrecht2024-02-271-36/+53
| | | | * improved gc/blockstore logging * more gcv2 tests
* Add retry with optional resume logic to HttpClient::Download (#639)Dan Engelbrecht2024-01-241-0/+2
| | | | | | | - Improvement: Refactored Jupiter upstream to use HttpClient - Improvement: Added retry and resume logic to HttpClient - Improvement: Added authentication support to HttpClient - Improvement: Clearer logging in GCV2 compact of FileCas/BlockStore - Improvement: Size details in oplog import logging
* improved scrubbing of oplogs and filecas (#596)Stefan Boberg2023-12-111-16/+76
| | | | | | - Improvement: Scrub command now validates compressed buffer hashes in filecas storage (used for large chunks) - Improvement: Added --dry, --no-gc and --no-cas options to zen scrub command - Improvement: Implemented oplog scrubbing (previously was a no-op) - Improvement: Implemented support for running scrubbint at startup with --scrub=<options>
* reserve vectors in gcv2 upfront / load factor for robin_map (#582)Dan Engelbrecht2023-12-041-0/+6
| | | | | * reserve vectors in gcv2 upfront * set max load factor for robin_map indexes to reduce memory usage * set min load factor for robin_map indexes to allow them to shrink
* tracing for gcv2 (#574)Dan Engelbrecht2023-11-281-0/+6
| | | | | | - Improvement: Added more trace scopes for GCv2 - Bugfix: Make sure we can override flags to "false" when running `zen gc` commmand - `smallobjects`, `skipcid`, `skipdelete`, `verbose`
* fix missing locks/sync of log position when writing index snapshots (#572)Dan Engelbrecht2023-11-271-2/+5
| | | | * fix missing locks/sync of log position when writing index snapshots * changelog
* Add GC Cancel/Stop (#568)Dan Engelbrecht2023-11-241-2/+20
| | | | - GcScheduler will now cancel any running GC when it shuts down. - Old GC is rather limited in *when* it reacts to cancel of GC. GCv2 is more responsive.
* add command line options for compact block threshold and gc verbose (#557)Dan Engelbrecht2023-11-211-1/+1
| | | | | | | | | | | - Feature: Added new options to zenserver for GC V2 - `--gc-compactblock-threshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--gc-verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new options to `zen gc` command for GC V2 - `--compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new parameters for endpoint `admin/gc` (PUT) - `compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `verbose` GCV2 - enable more verbose output when running a GC pass
* compact separate for gc referencer (#533)Dan Engelbrecht2023-11-211-52/+61
| | | | | - Refactor GCV2 so GcReferencer::RemoveExpiredData returns a store compactor, moving out the actual disk work from deleting items in the index. - Refactor GCV2 GcResult to reuse GcCompactStoreStats and GcStats - Make Compacting of stores non-parallell to not eat all the disk I/O when running GC
* gc v2 tests (#512)Dan Engelbrecht2023-11-061-1/+1
| | | | | | | | | | * set MaxBlockCount at init * properly calculate total size * basic blockstore compact blocks test * correct detection of block swap * Use one implementation for CreateRandomBlob * reduce some data sets to increase speed of tests * reduce test time * rename BlockStoreCompactState::AddBlock -> BlockStoreCompactState::IncludeBlock
* individual gc stats (#506)Dan Engelbrecht2023-10-301-36/+63
| | | | | - Feature: New parameter for endpoint `admin/gc` (GET) `details=true` which gives details stats on GC operation when using GC V2 - Feature: New options for zen command `gc-status` - `--details` that enables the detailed output from the last GC operation when using GC V2
* New GC implementation (#459)Dan Engelbrecht2023-10-301-1/+166
| | | - Feature: New garbage collection implementation, still in evaluation mode. Enabled by `--gc-v2` command line option
* added missing includes (#504)Stefan Boberg2023-10-271-0/+4
| | | | | this change adds some includes to files which "inherit" includes from elsewhere this was exposed on another branch when removing some heavy dependencies from central headers