aboutsummaryrefslogtreecommitdiff
path: root/src/zenstore/gc.cpp
Commit message (Collapse)AuthorAgeFilesLines
* GC - fix handling of attachment ranges, http access token expiration, lock ↵Stefan Boberg2026-02-201-3/+4
| | | | | | | | file retry logic (#766) * GC - fix handling of attachment ranges * fix trace/log strings * fix HTTP access token expiration time logic * added missing lock retry in zenserver startup
* added early-out check in GcManager::ScrubStorage(ScrubContext& GcCtx) (#698)Stefan Boberg2026-01-071-1/+7
| | | | | minimises time spent doing setup work after the deadline has expired also added log output with deadline/timeout information
* add otel instrumentation (#581)Stefan Boberg2025-12-111-1/+5
| | | | | | | | this change adds OTEL tracing to a few places * Top-level application lifecycle (config/init/cleanup, main loop) * http.sys requests it also brings some otlptrace optimizations and dynamic configuration of tracing. OTLP tracing is currently always disabled
* automatic scrub on startup (#667)Dan Engelbrecht2025-11-271-108/+133
| | | | | - Improvement: Deeper validation of data when scrub is activated (cas/cache/project) - Improvement: Enabled more multi threading when running scrub operations - Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
* switch to xmake for package management (#611)Stefan Boberg2025-11-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | This change removes our dependency on vcpkg for package management, in favour of bringing some code in-tree in the `thirdparty` folder as well as using the xmake build-in package management feature. For the latter, all the package definitions are maintained in the zen repo itself, in the `repo` folder. It should now also be easier to build the project as it will no longer depend on having the right version of vcpkg installed, which has been a common problem for new people coming in to the codebase. Now you should only need xmake to build. * Bumps xmake requirement on github runners to 2.9.9 to resolve an issue where xmake on Windows invokes cmake with `v144` toolchain which does not exist * BLAKE3 is now in-tree at `thirdparty/blake3` * cpr is now in-tree at `thirdparty/cpr` * cxxopts is now in-tree at `thirdparty/cxxopts` * fmt is now in-tree at `thirdparty/fmt` * robin-map is now in-tree at `thirdparty/robin-map` * ryml is now in-tree at `thirdparty/ryml` * sol2 is now in-tree at `thirdparty/sol2` * spdlog is now in-tree at `thirdparty/spdlog` * utfcpp is now in-tree at `thirdparty/utfcpp` * xmake package repo definitions is in `repo` * implemented support for sanitizers. ASAN is supported on windows, TSAN, UBSAN, MSAN etc are supported on Linux/MacOS though I have not yet tested it extensively on MacOS * the zencore encryption implementation also now supports using mbedTLS which is used on MacOS, though for now we still use openssl on Linux * crashpad * bumps libcurl to 8.11.0 (from 8.8.0) which should address a rare build upload bug
* optimize blockstore flush (#614)Dan Engelbrecht2025-10-271-1/+1
| | | | | * rework block store block flushing to only happen once at end of block write outside of locks * fix warning at startup if no gc.dlog file exists
* fix gc disk load graph (#610)Dan Engelbrecht2025-10-241-3/+3
| | | * make sure our gc disk load graph includes the latest measurement value
* gracefully handle broken gc dlog (#606)Dan Engelbrecht2025-10-241-0/+8
| | | * if gc.dlog is corrupt, remove and restart a new log
* if we are low on disk space, only run GC if it will remove any data (#603)Dan Engelbrecht2025-10-231-89/+157
| | | | * if we are low on disk space, only run GC if it will remove any data * make sure we don't treat bail of GC due to disk space as success causing 0 wait between GC passes
* remove scope in GC that prevented GC from executing (#600)Dan Engelbrecht2025-10-221-30/+31
|
* fix gc state switching (#588)Dan Engelbrecht2025-10-171-40/+38
| | | * fix state issue in GC thread where shutting down gc did not always block gc from running
* some bug fixes (#522)Stefan Boberg2025-09-291-1/+2
| | | | | | * fix for invalid regex in HttpBuildStoreService - triggers with most recent MSVC version * in GcScheduler don't wait for exit signal if exit has already been requested. this caused extended waits for shutdown in some automated tests on very fast machines, possibly also due to some behaviour change in condition_variable * speculative fix/workaround for issue with TLS teardown on secondary thread while main was tearing down trace
* issue error on second retry, not first attempt (#503)Dan Engelbrecht2025-09-221-2/+2
|
* revise exception vs error (#495)Dan Engelbrecht2025-09-151-16/+16
| | | | | - Change BadAlloc exceptions in GC to warnings - Add explict ASSERT exception catch in http plugin request processing - Make exceptions handled in http request processing to warnings
* add EMode to WorkerTheadPool to avoid thread starvation (#492)Dan Engelbrecht2025-09-101-90/+98
| | | - Improvement: Add a new mode to worker thread pools to avoid starvation of workers which could cause long stalls due to other work begin queued up. UE-305498
* oplog memory usage reduction (#482)Dan Engelbrecht2025-09-041-0/+2
| | | | | | - Improvement: For projectstore oplog GET operation, only read basic information and release it if the oplog is not already open to reduce memory usage when listing oplogs in web UI - Improvement: Reduce memory usage for oplog Op address lookup Refactored Oplog::EState -> Oplog ::EMode and make sure we open data files in read-only mode when EMode::kBasicReadOnly is used.
* clean up trace options parsing (#473)Dan Engelbrecht2025-08-221-0/+2
| | | | | * clean up trace command line options explicitly shut down worker pools * some additional startup trace scopes
* frequent disk space check (#407)Dan Engelbrecht2025-05-271-25/+84
| | | | * check low disk space condition more frequently and trigger GC when low water mark is reached * show waited time when waiting for zenserver instance to exit
* make RemoveExpiredData and PreCache serial to reduce CPU overhead / lock ↵Dan Engelbrecht2025-05-071-85/+61
| | | | | contention (#385) * make RemoveExpiredData and PreCache serial to reduce CPU overhead / lock contention
* make OOD and OOM in gc non critical (#381)Dan Engelbrecht2025-05-051-27/+218
| | | | * oom and ood exceptions in GC are now treated as warnings instead of errors
* silence Out Of Disk errors to sentry (#378)Dan Engelbrecht2025-05-051-48/+56
| | | | | * block writing GC state/info if disk is full * fix if/else on error while writing gc state
* long filename support (#330)Dan Engelbrecht2025-03-311-10/+10
| | | - Bugfix: Long file paths now works correctly on Windows
* zen build cache service (#318)Dan Engelbrecht2025-03-261-2/+19
| | | | | | | | | - **EXPERIMENTAL** `zen builds` - Feature: `--zen-cache-host` option for `upload` and `download` operations to use a zenserver host `/builds` endpoint for storing build blob and blob metadata - Feature: New `/builds` endpoint for caching build blobs and blob metadata - `/builds/{namespace}/{bucket}/{buildid}/blobs/{hash}` `GET` and `PUT` method for storing and fetching blobs - `/builds/{namespace}/{bucket}/{buildid}/blobs/putBlobMetadata` `POST` method for storing metadata about blobs - `/builds/{namespace}/{bucket}/{buildid}/blobs/getBlobMetadata` `POST` method for fetching metadata about blobs - `/builds/{namespace}/{bucket}/{buildid}/blobs/exists` `POST` method for checking existance of blobs
* Suppress progress report callback if oplog import detects zero op oplog (#271)Dan Engelbrecht2025-01-131-6/+16
| | | | * Suppress progress report callback if oplog import detects oplog with zero ops * output error code when catching system errors
* more memory tagging and fixes (#263)Stefan Boberg2024-12-161-0/+16
| | | This change adds more instrumentation for memory tracking, so that as little as possible comes through as Unknown in Insights analysis.
* ODR violation fixStefan Boberg2024-12-031-2/+2
|
* added support for dynamic LLM tags (#245)Stefan Boberg2024-12-021-0/+30
| | | | | * added FLLMTag which can be used to register memory tags outside of core * changed `UE_MEMSCOPE` -> `ZEN_MEMSCOPE` for consistency * instrumented some subsystems with dynamic tags
* add missing projectstore expire time in gc log (#227)Dan Engelbrecht2024-11-251-0/+1
|
* oplog prep gc fix (#216)Dan Engelbrecht2024-11-151-185/+415
| | | | | | - Added option gc-validation to zenserver that does a check for missing references in all oplog post full GC. Enabled by default. - Feature: Added option gc-validation to zen gc command to control reference validation. Enabled by default. - Added more details in post GC log. - Fixed race condition in oplog writes which could cause used attachments to be incorrectly removed by GC
* Use a smaller thread pool during pre-cache phase of GC to reduce memory ↵Dan Engelbrecht2024-10-221-7/+11
| | | | pressure (#205)
* remove gc v1 (#121)Dan Engelbrecht2024-10-031-506/+19
| | | | | * kill gc v1 * block use of gc v1 from zen command line * warn and flip to gcv2 if --gc-v2=false is specified for zenserver
* Porject -> ProjectStefan Boberg2024-10-021-1/+1
|
* optimize gc reference sort (#179)Dan Engelbrecht2024-10-011-33/+37
| | | | | - Do a single call to mempcy when fetching attachments from the meta store in GC - Use small lambda when calling std::sort in FilterReferences (enables inlining of the comparision function) - Use a single function for < and == comparision in KeepUnusedReferences
* use alternate IoHash comparision function (#177)v5.5.8-pre5Dan Engelbrecht2024-09-301-4/+23
| | | * Use alternate IoHash comparision function - reduces KeepUnusedReferences execution time by ~20%
* gc command attachment options (#176)Dan Engelbrecht2024-09-301-7/+30
| | | * zen command - add options to control meta data cache when triggering gc
* Add `gc-attachment-passes` option to zenserver (#167)Dan Engelbrecht2024-09-251-36/+150
| | | | | Added option `gc-attachment-passes` to zenserver Cleaned up GCv2 start and stop logs and added identifier to easily find matching start and end of a GC pass in log file Fixed project store not properly sorting references found during lock phase
* gc unused refactor (#165)Dan Engelbrecht2024-09-231-16/+242
| | | | | * optimize IoHash and OId comparisions * refactor filtering of unused references * add attachment filtering to gc
* move gc logs to gc logger (#142)Dan Engelbrecht2024-09-041-1/+1
| | | - Improvement: Move GC logging in callback functions into "gc" context
* separate worker pools into burst/background to avoid background jobs ↵Dan Engelbrecht2024-08-221-2/+2
| | | | blocking client requests (#134)
* if disk space is low, set the last gc time to avoid spamming retries (#124)Dan Engelbrecht2024-08-191-0/+2
| | | * if disk space is low, set the last gc time to avoid spamming retries
* improved logging removing unimportant information (#116)Dan Engelbrecht2024-08-141-24/+33
|
* hardening and reduced spam from GC on failure (#112)Dan Engelbrecht2024-08-141-141/+229
| | | | * Retry writing GC state if it fails to handle transient problems * If GC operation fails demote errors to warnings on consecutive fails
* add gc single threaded option (#104)Dan Engelbrecht2024-08-071-4/+9
| | | * add option to force gcv2 to run single threaded
* Make sure we monitor for new project, oplogs, namespaces and buckets during ↵Dan Engelbrecht2024-06-131-39/+85
| | | | | | GCv2 (#93) - Bugfix: Make sure we monitor and include new project/oplogs created during GCv2 - Bugfix: Make sure we monitor and include new namespaces/cache buckets created during GCv2
* use write and move in place for safer writing of files (#70)Dan Engelbrecht2024-05-021-1/+1
| | | * use write and move in place for safer writing of files
* safer gcv2 on error (#60)Dan Engelbrecht2024-04-241-1/+19
| | | - Bugfix: Harden GCv2 when errors occur and gracefully abort GC operation on error
* improved assert (#37)Dan Engelbrecht2024-04-041-16/+16
| | | | - Improvement: Add file and line to ASSERT exceptions - Improvement: Catch call stack when throwing assert exceptions and log/output call stack at important places to provide more context to caller
* Use multithreading to fetch size/rawsize of entries in ↵Dan Engelbrecht2024-03-281-2/+2
| | | | | | `/prj/{project}/oplog/{log}/chunkinfos` and `/prj/{project}/oplog/{log}/files` (#30) - Improvement: Use multithreading to fetch size/rawsize of entries in `/prj/{project}/oplog/{log}/chunkinfos` and `/prj/{project}/oplog/{log}/files` - Improvement: Add `GetMediumWorkerPool()` in addition to `LargeWorkerPool()` and `SmallWorkerPool()`
* Make sure we wait for all scheduled tasks to complete before throwing ↵Dan Engelbrecht2024-02-281-48/+88
| | | | | exceptions further (#662) Bugfix: We must not throw exceptions to calling function until all async work we spawned has returned
* Don't capture local variables in loop by reference (#623)Dan Engelbrecht2023-12-191-27/+27
| | | * Don't capture local variables in loop by reference