aboutsummaryrefslogtreecommitdiff
path: root/src/zenstore/compactcas.cpp
Commit message (Collapse)AuthorAgeFilesLines
* improved assert (#37)Dan Engelbrecht2024-04-041-2/+2
| | | | - Improvement: Add file and line to ASSERT exceptions - Improvement: Catch call stack when throwing assert exceptions and log/output call stack at important places to provide more context to caller
* fix potential partially written files (#2)Dan Engelbrecht2024-03-131-3/+12
| | | | * Make sure WriteFile() does not leave incomplete files * use TemporaryFile and MoveTemporaryIntoPlace to avoid leaving partial files on error
* Make sure we wait for all scheduled tasks to complete before throwing ↵Dan Engelbrecht2024-02-281-2/+6
| | | | | exceptions further (#662) Bugfix: We must not throw exceptions to calling function until all async work we spawned has returned
* improved block store logging and more gcv2 tests (#659)Dan Engelbrecht2024-02-271-50/+96
| | | | * improved gc/blockstore logging * more gcv2 tests
* fix ChunkIndexToChunkHash indexing (#621)Stefan Boberg2023-12-191-1/+1
| | | would previously index into a reserved-but-not-sized vector which is bad but not crash-inducing bad
* improve trace (#606)Dan Engelbrecht2023-12-131-0/+1
| | | | | * Adding some more trace scopes for better visiblity * Removed spammy trace scope when replaying oplogs * Remove "::Disk" from trace scopes - redundant now that we have merge disk and memory layers
* improved scrubbing of oplogs and filecas (#596)Stefan Boberg2023-12-111-26/+12
| | | | | | - Improvement: Scrub command now validates compressed buffer hashes in filecas storage (used for large chunks) - Improvement: Added --dry, --no-gc and --no-cas options to zen scrub command - Improvement: Implemented oplog scrubbing (previously was a no-op) - Improvement: Implemented support for running scrubbint at startup with --scrub=<options>
* fixed bug in CasContainerStrategy::ReadIndexFile (#595)Stefan Boberg2023-12-071-1/+3
| | | this was introduced in a recent optimization and would cause CAS items to not be found after a shutdown/restart cycle
* reserve vectors in gcv2 upfront / load factor for robin_map (#582)Dan Engelbrecht2023-12-041-6/+15
| | | | | * reserve vectors in gcv2 upfront * set max load factor for robin_map indexes to reduce memory usage * set min load factor for robin_map indexes to allow them to shrink
* use 32 bit offset and size in BlockStoreLocation (#581)Dan Engelbrecht2023-12-011-1/+1
| | | - Improvement: Reduce memory usage in GC and diskbucket flush
* tracing for gcv2 (#574)Dan Engelbrecht2023-11-281-0/+6
| | | | | | - Improvement: Added more trace scopes for GCv2 - Bugfix: Make sure we can override flags to "false" when running `zen gc` commmand - `smallobjects`, `skipcid`, `skipdelete`, `verbose`
* fix missing locks/sync of log position when writing index snapshots (#572)Dan Engelbrecht2023-11-271-2/+4
| | | | * fix missing locks/sync of log position when writing index snapshots * changelog
* optimized index snapshot reading/writing (#561)Stefan Boberg2023-11-271-18/+35
| | | | | the previous implementation of in-memory index snapshots serialise data to memory before writing to disk and vice versa when reading. This leads to some memory spikes which end up pushing useful data out of system cache and also cause stalls on I/O operations. this change moves more code to a streaming serialisation approach which scales better from a memory usage perspective and also performs much better
* Add GC Cancel/Stop (#568)Dan Engelbrecht2023-11-241-0/+19
| | | | - GcScheduler will now cancel any running GC when it shuts down. - Old GC is rather limited in *when* it reacts to cancel of GC. GCv2 is more responsive.
* reduce work when there are no blocks to compact (#558)Dan Engelbrecht2023-11-221-51/+54
| | | | * reduce work when there are no blocks to compact * fix lock scopes
* add command line options for compact block threshold and gc verbose (#557)Dan Engelbrecht2023-11-211-7/+13
| | | | | | | | | | | - Feature: Added new options to zenserver for GC V2 - `--gc-compactblock-threshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--gc-verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new options to `zen gc` command for GC V2 - `--compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new parameters for endpoint `admin/gc` (PUT) - `compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `verbose` GCV2 - enable more verbose output when running a GC pass
* compact separate for gc referencer (#533)Dan Engelbrecht2023-11-211-113/+106
| | | | | - Refactor GCV2 so GcReferencer::RemoveExpiredData returns a store compactor, moving out the actual disk work from deleting items in the index. - Refactor GCV2 GcResult to reuse GcCompactStoreStats and GcStats - Make Compacting of stores non-parallell to not eat all the disk I/O when running GC
* disk layer gc and error/warnings cleanup (#515)Dan Engelbrecht2023-11-081-2/+2
| | | | | | | - Improvement: Use GC reserve when writing index/manifest for a disk cache bucket when disk is low when available - Improvement: Demote errors to warning for issues that are not critical and we handle gracefully - Improvement: Treat more out of memory errors from windows as Out Of Memory errors Fixed wrong sizeof() statement for compactcas index (luckily the two structs are of same size)
* gc v2 tests (#512)Dan Engelbrecht2023-11-061-27/+9
| | | | | | | | | | * set MaxBlockCount at init * properly calculate total size * basic blockstore compact blocks test * correct detection of block swap * Use one implementation for CreateRandomBlob * reduce some data sets to increase speed of tests * reduce test time * rename BlockStoreCompactState::AddBlock -> BlockStoreCompactState::IncludeBlock
* individual gc stats (#506)Dan Engelbrecht2023-10-301-33/+56
| | | | | - Feature: New parameter for endpoint `admin/gc` (GET) `details=true` which gives details stats on GC operation when using GC V2 - Feature: New options for zen command `gc-status` - `--details` that enables the detailed output from the last GC operation when using GC V2
* New GC implementation (#459)Dan Engelbrecht2023-10-301-0/+217
| | | - Feature: New garbage collection implementation, still in evaluation mode. Enabled by `--gc-v2` command line option
* Remove any unreferenced blocks in block store on open (#492)Dan Engelbrecht2023-10-231-1/+1
| | | * Remove any unreferenced blocks in block store on open
* Don't prune block locations due to missing blocks a startup (#487)Dan Engelbrecht2023-10-201-46/+5
| | | | | | * Don't prune block locations due to missing blocks a startup This makes the behaviour consistent with FileCas - you can have an index that is not fully backed by data. Asking for a location that is not backed by data results in getting an empty result back Also, don't try to GC blocks that are unknown to the block store at the time of snapshot (to avoid removing data that comes in after GatherReferences in GC)
* clean up GcContributor and GcStorage to be pure interfaces (#485)Dan Engelbrecht2023-10-201-1/+3
|
* Add --skip-delete option to gc command (#484)Dan Engelbrecht2023-10-201-1/+1
| | | | - Feature: Add `--skip-delete` option to gc command - Bugfix: Fix implementation when claiming GC reserve during GC
* Fix curruption of disk cache bucket index on GC (#448)Dan Engelbrecht2023-10-051-26/+32
| | | | | | | | | * make sure we hold the index lock when reading payload data in reclaim space * don't use index snapshot when updating index in reclaim space * check that things have not moved under our feet * don't touch m_Payloads without a lock * start write block index on the highest block index * we don't need to bump writeblockindex when stopping write to a block, we will bump appropriately when we start a new block * changelog
* refactor comapactcas index (#443)Dan Engelbrecht2023-10-041-16/+57
| | | | | - Bugfix: Fix scrub messing up payload and access time in disk cache bucket when compacting index - Improvement: Split up disk cache bucket index into hash lookup and payload array to improve performance - Improvement: Reserve space up front for compact binary output when saving cache bucket manifest to improve performance
* faster accesstime save restore (#439)Dan Engelbrecht2023-10-031-79/+66
| | | | | | | | | | - Improvement: Reduce time a cache bucket is locked for write when flushing/garbage collecting - Change format for faster read/write and reduced size on disk - Don't lock index while writing manifest to disk - Skip garbage collect if we are currently in a Flush operation - BlockStore::Flush no longer terminates currently writing block - Garbage collect references to currently writing block but keep the block as new data may be added - Fix BlockStore::Prune used disk space calculation - Don't materialize data in filecas when we just need the size
* Handle OOM and OOD more gracefully to not spam Sentry with error reports (#434)Dan Engelbrecht2023-10-021-8/+23
| | | | | | - Improvement: Catch Out Of Memory and Out Of Disk exceptions and report back to reqeuster without reporting an error to Sentry - Improvement: If creating bucket fails when storing and item in the structured cache, log a warning and propagate error to requester without reporting an error to Sentry - Improvement: Make an explicit flush of the active block written to in blockstore flush - Improvement: Make sure cache and cas MakeIndexSnapshot does not throw exception on failure which would cause and abnormal termniation at exit
* lightweight gc (#431)Dan Engelbrecht2023-10-021-0/+5
| | | | | | - Feature: Add lightweight GC that only removes items from cache/project store without cleaning up data referenced in Cid store - Add `skipcid` parameter to http endpoint `admin/gc`, defaults to "false" - Add `--skipcid` option to `zen gc` command, defaults to false - Add `--gc-lightweight-interval-seconds` option to zenserver
* add more trace scopes (#362)Dan Engelbrecht2023-09-151-13/+21
| | | | | * more trace scopes * Make sure ReplayLogEntries uses the correct size for oplog buffer * changelog
* use robinmap in compact cas (#368)Dan Engelbrecht2023-08-211-0/+1
| | | | | * Use robin-map in compactcas for 30% faster CasContainerStrategy::CollectGarbage * use robin_set in ProjectStore::Oplog::GatherReferences and BlockStore::ReclaimSpace * changelog
* Content scrubbing (#271)Stefan Boberg2023-05-161-66/+73
| | | Added zen scrub command which may be triggered via the zen CLI helper. This traverses storage and validates contents either by content hash and/or by structure. If unexpected data is encountered it is invalidated.
* Additional trace instrumentation (#312)Stefan Boberg2023-05-161-0/+13
| | | | | | | | | * added trace instrumentation to upstreamcache * added asio trace instrumentation * added trace annotations for project store * added trace annotations for BlockStore * added trace annotations for HttpClient * added trace annotations for CAS/GC
* Add `--gc-projectstore-duration-seconds` option (#281)Dan Engelbrecht2023-05-161-11/+18
| | | | | | * Add `--gc-projectstore-duration-seconds` option * Cleanup lua gc options parsing * Remove dead configuration values * changelog
* added ScrubStorage to GcStorage base classStefan Boberg2023-05-151-3/+13
|
* minor GC API cleanupStefan Boberg2023-05-151-1/+1
| | | | | Scrub -> ScrubStorage Trigger -> TriggerGc (to make relationship to TriggerScrub clearer)
* WARN level log if we can't write snapshot/manifest/access times (#288)Dan Engelbrecht2023-05-111-1/+1
|
* clean up log/index reading and fix incorrect logging about bad log files (#286)Dan Engelbrecht2023-05-101-82/+96
|
* Validate that entries points inside valid blocks at startup (#280)Dan Engelbrecht2023-05-091-5/+42
| | | | | * Separate initialization of block store from pruning of unknown blocks * Validate that entries points inside valid blocks
* moved source directories into `/src` (#264)Stefan Boberg2023-05-021-0/+1511
* moved source directories into `/src` * updated bundle.lua for new `src` path * moved some docs, icon * removed old test trees