aboutsummaryrefslogtreecommitdiff
path: root/src/zenserver/projectstore/projectstore.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* fix iterate chunks crash (#86)Dan Engelbrecht2024-05-271-0/+1
| | | * fix worklatch count in Oplog::IterateChunks
* fix zero size attachment replies (#69)Dan Engelbrecht2024-05-021-3/+10
| | | | - Bugfix: Don't try to respond with zero size partial cache value when partial size is zero - Improvement: Added more validation of data read from cache / cas
* use write and move in place for safer writing of files (#70)Dan Engelbrecht2024-05-021-3/+3
| | | * use write and move in place for safer writing of files
* fix get project files loop (#68)Dan Engelbrecht2024-04-301-23/+20
| | | | | - Bugfix: Remove extra loop causing GetProjectFiles for project store to find all chunks once for each chunk found - Bugfix: Don't capture ChunkIndex variable in CasImpl::IterateChunks by reference as it causes crash - Improvement: Make FileCasStrategy::IterateChunks (optionally) multithreaded (improves GetProjectFiles performance)
* oplog iterate chunks content type (#65)Dan Engelbrecht2024-04-261-40/+112
| | | | - Bugfix: Properly set content type of chunks fetch from CidStore - Improvement: Add IterateChunks(std::span<Oid>) for better performance in get oplog
* fix oplog import during gcv2 (#62)v5.5.0-pre3v5.5.0-pre2Dan Engelbrecht2024-04-251-39/+44
| | | | - Bugfix: Always pre-cache oplog when creating project store GCv2 referencer - Bugfix: Correctly capture attachments imported with oplog to void them being GCd before oplog is written
* iterate cas chunks (#59)Dan Engelbrecht2024-04-241-28/+27
| | | - Improvement: Reworked GetChunkInfos in oplog store to reduce disk thrashing and improve performance
* safer gcv2 on error (#60)Dan Engelbrecht2024-04-241-4/+13
| | | - Bugfix: Harden GCv2 when errors occur and gracefully abort GC operation on error
* Bugfix: Only disable oplog update capture if we have started it (#58)Dan Engelbrecht2024-04-241-1/+6
|
* InsertChunks for CAS store (#55)Dan Engelbrecht2024-04-221-15/+44
| | | - Improvement: Add batching when writing multiple small chunks to block store - decreases I/O load significantly on oplog import
* import oplog improvements (#54)Dan Engelbrecht2024-04-201-32/+1
| | | | | | | | | | | * report down/up transfer speed during progress * add disk buffering in http client * offload block decoding and chunk writing form network worker pool threads add block hash verification for blocks recevied at oplog import * separate download-latch from write-latch to get more accurate download speed * check headers when downloading with http client to go directly to file writing for large payloads * we must clear write callback even if we only provide it as an argument to the Download() call * make timeout optional in AddSponsorProcess * check return codes when creating windows threadpool
* safer oplog import (#52)de/safer-oplog-importDan Engelbrecht2024-04-181-8/+71
| | | | * reference cache gc update capture * When importing oplogs we now import all attachments first and (optionally clean) write the oplog on success
* remote project store stats (#44)Dan Engelbrecht2024-04-101-0/+1
| | | | | * add remote oplog store statistics * block chunking when uploading oplog to zenserver (mirroring) * make sure we can move temporary dechunked file into cas store
* improved assert (#37)Dan Engelbrecht2024-04-041-5/+5
| | | | - Improvement: Add file and line to ASSERT exceptions - Improvement: Catch call stack when throwing assert exceptions and log/output call stack at important places to provide more context to caller
* zenremoteprojectstore with httpclient (#35)Dan Engelbrecht2024-04-031-1/+1
| | | | | | - Bugfix: Fix log of Success/Failure for oplog import - Improvement: Use HttpClient when doing oplog export/import with a zenserver as a remote target. Includes retry logic - Improvement: Increase the retry count to 4 (5 attempts in total) when talking to Jupiter for oplog export/import
* Use multithreading to fetch size/rawsize of entries in ↵Dan Engelbrecht2024-03-281-58/+161
| | | | | | `/prj/{project}/oplog/{log}/chunkinfos` and `/prj/{project}/oplog/{log}/files` (#30) - Improvement: Use multithreading to fetch size/rawsize of entries in `/prj/{project}/oplog/{log}/chunkinfos` and `/prj/{project}/oplog/{log}/files` - Improvement: Add `GetMediumWorkerPool()` in addition to `LargeWorkerPool()` and `SmallWorkerPool()`
* add "fieldnames" query param for GetProjectFiles/GetProjectChunkInfos (#29)Dan Engelbrecht2024-03-281-13/+84
| | | | | | | | - Improvement: It is now possible to control which fields to include in `/prj/{project}/oplog/{log}/chunkinfos` request by adding a comma delimited list of filed names for `fieldnames` parameter - Default fields are: `id`, `rawhash` and `rawsize` (translates to `?fieldnames=id,rawhash,rawsize`) - Use `?fieldnames=*` to get all the fields - Improvement: It is now possible to control which fields to include in `/prj/{project}/oplog/{log}/files` request by adding a comma delimited list of filed names for `fieldnames` parameter - Default fields are: `id`, `clientpath` and `serverpath` (translates to `?fieldnames=id,clientpath,serverpath`), `filter=client` only applies if `fieldnames` is not given as a parameter - Use `?fieldnames=*` to get all the fields
* Get raw size for compressed chunks correctly for ↵Dan Engelbrecht2024-03-271-1/+7
| | | | `/prj/{project}/oplog/{log}/chunkinfos` (#27)
* consistent paths encoding (#24)Dan Engelbrecht2024-03-251-21/+21
| | | | * Don't encode filesystem path to UTF8 unless stored in compactbinary string * Be consistent where we encode/decode paths to UTF8
* non memory copy compressed range (#13)Dan Engelbrecht2024-03-201-26/+30
| | | | | * Add CompressedBuffer::GetRange that references source data rather than make a memory copy * Use Compressed.CopyRange in project store GetChunkRange * docs for CompressedBuffer::CopyRange and CompressedBuffer::GetRange
* special treatment large oplog attachments v2 (#5)Dan Engelbrecht2024-03-141-30/+225
| | | | | - Bugfix: Install Ctrl+C handler earlier when doing `zen oplog-export` and `zen oplog-export` to properly cancel jobs - Improvement: Add ability to block a set of CAS entries from GC in project store - Improvement: Large attachments and loose files are now split into smaller chunks and stored in blocks during oplog export
* fix potential partially written files (#2)Dan Engelbrecht2024-03-131-8/+2
| | | | * Make sure WriteFile() does not leave incomplete files * use TemporaryFile and MoveTemporaryIntoPlace to avoid leaving partial files on error
* Make sure we wait for all scheduled tasks to complete before throwing ↵Dan Engelbrecht2024-02-281-1/+3
| | | | | exceptions further (#662) Bugfix: We must not throw exceptions to calling function until all async work we spawned has returned
* Keep track of added ops during GCV2 instead of rescanning full oplog when ↵Dan Engelbrecht2024-02-131-8/+23
| | | | added ops are detected (#652)
* compress large attachments on demand (#647)Dan Engelbrecht2024-02-051-32/+151
| | | | | | | - Improvement: Speed up oplog export by fetching/compressing big attachments on demand - Improvement: Speed up oplog export by batch-fetcing small attachments - Improvement: Speed up oplog import by batching writes of oplog ops - Improvement: Tweak oplog export default block size and embed size limit - Improvement: Add more messaging and progress during oplog import/export
* respond with BadRequest result instead of throwing exception on bad request ↵Dan Engelbrecht2024-02-051-2/+12
| | | | input (#648)
* improve oplog export logging (#644)Dan Engelbrecht2024-01-311-4/+0
| | | | | | - Improvement: More details in oplog import/export logs - Improvement: Switch from Download to Get when fetching Refs from Jupiter as they can't be resumed anyway and streaming to disk is redundant - Bugfix: Make sure we clear read callback when doing Put in HttpClient to avoid timeout due to not sending data when reusing sessions - Bugfix: Respect `--ignore-missing-attachments` in `oplog-export` command when loose file is missing on disk
* add ignore-missing-attachments option to oplog export (debugging tool) (#641)Dan Engelbrecht2024-01-251-14/+12
| | | | | | | * add ignore-missing-attachments option to oplog export (debugging tool) * add more status codes to do retry for in http client * add missing X-Jupiter-IoHash header for jupiter PutRef * reduce oplog block size to reduce amount of redundant chunks to download * improved logging
* Add retry with optional resume logic to HttpClient::Download (#639)Dan Engelbrecht2024-01-241-4/+8
| | | | | | | - Improvement: Refactored Jupiter upstream to use HttpClient - Improvement: Added retry and resume logic to HttpClient - Improvement: Added authentication support to HttpClient - Improvement: Clearer logging in GCV2 compact of FileCas/BlockStore - Improvement: Size details in oplog import logging
* oplog import/export improvements (#634)Dan Engelbrecht2024-01-231-59/+57
| | | | * improve feedback from oplog import/export * improve oplog save performance
* add --ignore-missing-attachments to oplog-import command (#637)Dan Engelbrecht2024-01-221-15/+23
|
* separate RPC processing from HTTP processing (#626)Stefan Boberg2023-12-201-1/+1
| | | | | | * moved all RPC processing from HttpStructuredCacheService into separate CacheRpcHandler class in zenstore * move package marshaling to zenutil. was previously in zenhttp/httpshared but it's useful in other contexts as well where we don't want to depend on zenhttp * introduced UpstreamCacheClient, this provides a subset of functions on UpstreamCache and lives in zenstore
* ensure we can build without trace (#619)Stefan Boberg2023-12-191-2/+2
| | | | `xmake config -zentrace=n` would previously not build cleanly
* improve trace (#606)Dan Engelbrecht2023-12-131-2/+0
| | | | | * Adding some more trace scopes for better visiblity * Removed spammy trace scope when replaying oplogs * Remove "::Disk" from trace scopes - redundant now that we have merge disk and memory layers
* improved scrubbing of oplogs and filecas (#596)Stefan Boberg2023-12-111-73/+188
| | | | | | - Improvement: Scrub command now validates compressed buffer hashes in filecas storage (used for large chunks) - Improvement: Added --dry, --no-gc and --no-cas options to zen scrub command - Improvement: Implemented oplog scrubbing (previously was a no-op) - Improvement: Implemented support for running scrubbint at startup with --scrub=<options>
* Change naming to ChunkInfos instead of Chunkszousar2023-12-061-3/+3
|
* Ran precommitzousar2023-12-051-6/+2
|
* Get hash when retrieving chunkszousar2023-12-051-7/+31
| | | | Also changes the returned fields for each chunk from size->rawsize. Backwards compatibility is not a concern as this was unused in past zenserver releases.
* Add endpoint for all chunk infoszousar2023-12-011-8/+4
| | | | Add endpoint for querying all chunk infos in an oplog.
* add separate PreCache step for GcReferenceChecker (#578)Dan Engelbrecht2023-12-011-14/+29
| | | | | | - Improvement: GCv2: Use separate PreCache step to improve concurrency when checking references - Improvement: GCv2: Improved verbose logging - Improvement: GCv2: Sort chunks to read by block/offset when finding references - Improvement: GCv2: Exit as soon as no more unreferenced items are left
* tracing for gcv2 (#574)Dan Engelbrecht2023-11-281-0/+12
| | | | | | - Improvement: Added more trace scopes for GCv2 - Bugfix: Make sure we can override flags to "false" when running `zen gc` commmand - `smallobjects`, `skipcid`, `skipdelete`, `verbose`
* gcv2 tests for project store and bugfixes (#571)Dan Engelbrecht2023-11-271-84/+209
| | | * gcv2 tests for project store and bugfixes
* gc stop command (#569)v0.2.36-pre2Dan Engelbrecht2023-11-271-2/+21
| | | | | - Feature: New endpoint `/admin/gc-stop` to cancel a running garbage collect operation - Feature: Added `zen gc-stop` command to cancel a running garbage collect operation - Bugfix: GCv2 - make sure to discover all projects and oplogs before checking for expired data
* add command line options for compact block threshold and gc verbose (#557)Dan Engelbrecht2023-11-211-1/+17
| | | | | | | | | | | - Feature: Added new options to zenserver for GC V2 - `--gc-compactblock-threshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--gc-verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new options to `zen gc` command for GC V2 - `--compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new parameters for endpoint `admin/gc` (PUT) - `compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `verbose` GCV2 - enable more verbose output when running a GC pass
* compact separate for gc referencer (#533)Dan Engelbrecht2023-11-211-84/+199
| | | | | - Refactor GCV2 so GcReferencer::RemoveExpiredData returns a store compactor, moving out the actual disk work from deleting items in the index. - Refactor GCV2 GcResult to reuse GcCompactStoreStats and GcStats - Make Compacting of stores non-parallell to not eat all the disk I/O when running GC
* spdlog implementation hiding (#498)Stefan Boberg2023-11-061-2/+2
| | | | | | | | | this change aims to hide logging internals from client code, in order to make it easier to extend and take more control over the logging process in the future. As a bonus side effect, the generated code is much tighter (net delta around 2.5% on the resulting executable which includes lots of thirdparty code) and should take less time to compile and link. Client usage via macros is pretty much unchanged. The main exposure client code had to spdlog internals before was the use of custom loggers per subsystem, where it would be common to have `spdlog::logger` references to keep a reference to a logger within a class. This is now replaced by `zen::LoggerRef` which currently simply encapsulates an actual `spdlog::logger` instance, but this is intended to be an implementation detail which will change in the future. The way the change works is that we now handle any formatting of log messages in the zencore logging subsystem instead of relying on `spdlog` to manage this. We use the `fmt` library to do the formatting which means the client usage is identical to using `spdlog`. The formatted message is then forwarded onto any sinks etc which are still implememted via `spdlog`.
* gc v2 tests (#512)Dan Engelbrecht2023-11-061-12/+1
| | | | | | | | | | * set MaxBlockCount at init * properly calculate total size * basic blockstore compact blocks test * correct detection of block swap * Use one implementation for CreateRandomBlob * reduce some data sets to increase speed of tests * reduce test time * rename BlockStoreCompactState::AddBlock -> BlockStoreCompactState::IncludeBlock
* individual gc stats (#506)Dan Engelbrecht2023-10-301-29/+62
| | | | | - Feature: New parameter for endpoint `admin/gc` (GET) `details=true` which gives details stats on GC operation when using GC V2 - Feature: New options for zen command `gc-status` - `--details` that enables the detailed output from the last GC operation when using GC V2
* New GC implementation (#459)Dan Engelbrecht2023-10-301-0/+258
| | | - Feature: New garbage collection implementation, still in evaluation mode. Enabled by `--gc-v2` command line option
* merge disk and memory layers (#493)Dan Engelbrecht2023-10-241-1/+1
| | | | - Feature: Added `--cache-memlayer-sizethreshold` option to zenserver to control at which size cache entries get cached in memory - Changed: Merged cache memory layer with cache disk layer to reduce memory and cpu overhead