aboutsummaryrefslogtreecommitdiff
path: root/src/zenserver/projectstore/projectstore.cpp
Commit message (Collapse)AuthorAgeFilesLines
* zenremoteprojectstore with httpclient (#35)Dan Engelbrecht2024-04-031-1/+1
| | | | | | - Bugfix: Fix log of Success/Failure for oplog import - Improvement: Use HttpClient when doing oplog export/import with a zenserver as a remote target. Includes retry logic - Improvement: Increase the retry count to 4 (5 attempts in total) when talking to Jupiter for oplog export/import
* Use multithreading to fetch size/rawsize of entries in ↵Dan Engelbrecht2024-03-281-58/+161
| | | | | | `/prj/{project}/oplog/{log}/chunkinfos` and `/prj/{project}/oplog/{log}/files` (#30) - Improvement: Use multithreading to fetch size/rawsize of entries in `/prj/{project}/oplog/{log}/chunkinfos` and `/prj/{project}/oplog/{log}/files` - Improvement: Add `GetMediumWorkerPool()` in addition to `LargeWorkerPool()` and `SmallWorkerPool()`
* add "fieldnames" query param for GetProjectFiles/GetProjectChunkInfos (#29)Dan Engelbrecht2024-03-281-13/+84
| | | | | | | | - Improvement: It is now possible to control which fields to include in `/prj/{project}/oplog/{log}/chunkinfos` request by adding a comma delimited list of filed names for `fieldnames` parameter - Default fields are: `id`, `rawhash` and `rawsize` (translates to `?fieldnames=id,rawhash,rawsize`) - Use `?fieldnames=*` to get all the fields - Improvement: It is now possible to control which fields to include in `/prj/{project}/oplog/{log}/files` request by adding a comma delimited list of filed names for `fieldnames` parameter - Default fields are: `id`, `clientpath` and `serverpath` (translates to `?fieldnames=id,clientpath,serverpath`), `filter=client` only applies if `fieldnames` is not given as a parameter - Use `?fieldnames=*` to get all the fields
* Get raw size for compressed chunks correctly for ↵Dan Engelbrecht2024-03-271-1/+7
| | | | `/prj/{project}/oplog/{log}/chunkinfos` (#27)
* consistent paths encoding (#24)Dan Engelbrecht2024-03-251-21/+21
| | | | * Don't encode filesystem path to UTF8 unless stored in compactbinary string * Be consistent where we encode/decode paths to UTF8
* non memory copy compressed range (#13)Dan Engelbrecht2024-03-201-26/+30
| | | | | * Add CompressedBuffer::GetRange that references source data rather than make a memory copy * Use Compressed.CopyRange in project store GetChunkRange * docs for CompressedBuffer::CopyRange and CompressedBuffer::GetRange
* special treatment large oplog attachments v2 (#5)Dan Engelbrecht2024-03-141-30/+225
| | | | | - Bugfix: Install Ctrl+C handler earlier when doing `zen oplog-export` and `zen oplog-export` to properly cancel jobs - Improvement: Add ability to block a set of CAS entries from GC in project store - Improvement: Large attachments and loose files are now split into smaller chunks and stored in blocks during oplog export
* fix potential partially written files (#2)Dan Engelbrecht2024-03-131-8/+2
| | | | * Make sure WriteFile() does not leave incomplete files * use TemporaryFile and MoveTemporaryIntoPlace to avoid leaving partial files on error
* Make sure we wait for all scheduled tasks to complete before throwing ↵Dan Engelbrecht2024-02-281-1/+3
| | | | | exceptions further (#662) Bugfix: We must not throw exceptions to calling function until all async work we spawned has returned
* Keep track of added ops during GCV2 instead of rescanning full oplog when ↵Dan Engelbrecht2024-02-131-8/+23
| | | | added ops are detected (#652)
* compress large attachments on demand (#647)Dan Engelbrecht2024-02-051-32/+151
| | | | | | | - Improvement: Speed up oplog export by fetching/compressing big attachments on demand - Improvement: Speed up oplog export by batch-fetcing small attachments - Improvement: Speed up oplog import by batching writes of oplog ops - Improvement: Tweak oplog export default block size and embed size limit - Improvement: Add more messaging and progress during oplog import/export
* respond with BadRequest result instead of throwing exception on bad request ↵Dan Engelbrecht2024-02-051-2/+12
| | | | input (#648)
* improve oplog export logging (#644)Dan Engelbrecht2024-01-311-4/+0
| | | | | | - Improvement: More details in oplog import/export logs - Improvement: Switch from Download to Get when fetching Refs from Jupiter as they can't be resumed anyway and streaming to disk is redundant - Bugfix: Make sure we clear read callback when doing Put in HttpClient to avoid timeout due to not sending data when reusing sessions - Bugfix: Respect `--ignore-missing-attachments` in `oplog-export` command when loose file is missing on disk
* add ignore-missing-attachments option to oplog export (debugging tool) (#641)Dan Engelbrecht2024-01-251-14/+12
| | | | | | | * add ignore-missing-attachments option to oplog export (debugging tool) * add more status codes to do retry for in http client * add missing X-Jupiter-IoHash header for jupiter PutRef * reduce oplog block size to reduce amount of redundant chunks to download * improved logging
* Add retry with optional resume logic to HttpClient::Download (#639)Dan Engelbrecht2024-01-241-4/+8
| | | | | | | - Improvement: Refactored Jupiter upstream to use HttpClient - Improvement: Added retry and resume logic to HttpClient - Improvement: Added authentication support to HttpClient - Improvement: Clearer logging in GCV2 compact of FileCas/BlockStore - Improvement: Size details in oplog import logging
* oplog import/export improvements (#634)Dan Engelbrecht2024-01-231-59/+57
| | | | * improve feedback from oplog import/export * improve oplog save performance
* add --ignore-missing-attachments to oplog-import command (#637)Dan Engelbrecht2024-01-221-15/+23
|
* separate RPC processing from HTTP processing (#626)Stefan Boberg2023-12-201-1/+1
| | | | | | * moved all RPC processing from HttpStructuredCacheService into separate CacheRpcHandler class in zenstore * move package marshaling to zenutil. was previously in zenhttp/httpshared but it's useful in other contexts as well where we don't want to depend on zenhttp * introduced UpstreamCacheClient, this provides a subset of functions on UpstreamCache and lives in zenstore
* ensure we can build without trace (#619)Stefan Boberg2023-12-191-2/+2
| | | | `xmake config -zentrace=n` would previously not build cleanly
* improve trace (#606)Dan Engelbrecht2023-12-131-2/+0
| | | | | * Adding some more trace scopes for better visiblity * Removed spammy trace scope when replaying oplogs * Remove "::Disk" from trace scopes - redundant now that we have merge disk and memory layers
* improved scrubbing of oplogs and filecas (#596)Stefan Boberg2023-12-111-73/+188
| | | | | | - Improvement: Scrub command now validates compressed buffer hashes in filecas storage (used for large chunks) - Improvement: Added --dry, --no-gc and --no-cas options to zen scrub command - Improvement: Implemented oplog scrubbing (previously was a no-op) - Improvement: Implemented support for running scrubbint at startup with --scrub=<options>
* Change naming to ChunkInfos instead of Chunkszousar2023-12-061-3/+3
|
* Ran precommitzousar2023-12-051-6/+2
|
* Get hash when retrieving chunkszousar2023-12-051-7/+31
| | | | Also changes the returned fields for each chunk from size->rawsize. Backwards compatibility is not a concern as this was unused in past zenserver releases.
* Add endpoint for all chunk infoszousar2023-12-011-8/+4
| | | | Add endpoint for querying all chunk infos in an oplog.
* add separate PreCache step for GcReferenceChecker (#578)Dan Engelbrecht2023-12-011-14/+29
| | | | | | - Improvement: GCv2: Use separate PreCache step to improve concurrency when checking references - Improvement: GCv2: Improved verbose logging - Improvement: GCv2: Sort chunks to read by block/offset when finding references - Improvement: GCv2: Exit as soon as no more unreferenced items are left
* tracing for gcv2 (#574)Dan Engelbrecht2023-11-281-0/+12
| | | | | | - Improvement: Added more trace scopes for GCv2 - Bugfix: Make sure we can override flags to "false" when running `zen gc` commmand - `smallobjects`, `skipcid`, `skipdelete`, `verbose`
* gcv2 tests for project store and bugfixes (#571)Dan Engelbrecht2023-11-271-84/+209
| | | * gcv2 tests for project store and bugfixes
* gc stop command (#569)v0.2.36-pre2Dan Engelbrecht2023-11-271-2/+21
| | | | | - Feature: New endpoint `/admin/gc-stop` to cancel a running garbage collect operation - Feature: Added `zen gc-stop` command to cancel a running garbage collect operation - Bugfix: GCv2 - make sure to discover all projects and oplogs before checking for expired data
* add command line options for compact block threshold and gc verbose (#557)Dan Engelbrecht2023-11-211-1/+17
| | | | | | | | | | | - Feature: Added new options to zenserver for GC V2 - `--gc-compactblock-threshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--gc-verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new options to `zen gc` command for GC V2 - `--compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `--verbose` GCV2 - enable more verbose output when running a GC pass - Feature: Added new parameters for endpoint `admin/gc` (PUT) - `compactblockthreshold` GCV2 - how much of a compact block should be used to skip compacting the block, default is 90% - `verbose` GCV2 - enable more verbose output when running a GC pass
* compact separate for gc referencer (#533)Dan Engelbrecht2023-11-211-84/+199
| | | | | - Refactor GCV2 so GcReferencer::RemoveExpiredData returns a store compactor, moving out the actual disk work from deleting items in the index. - Refactor GCV2 GcResult to reuse GcCompactStoreStats and GcStats - Make Compacting of stores non-parallell to not eat all the disk I/O when running GC
* spdlog implementation hiding (#498)Stefan Boberg2023-11-061-2/+2
| | | | | | | | | this change aims to hide logging internals from client code, in order to make it easier to extend and take more control over the logging process in the future. As a bonus side effect, the generated code is much tighter (net delta around 2.5% on the resulting executable which includes lots of thirdparty code) and should take less time to compile and link. Client usage via macros is pretty much unchanged. The main exposure client code had to spdlog internals before was the use of custom loggers per subsystem, where it would be common to have `spdlog::logger` references to keep a reference to a logger within a class. This is now replaced by `zen::LoggerRef` which currently simply encapsulates an actual `spdlog::logger` instance, but this is intended to be an implementation detail which will change in the future. The way the change works is that we now handle any formatting of log messages in the zencore logging subsystem instead of relying on `spdlog` to manage this. We use the `fmt` library to do the formatting which means the client usage is identical to using `spdlog`. The formatted message is then forwarded onto any sinks etc which are still implememted via `spdlog`.
* gc v2 tests (#512)Dan Engelbrecht2023-11-061-12/+1
| | | | | | | | | | * set MaxBlockCount at init * properly calculate total size * basic blockstore compact blocks test * correct detection of block swap * Use one implementation for CreateRandomBlob * reduce some data sets to increase speed of tests * reduce test time * rename BlockStoreCompactState::AddBlock -> BlockStoreCompactState::IncludeBlock
* individual gc stats (#506)Dan Engelbrecht2023-10-301-29/+62
| | | | | - Feature: New parameter for endpoint `admin/gc` (GET) `details=true` which gives details stats on GC operation when using GC V2 - Feature: New options for zen command `gc-status` - `--details` that enables the detailed output from the last GC operation when using GC V2
* New GC implementation (#459)Dan Engelbrecht2023-10-301-0/+258
| | | - Feature: New garbage collection implementation, still in evaluation mode. Enabled by `--gc-v2` command line option
* merge disk and memory layers (#493)Dan Engelbrecht2023-10-241-1/+1
| | | | - Feature: Added `--cache-memlayer-sizethreshold` option to zenserver to control at which size cache entries get cached in memory - Changed: Merged cache memory layer with cache disk layer to reduce memory and cpu overhead
* clean up GcContributor and GcStorage to be pure interfaces (#485)Dan Engelbrecht2023-10-201-3/+6
|
* add `flush` command and more gc status info (#483)Dan Engelbrecht2023-10-181-2/+3
| | | | | | - Feature: New endpoint `/admin/flush ` to flush all storage - CAS, Cache and ProjectStore - Feature: New command `zen flush` to flush all storage - CAS, Cache and ProjectStore - Improved: Command `zen gc-status` now gives details about storage, when last GC occured, how long until next GC etc - Changed: Cache access and write log are disabled by default
* faster oplog iteration (#471)Dan Engelbrecht2023-10-131-22/+34
| | | | | * use a CbObjectView instead of CbObject to avoid creating IOBufferCore instances * use BasicFileBuffer directly where possible * changelog
* cache reference tracking (#455)Dan Engelbrecht2023-10-101-6/+9
| | | | | - Feature: Add caching of referenced CId content for structured cache records, this avoid disk thrashing when gathering references for GC - disabled by default, enable with `--cache-reference-cache-enabled` - Improvement: Faster collection of referenced CId content in project store
* fix gc infinite loop (#453)Dan Engelbrecht2023-10-061-5/+13
| | | | * make sure we update last gc time even if gc fails * If we can't check if an oplog/project markerfile exists, assume it is not expired
* faster accesstime save restore (#439)Dan Engelbrecht2023-10-031-23/+43
| | | | | | | | | | - Improvement: Reduce time a cache bucket is locked for write when flushing/garbage collecting - Change format for faster read/write and reduced size on disk - Don't lock index while writing manifest to disk - Skip garbage collect if we are currently in a Flush operation - BlockStore::Flush no longer terminates currently writing block - Garbage collect references to currently writing block but keep the block as new data may be added - Fix BlockStore::Prune used disk space calculation - Don't materialize data in filecas when we just need the size
* lightweight gc (#431)Dan Engelbrecht2023-10-021-0/+4
| | | | | | - Feature: Add lightweight GC that only removes items from cache/project store without cleaning up data referenced in Cid store - Add `skipcid` parameter to http endpoint `admin/gc`, defaults to "false" - Add `--skipcid` option to `zen gc` command, defaults to false - Add `--gc-lightweight-interval-seconds` option to zenserver
* Improvement: Add names to background jobs for easier debugging (#412)Dan Engelbrecht2023-09-201-28/+32
| | | | Improvement: Background jobs now temporarily sets thread name to background job name while executing Improvement: Background jobs tracks worker thread id used while executing
* VFS implementation for local storage service (#396)Stefan Boberg2023-09-201-8/+102
| | | currently, only Windows (using Projected File System) is supported
* add more trace scopes (#362)Dan Engelbrecht2023-09-151-30/+39
| | | | | * more trace scopes * Make sure ReplayLogEntries uses the correct size for oplog buffer * changelog
* job queue and async oplog-import/export (#395)Dan Engelbrecht2023-09-131-43/+77
| | | | | | | | | | | | | | | | - Feature: New http endpoint for background jobs `/admin/jobs/status` which will return a response listing the currently active background jobs and their status - Feature: New http endpoint for background jobs information `/admin/jobs/status/{jobid}` which will return a response detailing status, pending messages and progress status - GET will return a response detailing status, pending messages and progress status - DELETE will mark the job for cancelling and return without waiting for completion - If status returned is "Complete" or "Aborted" the jobid will be removed from the server and can not be queried again - Feature: New zen command `jobs` to list, get info about and cancel background jobs - If no options are given it will display a list of active background jobs - `--jobid` accepts an id (returned from for example `oplog-export` with `--async`) and will return a response detailing status, pending messages and progress status for that job - `--cancel` can be added when `--jobid` is given which will request zenserver to cancel the background job - Feature: oplog import and export http rpc requests are now async operations that will run in the background - Feature: `oplog-export` and `oplog-import` now reports progress to the console as work progress by default - Feature: `oplog-export` and `oplog-import` can now be cancelled using Ctrl+C - Feature: `oplog-export` and `oplog-import` has a new option `--async` which will only trigger the work and report a background job id back
* scan oplog object for fields (#397)Dan Engelbrecht2023-09-131-89/+130
| | | | * scan oplog object for fields * Read all oplog entries but only read op data and get mapping for latest op of each key
* incremental oplog upload for block-based targets (#392)Dan Engelbrecht2023-09-121-2/+20
| | | | | | | * add option for base container for oplog export read base oplog and fetch known blocks * reuse blocks if a known block has 80+ % usage * changelog * better logging and added base to remotestore descriptions
* stream oplog attachments from jupiter (#384)Dan Engelbrecht2023-09-061-7/+8
| | | | | | | | | | * stream large downloads from jupiter to temporary file * rework DeleteOnClose - top level marks file for delete and if lower level parts wants to keep it it clears that flag * changelog * log number of attachments to download * add delay on jupiter request failure when retrying * make sure we upload all attachments even if Needs are empty when ForceUpload is true release TempAttachment as soon as it is used * sort attachments so we get predictable blocks for the same oplog