diff options
| author | Dan Engelbrecht <[email protected]> | 2025-02-26 15:10:14 +0100 |
|---|---|---|
| committer | GitHub Enterprise <[email protected]> | 2025-02-26 15:10:14 +0100 |
| commit | 7d8fe45af3b49d800f84f0ddce051c0b3b2e837d (patch) | |
| tree | c8dd564dcf247d7b2537bb5c2ebfbca57bafd205 /src | |
| parent | improvements and infrastructure for upcoming builds api command line (#284) (diff) | |
| download | zen-7d8fe45af3b49d800f84f0ddce051c0b3b2e837d.tar.xz zen-7d8fe45af3b49d800f84f0ddce051c0b3b2e837d.zip | |
builds upload command (#278)
- Feature: **EXPERIMENTAL** New `zen builds` command to list, upload and download folders to Cloud Build API
- `builds list` list available builds (**INCOMPLETE - FILTERING MISSING**)
- `builds upload` upload a folder to Cloud Build API
- `--local-path` source folder to upload
- `--create-build` creates a new parent build object (using the object id), if omitted a parent build must exist and `--build-id` must be given
- `--build-id` an Oid in hex form for the Build identifier to use - omit to have the id auto generated
- `--build-part-id` and Oid in hex form for the Build Part identifier for the folder - omit to have the id auto generated
- `--build-part-name` name of the build part - if omitted the name of the leaf folder name give in `--local-path`
- `--metadata-path` path to a json formatted file with meta data information about the build. Meta-data must be provided if `--create-build` is set
- `--metadata` key-value pairs separated by ';' with build meta data for the build. (key1=value1;key2=value2). Meta-data must be provided if `--create-build` is set
- `--clean` ignore any existing blocks of chunk data and upload a fresh set of blocks
- `--allow-multipart` enable usage of multi-part http upload requests
- `--manifest-path` path to text file listing files to include in upload. Exclude to upload everything in `--local-path`
- `builds download` download a folder from Cloud Build API (**INCOMPLETE - WILL WIPE UNTRACKED DATA FROM TARGET FOLDER**)
- `--local-path` target folder to download to
- `--build-id` an Oid in hex form for the Build identifier to use
- `--build-part-id` a comma separated list of Oid in hex for the build part identifier(s) to download - mutually exclusive to `--build-part-name`
- `--build-part-name` a comma separated list of names for the build part(s) to download - if omitted the name of the leaf folder name give in `--local-path`
- `--clean` deletes all data in target folder before downloading (NON-CLEAN IS NOT IMPLEMENTED YET)
- `--allow-multipart` enable usage of multi-part http download reqeusts
- `builds diff` download a folder from Cloud Build API
- `--local-path` target folder to download to
- `--compare-path` folder to compare target with
- `--only-chunked` compare only files that would be chunked
- `builds fetch-blob` fetch and validate a blob from remote store
- `--build-id` an Oid in hex form for the Build identifier to use
- `--blob-hash` an IoHash in hex form identifying the blob to download
- `builds validate part` fetch a build part and validate all referenced attachments
- `--build-id` an Oid in hex form for the Build identifier to use
- `--build-part-id` an Oid in hex for the build part identifier to validate - mutually exclusive to `--build-part-name`
- `--build-part-name` a name for the build part to validate - mutually exclusive to `--build-part-id`
- `builds test` a series of operation that uploads, downloads and test various aspects of incremental operations
- `--local-path` source folder to upload
- Options for Cloud Build API remote store (`list`, `upload`, `download`, `fetch-blob`, `validate-part`)
- `--url` Cloud Builds URL
- `--assume-http2` assume that the builds endpoint is a HTTP/2 endpoint skipping HTTP/1.1 upgrade handshake
- `--namespace` Builds Storage namespace
- `--bucket` Builds Storage bucket
- Authentication options for Cloud Build API
- Auth token
- `--access-token` http auth Cloud Storage access token
- `--access-token-env` name of environment variable that holds the Http auth Cloud Storage access token
- `--access-token-path` path to json file that holds the Http auth Cloud Storage access token
- OpenId authentication
- `--openid-provider-name` Open ID provider name
- `--openid-provider-url` Open ID provider url
- `--openid-client-id`Open ID client id
- `--openid-refresh-token` Open ID refresh token
- `--encryption-aes-key` 256 bit AES encryption key for storing OpenID credentials
- `--encryption-aes-iv` 128 bit AES encryption initialization vector for storing OpenID credentials
- OAuth authentication
- `--oauth-url` OAuth provier url
- `--oauth-clientid` OAuth client id
- `--oauth-clientsecret` OAuth client secret
- Options for file based remote store used for for testing purposes (`list`, `upload`, `download`, `fetch-blob`, `validate-part`, `test`)
- `--storage-path` path to folder to store builds data
- `--json-metadata` enable json output in store for all compact binary objects (off by default)
- Output options for all builds commands
- `--plain-progress` use plain line-by-line progress output
- `--verbose`
Diffstat (limited to 'src')
23 files changed, 9108 insertions, 61 deletions
diff --git a/src/zen/cmds/builds_cmd.cpp b/src/zen/cmds/builds_cmd.cpp new file mode 100644 index 000000000..ececab29e --- /dev/null +++ b/src/zen/cmds/builds_cmd.cpp @@ -0,0 +1,6106 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#include "builds_cmd.h" + +#include <zencore/basicfile.h> +#include <zencore/compactbinarybuilder.h> +#include <zencore/compactbinaryfile.h> +#include <zencore/compress.h> +#include <zencore/except.h> +#include <zencore/filesystem.h> +#include <zencore/fmtutils.h> +#include <zencore/logging.h> +#include <zencore/scopeguard.h> +#include <zencore/string.h> +#include <zencore/uid.h> +#include <zenhttp/formatters.h> +#include <zenhttp/httpclient.h> +#include <zenhttp/httpclientauth.h> +#include <zenhttp/httpcommon.h> +#include <zenutil/chunkblock.h> +#include <zenutil/chunkedcontent.h> +#include <zenutil/chunkedfile.h> +#include <zenutil/chunkingcontroller.h> +#include <zenutil/filebuildstorage.h> +#include <zenutil/jupiter/jupiterbuildstorage.h> +#include <zenutil/jupiter/jupitersession.h> +#include <zenutil/parallellwork.h> +#include <zenutil/workerpools.h> +#include <zenutil/zenserverprocess.h> + +#include <memory> + +ZEN_THIRD_PARTY_INCLUDES_START +#include <tsl/robin_map.h> +#include <tsl/robin_set.h> +#include <json11.hpp> +ZEN_THIRD_PARTY_INCLUDES_END + +#if ZEN_PLATFORM_WINDOWS +# include <zencore/windows.h> +#else +# include <fcntl.h> +# include <sys/file.h> +# include <sys/stat.h> +# include <unistd.h> +#endif + +#define EXTRA_VERIFY 0 + +namespace zen { +namespace { + using namespace std::literals; + + static const size_t DefaultMaxBlockSize = 64u * 1024u * 1024u; + static const size_t DefaultMaxChunkEmbedSize = 3u * 512u * 1024u; + + struct ChunksBlockParameters + { + size_t MaxBlockSize = DefaultMaxBlockSize; + size_t MaxChunkEmbedSize = DefaultMaxChunkEmbedSize; + }; + + const ChunksBlockParameters DefaultChunksBlockParams{.MaxBlockSize = 32u * 1024u * 1024u, + .MaxChunkEmbedSize = DefaultChunkedParams.MaxSize}; + const std::string ZenFolderName = ".zen"; + const std::string ZenStateFilePath = fmt::format("{}/current_state.cbo", ZenFolderName); + const std::string ZenStateFileJsonPath = fmt::format("{}/current_state.json", ZenFolderName); + const std::string ZenTempFolderName = fmt::format("{}/tmp", ZenFolderName); + const std::string ZenTempReuseFolderName = fmt::format("{}/reuse", ZenTempFolderName); + const std::string ZenTempStorageFolderName = fmt::format("{}/storage", ZenTempFolderName); + const std::string ZenTempBlockFolderName = fmt::format("{}/blocks", ZenTempFolderName); + const std::string ZenTempChunkFolderName = fmt::format("{}/chunks", ZenTempFolderName); + const std::string ZenExcludeManifestName = ".zen_exclude_manifest.txt"; + + const std::string UnsyncFolderName = ".unsync"; + + const std::string UGSFolderName = ".ugs"; + const std::string LegacyZenTempFolderName = ".zen-tmp"; + + const std::vector<std::string_view> DefaultExcludeFolders({UnsyncFolderName, ZenFolderName, UGSFolderName, LegacyZenTempFolderName}); + const std::vector<std::string_view> DefaultExcludeExtensions({}); + + static bool IsVerbose = false; + static bool UsePlainProgress = false; + +#define ZEN_CONSOLE_VERBOSE(fmtstr, ...) \ + if (IsVerbose) \ + { \ + ZEN_CONSOLE_LOG(zen::logging::level::Info, fmtstr, ##__VA_ARGS__); \ + } + + const std::string DefaultAccessTokenEnvVariableName( +#if ZEN_PLATFORM_WINDOWS + "UE-CloudDataCacheAccessToken"sv +#endif +#if ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + "UE_CloudDataCacheAccessToken"sv +#endif + + ); + + template<typename T> + std::string FormatArray(std::span<const T> Items, std::string_view Prefix) + { + ExtendableStringBuilder<512> SB; + for (const T& Item : Items) + { + SB.Append(fmt::format("{}{}", Prefix, Item)); + } + return SB.ToString(); + } + + void CleanDirectory(const std::filesystem::path& Path, std::span<const std::string_view> ExcludeDirectories) + { + DirectoryContent LocalDirectoryContent; + GetDirectoryContent(Path, DirectoryContentFlags::IncludeDirs | DirectoryContentFlags::IncludeFiles, LocalDirectoryContent); + for (const std::filesystem::path& LocalFilePath : LocalDirectoryContent.Files) + { + std::filesystem::remove(LocalFilePath); + } + + for (const std::filesystem::path& LocalDirPath : LocalDirectoryContent.Directories) + { + bool Leave = false; + for (const std::string_view ExcludeDirectory : ExcludeDirectories) + { + if (LocalDirPath == (Path / ExcludeDirectory)) + { + Leave = true; + break; + } + } + if (!Leave) + { + zen::CleanDirectory(LocalDirPath); + std::filesystem::remove(LocalDirPath); + } + } + } + + std::string ReadAccessTokenFromFile(const std::filesystem::path& Path) + { + if (!std::filesystem::is_regular_file(Path)) + { + throw std::runtime_error(fmt::format("the file '{}' does not exist", Path)); + } + IoBuffer Body = IoBufferBuilder::MakeFromFile(Path); + std::string JsonText(reinterpret_cast<const char*>(Body.GetData()), Body.GetSize()); + std::string JsonError; + json11::Json TokenInfo = json11::Json::parse(JsonText, JsonError); + if (!JsonError.empty()) + { + throw std::runtime_error(fmt::format("failed parsing json file '{}'. Reason: '{}'", Path, JsonError)); + } + const std::string AuthToken = TokenInfo["Token"].string_value(); + if (AuthToken.empty()) + { + throw std::runtime_error(fmt::format("the json file '{}' does not contain a value for \"Token\"", Path)); + } + return AuthToken; + } + + CompositeBuffer WriteToTempFileIfNeeded(const CompositeBuffer& Buffer, const std::filesystem::path& TempFolderPath, const IoHash& Hash) + { + // If this is a file based buffer or a compressed buffer with a memory-based header, we don't need to rewrite to disk to save memory + std::span<const SharedBuffer> Segments = Buffer.GetSegments(); + ZEN_ASSERT(Buffer.GetSegments().size() > 0); + size_t SegmentIndexToCheck = Segments.size() > 1 ? 1 : 0; + IoBufferFileReference FileRef; + if (Segments[SegmentIndexToCheck].GetFileReference(FileRef)) + { + return Buffer; + } + std::filesystem::path TempFilePath = (TempFolderPath / Hash.ToHexString()).make_preferred(); + return CompositeBuffer(WriteToTempFile(Buffer, TempFilePath)); + } + + class FilteredRate + { + public: + FilteredRate() {} + + void Start() + { + if (StartTimeUS == (uint64_t)-1) + { + uint64_t Expected = (uint64_t)-1; + if (StartTimeUS.compare_exchange_weak(Expected, Timer.GetElapsedTimeUs())) + { + LastTimeUS = StartTimeUS.load(); + } + } + } + void Stop() + { + if (EndTimeUS == (uint64_t)-1) + { + uint64_t Expected = (uint64_t)-1; + EndTimeUS.compare_exchange_weak(Expected, Timer.GetElapsedTimeUs()); + } + } + + void Update(uint64_t Count) + { + if (LastTimeUS == (uint64_t)-1) + { + return; + } + uint64_t TimeUS = Timer.GetElapsedTimeUs(); + uint64_t TimeDeltaUS = TimeUS - LastTimeUS; + if (TimeDeltaUS >= 1000000) + { + uint64_t Delta = Count - LastCount; + uint64_t PerSecond = (Delta * 1000000) / TimeDeltaUS; + + LastPerSecond = PerSecond; + + LastCount = Count; + + FilteredPerSecond = (PerSecond + (LastPerSecond * 7)) / 8; + + LastTimeUS = TimeUS; + } + } + + uint64_t GetCurrent() const + { + if (LastTimeUS == (uint64_t)-1) + { + return 0; + } + return FilteredPerSecond; + } + + uint64_t GetElapsedTime() const + { + if (StartTimeUS == (uint64_t)-1) + { + return 0; + } + if (EndTimeUS == (uint64_t)-1) + { + return 0; + } + uint64_t TimeDeltaUS = EndTimeUS - StartTimeUS; + return TimeDeltaUS; + } + + bool IsActive() const { return (StartTimeUS != (uint64_t)-1) && (EndTimeUS == (uint64_t)-1); } + + private: + Stopwatch Timer; + std::atomic<uint64_t> StartTimeUS = (uint64_t)-1; + std::atomic<uint64_t> EndTimeUS = (uint64_t)-1; + std::atomic<uint64_t> LastTimeUS = (uint64_t)-1; + uint64_t LastCount = 0; + uint64_t LastPerSecond = 0; + uint64_t FilteredPerSecond = 0; + }; + + uint64_t GetBytesPerSecond(uint64_t ElapsedWallTimeUS, uint64_t Count) + { + if (ElapsedWallTimeUS == 0) + { + return 0; + } + return Count * 1000000 / ElapsedWallTimeUS; + } + + ChunkedFolderContent ScanAndChunkFolder( + GetFolderContentStatistics& GetFolderContentStats, + ChunkingStatistics& ChunkingStats, + const std::filesystem::path& Path, + std::function<bool(const std::string_view& RelativePath)>&& IsAcceptedFolder, + std::function<bool(std::string_view RelativePath, uint64_t Size, uint32_t Attributes)>&& IsAcceptedFile, + ChunkingController& ChunkController, + std::atomic<bool>& AbortFlag) + { + FolderContent Content = GetFolderContent( + GetFolderContentStats, + Path, + std::move(IsAcceptedFolder), + std::move(IsAcceptedFile), + GetMediumWorkerPool(EWorkloadType::Burst), + UsePlainProgress ? 5000 : 200, + [](bool, std::ptrdiff_t) {}, + AbortFlag); + if (AbortFlag) + { + return {}; + } + + ProgressBar ProgressBar(UsePlainProgress); + FilteredRate FilteredBytesHashed; + FilteredBytesHashed.Start(); + ChunkedFolderContent FolderContent = ChunkFolderContent( + ChunkingStats, + GetMediumWorkerPool(EWorkloadType::Burst), + Path, + Content, + ChunkController, + UsePlainProgress ? 5000 : 200, + [&](bool, std::ptrdiff_t) { + FilteredBytesHashed.Update(ChunkingStats.BytesHashed.load()); + ProgressBar.UpdateState({.Task = "Scanning files ", + .Details = fmt::format("{}/{} ({}/{}, {}B/s) files, {} ({}) chunks found", + ChunkingStats.FilesProcessed.load(), + GetFolderContentStats.AcceptedFileCount.load(), + NiceBytes(ChunkingStats.BytesHashed.load()), + NiceBytes(GetFolderContentStats.FoundFileByteCount), + NiceNum(FilteredBytesHashed.GetCurrent()), + ChunkingStats.UniqueChunksFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load())), + .TotalCount = GetFolderContentStats.AcceptedFileByteCount, + .RemainingCount = GetFolderContentStats.AcceptedFileByteCount - ChunkingStats.BytesHashed.load()}, + false); + }, + AbortFlag); + FilteredBytesHashed.Stop(); + ProgressBar.Finish(); + + ZEN_CONSOLE("Found {} ({}) files divided into {} ({}) unique chunks in '{}' in {}. Average hash rate {}B/sec", + ChunkingStats.FilesProcessed.load(), + NiceBytes(ChunkingStats.BytesHashed.load()), + ChunkingStats.UniqueChunksFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load()), + Path, + NiceTimeSpanMs((GetFolderContentStats.ElapsedWallTimeUS + ChunkingStats.ElapsedWallTimeUS) / 1000), + NiceNum(GetBytesPerSecond(ChunkingStats.ElapsedWallTimeUS, ChunkingStats.BytesHashed))); + return FolderContent; + }; + + struct DiskStatistics + { + std::atomic<uint64_t> OpenReadCount = 0; + std::atomic<uint64_t> OpenWriteCount = 0; + std::atomic<uint64_t> ReadCount = 0; + std::atomic<uint64_t> ReadByteCount = 0; + std::atomic<uint64_t> WriteCount = 0; + std::atomic<uint64_t> WriteByteCount = 0; + std::atomic<uint64_t> CurrentOpenFileCount = 0; + }; + + struct FindBlocksStatistics + { + uint64_t FindBlockTimeMS = 0; + uint64_t PotentialChunkCount = 0; + uint64_t PotentialChunkByteCount = 0; + uint64_t FoundBlockCount = 0; + uint64_t FoundBlockChunkCount = 0; + uint64_t FoundBlockByteCount = 0; + uint64_t AcceptedBlockCount = 0; + uint64_t AcceptedChunkCount = 0; + uint64_t AcceptedByteCount = 0; + uint64_t RejectedBlockCount = 0; + uint64_t RejectedChunkCount = 0; + uint64_t RejectedByteCount = 0; + uint64_t AcceptedReduntantChunkCount = 0; + uint64_t AcceptedReduntantByteCount = 0; + uint64_t NewBlocksCount = 0; + uint64_t NewBlocksChunkCount = 0; + uint64_t NewBlocksChunkByteCount = 0; + }; + + struct UploadStatistics + { + std::atomic<uint64_t> BlockCount = 0; + std::atomic<uint64_t> BlocksBytes = 0; + std::atomic<uint64_t> ChunkCount = 0; + std::atomic<uint64_t> ChunksBytes = 0; + std::atomic<uint64_t> ReadFromDiskBytes = 0; + std::atomic<uint64_t> MultipartAttachmentCount = 0; + uint64_t ElapsedWallTimeUS = 0; + + UploadStatistics& operator+=(const UploadStatistics& Rhs) + { + BlockCount += Rhs.BlockCount; + BlocksBytes += Rhs.BlocksBytes; + ChunkCount += Rhs.ChunkCount; + ChunksBytes += Rhs.ChunksBytes; + ReadFromDiskBytes += Rhs.ReadFromDiskBytes; + MultipartAttachmentCount += Rhs.MultipartAttachmentCount; + ElapsedWallTimeUS += Rhs.ElapsedWallTimeUS; + return *this; + } + }; + + struct LooseChunksStatistics + { + uint64_t ChunkCount = 0; + uint64_t ChunkByteCount = 0; + std::atomic<uint64_t> CompressedChunkCount = 0; + std::atomic<uint64_t> CompressedChunkBytes = 0; + uint64_t CompressChunksElapsedWallTimeUS = 0; + + LooseChunksStatistics& operator+=(const LooseChunksStatistics& Rhs) + { + ChunkCount += Rhs.ChunkCount; + ChunkByteCount += Rhs.ChunkByteCount; + CompressedChunkCount += Rhs.CompressedChunkCount; + CompressedChunkBytes += Rhs.CompressedChunkBytes; + CompressChunksElapsedWallTimeUS += Rhs.CompressChunksElapsedWallTimeUS; + return *this; + } + }; + + struct GenerateBlocksStatistics + { + std::atomic<uint64_t> GeneratedBlockByteCount = 0; + std::atomic<uint64_t> GeneratedBlockCount = 0; + uint64_t GenerateBlocksElapsedWallTimeUS = 0; + + GenerateBlocksStatistics& operator+=(const GenerateBlocksStatistics& Rhs) + { + GeneratedBlockByteCount += Rhs.GeneratedBlockByteCount; + GeneratedBlockCount += Rhs.GeneratedBlockCount; + GenerateBlocksElapsedWallTimeUS += Rhs.GenerateBlocksElapsedWallTimeUS; + return *this; + } + }; + + std::vector<uint32_t> CalculateAbsoluteChunkOrders(const std::span<const IoHash> LocalChunkHashes, + const std::span<const uint32_t> LocalChunkOrder, + const tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& ChunkHashToLocalChunkIndex, + const std::span<const uint32_t>& LooseChunkIndexes, + const std::span<const ChunkBlockDescription>& BlockDescriptions) + { +#if EXTRA_VERIFY + std::vector<IoHash> TmpAbsoluteChunkHashes; + TmpAbsoluteChunkHashes.reserve(LocalChunkHashes.size()); +#endif // EXTRA_VERIFY + std::vector<uint32_t> LocalChunkIndexToAbsoluteChunkIndex; + LocalChunkIndexToAbsoluteChunkIndex.resize(LocalChunkHashes.size(), (uint32_t)-1); + std::uint32_t AbsoluteChunkCount = 0; + for (uint32_t ChunkIndex : LooseChunkIndexes) + { + LocalChunkIndexToAbsoluteChunkIndex[ChunkIndex] = AbsoluteChunkCount; +#if EXTRA_VERIFY + TmpAbsoluteChunkHashes.push_back(LocalChunkHashes[ChunkIndex]); +#endif // EXTRA_VERIFY + AbsoluteChunkCount++; + } + for (const ChunkBlockDescription& Block : BlockDescriptions) + { + for (const IoHash& ChunkHash : Block.ChunkRawHashes) + { + if (auto It = ChunkHashToLocalChunkIndex.find(ChunkHash); It != ChunkHashToLocalChunkIndex.end()) + { + const uint32_t LocalChunkIndex = It->second; + ZEN_ASSERT_SLOW(LocalChunkHashes[LocalChunkIndex] == ChunkHash); + LocalChunkIndexToAbsoluteChunkIndex[LocalChunkIndex] = AbsoluteChunkCount; + } +#if EXTRA_VERIFY + TmpAbsoluteChunkHashes.push_back(ChunkHash); +#endif // EXTRA_VERIFY + AbsoluteChunkCount++; + } + } + std::vector<uint32_t> AbsoluteChunkOrder; + AbsoluteChunkOrder.reserve(LocalChunkHashes.size()); + for (const uint32_t LocalChunkIndex : LocalChunkOrder) + { + const uint32_t AbsoluteChunkIndex = LocalChunkIndexToAbsoluteChunkIndex[LocalChunkIndex]; +#if EXTRA_VERIFY + ZEN_ASSERT(LocalChunkHashes[LocalChunkIndex] == TmpAbsoluteChunkHashes[AbsoluteChunkIndex]); +#endif // EXTRA_VERIFY + AbsoluteChunkOrder.push_back(AbsoluteChunkIndex); + } +#if EXTRA_VERIFY + { + uint32_t OrderIndex = 0; + while (OrderIndex < LocalChunkOrder.size()) + { + const uint32_t LocalChunkIndex = LocalChunkOrder[OrderIndex]; + const IoHash& LocalChunkHash = LocalChunkHashes[LocalChunkIndex]; + const uint32_t AbsoluteChunkIndex = AbsoluteChunkOrder[OrderIndex]; + const IoHash& AbsoluteChunkHash = TmpAbsoluteChunkHashes[AbsoluteChunkIndex]; + ZEN_ASSERT(LocalChunkHash == AbsoluteChunkHash); + OrderIndex++; + } + } +#endif // EXTRA_VERIFY + return AbsoluteChunkOrder; + } + + void CalculateLocalChunkOrders(const std::span<const uint32_t>& AbsoluteChunkOrders, + const std::span<const IoHash> LooseChunkHashes, + const std::span<const uint64_t> LooseChunkRawSizes, + const std::span<const ChunkBlockDescription>& BlockDescriptions, + std::vector<IoHash>& OutLocalChunkHashes, + std::vector<uint64_t>& OutLocalChunkRawSizes, + std::vector<uint32_t>& OutLocalChunkOrders) + { + std::vector<IoHash> AbsoluteChunkHashes; + std::vector<uint64_t> AbsoluteChunkRawSizes; + AbsoluteChunkHashes.insert(AbsoluteChunkHashes.end(), LooseChunkHashes.begin(), LooseChunkHashes.end()); + AbsoluteChunkRawSizes.insert(AbsoluteChunkRawSizes.end(), LooseChunkRawSizes.begin(), LooseChunkRawSizes.end()); + for (const ChunkBlockDescription& Block : BlockDescriptions) + { + AbsoluteChunkHashes.insert(AbsoluteChunkHashes.end(), Block.ChunkRawHashes.begin(), Block.ChunkRawHashes.end()); + AbsoluteChunkRawSizes.insert(AbsoluteChunkRawSizes.end(), Block.ChunkRawLengths.begin(), Block.ChunkRawLengths.end()); + } + OutLocalChunkHashes.reserve(AbsoluteChunkHashes.size()); + OutLocalChunkRawSizes.reserve(AbsoluteChunkRawSizes.size()); + OutLocalChunkOrders.reserve(AbsoluteChunkOrders.size()); + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> ChunkHashToChunkIndex; + ChunkHashToChunkIndex.reserve(AbsoluteChunkHashes.size()); + + for (uint32_t AbsoluteChunkOrderIndex = 0; AbsoluteChunkOrderIndex < AbsoluteChunkOrders.size(); AbsoluteChunkOrderIndex++) + { + const uint32_t AbsoluteChunkIndex = AbsoluteChunkOrders[AbsoluteChunkOrderIndex]; + const IoHash& AbsoluteChunkHash = AbsoluteChunkHashes[AbsoluteChunkIndex]; + const uint64_t AbsoluteChunkRawSize = AbsoluteChunkRawSizes[AbsoluteChunkIndex]; + + if (auto It = ChunkHashToChunkIndex.find(AbsoluteChunkHash); It != ChunkHashToChunkIndex.end()) + { + const uint32_t LocalChunkIndex = It->second; + OutLocalChunkOrders.push_back(LocalChunkIndex); + } + else + { + uint32_t LocalChunkIndex = gsl::narrow<uint32_t>(OutLocalChunkHashes.size()); + OutLocalChunkHashes.push_back(AbsoluteChunkHash); + OutLocalChunkRawSizes.push_back(AbsoluteChunkRawSize); + OutLocalChunkOrders.push_back(LocalChunkIndex); + ChunkHashToChunkIndex.insert_or_assign(AbsoluteChunkHash, LocalChunkIndex); + } +#if EXTRA_VERIFY + const uint32_t LocalChunkIndex = OutLocalChunkOrders[AbsoluteChunkOrderIndex]; + const IoHash& LocalChunkHash = OutLocalChunkHashes[LocalChunkIndex]; + const uint64_t& LocalChunkRawSize = OutLocalChunkRawSizes[LocalChunkIndex]; + ZEN_ASSERT(LocalChunkHash == AbsoluteChunkHash); + ZEN_ASSERT(LocalChunkRawSize == AbsoluteChunkRawSize); +#endif // EXTRA_VERIFY + } +#if EXTRA_VERIFY + for (uint32_t OrderIndex = 0; OrderIndex < OutLocalChunkOrders.size(); OrderIndex++) + { + uint32_t LocalChunkIndex = OutLocalChunkOrders[OrderIndex]; + const IoHash LocalChunkHash = OutLocalChunkHashes[LocalChunkIndex]; + uint64_t LocalChunkRawSize = OutLocalChunkRawSizes[LocalChunkIndex]; + + uint32_t VerifyChunkIndex = AbsoluteChunkOrders[OrderIndex]; + const IoHash VerifyChunkHash = AbsoluteChunkHashes[VerifyChunkIndex]; + uint64_t VerifyChunkRawSize = AbsoluteChunkRawSizes[VerifyChunkIndex]; + + ZEN_ASSERT(LocalChunkHash == VerifyChunkHash); + ZEN_ASSERT(LocalChunkRawSize == VerifyChunkRawSize); + } +#endif // EXTRA_VERIFY + } + + void WriteBuildContentToCompactBinary(CbObjectWriter& PartManifestWriter, + const SourcePlatform Platform, + std::span<const std::filesystem::path> Paths, + std::span<const IoHash> RawHashes, + std::span<const uint64_t> RawSizes, + std::span<const uint32_t> Attributes, + std::span<const IoHash> SequenceRawHashes, + std::span<const uint32_t> ChunkCounts, + std::span<const IoHash> LocalChunkHashes, + std::span<const uint64_t> LocalChunkRawSizes, + std::vector<uint32_t> AbsoluteChunkOrders, + const std::span<const uint32_t> LooseLocalChunkIndexes, + const std::span<IoHash> BlockHashes) + { + ZEN_ASSERT(Platform != SourcePlatform::_Count); + PartManifestWriter.AddString("platform"sv, ToString(Platform)); + + uint64_t TotalSize = 0; + for (const uint64_t Size : RawSizes) + { + TotalSize += Size; + } + PartManifestWriter.AddInteger("totalSize", TotalSize); + + PartManifestWriter.BeginObject("files"sv); + { + compactbinary_helpers::WriteArray(Paths, "paths"sv, PartManifestWriter); + compactbinary_helpers::WriteArray(RawHashes, "rawhashes"sv, PartManifestWriter); + compactbinary_helpers::WriteArray(RawSizes, "rawsizes"sv, PartManifestWriter); + if (Platform == SourcePlatform::Windows) + { + compactbinary_helpers::WriteArray(Attributes, "attributes"sv, PartManifestWriter); + } + if (Platform == SourcePlatform::Linux || Platform == SourcePlatform::MacOS) + { + compactbinary_helpers::WriteArray(Attributes, "mode"sv, PartManifestWriter); + } + } + PartManifestWriter.EndObject(); // files + + PartManifestWriter.BeginObject("chunkedContent"); + { + compactbinary_helpers::WriteArray(SequenceRawHashes, "sequenceRawHashes"sv, PartManifestWriter); + compactbinary_helpers::WriteArray(ChunkCounts, "chunkcounts"sv, PartManifestWriter); + compactbinary_helpers::WriteArray(AbsoluteChunkOrders, "chunkorders"sv, PartManifestWriter); + } + PartManifestWriter.EndObject(); // chunkedContent + + size_t LooseChunkCount = LooseLocalChunkIndexes.size(); + if (LooseChunkCount > 0) + { + PartManifestWriter.BeginObject("chunkAttachments"); + { + PartManifestWriter.BeginArray("rawHashes"sv); + for (uint32_t ChunkIndex : LooseLocalChunkIndexes) + { + PartManifestWriter.AddBinaryAttachment(LocalChunkHashes[ChunkIndex]); + } + PartManifestWriter.EndArray(); // rawHashes + + PartManifestWriter.BeginArray("chunkRawSizes"sv); + for (uint32_t ChunkIndex : LooseLocalChunkIndexes) + { + PartManifestWriter.AddInteger(LocalChunkRawSizes[ChunkIndex]); + } + PartManifestWriter.EndArray(); // chunkSizes + } + PartManifestWriter.EndObject(); // + } + + if (BlockHashes.size() > 0) + { + PartManifestWriter.BeginObject("blockAttachments"); + { + compactbinary_helpers::WriteBinaryAttachmentArray(BlockHashes, "rawHashes"sv, PartManifestWriter); + } + PartManifestWriter.EndObject(); // blocks + } + } + + void ReadBuildContentFromCompactBinary(CbObjectView BuildPartManifest, + SourcePlatform& OutPlatform, + std::vector<std::filesystem::path>& OutPaths, + std::vector<IoHash>& OutRawHashes, + std::vector<uint64_t>& OutRawSizes, + std::vector<uint32_t>& OutAttributes, + std::vector<IoHash>& OutSequenceRawHashes, + std::vector<uint32_t>& OutChunkCounts, + std::vector<uint32_t>& OutAbsoluteChunkOrders, + std::vector<IoHash>& OutLooseChunkHashes, + std::vector<uint64_t>& OutLooseChunkRawSizes, + std::vector<IoHash>& OutBlockRawHashes) + { + OutPlatform = FromString(BuildPartManifest["platform"sv].AsString(), SourcePlatform::_Count); + + CbObjectView FilesObject = BuildPartManifest["files"sv].AsObjectView(); + + compactbinary_helpers::ReadArray("paths"sv, FilesObject, OutPaths); + compactbinary_helpers::ReadArray("rawhashes"sv, FilesObject, OutRawHashes); + compactbinary_helpers::ReadArray("rawsizes"sv, FilesObject, OutRawSizes); + + uint64_t PathCount = OutPaths.size(); + if (OutRawHashes.size() != PathCount) + { + throw std::runtime_error(fmt::format("Number of raw hashes entries does not match number of paths")); + } + if (OutRawSizes.size() != PathCount) + { + throw std::runtime_error(fmt::format("Number of raw sizes entries does not match number of paths")); + } + + std::vector<uint32_t> ModeArray; + compactbinary_helpers::ReadArray("mode"sv, FilesObject, ModeArray); + if (ModeArray.size() != PathCount && ModeArray.size() != 0) + { + throw std::runtime_error(fmt::format("Number of attribute entries does not match number of paths")); + } + + std::vector<uint32_t> AttributeArray; + compactbinary_helpers::ReadArray("attributes"sv, FilesObject, ModeArray); + if (AttributeArray.size() != PathCount && AttributeArray.size() != 0) + { + throw std::runtime_error(fmt::format("Number of attribute entries does not match number of paths")); + } + + if (ModeArray.size() > 0) + { + if (OutPlatform == SourcePlatform::_Count) + { + OutPlatform = SourcePlatform::Linux; // Best guess - under dev format + } + OutAttributes = std::move(ModeArray); + } + else if (AttributeArray.size() > 0) + { + if (OutPlatform == SourcePlatform::_Count) + { + OutPlatform = SourcePlatform::Windows; + } + OutAttributes = std::move(AttributeArray); + } + else + { + if (OutPlatform == SourcePlatform::_Count) + { + OutPlatform = GetSourceCurrentPlatform(); + } + } + + if (FilesObject["chunkcounts"sv]) + { + // Legacy style + + std::vector<uint32_t> LegacyChunkCounts; + compactbinary_helpers::ReadArray("chunkcounts"sv, FilesObject, LegacyChunkCounts); + if (LegacyChunkCounts.size() != PathCount) + { + throw std::runtime_error(fmt::format("Number of chunk count entries does not match number of paths")); + } + std::vector<uint32_t> LegacyAbsoluteChunkOrders; + compactbinary_helpers::ReadArray("chunkorders"sv, FilesObject, LegacyAbsoluteChunkOrders); + + CbArrayView ChunkOrdersArray = BuildPartManifest["chunkorders"sv].AsArrayView(); + const uint64_t ChunkOrdersCount = ChunkOrdersArray.Num(); + + tsl::robin_set<IoHash, IoHash::Hasher> FoundRawHashes; + FoundRawHashes.reserve(PathCount); + + OutChunkCounts.reserve(PathCount); + OutAbsoluteChunkOrders.reserve(ChunkOrdersCount); + + uint32_t OrderIndexOffset = 0; + for (uint32_t PathIndex = 0; PathIndex < OutPaths.size(); PathIndex++) + { + const IoHash& PathRawHash = OutRawHashes[PathIndex]; + uint32_t LegacyChunkCount = LegacyChunkCounts[PathIndex]; + + if (FoundRawHashes.insert(PathRawHash).second) + { + OutSequenceRawHashes.push_back(PathRawHash); + OutChunkCounts.push_back(LegacyChunkCount); + std::span<uint32_t> AbsoluteChunkOrder = + std::span<uint32_t>(LegacyAbsoluteChunkOrders).subspan(OrderIndexOffset, LegacyChunkCount); + OutAbsoluteChunkOrders.insert(OutAbsoluteChunkOrders.end(), AbsoluteChunkOrder.begin(), AbsoluteChunkOrder.end()); + } + OrderIndexOffset += LegacyChunkCounts[PathIndex]; + } + } + if (CbObjectView ChunkContentView = BuildPartManifest["chunkedContent"sv].AsObjectView(); ChunkContentView) + { + compactbinary_helpers::ReadArray("sequenceRawHashes"sv, ChunkContentView, OutSequenceRawHashes); + compactbinary_helpers::ReadArray("chunkcounts"sv, ChunkContentView, OutChunkCounts); + if (OutChunkCounts.size() != OutSequenceRawHashes.size()) + { + throw std::runtime_error(fmt::format("Number of chunk count entries does not match number of paths")); + } + compactbinary_helpers::ReadArray("chunkorders"sv, ChunkContentView, OutAbsoluteChunkOrders); + } + + CbObjectView ChunkAttachmentsView = BuildPartManifest["chunkAttachments"sv].AsObjectView(); + { + compactbinary_helpers::ReadBinaryAttachmentArray("rawHashes"sv, ChunkAttachmentsView, OutLooseChunkHashes); + compactbinary_helpers::ReadArray("chunkRawSizes"sv, ChunkAttachmentsView, OutLooseChunkRawSizes); + if (OutLooseChunkHashes.size() != OutLooseChunkRawSizes.size()) + { + throw std::runtime_error( + fmt::format("Number of attachment chunk hashes does not match number of attachemnt chunk raw sizes")); + } + } + + CbObjectView BlocksView = BuildPartManifest["blockAttachments"sv].AsObjectView(); + { + compactbinary_helpers::ReadBinaryAttachmentArray("rawHashes"sv, BlocksView, OutBlockRawHashes); + } + } + + bool ReadStateObject(CbObjectView StateView, + Oid& OutBuildId, + std::vector<Oid>& BuildPartsIds, + std::vector<std::string>& BuildPartsNames, + std::vector<ChunkedFolderContent>& OutPartContents, + FolderContent& OutLocalFolderState) + { + try + { + CbObjectView BuildView = StateView["builds"sv].AsArrayView().CreateViewIterator().AsObjectView(); + OutBuildId = BuildView["buildId"sv].AsObjectId(); + for (CbFieldView PartView : BuildView["parts"sv].AsArrayView()) + { + CbObjectView PartObjectView = PartView.AsObjectView(); + BuildPartsIds.push_back(PartObjectView["partId"sv].AsObjectId()); + BuildPartsNames.push_back(std::string(PartObjectView["partName"sv].AsString())); + OutPartContents.push_back(LoadChunkedFolderContentToCompactBinary(PartObjectView["content"sv].AsObjectView())); + } + OutLocalFolderState = LoadFolderContentToCompactBinary(StateView["localFolderState"sv].AsObjectView()); + return true; + } + catch (const std::exception& Ex) + { + ZEN_CONSOLE("Unable to read local state: ", Ex.what()); + return false; + } + } + + CbObject CreateStateObject(const Oid& BuildId, + std::vector<std::pair<Oid, std::string>> AllBuildParts, + std::span<const ChunkedFolderContent> PartContents, + const FolderContent& LocalFolderState) + { + CbObjectWriter CurrentStateWriter; + CurrentStateWriter.BeginArray("builds"sv); + { + CurrentStateWriter.BeginObject(); + { + CurrentStateWriter.AddObjectId("buildId"sv, BuildId); + CurrentStateWriter.BeginArray("parts"sv); + for (size_t PartIndex = 0; PartIndex < AllBuildParts.size(); PartIndex++) + { + const Oid BuildPartId = AllBuildParts[PartIndex].first; + CurrentStateWriter.BeginObject(); + { + CurrentStateWriter.AddObjectId("partId"sv, BuildPartId); + CurrentStateWriter.AddString("partName"sv, AllBuildParts[PartIndex].second); + CurrentStateWriter.BeginObject("content"); + { + SaveChunkedFolderContentToCompactBinary(PartContents[PartIndex], CurrentStateWriter); + } + CurrentStateWriter.EndObject(); + } + CurrentStateWriter.EndObject(); + } + CurrentStateWriter.EndArray(); // parts + } + CurrentStateWriter.EndObject(); + } + CurrentStateWriter.EndArray(); // builds + + CurrentStateWriter.BeginObject("localFolderState"sv); + { + SaveFolderContentToCompactBinary(LocalFolderState, CurrentStateWriter); + } + CurrentStateWriter.EndObject(); // localFolderState + + return CurrentStateWriter.Save(); + } + + class BufferedOpenFile + { + public: + BufferedOpenFile(const std::filesystem::path Path) : Source(Path, BasicFile::Mode::kRead), SourceSize(Source.FileSize()) {} + BufferedOpenFile() = delete; + BufferedOpenFile(const BufferedOpenFile&) = delete; + BufferedOpenFile(BufferedOpenFile&&) = delete; + BufferedOpenFile& operator=(BufferedOpenFile&&) = delete; + BufferedOpenFile& operator=(const BufferedOpenFile&) = delete; + + const uint64_t BlockSize = 256u * 1024u; + CompositeBuffer GetRange(uint64_t Offset, uint64_t Size) + { + ZEN_ASSERT((CacheBlockIndex == (uint64_t)-1) || Cache); + auto _ = MakeGuard([&]() { ZEN_ASSERT((CacheBlockIndex == (uint64_t)-1) || Cache); }); + + ZEN_ASSERT((Offset + Size) <= SourceSize); + const uint64_t BlockIndexStart = Offset / BlockSize; + const uint64_t BlockIndexEnd = (Offset + Size) / BlockSize; + + std::vector<SharedBuffer> BufferRanges; + BufferRanges.reserve(BlockIndexEnd - BlockIndexStart + 1); + + uint64_t ReadOffset = Offset; + for (uint64_t BlockIndex = BlockIndexStart; BlockIndex <= BlockIndexEnd; BlockIndex++) + { + const uint64_t BlockStartOffset = BlockIndex * BlockSize; + if (CacheBlockIndex != BlockIndex) + { + uint64_t CacheSize = Min(BlockSize, SourceSize - BlockStartOffset); + ZEN_ASSERT(CacheSize > 0); + Cache = IoBuffer(CacheSize); + Source.Read(Cache.GetMutableView().GetData(), CacheSize, BlockStartOffset); + CacheBlockIndex = BlockIndex; + } + + const uint64_t BytesRead = ReadOffset - Offset; + ZEN_ASSERT(BlockStartOffset <= ReadOffset); + const uint64_t OffsetIntoBlock = ReadOffset - BlockStartOffset; + ZEN_ASSERT(OffsetIntoBlock < Cache.GetSize()); + const uint64_t BlockBytes = Min(Cache.GetSize() - OffsetIntoBlock, Size - BytesRead); + BufferRanges.emplace_back(SharedBuffer(IoBuffer(Cache, OffsetIntoBlock, BlockBytes))); + ReadOffset += BlockBytes; + } + CompositeBuffer Result(std::move(BufferRanges)); + ZEN_ASSERT(Result.GetSize() == Size); + return Result; + } + + private: + BasicFile Source; + const uint64_t SourceSize; + uint64_t CacheBlockIndex = (uint64_t)-1; + IoBuffer Cache; + }; + + class ReadFileCache + { + public: + // A buffered file reader that provides CompositeBuffer where the buffers are owned and the memory never overwritten + ReadFileCache(DiskStatistics& DiskStats, + const std::filesystem::path& Path, + const std::vector<std::filesystem::path>& Paths, + const std::vector<uint64_t>& RawSizes, + size_t MaxOpenFileCount) + : m_Path(Path) + , m_Paths(Paths) + , m_RawSizes(RawSizes) + , m_DiskStats(DiskStats) + { + m_OpenFiles.reserve(MaxOpenFileCount); + } + ~ReadFileCache() + { + m_DiskStats.CurrentOpenFileCount -= m_OpenFiles.size(); + m_OpenFiles.clear(); + } + + CompositeBuffer GetRange(uint32_t PathIndex, uint64_t Offset, uint64_t Size) + { + auto CacheIt = + std::find_if(m_OpenFiles.begin(), m_OpenFiles.end(), [PathIndex](const auto& Lhs) { return Lhs.first == PathIndex; }); + if (CacheIt != m_OpenFiles.end()) + { + if (CacheIt != m_OpenFiles.begin()) + { + auto CachedFile(std::move(CacheIt->second)); + m_OpenFiles.erase(CacheIt); + m_OpenFiles.insert(m_OpenFiles.begin(), std::make_pair(PathIndex, std::move(CachedFile))); + } + CompositeBuffer Result = m_OpenFiles.front().second->GetRange(Offset, Size); + m_DiskStats.ReadByteCount += Result.GetSize(); + return Result; + } + const std::filesystem::path AttachmentPath = (m_Path / m_Paths[PathIndex]).make_preferred(); + if (Size == m_RawSizes[PathIndex]) + { + IoBuffer Result = IoBufferBuilder::MakeFromFile(AttachmentPath); + m_DiskStats.OpenReadCount++; + m_DiskStats.ReadByteCount += Result.GetSize(); + return CompositeBuffer(SharedBuffer(Result)); + } + if (m_OpenFiles.size() == m_OpenFiles.capacity()) + { + m_OpenFiles.pop_back(); + m_DiskStats.CurrentOpenFileCount--; + } + m_OpenFiles.insert(m_OpenFiles.begin(), std::make_pair(PathIndex, std::make_unique<BufferedOpenFile>(AttachmentPath))); + CompositeBuffer Result = m_OpenFiles.front().second->GetRange(Offset, Size); + m_DiskStats.ReadByteCount += Result.GetSize(); + m_DiskStats.OpenReadCount++; + m_DiskStats.CurrentOpenFileCount++; + return Result; + } + + private: + const std::filesystem::path m_Path; + const std::vector<std::filesystem::path>& m_Paths; + const std::vector<uint64_t>& m_RawSizes; + std::vector<std::pair<uint32_t, std::unique_ptr<BufferedOpenFile>>> m_OpenFiles; + DiskStatistics& m_DiskStats; + }; + + CompositeBuffer ValidateBlob(BuildStorage& Storage, + const Oid& BuildId, + const IoHash& BlobHash, + uint64_t& OutCompressedSize, + uint64_t& OutDecompressedSize) + { + IoBuffer Payload = Storage.GetBuildBlob(BuildId, BlobHash); + if (!Payload) + { + throw std::runtime_error(fmt::format("Blob {} could not be found", BlobHash)); + } + if (Payload.GetContentType() != ZenContentType::kCompressedBinary) + { + throw std::runtime_error(fmt::format("Blob {} ({} bytes) has unexpected content type '{}'", + BlobHash, + Payload.GetSize(), + ToString(Payload.GetContentType()))); + } + IoHash RawHash; + uint64_t RawSize; + CompressedBuffer Compressed = CompressedBuffer::FromCompressed(SharedBuffer(Payload), RawHash, RawSize); + if (!Compressed) + { + throw std::runtime_error(fmt::format("Blob {} ({} bytes) compressed header is invalid", BlobHash, Payload.GetSize())); + } + if (RawHash != BlobHash) + { + throw std::runtime_error( + fmt::format("Blob {} ({} bytes) compressed header has a mismatching raw hash {}", BlobHash, Payload.GetSize(), RawHash)); + } + SharedBuffer Decompressed = Compressed.Decompress(); + if (!Decompressed) + { + throw std::runtime_error( + fmt::format("Blob {} ({} bytes) failed to decompress - header information mismatch", BlobHash, Payload.GetSize())); + } + IoHash ValidateRawHash = IoHash::HashBuffer(Decompressed); + if (ValidateRawHash != BlobHash) + { + throw std::runtime_error(fmt::format("Blob {} ({} bytes) decompressed hash {} does not match header information", + BlobHash, + Payload.GetSize(), + ValidateRawHash)); + } + CompositeBuffer DecompressedComposite = Compressed.DecompressToComposite(); + if (!DecompressedComposite) + { + throw std::runtime_error(fmt::format("Blob {} ({} bytes) failed to decompress to composite", BlobHash, Payload.GetSize())); + } + OutCompressedSize = Payload.GetSize(); + OutDecompressedSize = RawSize; + return DecompressedComposite; + } + + ChunkBlockDescription ValidateChunkBlock(BuildStorage& Storage, + const Oid& BuildId, + const IoHash& BlobHash, + uint64_t& OutCompressedSize, + uint64_t& OutDecompressedSize) + { + CompositeBuffer BlockBuffer = ValidateBlob(Storage, BuildId, BlobHash, OutCompressedSize, OutDecompressedSize); + return GetChunkBlockDescription(BlockBuffer.Flatten(), BlobHash); + } + + CompositeBuffer FetchChunk(const ChunkedFolderContent& Content, + const ChunkedContentLookup& Lookup, + const IoHash& ChunkHash, + ReadFileCache& OpenFileCache) + { + auto It = Lookup.ChunkHashToChunkIndex.find(ChunkHash); + ZEN_ASSERT(It != Lookup.ChunkHashToChunkIndex.end()); + uint32_t ChunkIndex = It->second; + std::span<const ChunkedContentLookup::ChunkLocation> ChunkLocations = GetChunkLocations(Lookup, ChunkIndex); + ZEN_ASSERT(!ChunkLocations.empty()); + CompositeBuffer Chunk = + OpenFileCache.GetRange(ChunkLocations[0].PathIndex, ChunkLocations[0].Offset, Content.ChunkedContent.ChunkRawSizes[ChunkIndex]); + ZEN_ASSERT_SLOW(IoHash::HashBuffer(Chunk) == ChunkHash); + return Chunk; + }; + + CompressedBuffer GenerateBlock(const std::filesystem::path& Path, + const ChunkedFolderContent& Content, + const ChunkedContentLookup& Lookup, + const std::vector<uint32_t>& ChunksInBlock, + ChunkBlockDescription& OutBlockDescription, + DiskStatistics& DiskStats) + { + ReadFileCache OpenFileCache(DiskStats, Path, Content.Paths, Content.RawSizes, 4); + + std::vector<std::pair<IoHash, FetchChunkFunc>> BlockContent; + BlockContent.reserve(ChunksInBlock.size()); + for (uint32_t ChunkIndex : ChunksInBlock) + { + BlockContent.emplace_back(std::make_pair( + Content.ChunkedContent.ChunkHashes[ChunkIndex], + [&Content, &Lookup, &OpenFileCache, ChunkIndex](const IoHash& ChunkHash) -> std::pair<uint64_t, CompressedBuffer> { + CompositeBuffer Chunk = FetchChunk(Content, Lookup, ChunkHash, OpenFileCache); + if (!Chunk) + { + ZEN_ASSERT(false); + } + uint64_t RawSize = Chunk.GetSize(); + return {RawSize, CompressedBuffer::Compress(Chunk, OodleCompressor::Mermaid, OodleCompressionLevel::None)}; + })); + } + + return GenerateChunkBlock(std::move(BlockContent), OutBlockDescription); + }; + + void ArrangeChunksIntoBlocks(const ChunkedFolderContent& Content, + const ChunkedContentLookup& Lookup, + uint64_t MaxBlockSize, + std::vector<uint32_t>& ChunkIndexes, + std::vector<std::vector<uint32_t>>& OutBlocks) + { + std::sort(ChunkIndexes.begin(), ChunkIndexes.end(), [&Content, &Lookup](uint32_t Lhs, uint32_t Rhs) { + const ChunkedContentLookup::ChunkLocation& LhsLocation = GetChunkLocations(Lookup, Lhs)[0]; + const ChunkedContentLookup::ChunkLocation& RhsLocation = GetChunkLocations(Lookup, Rhs)[0]; + if (LhsLocation.PathIndex < RhsLocation.PathIndex) + { + return true; + } + else if (LhsLocation.PathIndex > RhsLocation.PathIndex) + { + return false; + } + return LhsLocation.Offset < RhsLocation.Offset; + }); + + uint64_t MaxBlockSizeLowThreshold = MaxBlockSize - (MaxBlockSize / 16); + + uint64_t BlockSize = 0; + + uint32_t ChunkIndexStart = 0; + for (uint32_t ChunkIndexOffset = 0; ChunkIndexOffset < ChunkIndexes.size();) + { + const uint32_t ChunkIndex = ChunkIndexes[ChunkIndexOffset]; + const uint64_t ChunkSize = Content.ChunkedContent.ChunkRawSizes[ChunkIndex]; + + if ((BlockSize + ChunkSize) > MaxBlockSize) + { + // Within the span of MaxBlockSizeLowThreshold and MaxBlockSize, see if there is a break + // between source paths for chunks. Break the block at the last such break if any. + ZEN_ASSERT(ChunkIndexOffset > ChunkIndexStart); + + const uint32_t ChunkPathIndex = Lookup.ChunkLocations[Lookup.ChunkLocationOffset[ChunkIndex]].PathIndex; + + uint64_t ScanBlockSize = BlockSize; + + uint32_t ScanChunkIndexOffset = ChunkIndexOffset - 1; + while (ScanChunkIndexOffset > (ChunkIndexStart + 2)) + { + const uint32_t TestChunkIndex = ChunkIndexes[ScanChunkIndexOffset]; + const uint64_t TestChunkSize = Content.ChunkedContent.ChunkRawSizes[TestChunkIndex]; + if ((ScanBlockSize - TestChunkSize) < MaxBlockSizeLowThreshold) + { + break; + } + + const uint32_t TestPathIndex = Lookup.ChunkLocations[Lookup.ChunkLocationOffset[TestChunkIndex]].PathIndex; + if (ChunkPathIndex != TestPathIndex) + { + ChunkIndexOffset = ScanChunkIndexOffset + 1; + break; + } + + ScanBlockSize -= TestChunkSize; + ScanChunkIndexOffset--; + } + + std::vector<uint32_t> ChunksInBlock; + ChunksInBlock.reserve(ChunkIndexOffset - ChunkIndexStart); + for (uint32_t AddIndexOffset = ChunkIndexStart; AddIndexOffset < ChunkIndexOffset; AddIndexOffset++) + { + const uint32_t AddChunkIndex = ChunkIndexes[AddIndexOffset]; + ChunksInBlock.push_back(AddChunkIndex); + } + OutBlocks.emplace_back(std::move(ChunksInBlock)); + BlockSize = 0; + ChunkIndexStart = ChunkIndexOffset; + } + else + { + ChunkIndexOffset++; + BlockSize += ChunkSize; + } + } + if (ChunkIndexStart < ChunkIndexes.size()) + { + std::vector<uint32_t> ChunksInBlock; + ChunksInBlock.reserve(ChunkIndexes.size() - ChunkIndexStart); + for (uint32_t AddIndexOffset = ChunkIndexStart; AddIndexOffset < ChunkIndexes.size(); AddIndexOffset++) + { + const uint32_t AddChunkIndex = ChunkIndexes[AddIndexOffset]; + ChunksInBlock.push_back(AddChunkIndex); + } + OutBlocks.emplace_back(std::move(ChunksInBlock)); + } + } + + CompositeBuffer CompressChunk(const std::filesystem::path& Path, + const ChunkedFolderContent& Content, + const ChunkedContentLookup& Lookup, + uint32_t ChunkIndex, + const std::filesystem::path& TempFolderPath) + { + const IoHash& ChunkHash = Content.ChunkedContent.ChunkHashes[ChunkIndex]; + const uint64_t ChunkSize = Content.ChunkedContent.ChunkRawSizes[ChunkIndex]; + + const ChunkedContentLookup::ChunkLocation& Source = GetChunkLocations(Lookup, ChunkIndex)[0]; + + IoBuffer RawSource = + IoBufferBuilder::MakeFromFile((Path / Content.Paths[Source.PathIndex]).make_preferred(), Source.Offset, ChunkSize); + if (!RawSource) + { + throw std::runtime_error(fmt::format("Failed fetching chunk {}", ChunkHash)); + } + if (RawSource.GetSize() != ChunkSize) + { + throw std::runtime_error(fmt::format("Fetched chunk {} has invalid size", ChunkHash)); + } + ZEN_ASSERT_SLOW(IoHash::HashBuffer(RawSource) == ChunkHash); + + CompressedBuffer CompressedBlob = CompressedBuffer::Compress(SharedBuffer(std::move(RawSource))); + if (!CompressedBlob) + { + throw std::runtime_error(fmt::format("Failed decompressing chunk {}", ChunkHash)); + } + if (TempFolderPath.empty()) + { + return CompressedBlob.GetCompressed().MakeOwned(); + } + else + { + CompositeBuffer TempPayload = WriteToTempFileIfNeeded(CompressedBlob.GetCompressed(), TempFolderPath, ChunkHash); + return CompressedBuffer::FromCompressedNoValidate(std::move(TempPayload)).GetCompressed(); + } + } + + struct GeneratedBlocks + { + std::vector<ChunkBlockDescription> BlockDescriptions; + std::vector<uint64_t> BlockSizes; + std::vector<CompositeBuffer> BlockBuffers; + std::vector<CbObject> BlockMetaDatas; + std::vector<bool> MetaDataHasBeenUploaded; + tsl::robin_map<IoHash, size_t, IoHash::Hasher> BlockHashToBlockIndex; + }; + + void GenerateBuildBlocks(const std::filesystem::path& Path, + const ChunkedFolderContent& Content, + const ChunkedContentLookup& Lookup, + BuildStorage& Storage, + const Oid& BuildId, + std::atomic<bool>& AbortFlag, + const std::vector<std::vector<uint32_t>>& NewBlockChunks, + GeneratedBlocks& OutBlocks, + DiskStatistics& DiskStats, + UploadStatistics& UploadStats, + GenerateBlocksStatistics& GenerateBlocksStats) + { + const std::size_t NewBlockCount = NewBlockChunks.size(); + if (NewBlockCount > 0) + { + ProgressBar ProgressBar(UsePlainProgress); + + OutBlocks.BlockDescriptions.resize(NewBlockCount); + OutBlocks.BlockSizes.resize(NewBlockCount); + OutBlocks.BlockBuffers.resize(NewBlockCount); + OutBlocks.BlockMetaDatas.resize(NewBlockCount); + OutBlocks.MetaDataHasBeenUploaded.resize(NewBlockCount, false); + OutBlocks.BlockHashToBlockIndex.reserve(NewBlockCount); + + RwLock Lock; + + WorkerThreadPool& GenerateBlobsPool = GetMediumWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool();// + WorkerThreadPool& UploadBlocksPool = GetSmallWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool();// + + FilteredRate FilteredGeneratedBytesPerSecond; + FilteredRate FilteredUploadedBytesPerSecond; + + ParallellWork Work(AbortFlag); + + std::atomic<uint32_t> PendingUploadCount(0); + + for (size_t BlockIndex = 0; BlockIndex < NewBlockCount; BlockIndex++) + { + if (Work.IsAborted()) + { + break; + } + const std::vector<uint32_t>& ChunksInBlock = NewBlockChunks[BlockIndex]; + Work.ScheduleWork( + GenerateBlobsPool, + [&, BlockIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + FilteredGeneratedBytesPerSecond.Start(); + // TODO: Convert ScheduleWork body to function + + CompressedBuffer CompressedBlock = + GenerateBlock(Path, Content, Lookup, ChunksInBlock, OutBlocks.BlockDescriptions[BlockIndex], DiskStats); + ZEN_CONSOLE_VERBOSE("Generated block {} ({}) containing {} chunks", + OutBlocks.BlockDescriptions[BlockIndex].BlockHash, + NiceBytes(CompressedBlock.GetCompressedSize()), + OutBlocks.BlockDescriptions[BlockIndex].ChunkRawHashes.size()); + + OutBlocks.BlockSizes[BlockIndex] = CompressedBlock.GetCompressedSize(); + + CompositeBuffer Payload = WriteToTempFileIfNeeded(CompressedBlock.GetCompressed(), + Path / ZenTempBlockFolderName, + OutBlocks.BlockDescriptions[BlockIndex].BlockHash); + { + CbObjectWriter Writer; + Writer.AddString("createdBy", "zen"); + OutBlocks.BlockMetaDatas[BlockIndex] = Writer.Save(); + } + GenerateBlocksStats.GeneratedBlockByteCount += OutBlocks.BlockSizes[BlockIndex]; + GenerateBlocksStats.GeneratedBlockCount++; + + Lock.WithExclusiveLock([&]() { + OutBlocks.BlockHashToBlockIndex.insert_or_assign(OutBlocks.BlockDescriptions[BlockIndex].BlockHash, + BlockIndex); + }); + + if (GenerateBlocksStats.GeneratedBlockCount == NewBlockCount) + { + FilteredGeneratedBytesPerSecond.Stop(); + } + + if (!AbortFlag) + { + PendingUploadCount++; + Work.ScheduleWork( + UploadBlocksPool, + [&, BlockIndex, Payload = std::move(Payload)](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + if (GenerateBlocksStats.GeneratedBlockCount == NewBlockCount) + { + FilteredUploadedBytesPerSecond.Stop(); + OutBlocks.BlockBuffers[BlockIndex] = std::move(Payload); + } + else + { + FilteredUploadedBytesPerSecond.Start(); + // TODO: Convert ScheduleWork body to function + + PendingUploadCount--; + + const CbObject BlockMetaData = + BuildChunkBlockDescription(OutBlocks.BlockDescriptions[BlockIndex], + OutBlocks.BlockMetaDatas[BlockIndex]); + + const IoHash& BlockHash = OutBlocks.BlockDescriptions[BlockIndex].BlockHash; + Storage.PutBuildBlob(BuildId, BlockHash, ZenContentType::kCompressedBinary, Payload); + UploadStats.BlocksBytes += Payload.GetSize(); + ZEN_CONSOLE_VERBOSE("Uploaded block {} ({}) containing {} chunks", + OutBlocks.BlockDescriptions[BlockIndex].BlockHash, + NiceBytes(Payload.GetSize()), + OutBlocks.BlockDescriptions[BlockIndex].ChunkRawHashes.size()); + + Storage.PutBlockMetadata(BuildId, + OutBlocks.BlockDescriptions[BlockIndex].BlockHash, + BlockMetaData); + ZEN_CONSOLE_VERBOSE("Uploaded block {} metadata ({})", + OutBlocks.BlockDescriptions[BlockIndex].BlockHash, + NiceBytes(BlockMetaData.GetSize())); + + OutBlocks.MetaDataHasBeenUploaded[BlockIndex] = true; + + UploadStats.BlocksBytes += BlockMetaData.GetSize(); + UploadStats.BlockCount++; + if (UploadStats.BlockCount == NewBlockCount) + { + FilteredUploadedBytesPerSecond.Stop(); + } + } + } + }, + [&](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed uploading block. Reason: {}", Ex.what()); + AbortFlag = true; + }); + } + } + }, + [&](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed generating block. Reason: {}", Ex.what()); + AbortFlag = true; + }); + } + + Work.Wait(UsePlainProgress ? 5000 : 200, [&](bool IsAborted, std::ptrdiff_t PendingWork) { + ZEN_UNUSED(IsAborted, PendingWork); + + FilteredGeneratedBytesPerSecond.Update(GenerateBlocksStats.GeneratedBlockByteCount.load()); + FilteredUploadedBytesPerSecond.Update(UploadStats.BlocksBytes.load()); + + std::string Details = fmt::format("Generated {}/{} ({}, {}B/s) and uploaded {}/{} ({}, {}bits/s) blocks", + GenerateBlocksStats.GeneratedBlockCount.load(), + NewBlockCount, + NiceBytes(GenerateBlocksStats.GeneratedBlockByteCount.load()), + NiceNum(FilteredGeneratedBytesPerSecond.GetCurrent()), + UploadStats.BlockCount.load(), + NewBlockCount, + NiceBytes(UploadStats.BlocksBytes.load()), + NiceNum(FilteredUploadedBytesPerSecond.GetCurrent() * 8)); + + ProgressBar.UpdateState( + {.Task = "Generating blocks", + .Details = Details, + .TotalCount = gsl::narrow<uint64_t>(NewBlockCount), + .RemainingCount = gsl::narrow<uint64_t>(NewBlockCount - GenerateBlocksStats.GeneratedBlockCount.load())}, + false); + }); + + ProgressBar.Finish(); + + GenerateBlocksStats.GenerateBlocksElapsedWallTimeUS = FilteredGeneratedBytesPerSecond.GetElapsedTime(); + UploadStats.ElapsedWallTimeUS = FilteredUploadedBytesPerSecond.GetElapsedTime(); + } + } + + void UploadPartBlobs(BuildStorage& Storage, + const Oid& BuildId, + const std::filesystem::path& Path, + const ChunkedFolderContent& Content, + const ChunkedContentLookup& Lookup, + std::span<IoHash> RawHashes, + const std::vector<std::vector<uint32_t>>& NewBlockChunks, + GeneratedBlocks& NewBlocks, + std::span<const uint32_t> LooseChunkIndexes, + const std::uint64_t LargeAttachmentSize, + std::atomic<bool>& AbortFlag, + DiskStatistics& DiskStats, + UploadStatistics& UploadStats, + GenerateBlocksStatistics& GenerateBlocksStats, + LooseChunksStatistics& LooseChunksStats) + { + { + ProgressBar ProgressBar(UsePlainProgress); + + WorkerThreadPool& ReadChunkPool = GetMediumWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool(); // + WorkerThreadPool& UploadChunkPool = GetSmallWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool(); // + + FilteredRate FilteredGenerateBlockBytesPerSecond; + FilteredRate FilteredCompressedBytesPerSecond; + FilteredRate FilteredUploadedBytesPerSecond; + + ParallellWork Work(AbortFlag); + + std::atomic<size_t> UploadedBlockSize = 0; + std::atomic<size_t> UploadedBlockCount = 0; + std::atomic<size_t> UploadedChunkSize = 0; + std::atomic<uint32_t> UploadedChunkCount = 0; + + tsl::robin_map<uint32_t, uint32_t> ChunkIndexToLooseChunkOrderIndex; + ChunkIndexToLooseChunkOrderIndex.reserve(LooseChunkIndexes.size()); + for (uint32_t OrderIndex = 0; OrderIndex < LooseChunkIndexes.size(); OrderIndex++) + { + ChunkIndexToLooseChunkOrderIndex.insert_or_assign(LooseChunkIndexes[OrderIndex], OrderIndex); + } + + std::vector<IoHash> FoundChunkHashes; + FoundChunkHashes.reserve(RawHashes.size()); + + std::vector<size_t> BlockIndexes; + std::vector<uint32_t> LooseChunkOrderIndexes; + + uint64_t TotalChunksSize = 0; + uint64_t TotalBlocksSize = 0; + for (const IoHash& RawHash : RawHashes) + { + if (auto It = NewBlocks.BlockHashToBlockIndex.find(RawHash); It != NewBlocks.BlockHashToBlockIndex.end()) + { + BlockIndexes.push_back(It->second); + TotalBlocksSize += NewBlocks.BlockSizes[It->second]; + } + if (auto ChunkIndexIt = Lookup.ChunkHashToChunkIndex.find(RawHash); ChunkIndexIt != Lookup.ChunkHashToChunkIndex.end()) + { + const uint32_t ChunkIndex = ChunkIndexIt->second; + if (auto LooseOrderIndexIt = ChunkIndexToLooseChunkOrderIndex.find(ChunkIndex); + LooseOrderIndexIt != ChunkIndexToLooseChunkOrderIndex.end()) + { + LooseChunkOrderIndexes.push_back(LooseOrderIndexIt->second); + TotalChunksSize += Content.ChunkedContent.ChunkRawSizes[ChunkIndex]; + } + } + } + uint64_t TotalSize = TotalChunksSize + TotalBlocksSize; + + const size_t UploadBlockCount = BlockIndexes.size(); + const uint32_t UploadChunkCount = gsl::narrow<uint32_t>(LooseChunkOrderIndexes.size()); + + auto AsyncUploadBlock = [&](const size_t BlockIndex, const IoHash BlockHash, CompositeBuffer&& Payload) { + Work.ScheduleWork( + UploadChunkPool, + [&, BlockIndex, BlockHash, Payload = CompositeBuffer(std::move(Payload))](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + FilteredUploadedBytesPerSecond.Start(); + const CbObject BlockMetaData = + BuildChunkBlockDescription(NewBlocks.BlockDescriptions[BlockIndex], NewBlocks.BlockMetaDatas[BlockIndex]); + + Storage.PutBuildBlob(BuildId, BlockHash, ZenContentType::kCompressedBinary, Payload); + ZEN_CONSOLE_VERBOSE("Uploaded block {} ({}) containing {} chunks", + NewBlocks.BlockDescriptions[BlockIndex].BlockHash, + NiceBytes(Payload.GetSize()), + NewBlocks.BlockDescriptions[BlockIndex].ChunkRawHashes.size()); + UploadedBlockSize += Payload.GetSize(); + UploadStats.BlocksBytes += Payload.GetSize(); + + Storage.PutBlockMetadata(BuildId, BlockHash, BlockMetaData); + ZEN_CONSOLE_VERBOSE("Uploaded block {} metadata ({})", + NewBlocks.BlockDescriptions[BlockIndex].BlockHash, + NiceBytes(BlockMetaData.GetSize())); + + NewBlocks.MetaDataHasBeenUploaded[BlockIndex] = true; + + UploadStats.BlockCount++; + UploadStats.BlocksBytes += BlockMetaData.GetSize(); + + UploadedBlockCount++; + if (UploadedBlockCount == UploadBlockCount && UploadedChunkCount == UploadChunkCount) + { + FilteredUploadedBytesPerSecond.Stop(); + } + } + }, + [&](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed uploading block. Reason: {}", Ex.what()); + AbortFlag = true; + }); + }; + + auto AsyncUploadLooseChunk = [&](const IoHash& RawHash, CompositeBuffer&& Payload) { + Work.ScheduleWork( + UploadChunkPool, + [&, RawHash, Payload = CompositeBuffer(std::move(Payload))](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + const uint64_t PayloadSize = Payload.GetSize(); + if (PayloadSize >= LargeAttachmentSize) + { + UploadStats.MultipartAttachmentCount++; + std::vector<std::function<void()>> MultipartWork = Storage.PutLargeBuildBlob( + BuildId, + RawHash, + ZenContentType::kCompressedBinary, + PayloadSize, + [Payload = std::move(Payload), &FilteredUploadedBytesPerSecond](uint64_t Offset, + uint64_t Size) -> IoBuffer { + FilteredUploadedBytesPerSecond.Start(); + + IoBuffer PartPayload = Payload.Mid(Offset, Size).Flatten().AsIoBuffer(); + PartPayload.SetContentType(ZenContentType::kBinary); + return PartPayload; + }, + [&](uint64_t SentBytes, bool IsComplete) { + UploadStats.ChunksBytes += SentBytes; + UploadedChunkSize += SentBytes; + if (IsComplete) + { + UploadStats.ChunkCount++; + UploadedChunkCount++; + if (UploadedBlockCount == UploadBlockCount && UploadedChunkCount == UploadChunkCount) + { + FilteredUploadedBytesPerSecond.Stop(); + } + } + }); + for (auto& WorkPart : MultipartWork) + { + Work.ScheduleWork( + UploadChunkPool, + [Work = std::move(WorkPart)](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + Work(); + } + }, + [&, RawHash](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed uploading multipart blob {}. Reason: {}", RawHash, Ex.what()); + AbortFlag = true; + }); + } + ZEN_CONSOLE_VERBOSE("Uploaded multipart chunk {} ({})", RawHash, NiceBytes(PayloadSize)); + } + else + { + Storage.PutBuildBlob(BuildId, RawHash, ZenContentType::kCompressedBinary, Payload); + ZEN_CONSOLE_VERBOSE("Uploaded chunk {} ({})", RawHash, NiceBytes(PayloadSize)); + UploadStats.ChunksBytes += Payload.GetSize(); + UploadStats.ChunkCount++; + UploadedChunkSize += Payload.GetSize(); + UploadedChunkCount++; + if (UploadedChunkCount == UploadChunkCount) + { + FilteredUploadedBytesPerSecond.Stop(); + } + } + } + }, + [&](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed uploading chunk. Reason: {}", Ex.what()); + AbortFlag = true; + }); + }; + + std::vector<size_t> GenerateBlockIndexes; + + std::atomic<uint64_t> GeneratedBlockCount = 0; + std::atomic<uint64_t> GeneratedBlockByteCount = 0; + + // Start upload of any pre-built blocks + for (const size_t BlockIndex : BlockIndexes) + { + if (CompositeBuffer BlockPayload = std::move(NewBlocks.BlockBuffers[BlockIndex]); BlockPayload) + { + const IoHash& BlockHash = NewBlocks.BlockDescriptions[BlockIndex].BlockHash; + FoundChunkHashes.push_back(BlockHash); + if (!AbortFlag) + { + AsyncUploadBlock(BlockIndex, BlockHash, std::move(BlockPayload)); + } + // GeneratedBlockCount++; + } + else + { + GenerateBlockIndexes.push_back(BlockIndex); + } + } + + std::vector<uint32_t> CompressLooseChunkOrderIndexes; + + // Start upload of any pre-compressed loose chunks + for (const uint32_t LooseChunkOrderIndex : LooseChunkOrderIndexes) + { + CompressLooseChunkOrderIndexes.push_back(LooseChunkOrderIndex); + } + + // Start generation of any non-prebuilt blocks and schedule upload + for (const size_t BlockIndex : GenerateBlockIndexes) + { + const IoHash& BlockHash = NewBlocks.BlockDescriptions[BlockIndex].BlockHash; + FoundChunkHashes.push_back(BlockHash); + if (!AbortFlag) + { + Work.ScheduleWork( + ReadChunkPool, + [&, BlockIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + FilteredGenerateBlockBytesPerSecond.Start(); + ChunkBlockDescription BlockDescription; + CompressedBuffer CompressedBlock = + GenerateBlock(Path, Content, Lookup, NewBlockChunks[BlockIndex], BlockDescription, DiskStats); + if (!CompressedBlock) + { + throw std::runtime_error(fmt::format("Failed generating block {}", BlockHash)); + } + ZEN_ASSERT(BlockDescription.BlockHash == BlockHash); + + CompositeBuffer Payload = WriteToTempFileIfNeeded(CompressedBlock.GetCompressed(), + Path / ZenTempBlockFolderName, + BlockDescription.BlockHash); + + GenerateBlocksStats.GeneratedBlockByteCount += NewBlocks.BlockSizes[BlockIndex]; + GenerateBlocksStats.GeneratedBlockCount++; + GeneratedBlockByteCount += NewBlocks.BlockSizes[BlockIndex]; + GeneratedBlockCount++; + if (GeneratedBlockCount == GenerateBlockIndexes.size()) + { + FilteredGenerateBlockBytesPerSecond.Stop(); + } + if (!AbortFlag) + { + AsyncUploadBlock(BlockIndex, BlockHash, std::move(Payload)); + } + ZEN_CONSOLE_VERBOSE("Regenerated block {} ({}) containing {} chunks", + NewBlocks.BlockDescriptions[BlockIndex].BlockHash, + NiceBytes(CompressedBlock.GetCompressedSize()), + NewBlocks.BlockDescriptions[BlockIndex].ChunkRawHashes.size()); + } + }, + [&](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed generating block. Reason: {}", Ex.what()); + AbortFlag = true; + }); + } + } + + std::atomic<uint64_t> CompressedLooseChunkCount = 0; + std::atomic<uint64_t> CompressedLooseChunkByteCount = 0; + + // Start compression of any non-precompressed loose chunks and schedule upload + for (const uint32_t CompressLooseChunkOrderIndex : CompressLooseChunkOrderIndexes) + { + const uint32_t ChunkIndex = LooseChunkIndexes[CompressLooseChunkOrderIndex]; + Work.ScheduleWork( + ReadChunkPool, + [&, ChunkIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + FilteredCompressedBytesPerSecond.Start(); + CompositeBuffer Payload = CompressChunk(Path, Content, Lookup, ChunkIndex, Path / ZenTempChunkFolderName); + ZEN_CONSOLE_VERBOSE("Compressed chunk {} ({} -> {})", + Content.ChunkedContent.ChunkHashes[ChunkIndex], + NiceBytes(Content.ChunkedContent.ChunkRawSizes[ChunkIndex]), + NiceBytes(Payload.GetSize())); + UploadStats.ReadFromDiskBytes += Content.ChunkedContent.ChunkRawSizes[ChunkIndex]; + LooseChunksStats.CompressedChunkBytes += Payload.GetSize(); + LooseChunksStats.CompressedChunkCount++; + CompressedLooseChunkByteCount += Payload.GetSize(); + CompressedLooseChunkCount++; + if (CompressedLooseChunkCount == CompressLooseChunkOrderIndexes.size()) + { + FilteredCompressedBytesPerSecond.Stop(); + } + if (!AbortFlag) + { + AsyncUploadLooseChunk(Content.ChunkedContent.ChunkHashes[ChunkIndex], std::move(Payload)); + } + } + }, + [&, ChunkIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed compressing part blob {}. Reason: {}", + Content.ChunkedContent.ChunkHashes[ChunkIndex], + Ex.what()); + AbortFlag = true; + }); + } + + Work.Wait(UsePlainProgress ? 5000 : 200, [&](bool IsAborted, std::ptrdiff_t PendingWork) { + ZEN_UNUSED(IsAborted, PendingWork); + FilteredCompressedBytesPerSecond.Update(CompressedLooseChunkByteCount.load()); + FilteredGenerateBlockBytesPerSecond.Update(GeneratedBlockByteCount.load()); + FilteredUploadedBytesPerSecond.Update(UploadedChunkSize.load() + UploadedBlockSize.load()); + uint64_t UploadedSize = UploadedChunkSize.load() + UploadedBlockSize.load(); + ProgressBar.UpdateState({.Task = "Uploading blobs ", + .Details = fmt::format("Compressed {}/{} chunks. " + "Uploaded {}/{} blobs ({}/{} {}bits/s)", + CompressedLooseChunkCount.load(), + CompressLooseChunkOrderIndexes.size(), + + UploadedBlockCount.load() + UploadedChunkCount.load(), + UploadBlockCount + UploadChunkCount, + + NiceBytes(UploadedChunkSize.load() + UploadedBlockSize.load()), + NiceBytes(TotalSize), + NiceNum(FilteredUploadedBytesPerSecond.GetCurrent())), + .TotalCount = gsl::narrow<uint64_t>(TotalSize), + .RemainingCount = gsl::narrow<uint64_t>(TotalSize - UploadedSize)}, + false); + }); + + ProgressBar.Finish(); + UploadStats.ElapsedWallTimeUS = FilteredUploadedBytesPerSecond.GetElapsedTime(); + GenerateBlocksStats.GenerateBlocksElapsedWallTimeUS = FilteredGenerateBlockBytesPerSecond.GetElapsedTime(); + LooseChunksStats.CompressChunksElapsedWallTimeUS = FilteredCompressedBytesPerSecond.GetElapsedTime(); + } + } + + std::vector<size_t> FindReuseBlocks(const std::vector<ChunkBlockDescription>& KnownBlocks, + std::span<const IoHash> ChunkHashes, + std::span<const uint32_t> ChunkIndexes, + uint8_t MinPercentLimit, + std::vector<uint32_t>& OutUnusedChunkIndexes, + FindBlocksStatistics& FindBlocksStats) + { + // Find all blocks with a usage level higher than MinPercentLimit + // Pick out the blocks with usage higher or equal to MinPercentLimit + // Sort them with highest size usage - most usage first + // Make a list of all chunks and mark them as not found + // For each block, recalculate the block has usage percent based on the chunks marked as not found + // If the block still reaches MinPercentLimit, keep it and remove the matching chunks from the not found list + // Repeat for following all remaining block that initially matched MinPercentLimit + + std::vector<size_t> FilteredReuseBlockIndexes; + + uint32_t ChunkCount = gsl::narrow<uint32_t>(ChunkHashes.size()); + std::vector<bool> ChunkFound(ChunkCount, false); + + if (ChunkCount > 0) + { + if (!KnownBlocks.empty()) + { + Stopwatch ReuseTimer; + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> ChunkHashToChunkIndex; + ChunkHashToChunkIndex.reserve(ChunkIndexes.size()); + for (uint32_t ChunkIndex : ChunkIndexes) + { + ChunkHashToChunkIndex.insert_or_assign(ChunkHashes[ChunkIndex], ChunkIndex); + } + + std::vector<size_t> BlockSizes(KnownBlocks.size(), 0); + std::vector<size_t> BlockUseSize(KnownBlocks.size(), 0); + + std::vector<size_t> ReuseBlockIndexes; + + for (size_t KnownBlockIndex = 0; KnownBlockIndex < KnownBlocks.size(); KnownBlockIndex++) + { + const ChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; + if (KnownBlock.BlockHash != IoHash::Zero && + KnownBlock.ChunkRawHashes.size() == KnownBlock.ChunkCompressedLengths.size()) + { + size_t BlockAttachmentCount = KnownBlock.ChunkRawHashes.size(); + if (BlockAttachmentCount == 0) + { + continue; + } + size_t ReuseSize = 0; + size_t BlockSize = 0; + size_t FoundAttachmentCount = 0; + size_t BlockChunkCount = KnownBlock.ChunkRawHashes.size(); + for (size_t BlockChunkIndex = 0; BlockChunkIndex < BlockChunkCount; BlockChunkIndex++) + { + const IoHash& BlockChunkHash = KnownBlock.ChunkRawHashes[BlockChunkIndex]; + const uint32_t BlockChunkSize = KnownBlock.ChunkCompressedLengths[BlockChunkIndex]; + BlockSize += BlockChunkSize; + if (ChunkHashToChunkIndex.contains(BlockChunkHash)) + { + ReuseSize += BlockChunkSize; + FoundAttachmentCount++; + } + } + + size_t ReusePercent = (ReuseSize * 100) / BlockSize; + + if (ReusePercent >= MinPercentLimit) + { + ZEN_CONSOLE_VERBOSE("Reusing block {}. {} attachments found, usage level: {}%", + KnownBlock.BlockHash, + FoundAttachmentCount, + ReusePercent); + ReuseBlockIndexes.push_back(KnownBlockIndex); + + BlockSizes[KnownBlockIndex] = BlockSize; + BlockUseSize[KnownBlockIndex] = ReuseSize; + } + else if (FoundAttachmentCount > 0) + { + ZEN_CONSOLE_VERBOSE("Skipping block {}. {} attachments found, usage level: {}%", + KnownBlock.BlockHash, + FoundAttachmentCount, + ReusePercent); + FindBlocksStats.RejectedBlockCount++; + FindBlocksStats.RejectedChunkCount += FoundAttachmentCount; + FindBlocksStats.RejectedByteCount += ReuseSize; + } + } + } + + if (!ReuseBlockIndexes.empty()) + { + std::sort(ReuseBlockIndexes.begin(), ReuseBlockIndexes.end(), [&](size_t Lhs, size_t Rhs) { + return BlockUseSize[Lhs] > BlockUseSize[Rhs]; + }); + + for (size_t KnownBlockIndex : ReuseBlockIndexes) + { + std::vector<uint32_t> FoundChunkIndexes; + size_t BlockSize = 0; + size_t AdjustedReuseSize = 0; + const ChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; + for (size_t BlockChunkIndex = 0; BlockChunkIndex < KnownBlock.ChunkRawHashes.size(); BlockChunkIndex++) + { + const IoHash& BlockChunkHash = KnownBlock.ChunkRawHashes[BlockChunkIndex]; + const uint32_t BlockChunkSize = KnownBlock.ChunkCompressedLengths[BlockChunkIndex]; + BlockSize += BlockChunkSize; + if (auto It = ChunkHashToChunkIndex.find(BlockChunkHash); It != ChunkHashToChunkIndex.end()) + { + const uint32_t ChunkIndex = It->second; + if (!ChunkFound[ChunkIndex]) + { + FoundChunkIndexes.push_back(ChunkIndex); + AdjustedReuseSize += KnownBlock.ChunkCompressedLengths[BlockChunkIndex]; + } + } + } + + size_t ReusePercent = (AdjustedReuseSize * 100) / BlockSize; + + if (ReusePercent >= MinPercentLimit) + { + ZEN_CONSOLE_VERBOSE("Reusing block {}. {} attachments found, usage level: {}%", + KnownBlock.BlockHash, + FoundChunkIndexes.size(), + ReusePercent); + FilteredReuseBlockIndexes.push_back(KnownBlockIndex); + + for (uint32_t ChunkIndex : FoundChunkIndexes) + { + ChunkFound[ChunkIndex] = true; + } + FindBlocksStats.AcceptedChunkCount += FoundChunkIndexes.size(); + FindBlocksStats.AcceptedByteCount += AdjustedReuseSize; + FindBlocksStats.AcceptedReduntantChunkCount += KnownBlock.ChunkRawHashes.size() - FoundChunkIndexes.size(); + FindBlocksStats.AcceptedReduntantByteCount += BlockSize - AdjustedReuseSize; + } + else + { + ZEN_CONSOLE_VERBOSE("Skipping block {}. filtered usage level: {}%", KnownBlock.BlockHash, ReusePercent); + FindBlocksStats.RejectedBlockCount++; + FindBlocksStats.RejectedChunkCount += FoundChunkIndexes.size(); + FindBlocksStats.RejectedByteCount += AdjustedReuseSize; + } + } + } + } + OutUnusedChunkIndexes.reserve(ChunkIndexes.size() - FindBlocksStats.AcceptedChunkCount); + for (uint32_t ChunkIndex : ChunkIndexes) + { + if (!ChunkFound[ChunkIndex]) + { + OutUnusedChunkIndexes.push_back(ChunkIndex); + } + } + } + return FilteredReuseBlockIndexes; + }; + + bool UploadFolder(BuildStorage& Storage, + const Oid& BuildId, + const Oid& BuildPartId, + const std::string_view BuildPartName, + const std::filesystem::path& Path, + const std::filesystem::path& ManifestPath, + const uint8_t BlockReuseMinPercentLimit, + bool AllowMultiparts, + const CbObject& MetaData, + bool CreateBuild, + bool IgnoreExistingBlocks) + { + Stopwatch ProcessTimer; + + std::atomic<bool> AbortFlag = false; + + const std::filesystem::path ZenTempFolder = Path / ZenTempFolderName; + CreateDirectories(ZenTempFolder); + CleanDirectory(ZenTempFolder, {}); + auto _ = MakeGuard([&]() { + CleanDirectory(ZenTempFolder, {}); + std::filesystem::remove(ZenTempFolder); + }); + CreateDirectories(Path / ZenTempBlockFolderName); + CreateDirectories(Path / ZenTempChunkFolderName); + + CbObject ChunkerParameters; + + ChunkedFolderContent LocalContent; + + GetFolderContentStatistics LocalFolderScanStats; + ChunkingStatistics ChunkingStats; + { + auto IsAcceptedFolder = [ExcludeFolders = DefaultExcludeFolders](const std::string_view& RelativePath) -> bool { + for (const std::string_view& ExcludeFolder : ExcludeFolders) + { + if (RelativePath.starts_with(ExcludeFolder)) + { + if (RelativePath.length() == ExcludeFolder.length()) + { + return false; + } + else if (RelativePath[ExcludeFolder.length()] == '/') + { + return false; + } + } + } + return true; + }; + + auto IsAcceptedFile = [ExcludeExtensions = + DefaultExcludeExtensions](const std::string_view& RelativePath, uint64_t, uint32_t) -> bool { + for (const std::string_view& ExcludeExtension : ExcludeExtensions) + { + if (RelativePath.ends_with(ExcludeExtension)) + { + return false; + } + } + return true; + }; + + auto ParseManifest = [](const std::filesystem::path& Path, + const std::filesystem::path& ManifestPath) -> std::vector<std::filesystem::path> { + std::vector<std::filesystem::path> AssetPaths; + std::filesystem::path AbsoluteManifestPath = ManifestPath.is_absolute() ? ManifestPath : Path / ManifestPath; + IoBuffer ManifestContent = ReadFile(AbsoluteManifestPath).Flatten(); + std::string_view ManifestString((const char*)ManifestContent.GetView().GetData(), ManifestContent.GetSize()); + std::string_view::size_type Offset = 0; + while (Offset < ManifestContent.GetSize()) + { + size_t PathBreakOffset = ManifestString.find_first_of("\t\r\n", Offset); + if (PathBreakOffset == std::string_view::npos) + { + PathBreakOffset = ManifestContent.GetSize(); + } + std::string_view AssetPath = ManifestString.substr(Offset, PathBreakOffset - Offset); + if (!AssetPath.empty()) + { + AssetPaths.emplace_back(std::filesystem::path(AssetPath)); + } + Offset = PathBreakOffset; + size_t EolOffset = ManifestString.find_first_of("\r\n", Offset); + if (EolOffset == std::string_view::npos) + { + break; + } + Offset = EolOffset; + size_t LineBreakOffset = ManifestString.find_first_not_of("\t\r\n", Offset); + if (LineBreakOffset == std::string_view::npos) + { + break; + } + Offset = LineBreakOffset; + } + return AssetPaths; + }; + + Stopwatch ScanTimer; + FolderContent Content; + if (ManifestPath.empty()) + { + std::filesystem::path ExcludeManifestPath = Path / ZenExcludeManifestName; + tsl::robin_set<std::string> ExcludeAssetPaths; + if (std::filesystem::is_regular_file(ExcludeManifestPath)) + { + std::vector<std::filesystem::path> AssetPaths = ParseManifest(Path, ExcludeManifestPath); + ExcludeAssetPaths.reserve(AssetPaths.size()); + for (const std::filesystem::path& AssetPath : AssetPaths) + { + ExcludeAssetPaths.insert(AssetPath.generic_string()); + } + } + Content = GetFolderContent( + LocalFolderScanStats, + Path, + std::move(IsAcceptedFolder), + [&IsAcceptedFile, + &ExcludeAssetPaths](const std::string_view& RelativePath, uint64_t Size, uint32_t Attributes) -> bool { + if (RelativePath == ZenExcludeManifestName) + { + return false; + } + if (!IsAcceptedFile(RelativePath, Size, Attributes)) + { + return false; + } + if (ExcludeAssetPaths.contains(std::filesystem::path(RelativePath).generic_string())) + { + return false; + } + return true; + }, + GetMediumWorkerPool(EWorkloadType::Burst), + UsePlainProgress ? 5000 : 200, + [&](bool, std::ptrdiff_t) { + ZEN_DEBUG("Found {} files in '{}'...", LocalFolderScanStats.AcceptedFileCount.load(), Path); + }, + AbortFlag); + } + else + { + Stopwatch ManifestParseTimer; + std::vector<std::filesystem::path> AssetPaths = ParseManifest(Path, ManifestPath); + for (const std::filesystem::path& AssetPath : AssetPaths) + { + Content.Paths.push_back(AssetPath); + Content.RawSizes.push_back(std::filesystem::file_size(Path / AssetPath)); +#if ZEN_PLATFORM_WINDOWS + Content.Attributes.push_back(GetFileAttributes(Path / AssetPath)); +#endif // ZEN_PLATFORM_WINDOWS +#if ZEN_PLATFORM_MAC || ZEN_PLATFORM_LINUX + Content.Attributes.push_back(GetFileMode(Path / AssetPath)); +#endif // ZEN_PLATFORM_MAC || ZEN_PLATFORM_LINUX + LocalFolderScanStats.AcceptedFileByteCount += Content.RawSizes.back(); + LocalFolderScanStats.AcceptedFileCount++; + } + if (ManifestPath.is_relative()) + { + Content.Paths.push_back(ManifestPath); + Content.RawSizes.push_back(std::filesystem::file_size(ManifestPath)); +#if ZEN_PLATFORM_WINDOWS + Content.Attributes.push_back(GetFileAttributes(ManifestPath)); +#endif // ZEN_PLATFORM_WINDOWS +#if ZEN_PLATFORM_MAC || ZEN_PLATFORM_LINUX + Content.Attributes.push_back(GetFileMode(ManifestPath)); +#endif // ZEN_PLATFORM_MAC || ZEN_PLATFORM_LINUX + + LocalFolderScanStats.AcceptedFileByteCount += Content.RawSizes.back(); + LocalFolderScanStats.AcceptedFileCount++; + } + LocalFolderScanStats.FoundFileByteCount.store(LocalFolderScanStats.AcceptedFileByteCount); + LocalFolderScanStats.FoundFileCount.store(LocalFolderScanStats.AcceptedFileCount); + LocalFolderScanStats.ElapsedWallTimeUS = ManifestParseTimer.GetElapsedTimeUs(); + } + + std::unique_ptr<ChunkingController> ChunkController = CreateBasicChunkingController(); + { + CbObjectWriter ChunkParametersWriter; + ChunkParametersWriter.AddString("name"sv, ChunkController->GetName()); + ChunkParametersWriter.AddObject("parameters"sv, ChunkController->GetParameters()); + ChunkerParameters = ChunkParametersWriter.Save(); + } + + std::uint64_t TotalRawSize = 0; + for (uint64_t RawSize : Content.RawSizes) + { + TotalRawSize += RawSize; + } + { + ProgressBar ProgressBar(UsePlainProgress); + FilteredRate FilteredBytesHashed; + FilteredBytesHashed.Start(); + LocalContent = ChunkFolderContent( + ChunkingStats, + GetMediumWorkerPool(EWorkloadType::Burst), + Path, + Content, + *ChunkController, + UsePlainProgress ? 5000 : 200, + [&](bool, std::ptrdiff_t) { + FilteredBytesHashed.Update(ChunkingStats.BytesHashed.load()); + ProgressBar.UpdateState({.Task = "Scanning files ", + .Details = fmt::format("{}/{} ({}/{}, {}B/s) files, {} ({}) chunks found", + ChunkingStats.FilesProcessed.load(), + Content.Paths.size(), + NiceBytes(ChunkingStats.BytesHashed.load()), + NiceBytes(TotalRawSize), + NiceNum(FilteredBytesHashed.GetCurrent()), + ChunkingStats.UniqueChunksFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load())), + .TotalCount = TotalRawSize, + .RemainingCount = TotalRawSize - ChunkingStats.BytesHashed.load()}, + false); + }, + AbortFlag); + FilteredBytesHashed.Stop(); + ProgressBar.Finish(); + } + + if (AbortFlag) + { + return true; + } + + ZEN_CONSOLE("Found {} ({}) files divided into {} ({}) unique chunks in '{}' in {}. Average hash rate {}B/sec", + LocalContent.Paths.size(), + NiceBytes(TotalRawSize), + ChunkingStats.UniqueChunksFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load()), + Path, + NiceTimeSpanMs(ScanTimer.GetElapsedTimeMs()), + NiceNum(GetBytesPerSecond(ChunkingStats.ElapsedWallTimeUS, ChunkingStats.BytesHashed))); + } + + const ChunkedContentLookup LocalLookup = BuildChunkedContentLookup(LocalContent); + std::uint64_t PreferredMultipartChunkSize = 32u * 1024u * 1024u; + + if (CreateBuild) + { + Stopwatch PutBuildTimer; + CbObject PutBuildResult = Storage.PutBuild(BuildId, MetaData); + ZEN_CONSOLE("PutBuild took {}. Payload size: {}", + NiceLatencyNs(PutBuildTimer.GetElapsedTimeUs() * 1000), + NiceBytes(MetaData.GetSize())); + PreferredMultipartChunkSize = PutBuildResult["chunkSize"sv].AsUInt64(PreferredMultipartChunkSize); + } + else + { + Stopwatch GetBuildTimer; + CbObject Build = Storage.GetBuild(BuildId); + ZEN_CONSOLE("GetBuild took {}. Payload size: {}", + NiceLatencyNs(GetBuildTimer.GetElapsedTimeUs() * 1000), + NiceBytes(Build.GetSize())); + if (auto ChunkSize = Build["chunkSize"sv].AsUInt64(); ChunkSize != 0) + { + PreferredMultipartChunkSize = ChunkSize; + } + else if (AllowMultiparts) + { + ZEN_WARN("PreferredMultipartChunkSize is unknown. Defaulting to '{}'", NiceBytes(PreferredMultipartChunkSize)); + } + } + + const std::uint64_t LargeAttachmentSize = AllowMultiparts ? PreferredMultipartChunkSize * 4u : (std::uint64_t)-1; + + FindBlocksStatistics FindBlocksStats; + GenerateBlocksStatistics GenerateBlocksStats; + LooseChunksStatistics LooseChunksStats; + + std::vector<ChunkBlockDescription> KnownBlocks; + std::vector<size_t> ReuseBlockIndexes; + std::vector<uint32_t> NewBlockChunkIndexes; + Stopwatch BlockArrangeTimer; + + std::vector<std::uint32_t> LooseChunkIndexes; + { + bool EnableBlocks = true; + std::vector<std::uint32_t> BlockChunkIndexes; + for (uint32_t ChunkIndex = 0; ChunkIndex < LocalContent.ChunkedContent.ChunkHashes.size(); ChunkIndex++) + { + const uint64_t ChunkRawSize = LocalContent.ChunkedContent.ChunkRawSizes[ChunkIndex]; + if (!EnableBlocks || ChunkRawSize == 0 || ChunkRawSize > DefaultChunksBlockParams.MaxChunkEmbedSize) + { + LooseChunkIndexes.push_back(ChunkIndex); + LooseChunksStats.ChunkByteCount += ChunkRawSize; + } + else + { + BlockChunkIndexes.push_back(ChunkIndex); + FindBlocksStats.PotentialChunkByteCount += ChunkRawSize; + } + } + FindBlocksStats.PotentialChunkCount = BlockChunkIndexes.size(); + LooseChunksStats.ChunkCount = LooseChunkIndexes.size(); + + if (IgnoreExistingBlocks) + { + ZEN_CONSOLE("Ignoring any existing blocks in store"); + NewBlockChunkIndexes = std::move(BlockChunkIndexes); + } + else + { + Stopwatch KnownBlocksTimer; + KnownBlocks = Storage.FindBlocks(BuildId); + FindBlocksStats.FindBlockTimeMS = KnownBlocksTimer.GetElapsedTimeMs(); + FindBlocksStats.FoundBlockCount = KnownBlocks.size(); + + ReuseBlockIndexes = FindReuseBlocks(KnownBlocks, + LocalContent.ChunkedContent.ChunkHashes, + BlockChunkIndexes, + BlockReuseMinPercentLimit, + NewBlockChunkIndexes, + FindBlocksStats); + FindBlocksStats.AcceptedBlockCount = ReuseBlockIndexes.size(); + + for (const ChunkBlockDescription& Description : KnownBlocks) + { + for (uint32_t ChunkRawLength : Description.ChunkRawLengths) + { + FindBlocksStats.FoundBlockByteCount += ChunkRawLength; + } + FindBlocksStats.FoundBlockChunkCount += Description.ChunkRawHashes.size(); + } + } + } + + std::vector<std::vector<uint32_t>> NewBlockChunks; + ArrangeChunksIntoBlocks(LocalContent, LocalLookup, DefaultChunksBlockParams.MaxBlockSize, NewBlockChunkIndexes, NewBlockChunks); + + FindBlocksStats.NewBlocksCount = NewBlockChunks.size(); + for (uint32_t ChunkIndex : NewBlockChunkIndexes) + { + FindBlocksStats.NewBlocksChunkByteCount += LocalContent.ChunkedContent.ChunkRawSizes[ChunkIndex]; + } + FindBlocksStats.NewBlocksChunkCount = NewBlockChunkIndexes.size(); + + const double AcceptedByteCountPercent = FindBlocksStats.PotentialChunkByteCount > 0 + ? (100.0 * FindBlocksStats.AcceptedByteCount / FindBlocksStats.PotentialChunkByteCount) + : 0.0; + + const double AcceptedReduntantByteCountPercent = + FindBlocksStats.AcceptedByteCount > 0 ? (100.0 * FindBlocksStats.AcceptedReduntantByteCount) / + (FindBlocksStats.AcceptedByteCount + FindBlocksStats.AcceptedReduntantByteCount) + : 0.0; + ZEN_CONSOLE( + "Found {} chunks in {} ({}) blocks eligeble for reuse in {}\n" + " Reusing {} ({}) matching chunks in {} blocks ({:.1f}%)\n" + " Accepting {} ({}) redundant chunks ({:.1f}%)\n" + " Rejected {} ({}) chunks in {} blocks\n" + " Arranged {} ({}) chunks in {} new blocks\n" + " Keeping {} ({}) chunks as loose chunks\n" + " Discovery completed in {}", + FindBlocksStats.FoundBlockChunkCount, + FindBlocksStats.FoundBlockCount, + NiceBytes(FindBlocksStats.FoundBlockByteCount), + NiceTimeSpanMs(FindBlocksStats.FindBlockTimeMS), + + FindBlocksStats.AcceptedChunkCount, + NiceBytes(FindBlocksStats.AcceptedByteCount), + FindBlocksStats.AcceptedBlockCount, + AcceptedByteCountPercent, + + FindBlocksStats.AcceptedReduntantChunkCount, + NiceBytes(FindBlocksStats.AcceptedReduntantByteCount), + AcceptedReduntantByteCountPercent, + + FindBlocksStats.RejectedChunkCount, + NiceBytes(FindBlocksStats.RejectedByteCount), + FindBlocksStats.RejectedBlockCount, + + FindBlocksStats.NewBlocksChunkCount, + NiceBytes(FindBlocksStats.NewBlocksChunkByteCount), + FindBlocksStats.NewBlocksCount, + + LooseChunksStats.ChunkCount, + NiceBytes(LooseChunksStats.ChunkByteCount), + + NiceTimeSpanMs(BlockArrangeTimer.GetElapsedTimeMs())); + + DiskStatistics DiskStats; + UploadStatistics UploadStats; + GeneratedBlocks NewBlocks; + + if (!NewBlockChunks.empty()) + { + Stopwatch GenerateBuildBlocksTimer; + auto __ = MakeGuard([&]() { + uint64_t BlockGenerateTimeUs = GenerateBuildBlocksTimer.GetElapsedTimeUs(); + ZEN_CONSOLE("Generated {} ({}) and uploaded {} ({}) blocks in {}. Generate speed: {}B/sec. Transfer speed {}bits/sec.", + GenerateBlocksStats.GeneratedBlockCount.load(), + NiceBytes(GenerateBlocksStats.GeneratedBlockByteCount), + UploadStats.BlockCount.load(), + NiceBytes(UploadStats.BlocksBytes.load()), + NiceTimeSpanMs(BlockGenerateTimeUs / 1000), + NiceNum(GetBytesPerSecond(GenerateBlocksStats.GenerateBlocksElapsedWallTimeUS, + GenerateBlocksStats.GeneratedBlockByteCount)), + NiceNum(GetBytesPerSecond(UploadStats.ElapsedWallTimeUS, UploadStats.BlocksBytes * 8))); + }); + GenerateBuildBlocks(Path, + LocalContent, + LocalLookup, + Storage, + BuildId, + AbortFlag, + NewBlockChunks, + NewBlocks, + DiskStats, + UploadStats, + GenerateBlocksStats); + } + + if (AbortFlag) + { + return true; + } + + CbObject PartManifest; + { + CbObjectWriter PartManifestWriter; + Stopwatch ManifestGenerationTimer; + auto __ = MakeGuard([&]() { + ZEN_CONSOLE("Generated build part manifest in {} ({})", + NiceTimeSpanMs(ManifestGenerationTimer.GetElapsedTimeMs()), + NiceBytes(PartManifestWriter.GetSaveSize())); + }); + PartManifestWriter.AddObject("chunker"sv, ChunkerParameters); + + std::vector<IoHash> AllChunkBlockHashes; + std::vector<ChunkBlockDescription> AllChunkBlockDescriptions; + AllChunkBlockHashes.reserve(ReuseBlockIndexes.size() + NewBlocks.BlockDescriptions.size()); + AllChunkBlockDescriptions.reserve(ReuseBlockIndexes.size() + NewBlocks.BlockDescriptions.size()); + for (size_t ReuseBlockIndex : ReuseBlockIndexes) + { + AllChunkBlockDescriptions.push_back(KnownBlocks[ReuseBlockIndex]); + AllChunkBlockHashes.push_back(KnownBlocks[ReuseBlockIndex].BlockHash); + } + AllChunkBlockDescriptions.insert(AllChunkBlockDescriptions.end(), + NewBlocks.BlockDescriptions.begin(), + NewBlocks.BlockDescriptions.end()); + for (const ChunkBlockDescription& BlockDescription : NewBlocks.BlockDescriptions) + { + AllChunkBlockHashes.push_back(BlockDescription.BlockHash); + } +#if EXTRA_VERIFY + tsl::robin_map<IoHash, size_t, IoHash::Hasher> ChunkHashToAbsoluteChunkIndex; + std::vector<IoHash> AbsoluteChunkHashes; + AbsoluteChunkHashes.reserve(LocalContent.ChunkedContent.ChunkHashes.size()); + for (uint32_t ChunkIndex : LooseChunkIndexes) + { + ChunkHashToAbsoluteChunkIndex.insert({LocalContent.ChunkedContent.ChunkHashes[ChunkIndex], AbsoluteChunkHashes.size()}); + AbsoluteChunkHashes.push_back(LocalContent.ChunkedContent.ChunkHashes[ChunkIndex]); + } + for (const ChunkBlockDescription& Block : AllChunkBlockDescriptions) + { + for (const IoHash& ChunkHash : Block.ChunkHashes) + { + ChunkHashToAbsoluteChunkIndex.insert({ChunkHash, AbsoluteChunkHashes.size()}); + AbsoluteChunkHashes.push_back(ChunkHash); + } + } + for (const IoHash& ChunkHash : LocalContent.ChunkedContent.ChunkHashes) + { + ZEN_ASSERT(AbsoluteChunkHashes[ChunkHashToAbsoluteChunkIndex.at(ChunkHash)] == ChunkHash); + ZEN_ASSERT(LocalContent.ChunkedContent.ChunkHashes[LocalLookup.ChunkHashToChunkIndex.at(ChunkHash)] == ChunkHash); + } + for (const uint32_t ChunkIndex : LocalContent.ChunkedContent.ChunkOrders) + { + ZEN_ASSERT(AbsoluteChunkHashes[ChunkHashToAbsoluteChunkIndex.at(LocalContent.ChunkedContent.ChunkHashes[ChunkIndex])] == + LocalContent.ChunkedContent.ChunkHashes[ChunkIndex]); + ZEN_ASSERT(LocalLookup.ChunkHashToChunkIndex.at(LocalContent.ChunkedContent.ChunkHashes[ChunkIndex]) == ChunkIndex); + } +#endif // EXTRA_VERIFY + std::vector<uint32_t> AbsoluteChunkOrders = CalculateAbsoluteChunkOrders(LocalContent.ChunkedContent.ChunkHashes, + LocalContent.ChunkedContent.ChunkOrders, + LocalLookup.ChunkHashToChunkIndex, + LooseChunkIndexes, + AllChunkBlockDescriptions); + +#if EXTRA_VERIFY + for (uint32_t ChunkOrderIndex = 0; ChunkOrderIndex < LocalContent.ChunkedContent.ChunkOrders.size(); ChunkOrderIndex++) + { + uint32_t LocalChunkIndex = LocalContent.ChunkedContent.ChunkOrders[ChunkOrderIndex]; + uint32_t AbsoluteChunkIndex = AbsoluteChunkOrders[ChunkOrderIndex]; + const IoHash& LocalChunkHash = LocalContent.ChunkedContent.ChunkHashes[LocalChunkIndex]; + const IoHash& AbsoluteChunkHash = AbsoluteChunkHashes[AbsoluteChunkIndex]; + ZEN_ASSERT(LocalChunkHash == AbsoluteChunkHash); + } +#endif // EXTRA_VERIFY + + WriteBuildContentToCompactBinary(PartManifestWriter, + LocalContent.Platform, + LocalContent.Paths, + LocalContent.RawHashes, + LocalContent.RawSizes, + LocalContent.Attributes, + LocalContent.ChunkedContent.SequenceRawHashes, + LocalContent.ChunkedContent.ChunkCounts, + LocalContent.ChunkedContent.ChunkHashes, + LocalContent.ChunkedContent.ChunkRawSizes, + AbsoluteChunkOrders, + LooseChunkIndexes, + AllChunkBlockHashes); + +#if EXTRA_VERIFY + { + ChunkedFolderContent VerifyFolderContent; + + std::vector<uint32_t> OutAbsoluteChunkOrders; + std::vector<IoHash> OutLooseChunkHashes; + std::vector<uint64_t> OutLooseChunkRawSizes; + std::vector<IoHash> OutBlockRawHashes; + + ReadBuildContentFromCompactBinary(PartManifestWriter.Save(), + VerifyFolderContent.Platform, + VerifyFolderContent.Paths, + VerifyFolderContent.RawHashes, + VerifyFolderContent.RawSizes, + VerifyFolderContent.Attributes, + VerifyFolderContent.ChunkedContent.SequenceRawHashes, + VerifyFolderContent.ChunkedContent.ChunkCounts, + OutAbsoluteChunkOrders, + OutLooseChunkHashes, + OutLooseChunkRawSizes, + OutBlockRawHashes); + ZEN_ASSERT(OutBlockRawHashes == AllChunkBlockHashes); + + for (uint32_t OrderIndex = 0; OrderIndex < OutAbsoluteChunkOrders.size(); OrderIndex++) + { + uint32_t LocalChunkIndex = LocalContent.ChunkedContent.ChunkOrders[OrderIndex]; + const IoHash LocalChunkHash = LocalContent.ChunkedContent.ChunkHashes[LocalChunkIndex]; + + uint32_t VerifyChunkIndex = OutAbsoluteChunkOrders[OrderIndex]; + const IoHash VerifyChunkHash = AbsoluteChunkHashes[VerifyChunkIndex]; + + ZEN_ASSERT(LocalChunkHash == VerifyChunkHash); + } + + CalculateLocalChunkOrders(OutAbsoluteChunkOrders, + OutLooseChunkHashes, + OutLooseChunkRawSizes, + AllChunkBlockDescriptions, + VerifyFolderContent.ChunkedContent.ChunkHashes, + VerifyFolderContent.ChunkedContent.ChunkRawSizes, + VerifyFolderContent.ChunkedContent.ChunkOrders); + + ZEN_ASSERT(LocalContent.Paths == VerifyFolderContent.Paths); + ZEN_ASSERT(LocalContent.RawHashes == VerifyFolderContent.RawHashes); + ZEN_ASSERT(LocalContent.RawSizes == VerifyFolderContent.RawSizes); + ZEN_ASSERT(LocalContent.Attributes == VerifyFolderContent.Attributes); + ZEN_ASSERT(LocalContent.ChunkedContent.SequenceRawHashes == VerifyFolderContent.ChunkedContent.SequenceRawHashes); + ZEN_ASSERT(LocalContent.ChunkedContent.ChunkCounts == VerifyFolderContent.ChunkedContent.ChunkCounts); + + for (uint32_t OrderIndex = 0; OrderIndex < LocalContent.ChunkedContent.ChunkOrders.size(); OrderIndex++) + { + uint32_t LocalChunkIndex = LocalContent.ChunkedContent.ChunkOrders[OrderIndex]; + const IoHash LocalChunkHash = LocalContent.ChunkedContent.ChunkHashes[LocalChunkIndex]; + uint64_t LocalChunkRawSize = LocalContent.ChunkedContent.ChunkRawSizes[LocalChunkIndex]; + + uint32_t VerifyChunkIndex = VerifyFolderContent.ChunkedContent.ChunkOrders[OrderIndex]; + const IoHash VerifyChunkHash = VerifyFolderContent.ChunkedContent.ChunkHashes[VerifyChunkIndex]; + uint64_t VerifyChunkRawSize = VerifyFolderContent.ChunkedContent.ChunkRawSizes[VerifyChunkIndex]; + + ZEN_ASSERT(LocalChunkHash == VerifyChunkHash); + ZEN_ASSERT(LocalChunkRawSize == VerifyChunkRawSize); + } + } +#endif // EXTRA_VERIFY + PartManifest = PartManifestWriter.Save(); + } + + Stopwatch PutBuildPartResultTimer; + std::pair<IoHash, std::vector<IoHash>> PutBuildPartResult = Storage.PutBuildPart(BuildId, BuildPartId, BuildPartName, PartManifest); + ZEN_CONSOLE("PutBuildPart took {}, payload size {}. {} attachments are missing.", + NiceLatencyNs(PutBuildPartResultTimer.GetElapsedTimeUs() * 1000), + NiceBytes(PartManifest.GetSize()), + PutBuildPartResult.second.size()); + IoHash PartHash = PutBuildPartResult.first; + + auto UploadAttachments = [&](std::span<IoHash> RawHashes) { + if (!AbortFlag) + { + ZEN_CONSOLE_VERBOSE("Uploading attachments: {}", FormatArray<IoHash>(RawHashes, "\n "sv)); + + UploadStatistics TempUploadStats; + GenerateBlocksStatistics TempGenerateBlocksStats; + LooseChunksStatistics TempLooseChunksStats; + + Stopwatch TempUploadTimer; + auto __ = MakeGuard([&]() { + uint64_t TempChunkUploadTimeUs = TempUploadTimer.GetElapsedTimeUs(); + ZEN_CONSOLE( + "Generated {} ({} {}B/s) and uploaded {} ({}) blocks. " + "Compressed {} ({} {}B/s) and uploaded {} ({}) chunks. " + "Transferred {} ({}B/s) in {}", + TempGenerateBlocksStats.GeneratedBlockCount.load(), + NiceBytes(TempGenerateBlocksStats.GeneratedBlockByteCount.load()), + NiceNum(GetBytesPerSecond(TempGenerateBlocksStats.GenerateBlocksElapsedWallTimeUS, + TempGenerateBlocksStats.GeneratedBlockByteCount)), + TempUploadStats.BlockCount.load(), + NiceBytes(TempUploadStats.BlocksBytes), + + TempLooseChunksStats.CompressedChunkCount.load(), + NiceBytes(TempLooseChunksStats.CompressedChunkBytes.load()), + NiceNum(GetBytesPerSecond(TempLooseChunksStats.CompressChunksElapsedWallTimeUS, + TempLooseChunksStats.CompressedChunkBytes)), + TempUploadStats.ChunkCount.load(), + NiceBytes(TempUploadStats.ChunksBytes), + + NiceBytes(TempUploadStats.BlocksBytes + TempUploadStats.ChunksBytes), + NiceNum(GetBytesPerSecond(TempUploadStats.ElapsedWallTimeUS, TempUploadStats.ChunksBytes * 8)), + NiceTimeSpanMs(TempChunkUploadTimeUs / 1000)); + }); + UploadPartBlobs(Storage, + BuildId, + Path, + LocalContent, + LocalLookup, + RawHashes, + NewBlockChunks, + NewBlocks, + LooseChunkIndexes, + LargeAttachmentSize, + AbortFlag, + DiskStats, + TempUploadStats, + TempGenerateBlocksStats, + TempLooseChunksStats); + UploadStats += TempUploadStats; + LooseChunksStats += TempLooseChunksStats; + GenerateBlocksStats += TempGenerateBlocksStats; + } + }; + if (IgnoreExistingBlocks) + { + ZEN_CONSOLE_VERBOSE("PutBuildPart uploading all attachments, needs are: {}", + FormatArray<IoHash>(PutBuildPartResult.second, "\n "sv)); + + std::vector<IoHash> ForceUploadChunkHashes; + ForceUploadChunkHashes.reserve(LooseChunkIndexes.size()); + + for (uint32_t ChunkIndex : LooseChunkIndexes) + { + ForceUploadChunkHashes.push_back(LocalContent.ChunkedContent.ChunkHashes[ChunkIndex]); + } + + for (size_t BlockIndex = 0; BlockIndex < NewBlocks.BlockBuffers.size(); BlockIndex++) + { + if (NewBlocks.BlockBuffers[BlockIndex]) + { + // Block was not uploaded during generation + ForceUploadChunkHashes.push_back(NewBlocks.BlockDescriptions[BlockIndex].BlockHash); + } + } + UploadAttachments(ForceUploadChunkHashes); + } + else if (!PutBuildPartResult.second.empty()) + { + ZEN_CONSOLE_VERBOSE("PutBuildPart needs attachments: {}", FormatArray<IoHash>(PutBuildPartResult.second, "\n "sv)); + UploadAttachments(PutBuildPartResult.second); + } + + while (true) + { + Stopwatch FinalizeBuildPartTimer; + std::vector<IoHash> Needs = Storage.FinalizeBuildPart(BuildId, BuildPartId, PartHash); + ZEN_CONSOLE("FinalizeBuildPart took {}. {} attachments are missing.", + NiceLatencyNs(FinalizeBuildPartTimer.GetElapsedTimeUs() * 1000), + Needs.size()); + if (Needs.empty()) + { + break; + } + if (AbortFlag) + { + return true; + } + ZEN_CONSOLE_VERBOSE("FinalizeBuildPart needs attachments: {}", FormatArray<IoHash>(Needs, "\n "sv)); + UploadAttachments(Needs); + if (AbortFlag) + { + return true; + } + } + + if (AbortFlag) + { + return true; + } + + if (CreateBuild) + { + Stopwatch FinalizeBuildTimer; + Storage.FinalizeBuild(BuildId); + ZEN_CONSOLE("FinalizeBuild took {}", NiceLatencyNs(FinalizeBuildTimer.GetElapsedTimeUs() * 1000)); + } + + if (!NewBlocks.BlockDescriptions.empty()) + { + uint64_t UploadBlockMetadataCount = 0; + std::vector<IoHash> BlockHashes; + BlockHashes.reserve(NewBlocks.BlockDescriptions.size()); + Stopwatch UploadBlockMetadataTimer; + for (size_t BlockIndex = 0; BlockIndex < NewBlocks.BlockDescriptions.size(); BlockIndex++) + { + const IoHash& BlockHash = NewBlocks.BlockDescriptions[BlockIndex].BlockHash; + if (!NewBlocks.MetaDataHasBeenUploaded[BlockIndex]) + { + const CbObject BlockMetaData = + BuildChunkBlockDescription(NewBlocks.BlockDescriptions[BlockIndex], NewBlocks.BlockMetaDatas[BlockIndex]); + Storage.PutBlockMetadata(BuildId, BlockHash, BlockMetaData); + UploadStats.BlocksBytes += BlockMetaData.GetSize(); + NewBlocks.MetaDataHasBeenUploaded[BlockIndex] = true; + UploadBlockMetadataCount++; + } + BlockHashes.push_back(BlockHash); + } + if (UploadBlockMetadataCount > 0) + { + uint64_t ElapsedUS = UploadBlockMetadataTimer.GetElapsedTimeUs(); + UploadStats.ElapsedWallTimeUS += ElapsedUS; + ZEN_CONSOLE("Uploaded metadata for {} blocks in {}", UploadBlockMetadataCount, NiceTimeSpanMs(ElapsedUS / 1000)); + } + + std::vector<ChunkBlockDescription> VerifyBlockDescriptions = Storage.GetBlockMetadata(BuildId, BlockHashes); + if (VerifyBlockDescriptions.size() != BlockHashes.size()) + { + ZEN_CONSOLE("Uploaded blocks could not all be found, {} blocks are missing", + BlockHashes.size() - VerifyBlockDescriptions.size()); + return true; + } + } + + const double DeltaByteCountPercent = + ChunkingStats.BytesHashed > 0 + ? (100.0 * (FindBlocksStats.NewBlocksChunkByteCount + LooseChunksStats.CompressedChunkBytes)) / (ChunkingStats.BytesHashed) + : 0.0; + + const std::string LargeAttachmentStats = + (LargeAttachmentSize != (uint64_t)-1) ? fmt::format(" ({} as multipart)", UploadStats.MultipartAttachmentCount.load()) : ""; + + ZEN_CONSOLE_VERBOSE( + "Folder scanning stats:" + "\n FoundFileCount: {}" + "\n FoundFileByteCount: {}" + "\n AcceptedFileCount: {}" + "\n AcceptedFileByteCount: {}" + "\n ElapsedWallTimeUS: {}", + LocalFolderScanStats.FoundFileCount.load(), + NiceBytes(LocalFolderScanStats.FoundFileByteCount.load()), + LocalFolderScanStats.AcceptedFileCount.load(), + NiceBytes(LocalFolderScanStats.AcceptedFileByteCount.load()), + NiceLatencyNs(LocalFolderScanStats.ElapsedWallTimeUS * 1000)); + + ZEN_CONSOLE_VERBOSE( + "Chunking stats:" + "\n FilesProcessed: {}" + "\n FilesChunked: {}" + "\n BytesHashed: {}" + "\n UniqueChunksFound: {}" + "\n UniqueSequencesFound: {}" + "\n UniqueBytesFound: {}" + "\n ElapsedWallTimeUS: {}", + ChunkingStats.FilesProcessed.load(), + ChunkingStats.FilesChunked.load(), + NiceBytes(ChunkingStats.BytesHashed.load()), + ChunkingStats.UniqueChunksFound.load(), + ChunkingStats.UniqueSequencesFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load()), + NiceLatencyNs(ChunkingStats.ElapsedWallTimeUS * 1000)); + + ZEN_CONSOLE_VERBOSE( + "Find block stats:" + "\n FindBlockTimeMS: {}" + "\n PotentialChunkCount: {}" + "\n PotentialChunkByteCount: {}" + "\n FoundBlockCount: {}" + "\n FoundBlockChunkCount: {}" + "\n FoundBlockByteCount: {}" + "\n AcceptedBlockCount: {}" + "\n AcceptedChunkCount: {}" + "\n AcceptedByteCount: {}" + "\n RejectedBlockCount: {}" + "\n RejectedChunkCount: {}" + "\n RejectedByteCount: {}" + "\n AcceptedReduntantChunkCount: {}" + "\n AcceptedReduntantByteCount: {}" + "\n NewBlocksCount: {}" + "\n NewBlocksChunkCount: {}" + "\n NewBlocksChunkByteCount: {}", + NiceTimeSpanMs(FindBlocksStats.FindBlockTimeMS), + FindBlocksStats.PotentialChunkCount, + NiceBytes(FindBlocksStats.PotentialChunkByteCount), + FindBlocksStats.FoundBlockCount, + FindBlocksStats.FoundBlockChunkCount, + NiceBytes(FindBlocksStats.FoundBlockByteCount), + FindBlocksStats.AcceptedBlockCount, + FindBlocksStats.AcceptedChunkCount, + NiceBytes(FindBlocksStats.AcceptedByteCount), + FindBlocksStats.RejectedBlockCount, + FindBlocksStats.RejectedChunkCount, + NiceBytes(FindBlocksStats.RejectedByteCount), + FindBlocksStats.AcceptedReduntantChunkCount, + NiceBytes(FindBlocksStats.AcceptedReduntantByteCount), + FindBlocksStats.NewBlocksCount, + FindBlocksStats.NewBlocksChunkCount, + NiceBytes(FindBlocksStats.NewBlocksChunkByteCount)); + + ZEN_CONSOLE_VERBOSE( + "Generate blocks stats:" + "\n GeneratedBlockByteCount: {}" + "\n GeneratedBlockCount: {}" + "\n GenerateBlocksElapsedWallTimeUS: {}", + NiceBytes(GenerateBlocksStats.GeneratedBlockByteCount.load()), + GenerateBlocksStats.GeneratedBlockCount.load(), + NiceLatencyNs(GenerateBlocksStats.GenerateBlocksElapsedWallTimeUS * 1000)); + + ZEN_CONSOLE_VERBOSE( + "Generate blocks stats:" + "\n ChunkCount: {}" + "\n ChunkByteCount: {}" + "\n CompressedChunkCount: {}" + "\n CompressChunksElapsedWallTimeUS: {}", + LooseChunksStats.ChunkCount, + NiceBytes(LooseChunksStats.ChunkByteCount), + LooseChunksStats.CompressedChunkCount.load(), + NiceBytes(LooseChunksStats.CompressedChunkBytes.load()), + NiceLatencyNs(LooseChunksStats.CompressChunksElapsedWallTimeUS * 1000)); + + ZEN_CONSOLE_VERBOSE( + "Disk stats:" + "\n OpenReadCount: {}" + "\n OpenWriteCount: {}" + "\n ReadCount: {}" + "\n ReadByteCount: {}" + "\n WriteCount: {}" + "\n WriteByteCount: {}" + "\n CurrentOpenFileCount: {}", + DiskStats.OpenReadCount.load(), + DiskStats.OpenWriteCount.load(), + DiskStats.ReadCount.load(), + NiceBytes(DiskStats.ReadByteCount.load()), + DiskStats.WriteCount.load(), + NiceBytes(DiskStats.WriteByteCount.load()), + DiskStats.CurrentOpenFileCount.load()); + + ZEN_CONSOLE_VERBOSE( + "Upload stats:" + "\n BlockCount: {}" + "\n BlocksBytes: {}" + "\n ChunkCount: {}" + "\n ChunksBytes: {}" + "\n ReadFromDiskBytes: {}" + "\n MultipartAttachmentCount: {}" + "\n ElapsedWallTimeUS: {}", + UploadStats.BlockCount.load(), + NiceBytes(UploadStats.BlocksBytes.load()), + UploadStats.ChunkCount.load(), + NiceBytes(UploadStats.ChunksBytes.load()), + NiceBytes(UploadStats.ReadFromDiskBytes.load()), + UploadStats.MultipartAttachmentCount.load(), + NiceLatencyNs(UploadStats.ElapsedWallTimeUS * 1000)); + + ZEN_CONSOLE( + "Uploaded {}\n" + " Delta: {}/{} ({:.1f}%)\n" + " Blocks: {} ({})\n" + " Chunks: {} ({}){}\n" + " Rate: {}bits/sec", + NiceBytes(UploadStats.BlocksBytes + UploadStats.ChunksBytes), + + NiceBytes(FindBlocksStats.NewBlocksChunkByteCount + LooseChunksStats.CompressedChunkBytes), + NiceBytes(ChunkingStats.BytesHashed), + DeltaByteCountPercent, + + UploadStats.BlockCount.load(), + NiceBytes(UploadStats.BlocksBytes), + UploadStats.ChunkCount.load(), + NiceBytes(UploadStats.ChunksBytes), + LargeAttachmentStats, + + NiceNum(GetBytesPerSecond(UploadStats.ElapsedWallTimeUS, (UploadStats.ChunksBytes + UploadStats.BlocksBytes * 8)))); + + ZEN_CONSOLE("Uploaded ({}) build {} part {} ({}) in {}", + NiceBytes(FindBlocksStats.NewBlocksChunkByteCount + LooseChunksStats.CompressedChunkBytes), + BuildId, + BuildPartName, + BuildPartId, + NiceTimeSpanMs(ProcessTimer.GetElapsedTimeMs())); + return false; + } + + void VerifyFolder(const ChunkedFolderContent& Content, const std::filesystem::path& Path, std::atomic<bool>& AbortFlag) + { + ProgressBar ProgressBar(UsePlainProgress); + std::atomic<uint64_t> FilesVerified(0); + std::atomic<uint64_t> FilesFailed(0); + std::atomic<uint64_t> ReadBytes(0); + + WorkerThreadPool& VerifyPool = GetMediumWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool(); // + + ParallellWork Work(AbortFlag); + + const uint32_t PathCount = gsl::narrow<uint32_t>(Content.Paths.size()); + + RwLock ErrorLock; + std::vector<std::string> Errors; + + auto IsAcceptedFolder = [ExcludeFolders = DefaultExcludeFolders](const std::string_view& RelativePath) -> bool { + for (const std::string_view& ExcludeFolder : ExcludeFolders) + { + if (RelativePath.starts_with(ExcludeFolder)) + { + if (RelativePath.length() == ExcludeFolder.length()) + { + return false; + } + else if (RelativePath[ExcludeFolder.length()] == '/') + { + return false; + } + } + } + return true; + }; + + const ChunkedContentLookup Lookup = BuildChunkedContentLookup(Content); + + for (uint32_t PathIndex = 0; PathIndex < PathCount; PathIndex++) + { + if (Work.IsAborted()) + { + break; + } + + Work.ScheduleWork( + VerifyPool, + [&, PathIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + // TODO: Convert ScheduleWork body to function + + const std::filesystem::path TargetPath = (Path / Content.Paths[PathIndex]).make_preferred(); + if (IsAcceptedFolder(TargetPath.parent_path().generic_string())) + { + const uint64_t ExpectedSize = Content.RawSizes[PathIndex]; + if (!std::filesystem::exists(TargetPath)) + { + ErrorLock.WithExclusiveLock([&]() { + Errors.push_back(fmt::format("File {} with expected size {} does not exist", TargetPath, ExpectedSize)); + }); + FilesFailed++; + } + else + { + std::error_code Ec; + uint64_t SizeOnDisk = gsl::narrow<uint64_t>(std::filesystem::file_size(TargetPath, Ec)); + if (Ec) + { + ErrorLock.WithExclusiveLock([&]() { + Errors.push_back( + fmt::format("Failed to get size of file {}: {} ({})", TargetPath, Ec.message(), Ec.value())); + }); + FilesFailed++; + } + else if (SizeOnDisk < ExpectedSize) + { + ErrorLock.WithExclusiveLock([&]() { + Errors.push_back(fmt::format("Size of file {} is smaller than expected. Expected: {}, Found: {}", + TargetPath, + ExpectedSize, + SizeOnDisk)); + }); + FilesFailed++; + } + else if (SizeOnDisk > ExpectedSize) + { + ErrorLock.WithExclusiveLock([&]() { + Errors.push_back(fmt::format("Size of file {} is bigger than expected. Expected: {}, Found: {}", + TargetPath, + ExpectedSize, + SizeOnDisk)); + }); + FilesFailed++; + } + else if (SizeOnDisk > 0) + { + const IoHash& ExpectedRawHash = Content.RawHashes[PathIndex]; + IoBuffer Buffer = IoBufferBuilder::MakeFromFile(TargetPath); + IoHash RawHash = IoHash::HashBuffer(Buffer); + if (RawHash != ExpectedRawHash) + { + uint64_t FileOffset = 0; + const uint32_t SequenceRawHashesIndex = Lookup.RawHashToSequenceRawHashIndex.at(ExpectedRawHash); + const uint32_t OrderOffset = Lookup.SequenceRawHashIndexChunkOrderOffset[SequenceRawHashesIndex]; + for (uint32_t OrderIndex = OrderOffset; + OrderIndex < OrderOffset + Content.ChunkedContent.ChunkCounts[SequenceRawHashesIndex]; + OrderIndex++) + { + uint32_t ChunkIndex = Content.ChunkedContent.ChunkOrders[OrderIndex]; + uint64_t ChunkSize = Content.ChunkedContent.ChunkRawSizes[ChunkIndex]; + IoHash ChunkHash = Content.ChunkedContent.ChunkHashes[ChunkIndex]; + IoBuffer FileChunk = IoBuffer(Buffer, FileOffset, ChunkSize); + if (IoHash::HashBuffer(FileChunk) != ChunkHash) + { + ErrorLock.WithExclusiveLock([&]() { + Errors.push_back(fmt::format( + "WARNING: Hash of file {} does not match expected hash. Expected: {}, Found: {}. " + "Mismatch at chunk {}", + TargetPath, + ExpectedRawHash, + RawHash, + OrderIndex - OrderOffset)); + }); + break; + } + FileOffset += ChunkSize; + } + FilesFailed++; + } + ReadBytes += SizeOnDisk; + } + } + } + FilesVerified++; + } + }, + [&, PathIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_UNUSED(AbortFlag); + + ErrorLock.WithExclusiveLock([&]() { + Errors.push_back(fmt::format("Failed verifying file '{}'. Reason: {}", + (Path / Content.Paths[PathIndex]).make_preferred(), + Ex.what())); + }); + FilesFailed++; + }); + } + + Work.Wait(UsePlainProgress ? 5000 : 200, [&](bool IsAborted, std::ptrdiff_t PendingWork) { + ZEN_UNUSED(IsAborted, PendingWork); + ProgressBar.UpdateState({.Task = "Verifying files ", + .Details = fmt::format("Verified {} files out of {}. Verfied: {}. Failed files: {}", + FilesVerified.load(), + PathCount, + NiceBytes(ReadBytes.load()), + FilesFailed.load()), + .TotalCount = gsl::narrow<uint64_t>(PathCount), + .RemainingCount = gsl::narrow<uint64_t>(PathCount - FilesVerified.load())}, + false); + }); + ProgressBar.Finish(); + for (const std::string& Error : Errors) + { + ZEN_CONSOLE("{}", Error); + } + if (!Errors.empty()) + { + throw std::runtime_error(fmt::format("Verify failed with {} errors", Errors.size())); + } + } + + class WriteFileCache + { + public: + WriteFileCache() {} + ~WriteFileCache() { Flush(); } + + template<typename TBufferType> + void WriteToFile(uint32_t TargetIndex, + std::function<std::filesystem::path(uint32_t TargetIndex)>&& GetTargetPath, + const TBufferType& Buffer, + uint64_t FileOffset, + uint64_t TargetFinalSize) + { + if (!SeenTargetIndexes.empty() && SeenTargetIndexes.back() == TargetIndex) + { + ZEN_ASSERT(OpenFileWriter); + OpenFileWriter->Write(Buffer, FileOffset); + } + else + { + Flush(); + const std::filesystem::path& TargetPath = GetTargetPath(TargetIndex); + CreateDirectories(TargetPath.parent_path()); + uint32_t Tries = 5; + std::unique_ptr<BasicFile> NewOutputFile( + std::make_unique<BasicFile>(TargetPath, BasicFile::Mode::kWrite, [&Tries, TargetPath](std::error_code& Ec) { + if (Tries < 3) + { + ZEN_CONSOLE("Failed opening file '{}': {}{}", TargetPath, Ec.message(), Tries > 1 ? " Retrying"sv : ""sv); + } + if (Tries > 1) + { + Sleep(100); + } + return --Tries > 0; + })); + + const bool CacheWriter = TargetFinalSize > Buffer.GetSize(); + if (CacheWriter) + { + ZEN_ASSERT(std::find(SeenTargetIndexes.begin(), SeenTargetIndexes.end(), TargetIndex) == SeenTargetIndexes.end()); + + OutputFile = std::move(NewOutputFile); + OpenFileWriter = std::make_unique<BasicFileWriter>(*OutputFile, Min(TargetFinalSize, 256u * 1024u)); + OpenFileWriter->Write(Buffer, FileOffset); + SeenTargetIndexes.push_back(TargetIndex); + } + else + { + NewOutputFile->Write(Buffer, FileOffset); + } + } + } + + void Flush() + { + OpenFileWriter = {}; + OutputFile = {}; + } + std::vector<uint32_t> SeenTargetIndexes; + std::unique_ptr<BasicFile> OutputFile; + std::unique_ptr<BasicFileWriter> OpenFileWriter; + }; + + std::vector<const ChunkedContentLookup::ChunkLocation*> GetRemainingChunkTargets( + const std::vector<bool>& RemotePathIndexWantsCopyFromCacheFlags, + const ChunkedContentLookup& Lookup, + uint32_t ChunkIndex) + { + std::span<const ChunkedContentLookup::ChunkLocation> ChunkSources = GetChunkLocations(Lookup, ChunkIndex); + std::vector<const ChunkedContentLookup::ChunkLocation*> ChunkTargetPtrs; + if (!ChunkSources.empty()) + { + ChunkTargetPtrs.reserve(ChunkSources.size()); + for (const ChunkedContentLookup::ChunkLocation& Source : ChunkSources) + { + if (!RemotePathIndexWantsCopyFromCacheFlags[Source.PathIndex]) + { + ChunkTargetPtrs.push_back(&Source); + } + } + } + return ChunkTargetPtrs; + }; + + bool WriteBlockToDisk(const std::filesystem::path& Path, + const ChunkedFolderContent& Content, + const std::vector<bool>& RemotePathIndexWantsCopyFromCacheFlags, + const CompositeBuffer& DecompressedBlockBuffer, + const ChunkedContentLookup& Lookup, + std::atomic<bool>* RemoteChunkIndexNeedsCopyFromSourceFlags, + uint32_t& OutChunksComplete, + uint64_t& OutBytesWritten) + { + std::vector<CompositeBuffer> ChunkBuffers; + struct WriteOpData + { + const ChunkedContentLookup::ChunkLocation* Target; + size_t ChunkBufferIndex; + }; + std::vector<WriteOpData> WriteOps; + + SharedBuffer BlockBuffer = DecompressedBlockBuffer.Flatten(); + uint64_t HeaderSize = 0; + if (IterateChunkBlock( + BlockBuffer, + [&](CompressedBuffer&& Chunk, const IoHash& ChunkHash) { + if (auto It = Lookup.ChunkHashToChunkIndex.find(ChunkHash); It != Lookup.ChunkHashToChunkIndex.end()) + { + const uint32_t ChunkIndex = It->second; + std::vector<const ChunkedContentLookup::ChunkLocation*> ChunkTargetPtrs = + GetRemainingChunkTargets(RemotePathIndexWantsCopyFromCacheFlags, Lookup, ChunkIndex); + + if (!ChunkTargetPtrs.empty()) + { + bool NeedsWrite = true; + if (RemoteChunkIndexNeedsCopyFromSourceFlags[ChunkIndex].compare_exchange_strong(NeedsWrite, false)) + { + CompositeBuffer Decompressed = Chunk.DecompressToComposite(); + if (!Decompressed) + { + throw std::runtime_error(fmt::format("Decompression of build blob {} failed", ChunkHash)); + } + ZEN_ASSERT_SLOW(ChunkHash == IoHash::HashBuffer(Decompressed)); + ZEN_ASSERT(Decompressed.GetSize() == Content.ChunkedContent.ChunkRawSizes[ChunkIndex]); + for (const ChunkedContentLookup::ChunkLocation* Target : ChunkTargetPtrs) + { + WriteOps.push_back(WriteOpData{.Target = Target, .ChunkBufferIndex = ChunkBuffers.size()}); + } + ChunkBuffers.emplace_back(std::move(Decompressed)); + } + } + } + }, + HeaderSize)) + { + if (!WriteOps.empty()) + { + std::sort(WriteOps.begin(), WriteOps.end(), [](const WriteOpData& Lhs, const WriteOpData& Rhs) { + if (Lhs.Target->PathIndex < Rhs.Target->PathIndex) + { + return true; + } + if (Lhs.Target->PathIndex > Rhs.Target->PathIndex) + { + return false; + } + return Lhs.Target->Offset < Rhs.Target->Offset; + }); + + WriteFileCache OpenFileCache; + for (const WriteOpData& WriteOp : WriteOps) + { + const CompositeBuffer& Chunk = ChunkBuffers[WriteOp.ChunkBufferIndex]; + const uint32_t PathIndex = WriteOp.Target->PathIndex; + const uint64_t ChunkSize = Chunk.GetSize(); + const uint64_t FileOffset = WriteOp.Target->Offset; + ZEN_ASSERT(FileOffset + ChunkSize <= Content.RawSizes[PathIndex]); + + OpenFileCache.WriteToFile<CompositeBuffer>( + PathIndex, + [&Path, &Content](uint32_t TargetIndex) { return (Path / Content.Paths[TargetIndex]).make_preferred(); }, + Chunk, + FileOffset, + Content.RawSizes[PathIndex]); + OutBytesWritten += ChunkSize; + } + OutChunksComplete += gsl::narrow<uint32_t>(ChunkBuffers.size()); + } + return true; + } + return false; + } + + SharedBuffer Decompress(const IoBuffer& CompressedChunk, const IoHash& ChunkHash, const uint64_t ChunkRawSize) + { + IoHash RawHash; + uint64_t RawSize; + CompressedBuffer Compressed = CompressedBuffer::FromCompressed(SharedBuffer(CompressedChunk), RawHash, RawSize); + if (!Compressed) + { + throw std::runtime_error(fmt::format("Invalid build blob format for chunk {}", ChunkHash)); + } + if (RawHash != ChunkHash) + { + throw std::runtime_error(fmt::format("Mismatching build blob {}, but compressed header rawhash is {}", ChunkHash, RawHash)); + } + if (RawSize != ChunkRawSize) + { + throw std::runtime_error( + fmt::format("Mismatching build blob {}, expected raw size {} but recevied raw size {}", ChunkHash, ChunkRawSize, RawSize)); + } + if (!Compressed) + { + throw std::runtime_error(fmt::format("Invalid build blob {}, not a compressed buffer", ChunkHash)); + } + + SharedBuffer Decompressed = Compressed.Decompress(); + + if (!Decompressed) + { + throw std::runtime_error(fmt::format("Decompression of build blob {} failed", ChunkHash)); + } + return Decompressed; + } + + void WriteChunkToDisk(const std::filesystem::path& Path, + const ChunkedFolderContent& Content, + std::span<const ChunkedContentLookup::ChunkLocation* const> ChunkTargets, + const CompositeBuffer& ChunkData, + WriteFileCache& OpenFileCache, + uint64_t& OutBytesWritten) + { + for (const ChunkedContentLookup::ChunkLocation* TargetPtr : ChunkTargets) + { + const auto& Target = *TargetPtr; + const uint64_t FileOffset = Target.Offset; + + OpenFileCache.WriteToFile( + Target.PathIndex, + [&Path, &Content](uint32_t TargetIndex) { return (Path / Content.Paths[TargetIndex]).make_preferred(); }, + ChunkData, + FileOffset, + Content.RawSizes[Target.PathIndex]); + OutBytesWritten += ChunkData.GetSize(); + } + } + + void DownloadLargeBlob(BuildStorage& Storage, + const std::filesystem::path& Path, + const ChunkedFolderContent& RemoteContent, + const ChunkedContentLookup& RemoteLookup, + const Oid& BuildId, + const IoHash& ChunkHash, + const std::uint64_t PreferredMultipartChunkSize, + const std::vector<const ChunkedContentLookup::ChunkLocation*>& ChunkTargetPtrs, + ParallellWork& Work, + WorkerThreadPool& WritePool, + WorkerThreadPool& NetworkPool, + std::atomic<bool>& AbortFlag, + std::atomic<uint64_t>& BytesWritten, + std::atomic<uint64_t>& WriteToDiskBytes, + std::atomic<uint64_t>& BytesDownloaded, + std::atomic<uint64_t>& LooseChunksBytes, + std::atomic<uint64_t>& DownloadedChunks, + std::atomic<uint32_t>& ChunksComplete, + std::atomic<uint64_t>& MultipartAttachmentCount) + { + struct WorkloadData + { + TemporaryFile TempFile; + }; + std::shared_ptr<WorkloadData> Workload(std::make_shared<WorkloadData>()); + + std::error_code Ec; + Workload->TempFile.CreateTemporary(Path / ZenTempChunkFolderName, Ec); + if (Ec) + { + throw std::runtime_error( + fmt::format("Failed opening temporary file '{}': {} ({})", Workload->TempFile.GetPath(), Ec.message(), Ec.value())); + } + std::vector<std::function<void()>> WorkItems = Storage.GetLargeBuildBlob( + BuildId, + ChunkHash, + PreferredMultipartChunkSize, + [&Path, + &RemoteContent, + &RemoteLookup, + &Work, + &WritePool, + Workload, + ChunkHash, + &BytesDownloaded, + &LooseChunksBytes, + &BytesWritten, + &WriteToDiskBytes, + &DownloadedChunks, + &ChunksComplete, + ChunkTargetPtrs = std::vector<const ChunkedContentLookup::ChunkLocation*>(ChunkTargetPtrs), + &AbortFlag](uint64_t Offset, const IoBuffer& Chunk, uint64_t BytesRemaining) { + BytesDownloaded += Chunk.GetSize(); + LooseChunksBytes += Chunk.GetSize(); + + if (!AbortFlag.load()) + { + Workload->TempFile.Write(Chunk.GetView(), Offset); + if (Chunk.GetSize() == BytesRemaining) + { + DownloadedChunks++; + + Work.ScheduleWork( + WritePool, + [&Path, + &RemoteContent, + &RemoteLookup, + ChunkHash, + Workload, + Offset, + BytesRemaining, + &ChunksComplete, + &BytesWritten, + &WriteToDiskBytes, + ChunkTargetPtrs](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + uint64_t CompressedSize = Workload->TempFile.FileSize(); + void* FileHandle = Workload->TempFile.Detach(); + IoBuffer CompressedPart = IoBuffer(IoBuffer::File, + FileHandle, + 0, + CompressedSize, + /*IsWholeFile*/ true); + if (!CompressedPart) + { + throw std::runtime_error( + fmt::format("Multipart build blob {} is not a compressed buffer", ChunkHash)); + } + CompressedPart.SetDeleteOnClose(true); + + uint64_t TotalBytesWritten = 0; + + uint32_t ChunkIndex = RemoteLookup.ChunkHashToChunkIndex.at(ChunkHash); + + SharedBuffer Chunk = + Decompress(CompressedPart, ChunkHash, RemoteContent.ChunkedContent.ChunkRawSizes[ChunkIndex]); + + // ZEN_ASSERT_SLOW(ChunkHash == + // IoHash::HashBuffer(Chunk.AsIoBuffer())); + + { + WriteFileCache OpenFileCache; + + WriteChunkToDisk(Path, + RemoteContent, + ChunkTargetPtrs, + CompositeBuffer(Chunk), + OpenFileCache, + TotalBytesWritten); + ChunksComplete++; + BytesWritten += TotalBytesWritten; + WriteToDiskBytes += TotalBytesWritten; + } + } + }, + [&, ChunkHash](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed writing chunk {}. Reason: {}", ChunkHash, Ex.what()); + AbortFlag = true; + }); + } + } + }); + if (!WorkItems.empty()) + { + MultipartAttachmentCount++; + } + for (auto& WorkItem : WorkItems) + { + Work.ScheduleWork( + NetworkPool, + [WorkItem = std::move(WorkItem)](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + WorkItem(); + } + }, + [&, ChunkHash](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed uploading multipart blob {}. Reason: {}", ChunkHash, Ex.what()); + AbortFlag = true; + }); + } + } + + bool UpdateFolder(BuildStorage& Storage, + const Oid& BuildId, + const std::filesystem::path& Path, + const std::uint64_t LargeAttachmentSize, + const std::uint64_t PreferredMultipartChunkSize, + const ChunkedFolderContent& LocalContent, + const ChunkedFolderContent& RemoteContent, + const std::vector<ChunkBlockDescription>& BlockDescriptions, + const std::vector<IoHash>& LooseChunkHashes, + bool WipeTargetFolder, + std::atomic<bool>& AbortFlag, + FolderContent& OutLocalFolderState) + { + std::atomic<uint64_t> DownloadedBlocks = 0; + std::atomic<uint64_t> BlockBytes = 0; + std::atomic<uint64_t> DownloadedChunks = 0; + std::atomic<uint64_t> LooseChunksBytes = 0; + std::atomic<uint64_t> WriteToDiskBytes = 0; + std::atomic<uint64_t> MultipartAttachmentCount = 0; + + DiskStatistics DiskStats; + + Stopwatch IndexTimer; + + const ChunkedContentLookup LocalLookup = BuildChunkedContentLookup(LocalContent); + + const ChunkedContentLookup RemoteLookup = BuildChunkedContentLookup(RemoteContent); + + ZEN_CONSOLE("Indexed local and remote content in {}", NiceTimeSpanMs(IndexTimer.GetElapsedTimeMs())); + + const std::filesystem::path CacheFolderPath = Path / ZenTempReuseFolderName; + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> LocalRawHashToPathIndex; + + if (!WipeTargetFolder) + { + Stopwatch CacheTimer; + + for (uint32_t LocalPathIndex = 0; LocalPathIndex < LocalContent.Paths.size(); LocalPathIndex++) + { + if (LocalContent.RawSizes[LocalPathIndex] > 0) + { + const uint32_t SequenceRawHashIndex = + LocalLookup.RawHashToSequenceRawHashIndex.at(LocalContent.RawHashes[LocalPathIndex]); + uint32_t ChunkCount = LocalContent.ChunkedContent.ChunkCounts[SequenceRawHashIndex]; + if (ChunkCount > 0) + { + const IoHash LocalRawHash = LocalContent.RawHashes[LocalPathIndex]; + if (!LocalRawHashToPathIndex.contains(LocalRawHash)) + { + LocalRawHashToPathIndex.insert_or_assign(LocalRawHash, LocalPathIndex); + } + } + } + } + + { + std::vector<bool> IncludeLocalFiles(LocalContent.Paths.size(), false); + + for (const IoHash& ChunkHash : RemoteContent.ChunkedContent.ChunkHashes) + { + if (auto It = LocalLookup.ChunkHashToChunkIndex.find(ChunkHash); It != LocalLookup.ChunkHashToChunkIndex.end()) + { + const uint32_t LocalChunkIndex = It->second; + std::span<const ChunkedContentLookup::ChunkLocation> LocalChunkTargetRange = + GetChunkLocations(LocalLookup, LocalChunkIndex); + if (!LocalChunkTargetRange.empty()) + { + std::uint32_t LocalPathIndex = LocalChunkTargetRange[0].PathIndex; + IncludeLocalFiles[LocalPathIndex] = true; + } + } + } + for (const IoHash& RawHash : RemoteContent.RawHashes) + { + if (auto It = LocalRawHashToPathIndex.find(RawHash); It != LocalRawHashToPathIndex.end()) + { + uint32_t LocalPathIndex = It->second; + IncludeLocalFiles[LocalPathIndex] = true; + } + } + + for (uint32_t LocalPathIndex = 0; LocalPathIndex < LocalContent.Paths.size(); LocalPathIndex++) + { + if (!IncludeLocalFiles[LocalPathIndex]) + { + LocalRawHashToPathIndex.erase(LocalContent.RawHashes[LocalPathIndex]); + } + } + } + + uint64_t CachedBytes = 0; + CreateDirectories(CacheFolderPath); + for (auto& CachedLocalFile : LocalRawHashToPathIndex) + { + const IoHash& LocalRawHash = CachedLocalFile.first; + const uint32_t LocalPathIndex = CachedLocalFile.second; + const std::filesystem::path LocalFilePath = (Path / LocalContent.Paths[LocalPathIndex]).make_preferred(); + const std::filesystem::path CacheFilePath = (CacheFolderPath / LocalRawHash.ToHexString()).make_preferred(); + + SetFileReadOnly(LocalFilePath, false); + + std::filesystem::rename(LocalFilePath, CacheFilePath); + CachedBytes += std::filesystem::file_size(CacheFilePath); + } + + ZEN_CONSOLE("Cached {} ({}) local files in {}", + LocalRawHashToPathIndex.size(), + NiceBytes(CachedBytes), + NiceTimeSpanMs(CacheTimer.GetElapsedTimeMs())); + } + + if (AbortFlag) + { + return true; + } + + CleanDirectory(Path, DefaultExcludeFolders); + + Stopwatch CacheMappingTimer; + + std::atomic<uint64_t> BytesWritten = 0; + uint64_t CacheMappedBytesForReuse = 0; + + std::vector<bool> RemotePathIndexWantsCopyFromCacheFlags(RemoteContent.Paths.size(), false); + std::vector<std::atomic<bool>> RemoteChunkIndexWantsCopyFromCacheFlags(RemoteContent.ChunkedContent.ChunkHashes.size()); + // Guard if he same chunks is in multiple blocks (can happen due to block reuse, cache reuse blocks writes directly) + std::vector<std::atomic<bool>> RemoteChunkIndexNeedsCopyFromSourceFlags(RemoteContent.ChunkedContent.ChunkHashes.size()); + + struct CacheCopyData + { + std::filesystem::path OriginalSourceFileName; + IoHash LocalFileRawHash; + uint64_t LocalFileRawSize = 0; + std::vector<uint32_t> RemotePathIndexes; + std::vector<const ChunkedContentLookup::ChunkLocation*> ChunkSourcePtrs; + struct ChunkTarget + { + uint32_t ChunkSourceCount; + uint64_t ChunkRawSize; + uint64_t LocalFileOffset; + }; + std::vector<ChunkTarget> ChunkTargets; + }; + + tsl::robin_map<IoHash, size_t, IoHash::Hasher> RawHashToCacheCopyDataIndex; + std::vector<CacheCopyData> CacheCopyDatas; + uint32_t ChunkCountToWrite = 0; + + // Pick up all whole files to copy and/or move + for (uint32_t RemotePathIndex = 0; RemotePathIndex < RemoteContent.Paths.size(); RemotePathIndex++) + { + const IoHash& RemoteRawHash = RemoteContent.RawHashes[RemotePathIndex]; + if (auto It = LocalRawHashToPathIndex.find(RemoteRawHash); It != LocalRawHashToPathIndex.end()) + { + if (auto CopySourceIt = RawHashToCacheCopyDataIndex.find(RemoteRawHash); CopySourceIt != RawHashToCacheCopyDataIndex.end()) + { + CacheCopyData& Data = CacheCopyDatas[CopySourceIt->second]; + Data.RemotePathIndexes.push_back(RemotePathIndex); + } + else + { + const uint32_t LocalPathIndex = It->second; + ZEN_ASSERT(LocalContent.RawSizes[LocalPathIndex] == RemoteContent.RawSizes[RemotePathIndex]); + ZEN_ASSERT(LocalContent.RawHashes[LocalPathIndex] == RemoteContent.RawHashes[RemotePathIndex]); + RawHashToCacheCopyDataIndex.insert_or_assign(RemoteRawHash, CacheCopyDatas.size()); + CacheCopyDatas.push_back(CacheCopyData{.OriginalSourceFileName = LocalContent.Paths[LocalPathIndex], + .LocalFileRawHash = RemoteRawHash, + .LocalFileRawSize = LocalContent.RawSizes[LocalPathIndex], + .RemotePathIndexes = {RemotePathIndex}}); + CacheMappedBytesForReuse += RemoteContent.RawSizes[RemotePathIndex]; + ChunkCountToWrite++; + } + RemotePathIndexWantsCopyFromCacheFlags[RemotePathIndex] = true; + } + } + + // Pick up all chunks in cached files and make sure we block moving of cache files if we need part of them + for (auto& CachedLocalFile : LocalRawHashToPathIndex) + { + const IoHash& LocalFileRawHash = CachedLocalFile.first; + const uint32_t LocalPathIndex = CachedLocalFile.second; + const uint32_t LocalSequenceRawHashIndex = LocalLookup.RawHashToSequenceRawHashIndex.at(LocalFileRawHash); + const uint32_t LocalOrderOffset = + LocalLookup.SequenceRawHashIndexChunkOrderOffset[LocalSequenceRawHashIndex]; // CachedLocalFile.second.ChunkOrderOffset; + + { + uint64_t SourceOffset = 0; + const uint32_t LocalChunkCount = LocalContent.ChunkedContent.ChunkCounts[LocalSequenceRawHashIndex]; + for (uint32_t LocalOrderIndex = 0; LocalOrderIndex < LocalChunkCount; LocalOrderIndex++) + { + const uint32_t LocalChunkIndex = LocalContent.ChunkedContent.ChunkOrders[LocalOrderOffset + LocalOrderIndex]; + const IoHash& LocalChunkHash = LocalContent.ChunkedContent.ChunkHashes[LocalChunkIndex]; + const uint64_t LocalChunkRawSize = LocalContent.ChunkedContent.ChunkRawSizes[LocalChunkIndex]; + if (auto RemoteChunkIt = RemoteLookup.ChunkHashToChunkIndex.find(LocalChunkHash); + RemoteChunkIt != RemoteLookup.ChunkHashToChunkIndex.end()) + { + const uint32_t RemoteChunkIndex = RemoteChunkIt->second; + if (!RemoteChunkIndexWantsCopyFromCacheFlags[RemoteChunkIndex]) + { + std::vector<const ChunkedContentLookup::ChunkLocation*> ChunkTargetPtrs = + GetRemainingChunkTargets(RemotePathIndexWantsCopyFromCacheFlags, RemoteLookup, RemoteChunkIndex); + + if (!ChunkTargetPtrs.empty()) + { + CacheCopyData::ChunkTarget Target = {.ChunkSourceCount = gsl::narrow<uint32_t>(ChunkTargetPtrs.size()), + .ChunkRawSize = LocalChunkRawSize, + .LocalFileOffset = SourceOffset}; + if (auto CopySourceIt = RawHashToCacheCopyDataIndex.find(LocalFileRawHash); + CopySourceIt != RawHashToCacheCopyDataIndex.end()) + { + CacheCopyData& Data = CacheCopyDatas[CopySourceIt->second]; + Data.ChunkSourcePtrs.insert(Data.ChunkSourcePtrs.end(), ChunkTargetPtrs.begin(), ChunkTargetPtrs.end()); + Data.ChunkTargets.push_back(Target); + } + else + { + RawHashToCacheCopyDataIndex.insert_or_assign(LocalFileRawHash, CacheCopyDatas.size()); + CacheCopyDatas.push_back( + CacheCopyData{.OriginalSourceFileName = LocalContent.Paths[LocalPathIndex], + .LocalFileRawHash = LocalFileRawHash, + .LocalFileRawSize = LocalContent.RawSizes[LocalPathIndex], + .RemotePathIndexes = {}, + .ChunkSourcePtrs = ChunkTargetPtrs, + .ChunkTargets = std::vector<CacheCopyData::ChunkTarget>{Target}}); + } + CacheMappedBytesForReuse += LocalChunkRawSize; + } + RemoteChunkIndexWantsCopyFromCacheFlags[RemoteChunkIndex] = true; + } + } + SourceOffset += LocalChunkRawSize; + } + } + } + + for (uint32_t RemoteChunkIndex = 0; RemoteChunkIndex < RemoteContent.ChunkedContent.ChunkHashes.size(); RemoteChunkIndex++) + { + if (RemoteChunkIndexWantsCopyFromCacheFlags[RemoteChunkIndex]) + { + ChunkCountToWrite++; + } + else + { + std::vector<const ChunkedContentLookup::ChunkLocation*> ChunkTargetPtrs = + GetRemainingChunkTargets(RemotePathIndexWantsCopyFromCacheFlags, RemoteLookup, RemoteChunkIndex); + if (!ChunkTargetPtrs.empty()) + { + RemoteChunkIndexNeedsCopyFromSourceFlags[RemoteChunkIndex] = true; + ChunkCountToWrite++; + } + } + } + std::atomic<uint32_t> ChunkCountWritten = 0; + + ZEN_CONSOLE("Mapped {} cached data for reuse in {}", + NiceBytes(CacheMappedBytesForReuse), + NiceTimeSpanMs(CacheMappingTimer.GetElapsedTimeMs())); + + auto CopyChunksFromCacheFile = [](const std::filesystem::path& Path, + BufferedOpenFile& SourceFile, + WriteFileCache& OpenFileCache, + const ChunkedFolderContent& RemoteContent, + const uint64_t LocalFileSourceOffset, + const uint64_t LocalChunkRawSize, + std::span<const ChunkedContentLookup::ChunkLocation* const> ChunkTargetPtrs, + uint64_t& OutBytesWritten) { + CompositeBuffer Chunk = SourceFile.GetRange(LocalFileSourceOffset, LocalChunkRawSize); + uint64_t TotalBytesWritten = 0; + + WriteChunkToDisk(Path, RemoteContent, ChunkTargetPtrs, Chunk, OpenFileCache, TotalBytesWritten); + OutBytesWritten += TotalBytesWritten; + }; + + auto CloneFullFileFromCache = [](const std::filesystem::path& Path, + const std::filesystem::path& CacheFolderPath, + const ChunkedFolderContent& RemoteContent, + const IoHash& FileRawHash, + const uint64_t FileRawSize, + std::span<const uint32_t> FullCloneRemotePathIndexes, + bool CanMove, + uint64_t& OutBytesWritten) { + const std::filesystem::path CacheFilePath = (CacheFolderPath / FileRawHash.ToHexString()).make_preferred(); + + size_t CopyCount = FullCloneRemotePathIndexes.size(); + if (CanMove) + { + // If every reference to this chunk has are full files we can move the cache file to the last target + CopyCount--; + } + + for (uint32_t RemotePathIndex : FullCloneRemotePathIndexes) + { + const std::filesystem::path TargetPath = (Path / RemoteContent.Paths[RemotePathIndex]).make_preferred(); + CreateDirectories(TargetPath.parent_path()); + + if (CopyCount == 0) + { + std::filesystem::rename(CacheFilePath, TargetPath); + } + else + { + CopyFile(CacheFilePath, TargetPath, {.EnableClone = false}); + ZEN_ASSERT(CopyCount > 0); + CopyCount--; + } + OutBytesWritten += FileRawSize; + } + }; + + WorkerThreadPool& NetworkPool = GetSmallWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool(); // + WorkerThreadPool& WritePool = GetMediumWorkerPool(EWorkloadType::Burst); // GetSyncWorkerPool(); // + + ProgressBar WriteProgressBar(UsePlainProgress); + ParallellWork Work(AbortFlag); + + std::atomic<uint64_t> BytesDownloaded = 0; + + for (size_t CopyDataIndex = 0; CopyDataIndex < CacheCopyDatas.size(); CopyDataIndex++) + { + if (AbortFlag) + { + break; + } + + Work.ScheduleWork( + WritePool, // GetSyncWorkerPool(),// + [&, CopyDataIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + const CacheCopyData& CopyData = CacheCopyDatas[CopyDataIndex]; + const std::filesystem::path CacheFilePath = + (CacheFolderPath / CopyData.LocalFileRawHash.ToHexString()).make_preferred(); + + if (!CopyData.ChunkSourcePtrs.empty()) + { + uint64_t CacheLocalFileBytesRead = 0; + + size_t TargetStart = 0; + const std::span<const ChunkedContentLookup::ChunkLocation* const> AllTargets(CopyData.ChunkSourcePtrs); + + struct WriteOp + { + const ChunkedContentLookup::ChunkLocation* Target; + uint64_t LocalFileOffset; + uint64_t ChunkSize; + }; + + std::vector<WriteOp> WriteOps; + WriteOps.reserve(CopyData.ChunkSourcePtrs.size()); + + for (const CacheCopyData::ChunkTarget& ChunkTarget : CopyData.ChunkTargets) + { + std::span<const ChunkedContentLookup::ChunkLocation* const> TargetRange = + AllTargets.subspan(TargetStart, ChunkTarget.ChunkSourceCount); + for (const ChunkedContentLookup::ChunkLocation* Target : TargetRange) + { + WriteOps.push_back(WriteOp{.Target = Target, + .LocalFileOffset = ChunkTarget.LocalFileOffset, + .ChunkSize = ChunkTarget.ChunkRawSize}); + } + TargetStart += ChunkTarget.ChunkSourceCount; + } + + std::sort(WriteOps.begin(), WriteOps.end(), [](const WriteOp& Lhs, const WriteOp& Rhs) { + if (Lhs.Target->PathIndex < Rhs.Target->PathIndex) + { + return true; + } + else if (Lhs.Target->PathIndex > Rhs.Target->PathIndex) + { + return false; + } + if (Lhs.Target->Offset < Rhs.Target->Offset) + { + return true; + } + return false; + }); + + BufferedOpenFile SourceFile(CacheFilePath); + WriteFileCache OpenFileCache; + for (const WriteOp& Op : WriteOps) + { + const uint32_t RemotePathIndex = Op.Target->PathIndex; + const uint64_t ChunkSize = Op.ChunkSize; + CompositeBuffer ChunkSource = SourceFile.GetRange(Op.LocalFileOffset, ChunkSize); + + ZEN_ASSERT(Op.Target->Offset + ChunkSource.GetSize() <= RemoteContent.RawSizes[RemotePathIndex]); + + OpenFileCache.WriteToFile<CompositeBuffer>( + RemotePathIndex, + [&Path, &RemoteContent](uint32_t TargetIndex) { + return (Path / RemoteContent.Paths[TargetIndex]).make_preferred(); + }, + ChunkSource, + Op.Target->Offset, + RemoteContent.RawSizes[RemotePathIndex]); + BytesWritten += ChunkSize; + WriteToDiskBytes += ChunkSize; + CacheLocalFileBytesRead += ChunkSize; // TODO: This should be the sum of unique chunk sizes? + } + ChunkCountWritten += gsl::narrow<uint32_t>(CopyData.ChunkTargets.size()); + ZEN_DEBUG("Copied {} from {}", NiceBytes(CacheLocalFileBytesRead), CopyData.OriginalSourceFileName); + } + + if (CopyData.RemotePathIndexes.empty()) + { + std::filesystem::remove(CacheFilePath); + } + else + { + uint64_t LocalBytesWritten = 0; + CloneFullFileFromCache(Path, + CacheFolderPath, + RemoteContent, + CopyData.LocalFileRawHash, + CopyData.LocalFileRawSize, + CopyData.RemotePathIndexes, + true, + LocalBytesWritten); + // CacheLocalFileBytesRead += CopyData.LocalFileRawSize; + BytesWritten += LocalBytesWritten; + WriteToDiskBytes += LocalBytesWritten; + ChunkCountWritten++; + + ZEN_DEBUG("Used full cached file {} ({}) for {} ({}) targets", + CopyData.OriginalSourceFileName, + NiceBytes(CopyData.LocalFileRawSize), + CopyData.RemotePathIndexes.size(), + NiceBytes(LocalBytesWritten)); + } + } + }, + [&, CopyDataIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed reading cached file {}. Reason: {}", + CacheCopyDatas[CopyDataIndex].OriginalSourceFileName, + Ex.what()); + AbortFlag = true; + }); + } + + for (const IoHash ChunkHash : LooseChunkHashes) + { + if (AbortFlag) + { + break; + } + + uint32_t RemoteChunkIndex = RemoteLookup.ChunkHashToChunkIndex.at(ChunkHash); + if (RemoteChunkIndexWantsCopyFromCacheFlags[RemoteChunkIndex]) + { + ZEN_DEBUG("Skipping chunk {} due to cache reuse", ChunkHash); + continue; + } + bool NeedsCopy = true; + if (RemoteChunkIndexNeedsCopyFromSourceFlags[RemoteChunkIndex].compare_exchange_strong(NeedsCopy, false)) + { + std::vector<const ChunkedContentLookup::ChunkLocation*> ChunkTargetPtrs = + GetRemainingChunkTargets(RemotePathIndexWantsCopyFromCacheFlags, RemoteLookup, RemoteChunkIndex); + + if (ChunkTargetPtrs.empty()) + { + ZEN_DEBUG("Skipping chunk {} due to cache reuse", ChunkHash); + } + else + { + Work.ScheduleWork( + NetworkPool, + [&, ChunkHash, RemoteChunkIndex, ChunkTargetPtrs](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + if (RemoteContent.ChunkedContent.ChunkRawSizes[RemoteChunkIndex] >= LargeAttachmentSize) + { + DownloadLargeBlob(Storage, + Path, + RemoteContent, + RemoteLookup, + BuildId, + ChunkHash, + PreferredMultipartChunkSize, + ChunkTargetPtrs, + Work, + WritePool, + NetworkPool, + AbortFlag, + BytesWritten, + WriteToDiskBytes, + BytesDownloaded, + LooseChunksBytes, + DownloadedChunks, + ChunkCountWritten, + MultipartAttachmentCount); + } + else + { + IoBuffer CompressedPart = Storage.GetBuildBlob(BuildId, ChunkHash); + if (!CompressedPart) + { + throw std::runtime_error(fmt::format("Chunk {} is missing", ChunkHash)); + } + BytesDownloaded += CompressedPart.GetSize(); + LooseChunksBytes += CompressedPart.GetSize(); + DownloadedChunks++; + + if (!AbortFlag) + { + Work.ScheduleWork( + WritePool, + [&, ChunkHash, RemoteChunkIndex, ChunkTargetPtrs, CompressedPart = std::move(CompressedPart)]( + std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + uint64_t TotalBytesWritten = 0; + SharedBuffer Chunk = + Decompress(CompressedPart, + ChunkHash, + RemoteContent.ChunkedContent.ChunkRawSizes[RemoteChunkIndex]); + WriteFileCache OpenFileCache; + + WriteChunkToDisk(Path, + RemoteContent, + ChunkTargetPtrs, + CompositeBuffer(Chunk), + OpenFileCache, + TotalBytesWritten); + ChunkCountWritten++; + BytesWritten += TotalBytesWritten; + WriteToDiskBytes += TotalBytesWritten; + } + }, + [&, ChunkHash](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed writing chunk {}. Reason: {}", ChunkHash, Ex.what()); + AbortFlag = true; + }); + } + } + } + }, + [&, ChunkHash](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed downloading chunk {}. Reason: {}", ChunkHash, Ex.what()); + AbortFlag = true; + }); + } + } + } + + size_t BlockCount = BlockDescriptions.size(); + std::atomic<size_t> BlocksComplete = 0; + + auto IsBlockNeeded = [&RemoteContent, &RemoteLookup, &RemoteChunkIndexNeedsCopyFromSourceFlags]( + const ChunkBlockDescription& BlockDescription) -> bool { + for (const IoHash& ChunkHash : BlockDescription.ChunkRawHashes) + { + if (auto It = RemoteLookup.ChunkHashToChunkIndex.find(ChunkHash); It != RemoteLookup.ChunkHashToChunkIndex.end()) + { + const uint32_t RemoteChunkIndex = It->second; + if (RemoteChunkIndexNeedsCopyFromSourceFlags[RemoteChunkIndex]) + { + return true; + } + } + } + return false; + }; + + for (size_t BlockIndex = 0; BlockIndex < BlockCount; BlockIndex++) + { + if (Work.IsAborted()) + { + break; + } + Work.ScheduleWork( + WritePool, + [&, BlockIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + if (IsBlockNeeded(BlockDescriptions[BlockIndex])) + { + Work.ScheduleWork( + NetworkPool, + [&, BlockIndex](std::atomic<bool>& AbortFlag) { + IoBuffer BlockBuffer = Storage.GetBuildBlob(BuildId, BlockDescriptions[BlockIndex].BlockHash); + if (!BlockBuffer) + { + throw std::runtime_error( + fmt::format("Block {} is missing", BlockDescriptions[BlockIndex].BlockHash)); + } + BytesDownloaded += BlockBuffer.GetSize(); + BlockBytes += BlockBuffer.GetSize(); + DownloadedBlocks++; + + if (!AbortFlag) + { + Work.ScheduleWork( + WritePool, + [&, BlockIndex, BlockBuffer = std::move(BlockBuffer)](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + IoHash BlockRawHash; + uint64_t BlockRawSize; + CompressedBuffer CompressedBlockBuffer = + CompressedBuffer::FromCompressed(SharedBuffer(std::move(BlockBuffer)), + BlockRawHash, + BlockRawSize); + if (!CompressedBlockBuffer) + { + throw std::runtime_error(fmt::format("Block {} is not a compressed buffer", + BlockDescriptions[BlockIndex].BlockHash)); + } + + if (BlockRawHash != BlockDescriptions[BlockIndex].BlockHash) + { + throw std::runtime_error( + fmt::format("Block {} header has a mismatching raw hash {}", + BlockDescriptions[BlockIndex].BlockHash, + BlockRawHash)); + } + + CompositeBuffer DecompressedBlockBuffer = CompressedBlockBuffer.DecompressToComposite(); + if (!DecompressedBlockBuffer) + { + throw std::runtime_error(fmt::format("Block {} failed to decompress", + BlockDescriptions[BlockIndex].BlockHash)); + } + + ZEN_ASSERT_SLOW(BlockDescriptions[BlockIndex].BlockHash == + IoHash::HashBuffer(DecompressedBlockBuffer)); + + uint64_t BytesWrittenToDisk = 0; + uint32_t ChunksReadFromBlock = 0; + if (WriteBlockToDisk(Path, + RemoteContent, + RemotePathIndexWantsCopyFromCacheFlags, + DecompressedBlockBuffer, + RemoteLookup, + RemoteChunkIndexNeedsCopyFromSourceFlags.data(), + ChunksReadFromBlock, + BytesWrittenToDisk)) + { + BytesWritten += BytesWrittenToDisk; + WriteToDiskBytes += BytesWrittenToDisk; + ChunkCountWritten += ChunksReadFromBlock; + } + else + { + throw std::runtime_error( + fmt::format("Block {} is malformed", BlockDescriptions[BlockIndex].BlockHash)); + } + BlocksComplete++; + } + }, + [&, BlockIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed writing block {}. Reason: {}", + BlockDescriptions[BlockIndex].BlockHash, + Ex.what()); + AbortFlag = true; + }); + } + }, + [&, BlockIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed downloading block {}. Reason: {}", + BlockDescriptions[BlockIndex].BlockHash, + Ex.what()); + AbortFlag = true; + }); + } + else + { + ZEN_DEBUG("Skipping block {} due to cache reuse", BlockDescriptions[BlockIndex].BlockHash); + BlocksComplete++; + } + } + }, + [&, BlockIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed determning if block {} is needed. Reason: {}", BlockDescriptions[BlockIndex].BlockHash, Ex.what()); + AbortFlag = true; + }); + } + for (uint32_t PathIndex = 0; PathIndex < RemoteContent.Paths.size(); PathIndex++) + { + if (Work.IsAborted()) + { + break; + } + if (RemoteContent.RawSizes[PathIndex] == 0) + { + Work.ScheduleWork( + WritePool, + [&, PathIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + const std::filesystem::path TargetPath = (Path / RemoteContent.Paths[PathIndex]).make_preferred(); + CreateDirectories(TargetPath.parent_path()); + BasicFile OutputFile; + OutputFile.Open(TargetPath, BasicFile::Mode::kTruncate); + } + }, + [&, PathIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed creating file at {}. Reason: {}", RemoteContent.Paths[PathIndex], Ex.what()); + AbortFlag = true; + }); + } + } + + Work.Wait(UsePlainProgress ? 5000 : 200, [&](bool IsAborted, std::ptrdiff_t PendingWork) { + ZEN_UNUSED(IsAborted, PendingWork); + ZEN_ASSERT(ChunkCountToWrite >= ChunkCountWritten.load()); + WriteProgressBar.UpdateState( + {.Task = "Writing chunks ", + .Details = fmt::format("Written {} chunks out of {}. {} ouf of {} blocks complete. Downloaded: {}. Written: {}", + ChunkCountWritten.load(), + ChunkCountToWrite, + BlocksComplete.load(), + BlockCount, + NiceBytes(BytesDownloaded.load()), + NiceBytes(BytesWritten.load())), + .TotalCount = gsl::narrow<uint64_t>(ChunkCountToWrite), + .RemainingCount = gsl::narrow<uint64_t>(ChunkCountToWrite - ChunkCountWritten.load())}, + false); + }); + WriteProgressBar.Finish(); + + { + ProgressBar PremissionsProgressBar(false); + if (!RemoteContent.Attributes.empty()) + { + auto SetNativeFileAttributes = + [](const std::filesystem::path FilePath, SourcePlatform SourcePlatform, uint32_t Attributes) -> uint32_t { +#if ZEN_PLATFORM_WINDOWS + if (SourcePlatform == SourcePlatform::Windows) + { + SetFileAttributes(FilePath, Attributes); + return Attributes; + } + else + { + uint32_t CurrentAttributes = GetFileAttributes(FilePath); + uint32_t NewAttributes = MakeFileAttributeReadOnly(CurrentAttributes, IsFileModeReadOnly(Attributes)); + if (CurrentAttributes != NewAttributes) + { + SetFileAttributes(FilePath, NewAttributes); + } + return NewAttributes; + } +#endif // ZEN_PLATFORM_WINDOWS +#if ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + if (SourcePlatform != SourcePlatform::Windows) + { + SetFileMode(FilePath, Attributes); + return Attributes; + } + else + { + uint32_t CurrentMode = GetFileMode(FilePath); + uint32_t NewMode = MakeFileModeReadOnly(CurrentMode, IsFileAttributeReadOnly(Attributes)); + if (CurrentMode != NewMode) + { + SetFileMode(FilePath, NewMode); + } + return NewMode; + } +#endif // ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + }; + + OutLocalFolderState.Paths.reserve(RemoteContent.Paths.size()); + OutLocalFolderState.RawSizes.reserve(RemoteContent.Paths.size()); + OutLocalFolderState.Attributes.reserve(RemoteContent.Paths.size()); + OutLocalFolderState.ModificationTicks.reserve(RemoteContent.Paths.size()); + for (uint32_t PathIndex = 0; PathIndex < RemoteContent.Paths.size(); PathIndex++) + { + const std::filesystem::path LocalFilePath = (Path / RemoteContent.Paths[PathIndex]); + const uint32_t CurrentPlatformAttributes = + SetNativeFileAttributes(LocalFilePath, RemoteContent.Platform, RemoteContent.Attributes[PathIndex]); + + OutLocalFolderState.Paths.push_back(RemoteContent.Paths[PathIndex]); + OutLocalFolderState.RawSizes.push_back(RemoteContent.RawSizes[PathIndex]); + OutLocalFolderState.Attributes.push_back(CurrentPlatformAttributes); + OutLocalFolderState.ModificationTicks.push_back(GetModificationTickFromPath(LocalFilePath)); + + PremissionsProgressBar.UpdateState( + {.Task = "Set permissions ", + .Details = fmt::format("Updated {} files out of {}", PathIndex, RemoteContent.Paths.size()), + .TotalCount = RemoteContent.Paths.size(), + .RemainingCount = (RemoteContent.Paths.size() - PathIndex)}, + false); + } + } + PremissionsProgressBar.Finish(); + } + + return false; + } + + std::vector<std::pair<Oid, std::string>> ResolveBuildPartNames(BuildStorage& Storage, + const Oid& BuildId, + const std::vector<Oid>& BuildPartIds, + std::span<const std::string> BuildPartNames, + std::uint64_t& OutPreferredMultipartChunkSize) + { + std::vector<std::pair<Oid, std::string>> Result; + { + Stopwatch GetBuildTimer; + + std::vector<std::pair<Oid, std::string>> AvailableParts; + + CbObject BuildObject = Storage.GetBuild(BuildId); + ZEN_CONSOLE("GetBuild took {}. Payload size: {}", + NiceLatencyNs(GetBuildTimer.GetElapsedTimeUs() * 1000), + NiceBytes(BuildObject.GetSize())); + + CbObjectView PartsObject = BuildObject["parts"sv].AsObjectView(); + if (!PartsObject) + { + throw std::runtime_error("Build object does not have a 'parts' object"); + } + + OutPreferredMultipartChunkSize = BuildObject["chunkSize"sv].AsUInt64(OutPreferredMultipartChunkSize); + + for (CbFieldView PartView : PartsObject) + { + const std::string BuildPartName = std::string(PartView.GetName()); + const Oid BuildPartId = PartView.AsObjectId(); + if (BuildPartId == Oid::Zero) + { + ExtendableStringBuilder<128> SB; + for (CbFieldView ScanPartView : PartsObject) + { + SB.Append(fmt::format("\n {}: {}", ScanPartView.GetName(), ScanPartView.AsObjectId())); + } + throw std::runtime_error( + fmt::format("Build object parts does not have a '{}' object id{}", BuildPartName, SB.ToView())); + } + AvailableParts.push_back({BuildPartId, BuildPartName}); + } + + if (BuildPartIds.empty() && BuildPartNames.empty()) + { + Result = AvailableParts; + } + else + { + for (const std::string& BuildPartName : BuildPartNames) + { + if (auto It = std::find_if(AvailableParts.begin(), + AvailableParts.end(), + [&BuildPartName](const auto& Part) { return Part.second == BuildPartName; }); + It != AvailableParts.end()) + { + Result.push_back(*It); + } + else + { + throw std::runtime_error(fmt::format("Build {} object does not have a part named '{}'", BuildId, BuildPartName)); + } + } + for (const Oid& BuildPartId : BuildPartIds) + { + if (auto It = std::find_if(AvailableParts.begin(), + AvailableParts.end(), + [&BuildPartId](const auto& Part) { return Part.first == BuildPartId; }); + It != AvailableParts.end()) + { + Result.push_back(*It); + } + else + { + throw std::runtime_error(fmt::format("Build {} object does not have a part with id '{}'", BuildId, BuildPartId)); + } + } + } + + if (Result.empty()) + { + throw std::runtime_error(fmt::format("Build object does not have any parts", BuildId)); + } + } + return Result; + } + + ChunkedFolderContent GetRemoteContent(BuildStorage& Storage, + const Oid& BuildId, + const std::vector<std::pair<Oid, std::string>>& BuildParts, + std::unique_ptr<ChunkingController>& OutChunkController, + std::vector<ChunkedFolderContent>& OutPartContents, + std::vector<ChunkBlockDescription>& OutBlockDescriptions, + std::vector<IoHash>& OutLooseChunkHashes) + { + Stopwatch GetBuildPartTimer; + CbObject BuildPartManifest = Storage.GetBuildPart(BuildId, BuildParts[0].first); + ZEN_CONSOLE("GetBuildPart {} ('{}') took {}. Payload size: {}", + BuildParts[0].first, + BuildParts[0].second, + NiceLatencyNs(GetBuildPartTimer.GetElapsedTimeUs() * 1000), + NiceBytes(BuildPartManifest.GetSize())); + + { + CbObjectView Chunker = BuildPartManifest["chunker"sv].AsObjectView(); + std::string_view ChunkerName = Chunker["name"sv].AsString(); + CbObjectView Parameters = Chunker["parameters"sv].AsObjectView(); + OutChunkController = CreateChunkingController(ChunkerName, Parameters); + } + + auto ParseBuildPartManifest = [](BuildStorage& Storage, + const Oid& BuildId, + const Oid& BuildPartId, + CbObject BuildPartManifest, + ChunkedFolderContent& OutRemoteContent, + std::vector<ChunkBlockDescription>& OutBlockDescriptions, + std::vector<IoHash>& OutLooseChunkHashes) { + std::vector<uint32_t> AbsoluteChunkOrders; + std::vector<uint64_t> LooseChunkRawSizes; + std::vector<IoHash> BlockRawHashes; + + ReadBuildContentFromCompactBinary(BuildPartManifest, + OutRemoteContent.Platform, + OutRemoteContent.Paths, + OutRemoteContent.RawHashes, + OutRemoteContent.RawSizes, + OutRemoteContent.Attributes, + OutRemoteContent.ChunkedContent.SequenceRawHashes, + OutRemoteContent.ChunkedContent.ChunkCounts, + AbsoluteChunkOrders, + OutLooseChunkHashes, + LooseChunkRawSizes, + BlockRawHashes); + + // TODO: GetBlockDescriptions for all BlockRawHashes in one go - check for local block descriptions when we cache them + + Stopwatch GetBlockMetadataTimer; + OutBlockDescriptions = Storage.GetBlockMetadata(BuildId, BlockRawHashes); + ZEN_CONSOLE("GetBlockMetadata for {} took {}. Found {} blocks", + BuildPartId, + NiceLatencyNs(GetBlockMetadataTimer.GetElapsedTimeUs() * 1000), + OutBlockDescriptions.size()); + + if (OutBlockDescriptions.size() != BlockRawHashes.size()) + { + bool AttemptFallback = false; + std::string ErrorDescription = + fmt::format("All required blocks could not be found, {} blocks does not have metadata in this context.", + BlockRawHashes.size() - OutBlockDescriptions.size()); + if (AttemptFallback) + { + ZEN_CONSOLE("{} Attemping fallback options.", ErrorDescription); + std::vector<ChunkBlockDescription> AugmentedBlockDescriptions; + AugmentedBlockDescriptions.reserve(BlockRawHashes.size()); + std::vector<ChunkBlockDescription> FoundBlocks = Storage.FindBlocks(BuildId); + + for (const IoHash& BlockHash : BlockRawHashes) + { + if (auto It = std::find_if( + OutBlockDescriptions.begin(), + OutBlockDescriptions.end(), + [BlockHash](const ChunkBlockDescription& Description) { return Description.BlockHash == BlockHash; }); + It != OutBlockDescriptions.end()) + { + AugmentedBlockDescriptions.emplace_back(std::move(*It)); + } + else if (auto ListBlocksIt = std::find_if( + FoundBlocks.begin(), + FoundBlocks.end(), + [BlockHash](const ChunkBlockDescription& Description) { return Description.BlockHash == BlockHash; }); + ListBlocksIt != FoundBlocks.end()) + { + ZEN_CONSOLE("Found block {} via context find successfully", BlockHash); + AugmentedBlockDescriptions.emplace_back(std::move(*ListBlocksIt)); + } + else + { + IoBuffer BlockBuffer = Storage.GetBuildBlob(BuildId, BlockHash); + if (!BlockBuffer) + { + throw std::runtime_error(fmt::format("Block {} could not be found", BlockHash)); + } + IoHash BlockRawHash; + uint64_t BlockRawSize; + CompressedBuffer CompressedBlockBuffer = + CompressedBuffer::FromCompressed(SharedBuffer(std::move(BlockBuffer)), BlockRawHash, BlockRawSize); + if (!CompressedBlockBuffer) + { + throw std::runtime_error(fmt::format("Block {} is not a compressed buffer", BlockHash)); + } + + if (BlockRawHash != BlockHash) + { + throw std::runtime_error( + fmt::format("Block {} header has a mismatching raw hash {}", BlockHash, BlockRawHash)); + } + + CompositeBuffer DecompressedBlockBuffer = CompressedBlockBuffer.DecompressToComposite(); + if (!DecompressedBlockBuffer) + { + throw std::runtime_error(fmt::format("Block {} failed to decompress", BlockHash)); + } + + ChunkBlockDescription MissingChunkDescription = + GetChunkBlockDescription(DecompressedBlockBuffer.Flatten(), BlockHash); + AugmentedBlockDescriptions.emplace_back(std::move(MissingChunkDescription)); + } + } + OutBlockDescriptions.swap(AugmentedBlockDescriptions); + } + else + { + throw std::runtime_error(ErrorDescription); + } + } + + CalculateLocalChunkOrders(AbsoluteChunkOrders, + OutLooseChunkHashes, + LooseChunkRawSizes, + OutBlockDescriptions, + OutRemoteContent.ChunkedContent.ChunkHashes, + OutRemoteContent.ChunkedContent.ChunkRawSizes, + OutRemoteContent.ChunkedContent.ChunkOrders); + }; + + OutPartContents.resize(1); + ParseBuildPartManifest(Storage, + BuildId, + BuildParts[0].first, + BuildPartManifest, + OutPartContents[0], + OutBlockDescriptions, + OutLooseChunkHashes); + ChunkedFolderContent RemoteContent; + if (BuildParts.size() > 1) + { + std::vector<ChunkBlockDescription> OverlayBlockDescriptions; + std::vector<IoHash> OverlayLooseChunkHashes; + for (size_t PartIndex = 1; PartIndex < BuildParts.size(); PartIndex++) + { + const Oid& OverlayBuildPartId = BuildParts[PartIndex].first; + const std::string& OverlayBuildPartName = BuildParts[PartIndex].second; + Stopwatch GetOverlayBuildPartTimer; + CbObject OverlayBuildPartManifest = Storage.GetBuildPart(BuildId, OverlayBuildPartId); + ZEN_CONSOLE("GetBuildPart {} ('{}') took {}. Payload size: {}", + OverlayBuildPartId, + OverlayBuildPartName, + NiceLatencyNs(GetOverlayBuildPartTimer.GetElapsedTimeUs() * 1000), + NiceBytes(OverlayBuildPartManifest.GetSize())); + + ChunkedFolderContent OverlayPartContent; + std::vector<ChunkBlockDescription> OverlayPartBlockDescriptions; + std::vector<IoHash> OverlayPartLooseChunkHashes; + + ParseBuildPartManifest(Storage, + BuildId, + OverlayBuildPartId, + OverlayBuildPartManifest, + OverlayPartContent, + OverlayPartBlockDescriptions, + OverlayPartLooseChunkHashes); + OutPartContents.push_back(OverlayPartContent); + OverlayBlockDescriptions.insert(OverlayBlockDescriptions.end(), + OverlayPartBlockDescriptions.begin(), + OverlayPartBlockDescriptions.end()); + OverlayLooseChunkHashes.insert(OverlayLooseChunkHashes.end(), + OverlayPartLooseChunkHashes.begin(), + OverlayPartLooseChunkHashes.end()); + } + + RemoteContent = + MergeChunkedFolderContents(OutPartContents[0], std::span<const ChunkedFolderContent>(OutPartContents).subspan(1)); + { + tsl::robin_set<IoHash> AllBlockHashes; + for (const ChunkBlockDescription& Description : OutBlockDescriptions) + { + AllBlockHashes.insert(Description.BlockHash); + } + for (const ChunkBlockDescription& Description : OverlayBlockDescriptions) + { + if (!AllBlockHashes.contains(Description.BlockHash)) + { + AllBlockHashes.insert(Description.BlockHash); + OutBlockDescriptions.push_back(Description); + } + } + } + { + tsl::robin_set<IoHash> AllLooseChunkHashes(OutLooseChunkHashes.begin(), OutLooseChunkHashes.end()); + for (const IoHash& OverlayLooseChunkHash : OverlayLooseChunkHashes) + { + if (!AllLooseChunkHashes.contains(OverlayLooseChunkHash)) + { + AllLooseChunkHashes.insert(OverlayLooseChunkHash); + OutLooseChunkHashes.push_back(OverlayLooseChunkHash); + } + } + } + } + else + { + RemoteContent = OutPartContents[0]; + } + return RemoteContent; + } + + ChunkedFolderContent GetLocalContent(GetFolderContentStatistics& LocalFolderScanStats, + ChunkingStatistics& ChunkingStats, + const std::filesystem::path& Path, + ChunkingController& ChunkController, + std::atomic<bool>& AbortFlag) + { + ChunkedFolderContent LocalContent; + + auto IsAcceptedFolder = [ExcludeFolders = DefaultExcludeFolders](const std::string_view& RelativePath) -> bool { + for (const std::string_view& ExcludeFolder : ExcludeFolders) + { + if (RelativePath.starts_with(ExcludeFolder)) + { + if (RelativePath.length() == ExcludeFolder.length()) + { + return false; + } + else if (RelativePath[ExcludeFolder.length()] == '/') + { + return false; + } + } + } + return true; + }; + + auto IsAcceptedFile = [ExcludeExtensions = + DefaultExcludeExtensions](const std::string_view& RelativePath, uint64_t, uint32_t) -> bool { + for (const std::string_view& ExcludeExtension : ExcludeExtensions) + { + if (RelativePath.ends_with(ExcludeExtension)) + { + return false; + } + } + return true; + }; + + FolderContent CurrentLocalFolderContent = GetFolderContent( + LocalFolderScanStats, + Path, + std::move(IsAcceptedFolder), + std::move(IsAcceptedFile), + GetMediumWorkerPool(EWorkloadType::Burst), + UsePlainProgress ? 5000 : 200, + [&](bool, std::ptrdiff_t) { ZEN_DEBUG("Found {} files in '{}'...", LocalFolderScanStats.AcceptedFileCount.load(), Path); }, + AbortFlag); + if (AbortFlag) + { + return {}; + } + + FolderContent LocalFolderState; + + bool ScanContent = true; + std::vector<uint32_t> PathIndexesOufOfDate; + if (std::filesystem::is_regular_file(Path / ZenStateFilePath)) + { + try + { + Stopwatch ReadStateTimer; + CbObject CurrentStateObject = LoadCompactBinaryObject(Path / ZenStateFilePath).Object; + if (CurrentStateObject) + { + Oid CurrentBuildId; + std::vector<Oid> SavedBuildPartIds; + std::vector<std::string> SavedBuildPartsNames; + std::vector<ChunkedFolderContent> SavedPartContents; + if (ReadStateObject(CurrentStateObject, + CurrentBuildId, + SavedBuildPartIds, + SavedBuildPartsNames, + SavedPartContents, + LocalFolderState)) + { + if (!SavedPartContents.empty()) + { + if (SavedPartContents.size() == 1) + { + LocalContent = std::move(SavedPartContents[0]); + } + else + { + LocalContent = + MergeChunkedFolderContents(SavedPartContents[0], + std::span<const ChunkedFolderContent>(SavedPartContents).subspan(1)); + } + + if (!LocalFolderState.AreKnownFilesEqual(CurrentLocalFolderContent)) + { + std::vector<std::filesystem::path> DeletedPaths; + FolderContent UpdatedContent = GetUpdatedContent(LocalFolderState, CurrentLocalFolderContent, DeletedPaths); + if (!DeletedPaths.empty()) + { + LocalContent = DeletePathsFromChunkedContent(LocalContent, DeletedPaths); + } + + ZEN_CONSOLE("Updating state, {} local files deleted and {} local files updated", + DeletedPaths.size(), + UpdatedContent.Paths.size()); + if (UpdatedContent.Paths.size() > 0) + { + uint64_t ByteCountToScan = 0; + for (const uint64_t RawSize : UpdatedContent.RawSizes) + { + ByteCountToScan += RawSize; + } + ProgressBar ProgressBar(false); + FilteredRate FilteredBytesHashed; + FilteredBytesHashed.Start(); + ChunkedFolderContent UpdatedLocalContent = ChunkFolderContent( + ChunkingStats, + GetMediumWorkerPool(EWorkloadType::Burst), + Path, + UpdatedContent, + ChunkController, + UsePlainProgress ? 5000 : 200, + [&](bool, std::ptrdiff_t) { + FilteredBytesHashed.Update(ChunkingStats.BytesHashed.load()); + + ProgressBar.UpdateState( + {.Task = "Scanning files ", + .Details = fmt::format("{}/{} ({}/{}, {}B/s) files, {} ({}) chunks found", + ChunkingStats.FilesProcessed.load(), + UpdatedContent.Paths.size(), + NiceBytes(ChunkingStats.BytesHashed.load()), + NiceBytes(ByteCountToScan), + NiceNum(FilteredBytesHashed.GetCurrent()), + ChunkingStats.UniqueChunksFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load())), + .TotalCount = ByteCountToScan, + .RemainingCount = ByteCountToScan - ChunkingStats.BytesHashed.load()}, + false); + }, + AbortFlag); + if (AbortFlag) + { + return {}; + } + FilteredBytesHashed.Stop(); + ProgressBar.Finish(); + LocalContent = MergeChunkedFolderContents(LocalContent, {{UpdatedLocalContent}}); + } + } + else + { + ZEN_CONSOLE("Using cached local state"); + } + ZEN_CONSOLE("Read local state in {}", NiceLatencyNs(ReadStateTimer.GetElapsedTimeUs() * 1000)); + ScanContent = false; + } + } + } + } + catch (const std::exception& Ex) + { + ZEN_CONSOLE("Failed reading state file, falling back to scannning. Reason: {}", Ex.what()); + } + } + + if (ScanContent) + { + uint64_t ByteCountToScan = 0; + for (const uint64_t RawSize : CurrentLocalFolderContent.RawSizes) + { + ByteCountToScan += RawSize; + } + ProgressBar ProgressBar(false); + FilteredRate FilteredBytesHashed; + FilteredBytesHashed.Start(); + ChunkedFolderContent UpdatedLocalContent = ChunkFolderContent( + ChunkingStats, + GetMediumWorkerPool(EWorkloadType::Burst), + Path, + CurrentLocalFolderContent, + ChunkController, + UsePlainProgress ? 5000 : 200, + [&](bool, std::ptrdiff_t) { + FilteredBytesHashed.Update(ChunkingStats.BytesHashed.load()); + ProgressBar.UpdateState({.Task = "Scanning files ", + .Details = fmt::format("{}/{} ({}/{}, {}B/s) files, {} ({}) chunks found", + ChunkingStats.FilesProcessed.load(), + CurrentLocalFolderContent.Paths.size(), + NiceBytes(ChunkingStats.BytesHashed.load()), + ByteCountToScan, + NiceNum(FilteredBytesHashed.GetCurrent()), + ChunkingStats.UniqueChunksFound.load(), + NiceBytes(ChunkingStats.UniqueBytesFound.load())), + .TotalCount = ByteCountToScan, + .RemainingCount = (ByteCountToScan - ChunkingStats.BytesHashed.load())}, + false); + }, + AbortFlag); + FilteredBytesHashed.Stop(); + ProgressBar.Finish(); + + if (AbortFlag) + { + return {}; + } + } + return LocalContent; + } + + bool DownloadFolder(BuildStorage& Storage, + const Oid& BuildId, + const std::vector<Oid>& BuildPartIds, + std::span<const std::string> BuildPartNames, + const std::filesystem::path& Path, + bool AllowMultiparts, + bool WipeTargetFolder) + { + Stopwatch DownloadTimer; + std::atomic<bool> AbortFlag(false); + + const std::filesystem::path ZenTempFolder = Path / ZenTempFolderName; + CreateDirectories(ZenTempFolder); + auto _ = MakeGuard([&]() { + CleanDirectory(ZenTempFolder, {}); + std::filesystem::remove(ZenTempFolder); + }); + CreateDirectories(Path / ZenTempBlockFolderName); + CreateDirectories(Path / ZenTempChunkFolderName); + CreateDirectories(Path / ZenTempReuseFolderName); + + std::uint64_t PreferredMultipartChunkSize = 32u * 1024u * 1024u; + + std::vector<std::pair<Oid, std::string>> AllBuildParts = + ResolveBuildPartNames(Storage, BuildId, BuildPartIds, BuildPartNames, PreferredMultipartChunkSize); + + std::vector<ChunkedFolderContent> PartContents; + + std::unique_ptr<ChunkingController> ChunkController; + + std::vector<ChunkBlockDescription> BlockDescriptions; + std::vector<IoHash> LooseChunkHashes; + + ChunkedFolderContent RemoteContent = + GetRemoteContent(Storage, BuildId, AllBuildParts, ChunkController, PartContents, BlockDescriptions, LooseChunkHashes); + + const std::uint64_t LargeAttachmentSize = AllowMultiparts ? PreferredMultipartChunkSize * 4u : (std::uint64_t)-1; + if (!ChunkController) + { + ZEN_CONSOLE("Warning: Unspecified chunking algorith, using default"); + ChunkController = CreateBasicChunkingController(); + } + + GetFolderContentStatistics LocalFolderScanStats; + ChunkingStatistics ChunkingStats; + ChunkedFolderContent LocalContent; + if (std::filesystem::is_directory(Path)) + { + if (!WipeTargetFolder) + { + LocalContent = GetLocalContent(LocalFolderScanStats, ChunkingStats, Path, *ChunkController, AbortFlag); + } + } + else + { + CreateDirectories(Path); + } + if (AbortFlag.load()) + { + return true; + } + + auto CompareContent = [](const ChunkedFolderContent& Lsh, const ChunkedFolderContent& Rhs) { + tsl::robin_map<std::string, size_t> RhsPathToIndex; + const size_t RhsPathCount = Rhs.Paths.size(); + RhsPathToIndex.reserve(RhsPathCount); + for (size_t RhsPathIndex = 0; RhsPathIndex < RhsPathCount; RhsPathIndex++) + { + RhsPathToIndex.insert({Rhs.Paths[RhsPathIndex].generic_string(), RhsPathIndex}); + } + const size_t LhsPathCount = Lsh.Paths.size(); + for (size_t LhsPathIndex = 0; LhsPathIndex < LhsPathCount; LhsPathIndex++) + { + if (auto It = RhsPathToIndex.find(Lsh.Paths[LhsPathIndex].generic_string()); It != RhsPathToIndex.end()) + { + const size_t RhsPathIndex = It->second; + if ((Lsh.RawHashes[LhsPathIndex] != Rhs.RawHashes[RhsPathIndex]) || + (!FolderContent::AreFileAttributesEqual(Lsh.Attributes[LhsPathIndex], Rhs.Attributes[RhsPathIndex]))) + { + return false; + } + } + else + { + return false; + } + } + return true; + }; + + if (CompareContent(RemoteContent, LocalContent)) + { + ZEN_CONSOLE("Local state is identical to build to download. All done. Completed in {}.", + NiceLatencyNs(DownloadTimer.GetElapsedTimeUs() * 1000)); + } + else + { + ExtendableStringBuilder<128> SB; + for (const std::pair<Oid, std::string>& BuildPart : AllBuildParts) + { + SB.Append(fmt::format(" {} ({})", BuildPart.second, BuildPart.first)); + } + ZEN_CONSOLE("Downloading build {}, parts:{}", BuildId, SB.ToView()); + FolderContent LocalFolderState; + if (UpdateFolder(Storage, + BuildId, + Path, + LargeAttachmentSize, + PreferredMultipartChunkSize, + LocalContent, + RemoteContent, + BlockDescriptions, + LooseChunkHashes, + WipeTargetFolder, + AbortFlag, + LocalFolderState)) + { + AbortFlag = true; + } + if (!AbortFlag) + { + VerifyFolder(RemoteContent, Path, AbortFlag); + } + + Stopwatch WriteStateTimer; + CbObject StateObject = CreateStateObject(BuildId, AllBuildParts, PartContents, LocalFolderState); + + CreateDirectories((Path / ZenStateFilePath).parent_path()); + TemporaryFile::SafeWriteFile(Path / ZenStateFilePath, StateObject.GetView()); + ZEN_CONSOLE("Wrote local state in {}", NiceLatencyNs(WriteStateTimer.GetElapsedTimeUs() * 1000)); + +#if 0 + ExtendableStringBuilder<1024> SB; + CompactBinaryToJson(StateObject, SB); + WriteFile(Path / ZenStateFileJsonPath, IoBuffer(IoBuffer::Wrap, SB.Data(), SB.Size())); +#endif // 0 + + ZEN_CONSOLE("Downloaded build in {}.", NiceLatencyNs(DownloadTimer.GetElapsedTimeUs() * 1000)); + } + + return AbortFlag.load(); + } + + bool DiffFolders(const std::filesystem::path& BasePath, const std::filesystem::path& ComparePath, bool OnlyChunked) + { + std::atomic<bool> AbortFlag(false); + + ChunkedFolderContent BaseFolderContent; + ChunkedFolderContent CompareFolderContent; + + { + std::unique_ptr<ChunkingController> ChunkController = CreateBasicChunkingController(); + std::vector<std::string_view> ExcludeExtensions = DefaultExcludeExtensions; + if (OnlyChunked) + { + ExcludeExtensions.insert(ExcludeExtensions.end(), + DefaultChunkingExcludeExtensions.begin(), + DefaultChunkingExcludeExtensions.end()); + } + + auto IsAcceptedFolder = [ExcludeFolders = DefaultExcludeFolders](const std::string_view& RelativePath) -> bool { + for (const std::string_view& ExcludeFolder : ExcludeFolders) + { + if (RelativePath.starts_with(ExcludeFolder)) + { + if (RelativePath.length() == ExcludeFolder.length()) + { + return false; + } + else if (RelativePath[ExcludeFolder.length()] == '/') + { + return false; + } + } + } + return true; + }; + + auto IsAcceptedFile = [ExcludeExtensions](const std::string_view& RelativePath, uint64_t, uint32_t) -> bool { + for (const std::string_view& ExcludeExtension : ExcludeExtensions) + { + if (RelativePath.ends_with(ExcludeExtension)) + { + return false; + } + } + return true; + }; + + GetFolderContentStatistics BaseGetFolderContentStats; + ChunkingStatistics BaseChunkingStats; + BaseFolderContent = ScanAndChunkFolder(BaseGetFolderContentStats, + BaseChunkingStats, + BasePath, + IsAcceptedFolder, + IsAcceptedFile, + *ChunkController, + AbortFlag); + + GetFolderContentStatistics CompareGetFolderContentStats; + ChunkingStatistics CompareChunkingStats; + CompareFolderContent = ScanAndChunkFolder(CompareGetFolderContentStats, + CompareChunkingStats, + ComparePath, + IsAcceptedFolder, + IsAcceptedFile, + *ChunkController, + AbortFlag); + } + + std::vector<IoHash> AddedHashes; + std::vector<IoHash> RemovedHashes; + uint64_t RemovedSize = 0; + uint64_t AddedSize = 0; + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> BaseRawHashLookup; + for (size_t PathIndex = 0; PathIndex < BaseFolderContent.RawHashes.size(); PathIndex++) + { + const IoHash& RawHash = BaseFolderContent.RawHashes[PathIndex]; + BaseRawHashLookup.insert_or_assign(RawHash, PathIndex); + } + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> CompareRawHashLookup; + for (size_t PathIndex = 0; PathIndex < CompareFolderContent.RawHashes.size(); PathIndex++) + { + const IoHash& RawHash = CompareFolderContent.RawHashes[PathIndex]; + if (!BaseRawHashLookup.contains(RawHash)) + { + AddedHashes.push_back(RawHash); + AddedSize += CompareFolderContent.RawSizes[PathIndex]; + } + CompareRawHashLookup.insert_or_assign(RawHash, PathIndex); + } + for (uint32_t PathIndex = 0; PathIndex < BaseFolderContent.Paths.size(); PathIndex++) + { + const IoHash& RawHash = BaseFolderContent.RawHashes[PathIndex]; + if (!CompareRawHashLookup.contains(RawHash)) + { + RemovedHashes.push_back(RawHash); + RemovedSize += BaseFolderContent.RawSizes[PathIndex]; + } + } + + uint64_t BaseTotalRawSize = 0; + for (uint32_t PathIndex = 0; PathIndex < BaseFolderContent.Paths.size(); PathIndex++) + { + BaseTotalRawSize += BaseFolderContent.RawSizes[PathIndex]; + } + + double KeptPercent = BaseTotalRawSize > 0 ? (100.0 * (BaseTotalRawSize - RemovedSize)) / BaseTotalRawSize : 0; + + ZEN_CONSOLE("{} ({}) files removed, {} ({}) files added, {} ({} {:.1f}%) files kept", + RemovedHashes.size(), + NiceBytes(RemovedSize), + AddedHashes.size(), + NiceBytes(AddedSize), + BaseFolderContent.Paths.size() - RemovedHashes.size(), + NiceBytes(BaseTotalRawSize - RemovedSize), + KeptPercent); + + uint64_t CompareTotalRawSize = 0; + + uint64_t FoundChunkCount = 0; + uint64_t FoundChunkSize = 0; + uint64_t NewChunkCount = 0; + uint64_t NewChunkSize = 0; + const ChunkedContentLookup BaseFolderLookup = BuildChunkedContentLookup(BaseFolderContent); + for (uint32_t ChunkIndex = 0; ChunkIndex < CompareFolderContent.ChunkedContent.ChunkHashes.size(); ChunkIndex++) + { + const IoHash& ChunkHash = CompareFolderContent.ChunkedContent.ChunkHashes[ChunkIndex]; + if (BaseFolderLookup.ChunkHashToChunkIndex.contains(ChunkHash)) + { + FoundChunkCount++; + FoundChunkSize += CompareFolderContent.ChunkedContent.ChunkRawSizes[ChunkIndex]; + } + else + { + NewChunkCount++; + NewChunkSize += CompareFolderContent.ChunkedContent.ChunkRawSizes[ChunkIndex]; + } + CompareTotalRawSize += CompareFolderContent.ChunkedContent.ChunkRawSizes[ChunkIndex]; + } + + double FoundPercent = CompareTotalRawSize > 0 ? (100.0 * FoundChunkSize) / CompareTotalRawSize : 0; + double NewPercent = CompareTotalRawSize > 0 ? (100.0 * NewChunkSize) / CompareTotalRawSize : 0; + + ZEN_CONSOLE("Found {} ({} {:.1f}%) out of {} ({}) chunks in {} ({}) base chunks. Added {} ({} {:.1f}%) chunks.", + FoundChunkCount, + NiceBytes(FoundChunkSize), + FoundPercent, + CompareFolderContent.ChunkedContent.ChunkHashes.size(), + NiceBytes(CompareTotalRawSize), + BaseFolderContent.ChunkedContent.ChunkHashes.size(), + NiceBytes(BaseTotalRawSize), + NewChunkCount, + NiceBytes(NewChunkSize), + NewPercent); + + return false; + } + +} // namespace + +////////////////////////////////////////////////////////////////////////////////////////////////////// + +BuildsCommand::BuildsCommand() +{ + m_Options.add_options()("h,help", "Print help"); + + auto AddAuthOptions = [this](cxxopts::Options& Ops) { + Ops.add_option("", "", "system-dir", "Specify system root", cxxopts::value<std::filesystem::path>(m_SystemRootDir), "<systemdir>"); + + // Direct access token (may expire) + Ops.add_option("auth-token", + "", + "access-token", + "Cloud/Builds Storage access token", + cxxopts::value(m_AccessToken), + "<accesstoken>"); + Ops.add_option("auth-token", + "", + "access-token-env", + "Name of environment variable that holds the cloud/builds Storage access token", + cxxopts::value(m_AccessTokenEnv)->default_value(DefaultAccessTokenEnvVariableName), + "<envvariable>"); + Ops.add_option("auth-token", + "", + "access-token-path", + "Path to json file that holds the cloud/builds Storage access token", + cxxopts::value(m_AccessTokenPath), + "<filepath>"); + + // Auth manager token encryption + Ops.add_option("security", + "", + "encryption-aes-key", + "256 bit AES encryption key", + cxxopts::value<std::string>(m_EncryptionKey), + ""); + Ops.add_option("security", + "", + "encryption-aes-iv", + "128 bit AES encryption initialization vector", + cxxopts::value<std::string>(m_EncryptionIV), + ""); + + // OpenId acccess token + Ops.add_option("openid", + "", + "openid-provider-name", + "Open ID provider name", + cxxopts::value<std::string>(m_OpenIdProviderName), + "Default"); + Ops.add_option("openid", "", "openid-provider-url", "Open ID provider url", cxxopts::value<std::string>(m_OpenIdProviderUrl), ""); + Ops.add_option("openid", "", "openid-client-id", "Open ID client id", cxxopts::value<std::string>(m_OpenIdClientId), ""); + Ops.add_option("openid", + "", + "openid-refresh-token", + "Open ID refresh token", + cxxopts::value<std::string>(m_OpenIdRefreshToken), + ""); + + // OAuth acccess token + Ops.add_option("oauth", "", "oauth-url", "OAuth provier url", cxxopts::value<std::string>(m_OAuthUrl)->default_value(""), ""); + Ops.add_option("oauth", + "", + "oauth-clientid", + "OAuth client id", + cxxopts::value<std::string>(m_OAuthClientId)->default_value(""), + ""); + Ops.add_option("oauth", + "", + "oauth-clientsecret", + "OAuth client secret", + cxxopts::value<std::string>(m_OAuthClientSecret)->default_value(""), + ""); + }; + + auto AddCloudOptions = [this, &AddAuthOptions](cxxopts::Options& Ops) { + AddAuthOptions(Ops); + + Ops.add_option("cloud build", "", "url", "Cloud Builds URL", cxxopts::value(m_BuildsUrl), "<url>"); + Ops.add_option("cloud build", + "", + "assume-http2", + "Assume that the builds endpoint is a HTTP/2 endpoint skipping HTTP/1.1 upgrade handshake", + cxxopts::value(m_AssumeHttp2), + "<assumehttp2>"); + + Ops.add_option("cloud build", "", "namespace", "Builds Storage namespace", cxxopts::value(m_Namespace), "<namespace>"); + Ops.add_option("cloud build", "", "bucket", "Builds Storage bucket", cxxopts::value(m_Bucket), "<bucket>"); + }; + + auto AddFileOptions = [this](cxxopts::Options& Ops) { + Ops.add_option("filestorage", "", "storage-path", "Builds Storage Path", cxxopts::value(m_StoragePath), "<storagepath>"); + Ops.add_option("filestorage", + "", + "json-metadata", + "Write build, part and block metadata as .json files in addition to .cb files", + cxxopts::value(m_WriteMetadataAsJson), + "<jsonmetadata>"); + }; + + auto AddOutputOptions = [this](cxxopts::Options& Ops) { + Ops.add_option("output", "", "plain-progress", "Show progress using plain output", cxxopts::value(m_PlainProgress), "<progress>"); + Ops.add_option("output", "", "verbose", "Enable verbose console output", cxxopts::value(m_Verbose), "<verbose>"); + }; + + m_Options.add_option("", "v", "verb", "Verb for build - list, upload, download, diff", cxxopts::value(m_Verb), "<verb>"); + m_Options.parse_positional({"verb"}); + m_Options.positional_help("verb"); + + // list + AddCloudOptions(m_ListOptions); + AddFileOptions(m_ListOptions); + AddOutputOptions(m_ListOptions); + m_ListOptions.add_options()("h,help", "Print help"); + + // upload + AddCloudOptions(m_UploadOptions); + AddFileOptions(m_UploadOptions); + AddOutputOptions(m_UploadOptions); + m_UploadOptions.add_options()("h,help", "Print help"); + m_UploadOptions.add_option("", "l", "local-path", "Root file system folder for build", cxxopts::value(m_Path), "<local-path>"); + m_UploadOptions.add_option("", + "", + "create-build", + "Set to true to create the containing build, if unset a builds-id must be given and the build already exist", + cxxopts::value(m_CreateBuild), + "<id>"); + m_UploadOptions.add_option("", "", "build-id", "Build Id", cxxopts::value(m_BuildId), "<id>"); + m_UploadOptions.add_option("", + "", + "build-part-id", + "Build part Id, if not given it will be auto generated", + cxxopts::value(m_BuildPartId), + "<id>"); + m_UploadOptions.add_option("", + "", + "build-part-name", + "Name of the build part, if not given it will be be named after the directory name at end of local-path", + cxxopts::value(m_BuildPartName), + "<name>"); + m_UploadOptions.add_option("", + "", + "metadata-path", + "Path to json file that holds the metadata for the build. Requires the create-build option to be set", + cxxopts::value(m_BuildMetadataPath), + "<metadata-path>"); + m_UploadOptions.add_option( + "", + "", + "metadata", + "Key-value pairs separated by ';' with build meta data. (key1=value1;key2=value2). Requires the create-build option to be set", + cxxopts::value(m_BuildMetadata), + "<metadata>"); + m_UploadOptions.add_option("", "", "clean", "Ignore existing blocks", cxxopts::value(m_Clean), "<clean>"); + m_UploadOptions.add_option("", + "", + "block-min-reuse", + "Percent of an existing block that must be relevant for it to be resused. Defaults to 85.", + cxxopts::value(m_BlockReuseMinPercentLimit), + "<minreuse>"); + m_UploadOptions.add_option("", + "", + "allow-multipart", + "Allow large attachments to be transfered using multipart protocol. Defaults to true.", + cxxopts::value(m_AllowMultiparts), + "<allowmultipart>"); + m_UploadOptions.add_option("", + "", + "manifest-path", + "Path to a text file with one line of <local path>[TAB]<modification date> per file to include.", + cxxopts::value(m_ManifestPath), + "<manifestpath>"); + + m_UploadOptions.parse_positional({"local-path", "build-id"}); + m_UploadOptions.positional_help("local-path build-id"); + + // download + AddCloudOptions(m_DownloadOptions); + AddFileOptions(m_DownloadOptions); + AddOutputOptions(m_DownloadOptions); + m_DownloadOptions.add_options()("h,help", "Print help"); + m_DownloadOptions.add_option("", "l", "local-path", "Root file system folder for build", cxxopts::value(m_Path), "<local-path>"); + m_DownloadOptions.add_option("", "", "build-id", "Build Id", cxxopts::value(m_BuildId), "<id>"); + m_DownloadOptions.add_option( + "", + "", + "build-part-id", + "Build part Ids list separated by ',', if no build-part-ids or build-part-names are given all parts will be downloaded", + cxxopts::value(m_BuildPartIds), + "<id>"); + m_DownloadOptions.add_option( + "", + "", + "build-part-name", + "Name of the build parts list separated by ',', if no build-part-ids or build-part-names are given all parts will be downloaded", + cxxopts::value(m_BuildPartNames), + "<name>"); + m_DownloadOptions + .add_option("", "", "clean", "Delete all data in target folder before downloading", cxxopts::value(m_Clean), "<clean>"); + m_DownloadOptions.add_option("", + "", + "allow-multipart", + "Allow large attachments to be transfered using multipart protocol. Defaults to true.", + cxxopts::value(m_AllowMultiparts), + "<allowmultipart>"); + m_DownloadOptions.parse_positional({"local-path", "build-id", "build-part-name"}); + m_DownloadOptions.positional_help("local-path build-id build-part-name"); + + AddOutputOptions(m_DiffOptions); + m_DiffOptions.add_options()("h,help", "Print help"); + m_DiffOptions.add_option("", "l", "local-path", "Root file system folder used as base", cxxopts::value(m_Path), "<local-path>"); + m_DiffOptions.add_option("", "c", "compare-path", "Root file system folder used as diff", cxxopts::value(m_DiffPath), "<diff-path>"); + m_DiffOptions.add_option("", + "", + "only-chunked", + "Skip files from diff summation that are not processed with chunking", + cxxopts::value(m_OnlyChunked), + "<only-chunked>"); + m_DiffOptions.parse_positional({"local-path", "compare-path"}); + m_DiffOptions.positional_help("local-path compare-path"); + + AddCloudOptions(m_TestOptions); + AddFileOptions(m_TestOptions); + AddOutputOptions(m_TestOptions); + m_TestOptions.add_options()("h,help", "Print help"); + m_TestOptions.add_option("", "l", "local-path", "Root file system folder used as base", cxxopts::value(m_Path), "<local-path>"); + m_TestOptions.parse_positional({"local-path"}); + m_TestOptions.positional_help("local-path"); + + AddCloudOptions(m_FetchBlobOptions); + AddFileOptions(m_FetchBlobOptions); + AddOutputOptions(m_FetchBlobOptions); + m_FetchBlobOptions.add_option("", "", "build-id", "Build Id", cxxopts::value(m_BuildId), "<id>"); + m_FetchBlobOptions + .add_option("", "", "blob-hash", "IoHash in hex form identifying the blob to download", cxxopts::value(m_BlobHash), "<blob-hash>"); + m_FetchBlobOptions.parse_positional({"build-id", "blob-hash"}); + m_FetchBlobOptions.positional_help("build-id blob-hash"); + + AddCloudOptions(m_ValidateBuildPartOptions); + AddFileOptions(m_ValidateBuildPartOptions); + AddOutputOptions(m_ValidateBuildPartOptions); + m_ValidateBuildPartOptions.add_option("", "", "build-id", "Build Id", cxxopts::value(m_BuildId), "<id>"); + m_ValidateBuildPartOptions.add_option("", + "", + "build-part-id", + "Build part Id, if not given it will be auto generated", + cxxopts::value(m_BuildPartId), + "<id>"); + m_ValidateBuildPartOptions.add_option( + "", + "", + "build-part-name", + "Name of the build part, if not given it will be be named after the directory name at end of local-path", + cxxopts::value(m_BuildPartName), + "<name>"); + m_ValidateBuildPartOptions.parse_positional({"build-id", "build-part-id"}); + m_ValidateBuildPartOptions.positional_help("build-id build-part-id"); +} + +BuildsCommand::~BuildsCommand() = default; + +int +BuildsCommand::Run(const ZenCliOptions& GlobalOptions, int argc, char** argv) +{ + ZEN_UNUSED(GlobalOptions); + + using namespace std::literals; + + std::vector<char*> SubCommandArguments; + cxxopts::Options* SubOption = nullptr; + int ParentCommandArgCount = GetSubCommand(m_Options, argc, argv, m_SubCommands, SubOption, SubCommandArguments); + if (!ParseOptions(ParentCommandArgCount, argv)) + { + return 0; + } + + if (SubOption == nullptr) + { + throw zen::OptionParseException("command verb is missing"); + } + + if (!ParseOptions(*SubOption, gsl::narrow<int>(SubCommandArguments.size()), SubCommandArguments.data())) + { + return 0; + } + + auto ParseStorageOptions = [&]() { + if (!m_BuildsUrl.empty()) + { + if (!m_StoragePath.empty()) + { + throw zen::OptionParseException(fmt::format("url is not compatible with the storage-path option\n{}", m_Options.help())); + } + if (m_Namespace.empty() || m_Bucket.empty()) + { + throw zen::OptionParseException( + fmt::format("namespace and bucket options are required for url option\n{}", m_Options.help())); + } + } + }; + + std::unique_ptr<AuthMgr> Auth; + HttpClientSettings ClientSettings{.AssumeHttp2 = m_AssumeHttp2, .AllowResume = true, .RetryCount = 2}; + + auto CreateAuthMgr = [&]() { + if (!Auth) + { + std::filesystem::path DataRoot = m_SystemRootDir.empty() ? PickDefaultSystemRootDirectory() : m_SystemRootDir; + + if (m_EncryptionKey.empty()) + { + m_EncryptionKey = "abcdefghijklmnopqrstuvxyz0123456"; + ZEN_CONSOLE("Warning: Using default encryption key"); + } + + if (m_EncryptionIV.empty()) + { + m_EncryptionIV = "0123456789abcdef"; + ZEN_CONSOLE("Warning: Using default encryption initialization vector"); + } + + AuthConfig AuthMgrConfig = {.RootDirectory = DataRoot / "auth", + .EncryptionKey = AesKey256Bit::FromString(m_EncryptionKey), + .EncryptionIV = AesIV128Bit::FromString(m_EncryptionIV)}; + if (!AuthMgrConfig.EncryptionKey.IsValid()) + { + throw zen::OptionParseException("Invalid AES encryption key"); + } + if (!AuthMgrConfig.EncryptionIV.IsValid()) + { + throw zen::OptionParseException("Invalid AES initialization vector"); + } + Auth = AuthMgr::Create(AuthMgrConfig); + } + }; + + auto ParseAuthOptions = [&]() { + if (!m_OpenIdProviderUrl.empty() && !m_OpenIdClientId.empty()) + { + CreateAuthMgr(); + std::string ProviderName = m_OpenIdProviderName.empty() ? "Default" : m_OpenIdProviderName; + Auth->AddOpenIdProvider({.Name = ProviderName, .Url = m_OpenIdProviderUrl, .ClientId = m_OpenIdClientId}); + if (!m_OpenIdRefreshToken.empty()) + { + Auth->AddOpenIdToken({.ProviderName = ProviderName, .RefreshToken = m_OpenIdRefreshToken}); + } + } + + if (!m_AccessToken.empty()) + { + ClientSettings.AccessTokenProvider = httpclientauth::CreateFromStaticToken(m_AccessToken); + } + else if (!m_AccessTokenPath.empty()) + { + std::string ResolvedAccessToken = ReadAccessTokenFromFile(m_AccessTokenPath); + if (!ResolvedAccessToken.empty()) + { + ClientSettings.AccessTokenProvider = httpclientauth::CreateFromStaticToken(ResolvedAccessToken); + } + } + else if (!m_AccessTokenEnv.empty()) + { + std::string ResolvedAccessToken = GetEnvVariable(m_AccessTokenEnv); + if (!ResolvedAccessToken.empty()) + { + ClientSettings.AccessTokenProvider = httpclientauth::CreateFromStaticToken(ResolvedAccessToken); + } + } + else if (!m_OAuthUrl.empty()) + { + ClientSettings.AccessTokenProvider = httpclientauth::CreateFromOAuthClientCredentials( + {.Url = m_OAuthUrl, .ClientId = m_OAuthClientId, .ClientSecret = m_OAuthClientSecret}); + } + else if (!m_OpenIdProviderName.empty()) + { + CreateAuthMgr(); + ClientSettings.AccessTokenProvider = httpclientauth::CreateFromOpenIdProvider(*Auth, m_OpenIdProviderName); + } + else + { + CreateAuthMgr(); + ClientSettings.AccessTokenProvider = httpclientauth::CreateFromDefaultOpenIdProvider(*Auth); + } + + if (!m_BuildsUrl.empty() && !ClientSettings.AccessTokenProvider) + { + ZEN_CONSOLE("Warning: No auth provider given, attempting operation without credentials."); + } + }; + + auto ParseOutputOptions = [&]() { + IsVerbose = m_Verbose; + UsePlainProgress = IsVerbose || m_PlainProgress; + }; + ParseOutputOptions(); + + if (SubOption == &m_ListOptions) + { + ParseStorageOptions(); + ParseAuthOptions(); + + HttpClient Http(m_BuildsUrl, ClientSettings); + + CbObjectWriter QueryWriter; + QueryWriter.BeginObject("query"); + { + // QueryWriter.BeginObject("platform"); + // { + // QueryWriter.AddString("$eq", "Windows"); + // } + // QueryWriter.EndObject(); // changelist + } + QueryWriter.EndObject(); // query + + BuildStorage::Statistics StorageStats; + std::unique_ptr<BuildStorage> Storage; + if (!m_BuildsUrl.empty()) + { + ZEN_CONSOLE("Querying builds in cloud endpoint '{}'. SessionId: '{}'. Namespace '{}', Bucket '{}'", + m_BuildsUrl, + Http.GetSessionId(), + m_Namespace, + m_Bucket); + Storage = CreateJupiterBuildStorage(Log(), Http, StorageStats, m_Namespace, m_Bucket, std::filesystem::path{}); + } + else if (!m_StoragePath.empty()) + { + ZEN_CONSOLE("Querying builds in folder '{}'.", m_StoragePath); + Storage = CreateFileBuildStorage(m_StoragePath, StorageStats, false); // , .0015, 0.00004 + } + else + { + throw zen::OptionParseException(fmt::format("Storage option is missing\n{}", m_UploadOptions.help())); + } + + CbObject Response = Storage->ListBuilds(QueryWriter.Save()); + ExtendableStringBuilder<1024> SB; + CompactBinaryToJson(Response.GetView(), SB); + ZEN_CONSOLE("{}", SB.ToView()); + return 0; + } + + if (SubOption == &m_UploadOptions) + { + ParseStorageOptions(); + ParseAuthOptions(); + + HttpClient Http(m_BuildsUrl, ClientSettings); + + if (m_Path.empty()) + { + throw zen::OptionParseException(fmt::format("local-path is required\n{}", m_UploadOptions.help())); + } + + if (m_CreateBuild) + { + if (m_BuildMetadataPath.empty() && m_BuildMetadata.empty()) + { + throw zen::OptionParseException(fmt::format("Options for builds target are missing\n{}", m_UploadOptions.help())); + } + if (!m_BuildMetadataPath.empty() && !m_BuildMetadata.empty()) + { + throw zen::OptionParseException(fmt::format("Conflicting options for builds target\n{}", m_UploadOptions.help())); + } + } + else + { + if (!m_BuildMetadataPath.empty()) + { + throw zen::OptionParseException( + fmt::format("metadata-path option is only valid if creating a build\n{}", m_UploadOptions.help())); + } + if (!m_BuildMetadata.empty()) + { + throw zen::OptionParseException( + fmt::format("metadata option is only valid if creating a build\n{}", m_UploadOptions.help())); + } + } + + if (m_BuildPartName.empty()) + { + m_BuildPartName = m_Path.filename().string(); + } + + const bool GeneratedBuildId = m_BuildId.empty(); + if (GeneratedBuildId) + { + m_BuildId = Oid::NewOid().ToString(); + } + else if (m_BuildId.length() != Oid::StringLength) + { + throw zen::OptionParseException(fmt::format("Invalid build id\n{}", m_UploadOptions.help())); + } + else if (Oid::FromHexString(m_BuildId) == Oid::Zero) + { + throw zen::OptionParseException(fmt::format("Invalid build id\n{}", m_UploadOptions.help())); + } + + const bool GeneratedBuildPartId = m_BuildPartId.empty(); + if (GeneratedBuildPartId) + { + m_BuildPartId = Oid::NewOid().ToString(); + } + else if (m_BuildPartId.length() != Oid::StringLength) + { + throw zen::OptionParseException(fmt::format("Invalid build id\n{}", m_UploadOptions.help())); + } + else if (Oid::FromHexString(m_BuildPartId) == Oid::Zero) + { + throw zen::OptionParseException(fmt::format("Invalid build part id\n{}", m_UploadOptions.help())); + } + + BuildStorage::Statistics StorageStats; + const Oid BuildId = Oid::FromHexString(m_BuildId); + const Oid BuildPartId = Oid::FromHexString(m_BuildPartId); + std::unique_ptr<BuildStorage> Storage; + std::string StorageName; + if (!m_BuildsUrl.empty()) + { + ZEN_CONSOLE("Uploading '{}' from '{}' to cloud endpoint '{}'. SessionId: '{}'. Namespace '{}', Bucket '{}', {}BuildId '{}'", + m_BuildPartName, + m_Path, + m_BuildsUrl, + Http.GetSessionId(), + m_Namespace, + m_Bucket, + GeneratedBuildId ? "Generated " : "", + BuildId); + CreateDirectories(m_Path / ZenTempStorageFolderName); + Storage = CreateJupiterBuildStorage(Log(), Http, StorageStats, m_Namespace, m_Bucket, m_Path / ZenTempStorageFolderName); + StorageName = "Cloud DDC"; + } + else if (!m_StoragePath.empty()) + { + ZEN_CONSOLE("Uploading '{}' from '{}' to folder '{}'. {}BuildId '{}'", + m_BuildPartName, + m_Path, + m_StoragePath, + GeneratedBuildId ? "Generated " : "", + BuildId); + Storage = CreateFileBuildStorage(m_StoragePath, StorageStats, m_WriteMetadataAsJson); // , .0015, 0.00004 + StorageName = fmt::format("Disk {}", m_StoragePath.stem()); + } + else + { + throw zen::OptionParseException(fmt::format("Storage option is missing\n{}", m_UploadOptions.help())); + } + + CbObject MetaData; + if (m_CreateBuild) + { + if (!m_BuildMetadataPath.empty()) + { + std::filesystem::path MetadataPath(m_BuildMetadataPath); + IoBuffer MetaDataJson = ReadFile(MetadataPath).Flatten(); + std::string_view Json(reinterpret_cast<const char*>(MetaDataJson.GetData()), MetaDataJson.GetSize()); + std::string JsonError; + MetaData = LoadCompactBinaryFromJson(Json, JsonError).AsObject(); + if (!JsonError.empty()) + { + throw std::runtime_error( + fmt::format("build metadata file '{}' is malformed. Reason: '{}'", m_BuildMetadataPath, JsonError)); + } + } + if (!m_BuildMetadata.empty()) + { + CbObjectWriter MetaDataWriter(1024); + ForEachStrTok(m_BuildMetadata, ';', [&](std::string_view Pair) { + size_t SplitPos = Pair.find('='); + if (SplitPos == std::string::npos || SplitPos == 0) + { + throw std::runtime_error(fmt::format("build metadata key-value pair '{}' is malformed", Pair)); + } + MetaDataWriter.AddString(Pair.substr(0, SplitPos), Pair.substr(SplitPos + 1)); + return true; + }); + MetaData = MetaDataWriter.Save(); + } + } + + bool Aborted = UploadFolder(*Storage, + BuildId, + BuildPartId, + m_BuildPartName, + m_Path, + m_ManifestPath, + m_BlockReuseMinPercentLimit, + m_AllowMultiparts, + MetaData, + m_CreateBuild, + m_Clean); + if (Aborted) + { + ZEN_CONSOLE("Upload failed."); + } + + if (false) + { + ZEN_CONSOLE( + "{}:\n" + "Read: {}\n" + "Write: {}\n" + "Requests: {}\n" + "Avg Request Time: {}\n" + "Avg I/O Time: {}", + StorageName, + NiceBytes(StorageStats.TotalBytesRead.load()), + NiceBytes(StorageStats.TotalBytesWritten.load()), + StorageStats.TotalRequestCount.load(), + StorageStats.TotalExecutionTimeUs.load() > 0 + ? NiceTimeSpanMs(StorageStats.TotalExecutionTimeUs.load() / 1000 / StorageStats.TotalRequestCount.load()) + : 0, + StorageStats.TotalRequestCount.load() > 0 + ? NiceTimeSpanMs(StorageStats.TotalRequestTimeUs.load() / 1000 / StorageStats.TotalRequestCount.load()) + : 0); + } + return Aborted ? 11 : 0; + } + + if (SubOption == &m_DownloadOptions) + { + ParseStorageOptions(); + ParseAuthOptions(); + + HttpClient Http(m_BuildsUrl, ClientSettings); + + if (m_Path.empty()) + { + throw zen::OptionParseException(fmt::format("local-path is required\n{}", m_DownloadOptions.help())); + } + if (m_BuildId.empty()) + { + throw zen::OptionParseException(fmt::format("build-id is required\n{}", m_DownloadOptions.help())); + } + Oid BuildId = Oid::TryFromHexString(m_BuildId); + if (BuildId == Oid::Zero) + { + throw zen::OptionParseException(fmt::format("build-id is invalid\n{}", m_DownloadOptions.help())); + } + + if (!m_BuildPartName.empty() && !m_BuildPartId.empty()) + { + throw zen::OptionParseException(fmt::format("build-part-id conflicts with build-part-name\n{}", m_DownloadOptions.help())); + } + + std::vector<Oid> BuildPartIds; + for (const std::string& BuildPartId : m_BuildPartIds) + { + BuildPartIds.push_back(Oid::TryFromHexString(BuildPartId)); + if (BuildPartIds.back() == Oid::Zero) + { + throw zen::OptionParseException(fmt::format("build-part-id '{}' is invalid\n{}", BuildPartId, m_DownloadOptions.help())); + } + } + + BuildStorage::Statistics StorageStats; + std::unique_ptr<BuildStorage> Storage; + std::string StorageName; + if (!m_BuildsUrl.empty()) + { + ZEN_CONSOLE("Downloading '{}' to '{}' from cloud endpoint {}. SessionId: '{}'. Namespace '{}', Bucket '{}', BuildId '{}'", + BuildId, + m_Path, + m_BuildsUrl, + Http.GetSessionId(), + m_Namespace, + m_Bucket, + BuildId); + CreateDirectories(m_Path / ZenTempStorageFolderName); + Storage = CreateJupiterBuildStorage(Log(), Http, StorageStats, m_Namespace, m_Bucket, m_Path / ZenTempStorageFolderName); + StorageName = "Cloud DDC"; + } + else if (!m_StoragePath.empty()) + { + ZEN_CONSOLE("Downloading '{}' to '{}' from folder {}. BuildId '{}'", BuildId, m_Path, m_StoragePath, BuildId); + Storage = CreateFileBuildStorage(m_StoragePath, StorageStats, false); // , .0015, 0.00004 + StorageName = fmt::format("Disk {}", m_StoragePath.stem()); + } + else + { + throw zen::OptionParseException(fmt::format("Storage option is missing\n{}", m_UploadOptions.help())); + } + + bool Aborted = DownloadFolder(*Storage, BuildId, BuildPartIds, m_BuildPartNames, m_Path, m_AllowMultiparts, m_Clean); + if (Aborted) + { + ZEN_CONSOLE("Download failed."); + } + if (false) + { + ZEN_CONSOLE( + "{}:\n" + "Read: {}\n" + "Write: {}\n" + "Requests: {}\n" + "Avg Request Time: {}\n" + "Avg I/O Time: {}", + StorageName, + NiceBytes(StorageStats.TotalBytesRead.load()), + NiceBytes(StorageStats.TotalBytesWritten.load()), + StorageStats.TotalRequestCount.load(), + StorageStats.TotalExecutionTimeUs.load() > 0 + ? NiceTimeSpanMs(StorageStats.TotalExecutionTimeUs.load() / 1000 / StorageStats.TotalRequestCount.load()) + : 0, + StorageStats.TotalRequestCount.load() > 0 + ? NiceTimeSpanMs(StorageStats.TotalRequestTimeUs.load() / 1000 / StorageStats.TotalRequestCount.load()) + : 0); + } + + return Aborted ? 11 : 0; + } + if (SubOption == &m_DiffOptions) + { + if (m_Path.empty()) + { + throw zen::OptionParseException(fmt::format("local-path is required\n{}", m_DownloadOptions.help())); + } + if (m_DiffPath.empty()) + { + throw zen::OptionParseException(fmt::format("compare-path is required\n{}", m_DownloadOptions.help())); + } + bool Aborted = DiffFolders(m_Path, m_DiffPath, m_OnlyChunked); + return Aborted ? 11 : 0; + } + + if (SubOption == &m_TestOptions) + { + ParseStorageOptions(); + ParseAuthOptions(); + + HttpClient Http(m_BuildsUrl, ClientSettings); + + if (m_Path.empty()) + { + throw zen::OptionParseException(fmt::format("local-path is required\n{}", m_DownloadOptions.help())); + } + + m_BuildId = Oid::NewOid().ToString(); + m_BuildPartName = m_Path.filename().string(); + m_BuildPartId = Oid::NewOid().ToString(); + m_CreateBuild = true; + + BuildStorage::Statistics StorageStats; + const Oid BuildId = Oid::FromHexString(m_BuildId); + const Oid BuildPartId = Oid::FromHexString(m_BuildPartId); + std::unique_ptr<BuildStorage> Storage; + std::string StorageName; + + if (m_BuildsUrl.empty() && m_StoragePath.empty()) + { + m_StoragePath = GetRunningExecutablePath().parent_path() / ".tmpstore"; + CreateDirectories(m_StoragePath); + CleanDirectory(m_StoragePath); + } + auto _ = MakeGuard([&]() { + if (m_BuildsUrl.empty() && m_StoragePath.empty()) + { + DeleteDirectories(m_StoragePath); + } + }); + + if (!m_BuildsUrl.empty()) + { + ZEN_CONSOLE("Using '{}' to '{}' from cloud endpoint {}. SessionId: '{}'. Namespace '{}', Bucket '{}', BuildId '{}'", + m_BuildPartName.empty() ? m_BuildPartId : m_BuildPartName, + m_Path, + m_BuildsUrl, + Http.GetSessionId(), + m_Namespace, + m_Bucket, + BuildId); + CreateDirectories(m_Path / ZenTempStorageFolderName); + Storage = CreateJupiterBuildStorage(Log(), Http, StorageStats, m_Namespace, m_Bucket, m_Path / ZenTempStorageFolderName); + StorageName = "Cloud DDC"; + } + else if (!m_StoragePath.empty()) + { + ZEN_CONSOLE("Using '{}' to '{}' from folder {}. BuildId '{}'", + m_BuildPartName.empty() ? m_BuildPartId : m_BuildPartName, + m_Path, + m_StoragePath, + BuildId); + Storage = CreateFileBuildStorage(m_StoragePath, StorageStats, false); // , .0015, 0.00004 + StorageName = fmt::format("Disk {}", m_StoragePath.stem()); + } + else + { + throw zen::OptionParseException(fmt::format("Storage option is missing\n{}", m_UploadOptions.help())); + } + + auto MakeMetaData = [](const Oid& BuildId) -> CbObject { + CbObjectWriter BuildMetaDataWriter; + { + const uint32_t CL = BuildId.OidBits[2]; + BuildMetaDataWriter.AddString("name", fmt::format("++Test+Main-CL-{}", CL)); + BuildMetaDataWriter.AddString("branch", "ZenTestBuild"); + BuildMetaDataWriter.AddString("baselineBranch", "ZenTestBuild"); + BuildMetaDataWriter.AddString("platform", "Windows"); + BuildMetaDataWriter.AddString("project", "Test"); + BuildMetaDataWriter.AddInteger("changelist", CL); + BuildMetaDataWriter.AddString("buildType", "test-folder"); + } + return BuildMetaDataWriter.Save(); + }; + CbObject MetaData = MakeMetaData(Oid::TryFromHexString(m_BuildId)); + { + ExtendableStringBuilder<256> SB; + CompactBinaryToJson(MetaData, SB); + ZEN_CONSOLE("Upload Build {}, Part {} ({})\n{}", m_BuildId, BuildPartId, m_BuildPartName, SB.ToView()); + } + + bool Aborted = UploadFolder(*Storage, + BuildId, + BuildPartId, + m_BuildPartName, + m_Path, + {}, + m_BlockReuseMinPercentLimit, + m_AllowMultiparts, + MetaData, + m_CreateBuild, + m_Clean); + if (Aborted) + { + ZEN_CONSOLE("Upload failed."); + return 11; + } + + const std::filesystem::path DownloadPath = m_Path.parent_path() / (m_BuildPartName + "_download"); + ZEN_CONSOLE("\nDownload Build {}, Part {} ({}) to '{}'", BuildId, BuildPartId, m_BuildPartName, DownloadPath); + Aborted = DownloadFolder(*Storage, BuildId, {BuildPartId}, {}, DownloadPath, m_AllowMultiparts, true); + if (Aborted) + { + ZEN_CONSOLE("Download failed."); + return 11; + } + + ZEN_CONSOLE("\nRe-download Build {}, Part {} ({}) to '{}' (identical target)", BuildId, BuildPartId, m_BuildPartName, DownloadPath); + Aborted = DownloadFolder(*Storage, BuildId, {BuildPartId}, {}, DownloadPath, m_AllowMultiparts, false); + if (Aborted) + { + ZEN_CONSOLE("Re-download failed. (identical target)"); + return 11; + } + + auto ScrambleDir = [](const std::filesystem::path& Path) { + ZEN_CONSOLE("\nScrambling '{}'", Path); + Stopwatch Timer; + DirectoryContent DownloadContent; + GetDirectoryContent( + Path, + DirectoryContentFlags::Recursive | DirectoryContentFlags::IncludeFiles | DirectoryContentFlags::IncludeFileSizes, + DownloadContent); + auto IsAcceptedFolder = [ExcludeFolders = DefaultExcludeFolders, Path](const std::filesystem::path& AbsolutePath) -> bool { + std::string RelativePath = std::filesystem::relative(AbsolutePath, Path).generic_string(); + for (const std::string_view& ExcludeFolder : ExcludeFolders) + { + if (RelativePath.starts_with(ExcludeFolder)) + { + if (RelativePath.length() == ExcludeFolder.length()) + { + return false; + } + else if (RelativePath[ExcludeFolder.length()] == '/') + { + return false; + } + } + } + return true; + }; + + std::atomic<bool> AbortFlag = false; + ParallellWork Work(AbortFlag); + + uint32_t Randomizer = 0; + auto FileSizeIt = DownloadContent.FileSizes.begin(); + for (const std::filesystem::path& FilePath : DownloadContent.Files) + { + if (IsAcceptedFolder(FilePath)) + { + uint32_t Case = (Randomizer++) % 7; + switch (Case) + { + case 0: + { + uint64_t SourceSize = *FileSizeIt; + if (SourceSize > 0) + { + Work.ScheduleWork( + GetMediumWorkerPool(EWorkloadType::Burst), + [SourceSize, FilePath](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + IoBuffer Scrambled(SourceSize); + { + IoBuffer Source = IoBufferBuilder::MakeFromFile(FilePath); + Scrambled.GetMutableView().CopyFrom( + Source.GetView().Mid(SourceSize / 3, SourceSize / 3)); + Scrambled.GetMutableView() + .Mid(SourceSize / 3) + .CopyFrom(Source.GetView().Mid(0, SourceSize / 3)); + Scrambled.GetMutableView() + .Mid((SourceSize / 3) * 2) + .CopyFrom(Source.GetView().Mid(SourceSize / 2, SourceSize / 3)); + } + bool IsReadOnly = SetFileReadOnly(FilePath, false); + WriteFile(FilePath, Scrambled); + if (IsReadOnly) + { + SetFileReadOnly(FilePath, true); + } + } + }, + [FilePath](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed scrambling file {}. Reason: {}", FilePath, Ex.what()); + AbortFlag = true; + }); + } + } + break; + case 1: + std::filesystem::remove(FilePath); + break; + default: + break; + } + } + FileSizeIt++; + } + Work.Wait(5000, [&](bool IsAborted, std::ptrdiff_t PendingWork) { + ZEN_UNUSED(IsAborted); + ZEN_CONSOLE("Scrambling files, {} remaining", PendingWork); + }); + ZEN_ASSERT(!AbortFlag.load()); + ZEN_CONSOLE("Scrambled files in {}", NiceLatencyNs(Timer.GetElapsedTimeUs() * 1000)); + }; + + ScrambleDir(DownloadPath); + ZEN_CONSOLE("\nRe-download Build {}, Part {} ({}) to '{}' (scrambled target)", BuildId, BuildPartId, m_BuildPartName, DownloadPath); + Aborted = DownloadFolder(*Storage, BuildId, {BuildPartId}, {}, DownloadPath, m_AllowMultiparts, false); + if (Aborted) + { + ZEN_CONSOLE("Re-download failed. (scrambled target)"); + return 11; + } + + ScrambleDir(DownloadPath); + + Oid BuildId2 = Oid::NewOid(); + Oid BuildPartId2 = Oid::NewOid(); + + CbObject MetaData2 = MakeMetaData(BuildId2); + { + ExtendableStringBuilder<256> SB; + CompactBinaryToJson(MetaData, SB); + ZEN_CONSOLE("\nUpload scrambled Build {}, Part {} ({})\n{}\n", BuildId2, BuildPartId2, m_BuildPartName, SB.ToView()); + } + + Aborted = UploadFolder(*Storage, + BuildId2, + BuildPartId2, + m_BuildPartName, + DownloadPath, + {}, + m_BlockReuseMinPercentLimit, + m_AllowMultiparts, + MetaData2, + true, + false); + if (Aborted) + { + ZEN_CONSOLE("Upload of scrambled failed."); + return 11; + } + + ZEN_CONSOLE("\nDownload Build {}, Part {} ({}) to '{}' (original)", BuildId, BuildPartId, m_BuildPartName, DownloadPath); + Aborted = DownloadFolder(*Storage, BuildId, {BuildPartId}, {}, DownloadPath, m_AllowMultiparts, false); + if (Aborted) + { + ZEN_CONSOLE("Re-download failed."); + return 11; + } + + ZEN_CONSOLE("\nDownload Build {}, Part {} ({}) to '{}' (scrambled)", BuildId2, BuildPartId2, m_BuildPartName, DownloadPath); + Aborted = DownloadFolder(*Storage, BuildId2, {BuildPartId2}, {}, DownloadPath, m_AllowMultiparts, false); + if (Aborted) + { + ZEN_CONSOLE("Re-download failed."); + return 11; + } + + ZEN_CONSOLE("\nRe-download Build {}, Part {} ({}) to '{}' (scrambled)", BuildId2, BuildPartId2, m_BuildPartName, DownloadPath); + Aborted = DownloadFolder(*Storage, BuildId2, {BuildPartId2}, {}, DownloadPath, m_AllowMultiparts, false); + if (Aborted) + { + ZEN_CONSOLE("Re-download failed."); + return 11; + } + + return 0; + } + + if (SubOption == &m_FetchBlobOptions) + { + ParseStorageOptions(); + ParseAuthOptions(); + + HttpClient Http(m_BuildsUrl, ClientSettings); + + if (m_BlobHash.empty()) + { + throw zen::OptionParseException(fmt::format("Blob hash string is missing\n{}", m_UploadOptions.help())); + } + + IoHash BlobHash; + if (!IoHash::TryParse(m_BlobHash, BlobHash)) + { + throw zen::OptionParseException(fmt::format("Blob hash string is invalid\n{}", m_UploadOptions.help())); + } + + if (m_BuildsUrl.empty() && m_StoragePath.empty()) + { + throw zen::OptionParseException(fmt::format("At least one storage option is required\n{}", m_UploadOptions.help())); + } + + BuildStorage::Statistics StorageStats; + const Oid BuildId = Oid::FromHexString(m_BuildId); + std::unique_ptr<BuildStorage> Storage; + std::string StorageName; + + if (!m_BuildsUrl.empty()) + { + ZEN_CONSOLE("Using from cloud endpoint {}. SessionId: '{}'. Namespace '{}', Bucket '{}', BuildId '{}'", + m_BuildsUrl, + Http.GetSessionId(), + m_Namespace, + m_Bucket, + BuildId); + CreateDirectories(m_Path / ZenTempStorageFolderName); + Storage = CreateJupiterBuildStorage(Log(), Http, StorageStats, m_Namespace, m_Bucket, m_Path / ZenTempStorageFolderName); + StorageName = "Cloud DDC"; + } + else if (!m_StoragePath.empty()) + { + ZEN_CONSOLE("Using folder {}. BuildId '{}'", m_StoragePath, BuildId); + Storage = CreateFileBuildStorage(m_StoragePath, StorageStats, false); // , .0015, 0.00004 + StorageName = fmt::format("Disk {}", m_StoragePath.stem()); + } + else + { + throw zen::OptionParseException(fmt::format("Storage option is missing\n{}", m_UploadOptions.help())); + } + + uint64_t CompressedSize; + uint64_t DecompressedSize; + ValidateBlob(*Storage, BuildId, BlobHash, CompressedSize, DecompressedSize); + ZEN_CONSOLE("Blob '{}' has a compressed size {} and a decompressed size of {} bytes", BlobHash, CompressedSize, DecompressedSize); + return 0; + } + + if (SubOption == &m_ValidateBuildPartOptions) + { + ParseStorageOptions(); + ParseAuthOptions(); + + HttpClient Http(m_BuildsUrl, ClientSettings); + + if (m_BuildsUrl.empty() && m_StoragePath.empty()) + { + throw zen::OptionParseException(fmt::format("At least one storage option is required\n{}", m_UploadOptions.help())); + } + + if (m_BuildId.empty()) + { + throw zen::OptionParseException(fmt::format("build-id is required\n{}", m_DownloadOptions.help())); + } + Oid BuildId = Oid::TryFromHexString(m_BuildId); + if (BuildId == Oid::Zero) + { + throw zen::OptionParseException(fmt::format("build-id is invalid\n{}", m_DownloadOptions.help())); + } + + if (!m_BuildPartName.empty() && !m_BuildPartId.empty()) + { + throw zen::OptionParseException(fmt::format("build-part-id conflicts with build-part-name\n{}", m_DownloadOptions.help())); + } + + BuildStorage::Statistics StorageStats; + std::unique_ptr<BuildStorage> Storage; + std::string StorageName; + + if (!m_BuildsUrl.empty()) + { + ZEN_CONSOLE("Using from cloud endpoint {}. SessionId: '{}'. Namespace '{}', Bucket '{}', BuildId '{}'", + m_BuildsUrl, + Http.GetSessionId(), + m_Namespace, + m_Bucket, + BuildId); + CreateDirectories(m_Path / ZenTempStorageFolderName); + Storage = CreateJupiterBuildStorage(Log(), Http, StorageStats, m_Namespace, m_Bucket, m_Path / ZenTempStorageFolderName); + StorageName = "Cloud DDC"; + } + else if (!m_StoragePath.empty()) + { + ZEN_CONSOLE("Using folder {}. BuildId '{}'", m_StoragePath, BuildId); + Storage = CreateFileBuildStorage(m_StoragePath, StorageStats, false); // , .0015, 0.00004 + StorageName = fmt::format("Disk {}", m_StoragePath.stem()); + } + else + { + throw zen::OptionParseException(fmt::format("Storage option is missing\n{}", m_UploadOptions.help())); + } + Oid BuildPartId = Oid::TryFromHexString(m_BuildPartId); + CbObject Build = Storage->GetBuild(BuildId); + if (!m_BuildPartName.empty()) + { + BuildPartId = Build["parts"sv].AsObjectView()[m_BuildPartName].AsObjectId(); + if (BuildPartId == Oid::Zero) + { + throw std::runtime_error(fmt::format("Build {} does not have a part named '{}'", m_BuildId, m_BuildPartName)); + } + } + CbObject BuildPart = Storage->GetBuildPart(BuildId, BuildPartId); + ZEN_CONSOLE("Validating build part {}/{} ({})", BuildId, BuildPartId, NiceBytes(BuildPart.GetSize())); + std::vector<IoHash> ChunkAttachments; + for (CbFieldView LooseFileView : BuildPart["chunkAttachments"sv].AsObjectView()["rawHashes"sv]) + { + ChunkAttachments.push_back(LooseFileView.AsBinaryAttachment()); + } + std::vector<IoHash> BlockAttachments; + for (CbFieldView BlocksView : BuildPart["blockAttachments"sv].AsObjectView()["rawHashes"sv]) + { + BlockAttachments.push_back(BlocksView.AsBinaryAttachment()); + } + + for (const IoHash& ChunkAttachment : ChunkAttachments) + { + uint64_t CompressedSize; + uint64_t DecompressedSize; + try + { + ValidateBlob(*Storage, BuildId, ChunkAttachment, CompressedSize, DecompressedSize); + ZEN_CONSOLE("Chunk attachment {} ({} -> {}) is valid", + ChunkAttachment, + NiceBytes(CompressedSize), + NiceBytes(DecompressedSize)); + } + catch (const std::exception& Ex) + { + ZEN_CONSOLE("Failed validating chunk attachment {}: {}", ChunkAttachment, Ex.what()); + } + } + + for (const IoHash& BlockAttachment : BlockAttachments) + { + uint64_t CompressedSize; + uint64_t DecompressedSize; + try + { + ValidateChunkBlock(*Storage, BuildId, BlockAttachment, CompressedSize, DecompressedSize); + ZEN_CONSOLE("Block attachment {} ({} -> {}) is valid", + BlockAttachment, + NiceBytes(CompressedSize), + NiceBytes(DecompressedSize)); + } + catch (const std::exception& Ex) + { + ZEN_CONSOLE("Failed validating block attachment {}: {}", BlockAttachment, Ex.what()); + } + } + + return 0; + } + + ZEN_ASSERT(false); +} + +} // namespace zen diff --git a/src/zen/cmds/builds_cmd.h b/src/zen/cmds/builds_cmd.h new file mode 100644 index 000000000..fa223943b --- /dev/null +++ b/src/zen/cmds/builds_cmd.h @@ -0,0 +1,106 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include "../zen.h" + +#include <zenhttp/auth/authmgr.h> +#include <zenhttp/httpclientauth.h> +#include <filesystem> + +namespace zen { + +class BuildsCommand : public CacheStoreCommand +{ +public: + static constexpr char Name[] = "builds"; + static constexpr char Description[] = "Manage builds - list, upload, download, diff"; + + BuildsCommand(); + ~BuildsCommand(); + + virtual int Run(const ZenCliOptions& GlobalOptions, int argc, char** argv) override; + virtual cxxopts::Options& Options() override { return m_Options; } + +private: + cxxopts::Options m_Options{Name, Description}; + + std::filesystem::path m_SystemRootDir; + + bool m_PlainProgress = false; + bool m_Verbose = false; + + // cloud builds + std::string m_BuildsUrl; + bool m_AssumeHttp2 = false; + std::string m_Namespace; + std::string m_Bucket; + + // file storage + std::filesystem::path m_StoragePath; + bool m_WriteMetadataAsJson = false; + + std::string m_BuildId; + bool m_CreateBuild = false; + std::string m_BuildMetadataPath; + std::string m_BuildMetadata; + std::string m_BuildPartName; // Defaults to name of leaf folder in m_Path + std::string m_BuildPartId; // Defaults to a generated id when creating part, looked up when downloading using m_BuildPartName + bool m_Clean = false; + uint8_t m_BlockReuseMinPercentLimit = 85; + bool m_AllowMultiparts = true; + std::filesystem::path m_ManifestPath; + + // Direct access token (may expire) + std::string m_AccessToken; + std::string m_AccessTokenEnv; + std::string m_AccessTokenPath; + + // Auth manager token encryption + std::string m_EncryptionKey; // 256 bit AES encryption key + std::string m_EncryptionIV; // 128 bit AES initialization vector + + // OpenId acccess token + std::string m_OpenIdProviderName; + std::string m_OpenIdProviderUrl; + std::string m_OpenIdClientId; + std::string m_OpenIdRefreshToken; + + // OAuth acccess token + std::string m_OAuthUrl; + std::string m_OAuthClientId; + std::string m_OAuthClientSecret; + + std::string m_Verb; // list, upload, download + + cxxopts::Options m_ListOptions{"list", "List available builds"}; + + std::filesystem::path m_Path; + + cxxopts::Options m_UploadOptions{"upload", "Upload a folder"}; + + cxxopts::Options m_DownloadOptions{"download", "Download a folder"}; + std::vector<std::string> m_BuildPartNames; + std::vector<std::string> m_BuildPartIds; + + cxxopts::Options m_DiffOptions{"diff", "Compare two local folders"}; + std::filesystem::path m_DiffPath; + bool m_OnlyChunked = false; + + cxxopts::Options m_TestOptions{"test", "Test upload and download with verify"}; + + cxxopts::Options m_FetchBlobOptions{"fetch-blob", "Fetch a blob from remote store"}; + std::string m_BlobHash; + + cxxopts::Options m_ValidateBuildPartOptions{"validate-part", "Fetch a build part and validate all referenced attachments"}; + + cxxopts::Options* m_SubCommands[7] = {&m_ListOptions, + &m_UploadOptions, + &m_DownloadOptions, + &m_DiffOptions, + &m_TestOptions, + &m_FetchBlobOptions, + &m_ValidateBuildPartOptions}; +}; + +} // namespace zen diff --git a/src/zen/zen.cpp b/src/zen/zen.cpp index 872ea8941..2e230ed53 100644 --- a/src/zen/zen.cpp +++ b/src/zen/zen.cpp @@ -7,6 +7,7 @@ #include "cmds/admin_cmd.h" #include "cmds/bench_cmd.h" +#include "cmds/builds_cmd.h" #include "cmds/cache_cmd.h" #include "cmds/copy_cmd.h" #include "cmds/dedup_cmd.h" @@ -396,6 +397,7 @@ main(int argc, char** argv) AttachCommand AttachCmd; BenchCommand BenchCmd; + BuildsCommand BuildsCmd; CacheDetailsCommand CacheDetailsCmd; CacheGetCommand CacheGetCmd; CacheGenerateCommand CacheGenerateCmd; @@ -451,6 +453,7 @@ main(int argc, char** argv) // clang-format off {"attach", &AttachCmd, "Add a sponsor process to a running zen service"}, {"bench", &BenchCmd, "Utility command for benchmarking"}, + {BuildsCommand::Name, &BuildsCmd, BuildsCommand::Description}, {"cache-details", &CacheDetailsCmd, "Details on cache"}, {"cache-info", &CacheInfoCmd, "Info on cache, namespace or bucket"}, {CacheGetCommand::Name, &CacheGetCmd, CacheGetCommand::Description}, diff --git a/src/zencore/filesystem.cpp b/src/zencore/filesystem.cpp index 5716d1255..8279fb952 100644 --- a/src/zencore/filesystem.cpp +++ b/src/zencore/filesystem.cpp @@ -1469,6 +1469,36 @@ GetModificationTickFromHandle(void* NativeHandle, std::error_code& Ec) return 0; } +uint64_t +GetModificationTickFromPath(const std::filesystem::path& Filename) +{ + // PathFromHandle + void* Handle; +#if ZEN_PLATFORM_WINDOWS + Handle = CreateFileW(Filename.c_str(), GENERIC_READ, FILE_SHARE_READ, nullptr, OPEN_EXISTING, 0, nullptr); + if (Handle == INVALID_HANDLE_VALUE) + { + ThrowLastError(fmt::format("Failed to open file {} to check modification tick.", Filename)); + } + auto _ = MakeGuard([Handle]() { CloseHandle(Handle); }); +#else + int Fd = open(Filename.c_str(), O_RDONLY | O_CLOEXEC); + if (Fd <= 9) + { + ThrowLastError(fmt::format("Failed to open file {} to check modification tick.", Filename)); + } + Handle = (void*)uintptr_t(Fd); + auto _ = MakeGuard([Handle]() { close(int(uintptr_t(Handle))); }); +#endif + std::error_code Ec; + uint64_t ModificatonTick = GetModificationTickFromHandle(Handle, Ec); + if (Ec) + { + ThrowSystemError(Ec.value(), Ec.message()); + } + return ModificatonTick; +} + std::filesystem::path GetRunningExecutablePath() { @@ -1895,6 +1925,116 @@ PickDefaultSystemRootDirectory() #endif // ZEN_PLATFORM_WINDOWS } +#if ZEN_PLATFORM_WINDOWS + +uint32_t +GetFileAttributes(const std::filesystem::path& Filename) +{ + DWORD Attributes = ::GetFileAttributes(Filename.native().c_str()); + if (Attributes == INVALID_FILE_ATTRIBUTES) + { + ThrowLastError(fmt::format("failed to get attributes of file {}", Filename)); + } + return (uint32_t)Attributes; +} + +void +SetFileAttributes(const std::filesystem::path& Filename, uint32_t Attributes) +{ + if (::SetFileAttributes(Filename.native().c_str(), Attributes) == 0) + { + ThrowLastError(fmt::format("failed to set attributes of file {}", Filename)); + } +} + +#endif // ZEN_PLATFORM_WINDOWS + +#if ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + +uint32_t +GetFileMode(const std::filesystem::path& Filename) +{ + struct stat Stat; + int err = stat(Filename.native().c_str(), &Stat); + if (err) + { + ThrowLastError(fmt::format("Failed to get mode of file {}", Filename)); + } + return (uint32_t)Stat.st_mode; +} + +void +SetFileMode(const std::filesystem::path& Filename, uint32_t Attributes) +{ + int err = chmod(Filename.native().c_str(), (mode_t)Attributes); + if (err) + { + ThrowLastError(fmt::format("Failed to set mode of file {}", Filename)); + } +} + +#endif // ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + +#if ZEN_PLATFORM_WINDOWS +const uint32_t FileAttributesSystemReadOnlyFlag = FILE_ATTRIBUTE_READONLY; +#else +const uint32_t FileAttributesSystemReadOnlyFlag = 0x00000001; +#endif // ZEN_PLATFORM_WINDOWS + +const uint32_t FileModeWriteEnableFlags = 0222; + +bool +IsFileAttributeReadOnly(uint32_t FileAttributes) +{ +#if ZEN_PLATFORM_WINDOWS + return (FileAttributes & FileAttributesSystemReadOnlyFlag) != 0; +#else + return (FileAttributes & 0x00000001) != 0; +#endif // ZEN_PLATFORM_WINDOWS +} + +bool +IsFileModeReadOnly(uint32_t FileMode) +{ + return (FileMode & FileModeWriteEnableFlags) == 0; +} + +uint32_t +MakeFileAttributeReadOnly(uint32_t FileAttributes, bool ReadOnly) +{ + return ReadOnly ? (FileAttributes | FileAttributesSystemReadOnlyFlag) : (FileAttributes & ~FileAttributesSystemReadOnlyFlag); +} + +uint32_t +MakeFileModeReadOnly(uint32_t FileMode, bool ReadOnly) +{ + return ReadOnly ? (FileMode & ~FileModeWriteEnableFlags) : (FileMode | FileModeWriteEnableFlags); +} + +bool +SetFileReadOnly(const std::filesystem::path& Filename, bool ReadOnly) +{ +#if ZEN_PLATFORM_WINDOWS + uint32_t CurrentAttributes = GetFileAttributes(Filename); + uint32_t NewAttributes = MakeFileAttributeReadOnly(CurrentAttributes, ReadOnly); + if (CurrentAttributes != NewAttributes) + { + SetFileAttributes(Filename, NewAttributes); + return true; + } +#endif // ZEN_PLATFORM_WINDOWS +#if ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + uint32_t CurrentMode = GetFileMode(Filename); + uint32_t NewMode = MakeFileModeReadOnly(CurrentMode, ReadOnly); + if (CurrentMode != NewMode) + { + SetFileMode(Filename, NewMode); + return true; + } +#endif // ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + return false; +} + ////////////////////////////////////////////////////////////////////////// // // Testing related code follows... diff --git a/src/zencore/include/zencore/filesystem.h b/src/zencore/include/zencore/filesystem.h index 250745e86..20f6dc56c 100644 --- a/src/zencore/include/zencore/filesystem.h +++ b/src/zencore/include/zencore/filesystem.h @@ -52,6 +52,10 @@ ZENCORE_API uint64_t FileSizeFromHandle(void* NativeHandle); */ ZENCORE_API uint64_t GetModificationTickFromHandle(void* NativeHandle, std::error_code& Ec); +/** Get a native time tick of last modification time + */ +ZENCORE_API uint64_t GetModificationTickFromPath(const std::filesystem::path& Filename); + ZENCORE_API std::filesystem::path GetRunningExecutablePath(); /** Set the max open file handle count to max allowed for the current process on Linux and MacOS @@ -271,6 +275,23 @@ std::error_code RotateDirectories(const std::filesystem::path& DirectoryName, st std::filesystem::path PickDefaultSystemRootDirectory(); +#if ZEN_PLATFORM_WINDOWS +uint32_t GetFileAttributes(const std::filesystem::path& Filename); +void SetFileAttributes(const std::filesystem::path& Filename, uint32_t Attributes); +#endif // ZEN_PLATFORM_WINDOWS + +#if ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC +uint32_t GetFileMode(const std::filesystem::path& Filename); +void SetFileMode(const std::filesystem::path& Filename, uint32_t Attributes); +#endif // ZEN_PLATFORM_LINUX || ZEN_PLATFORM_MAC + +bool IsFileAttributeReadOnly(uint32_t FileAttributes); +bool IsFileModeReadOnly(uint32_t FileMode); +uint32_t MakeFileAttributeReadOnly(uint32_t FileAttributes, bool ReadOnly); +uint32_t MakeFileModeReadOnly(uint32_t FileMode, bool ReadOnly); + +bool SetFileReadOnly(const std::filesystem::path& Filename, bool ReadOnly); + ////////////////////////////////////////////////////////////////////////// void filesystem_forcelink(); // internal diff --git a/src/zenserver/projectstore/buildsremoteprojectstore.cpp b/src/zenserver/projectstore/buildsremoteprojectstore.cpp index e4e91104c..fbb9bc344 100644 --- a/src/zenserver/projectstore/buildsremoteprojectstore.cpp +++ b/src/zenserver/projectstore/buildsremoteprojectstore.cpp @@ -344,7 +344,12 @@ public: m_BuildId); return Result; } - Result.Blocks = std::move(Blocks.value()); + Result.Blocks.reserve(Blocks.value().size()); + for (ChunkBlockDescription& BlockDescription : Blocks.value()) + { + Result.Blocks.push_back(ThinChunkBlockDescription{.BlockHash = BlockDescription.BlockHash, + .ChunkRawHashes = std::move(BlockDescription.ChunkRawHashes)}); + } return Result; } diff --git a/src/zenserver/projectstore/fileremoteprojectstore.cpp b/src/zenserver/projectstore/fileremoteprojectstore.cpp index 5a21a7540..98e292d91 100644 --- a/src/zenserver/projectstore/fileremoteprojectstore.cpp +++ b/src/zenserver/projectstore/fileremoteprojectstore.cpp @@ -192,8 +192,8 @@ public: return GetKnownBlocksResult{{.ErrorCode = static_cast<int>(HttpResponseCode::NoContent), .ElapsedSeconds = LoadResult.ElapsedSeconds + Timer.GetElapsedTimeUs() * 1000}}; } - std::vector<ChunkBlockDescription> KnownBlocks = GetBlocksFromOplog(LoadResult.ContainerObject, ExistingBlockHashes); - GetKnownBlocksResult Result{{.ElapsedSeconds = LoadResult.ElapsedSeconds + Timer.GetElapsedTimeUs() * 1000}}; + std::vector<ThinChunkBlockDescription> KnownBlocks = GetBlocksFromOplog(LoadResult.ContainerObject, ExistingBlockHashes); + GetKnownBlocksResult Result{{.ElapsedSeconds = LoadResult.ElapsedSeconds + Timer.GetElapsedTimeUs() * 1000}}; Result.Blocks = std::move(KnownBlocks); return Result; } diff --git a/src/zenserver/projectstore/jupiterremoteprojectstore.cpp b/src/zenserver/projectstore/jupiterremoteprojectstore.cpp index 2b6a437d1..e5839ad3b 100644 --- a/src/zenserver/projectstore/jupiterremoteprojectstore.cpp +++ b/src/zenserver/projectstore/jupiterremoteprojectstore.cpp @@ -193,7 +193,7 @@ public: return GetKnownBlocksResult{{.ErrorCode = static_cast<int>(HttpResponseCode::NoContent), .ElapsedSeconds = LoadResult.ElapsedSeconds + ExistsResult.ElapsedSeconds}}; } - std::vector<ChunkBlockDescription> KnownBlocks = GetBlocksFromOplog(LoadResult.ContainerObject, ExistingBlockHashes); + std::vector<ThinChunkBlockDescription> KnownBlocks = GetBlocksFromOplog(LoadResult.ContainerObject, ExistingBlockHashes); GetKnownBlocksResult Result{ {.ElapsedSeconds = LoadResult.ElapsedSeconds + ExistsResult.ElapsedSeconds + Timer.GetElapsedTimeUs() * 1000.0}}; diff --git a/src/zenserver/projectstore/projectstore.cpp b/src/zenserver/projectstore/projectstore.cpp index f6f7eba99..53df12b14 100644 --- a/src/zenserver/projectstore/projectstore.cpp +++ b/src/zenserver/projectstore/projectstore.cpp @@ -8628,7 +8628,11 @@ TEST_CASE("project.store.block") } ChunkBlockDescription Block; CompressedBuffer BlockBuffer = GenerateChunkBlock(std::move(Chunks), Block); - CHECK(IterateChunkBlock(BlockBuffer.Decompress(), [](CompressedBuffer&&, const IoHash&) {})); + uint64_t HeaderSize; + CHECK(IterateChunkBlock( + BlockBuffer.Decompress(), + [](CompressedBuffer&&, const IoHash&) {}, + HeaderSize)); } TEST_CASE("project.store.iterateoplog") diff --git a/src/zenserver/projectstore/remoteprojectstore.cpp b/src/zenserver/projectstore/remoteprojectstore.cpp index b4b2c6fc4..a7263da83 100644 --- a/src/zenserver/projectstore/remoteprojectstore.cpp +++ b/src/zenserver/projectstore/remoteprojectstore.cpp @@ -516,21 +516,23 @@ namespace remotestore_impl { return; } - bool StoreChunksOK = IterateChunkBlock( - BlockPayload, - [&WantedChunks, &WriteAttachmentBuffers, &WriteRawHashes, &Info](CompressedBuffer&& Chunk, - const IoHash& AttachmentRawHash) { - if (WantedChunks.contains(AttachmentRawHash)) - { - WriteAttachmentBuffers.emplace_back(Chunk.GetCompressed().Flatten().AsIoBuffer()); - IoHash RawHash; - uint64_t RawSize; - ZEN_ASSERT(CompressedBuffer::ValidateCompressedHeader(WriteAttachmentBuffers.back(), RawHash, RawSize)); - ZEN_ASSERT(RawHash == AttachmentRawHash); - WriteRawHashes.emplace_back(AttachmentRawHash); - WantedChunks.erase(AttachmentRawHash); - } - }); + uint64_t BlockHeaderSize = 0; + bool StoreChunksOK = IterateChunkBlock( + BlockPayload, + [&WantedChunks, &WriteAttachmentBuffers, &WriteRawHashes, &Info](CompressedBuffer&& Chunk, + const IoHash& AttachmentRawHash) { + if (WantedChunks.contains(AttachmentRawHash)) + { + WriteAttachmentBuffers.emplace_back(Chunk.GetCompressed().Flatten().AsIoBuffer()); + IoHash RawHash; + uint64_t RawSize; + ZEN_ASSERT(CompressedBuffer::ValidateCompressedHeader(WriteAttachmentBuffers.back(), RawHash, RawSize)); + ZEN_ASSERT(RawHash == AttachmentRawHash); + WriteRawHashes.emplace_back(AttachmentRawHash); + WantedChunks.erase(AttachmentRawHash); + } + }, + BlockHeaderSize); if (!StoreChunksOK) { @@ -1101,11 +1103,11 @@ GetBlockHashesFromOplog(CbObjectView ContainerObject) return BlockHashes; } -std::vector<ChunkBlockDescription> +std::vector<ThinChunkBlockDescription> GetBlocksFromOplog(CbObjectView ContainerObject, std::span<const IoHash> IncludeBlockHashes) { using namespace std::literals; - std::vector<ChunkBlockDescription> Result; + std::vector<ThinChunkBlockDescription> Result; CbArrayView BlocksArray = ContainerObject["blocks"sv].AsArrayView(); tsl::robin_set<IoHash, IoHash::Hasher> IncludeSet; IncludeSet.insert(IncludeBlockHashes.begin(), IncludeBlockHashes.end()); @@ -1128,7 +1130,7 @@ GetBlocksFromOplog(CbObjectView ContainerObject, std::span<const IoHash> Include { ChunkHashes.push_back(ChunkField.AsHash()); } - Result.push_back({.BlockHash = BlockHash, .ChunkHashes = std::move(ChunkHashes)}); + Result.push_back(ThinChunkBlockDescription{.BlockHash = BlockHash, .ChunkRawHashes = std::move(ChunkHashes)}); } } return Result; @@ -1144,7 +1146,7 @@ BuildContainer(CidStore& ChunkStore, bool BuildBlocks, bool IgnoreMissingAttachments, bool AllowChunking, - const std::vector<ChunkBlockDescription>& KnownBlocks, + const std::vector<ThinChunkBlockDescription>& KnownBlocks, WorkerThreadPool& WorkerPool, const std::function<void(CompressedBuffer&&, ChunkBlockDescription&&)>& AsyncOnBlock, const std::function<void(const IoHash&, TGetAttachmentBufferFunc&&)>& OnLargeAttachment, @@ -1386,7 +1388,7 @@ BuildContainer(CidStore& ChunkStore, return {}; } - auto FindReuseBlocks = [](const std::vector<ChunkBlockDescription>& KnownBlocks, + auto FindReuseBlocks = [](const std::vector<ThinChunkBlockDescription>& KnownBlocks, const std::unordered_set<IoHash, IoHash::Hasher>& Attachments, JobContext* OptionalContext) -> std::vector<size_t> { std::vector<size_t> ReuseBlockIndexes; @@ -1399,14 +1401,14 @@ BuildContainer(CidStore& ChunkStore, for (size_t KnownBlockIndex = 0; KnownBlockIndex < KnownBlocks.size(); KnownBlockIndex++) { - const ChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; - size_t BlockAttachmentCount = KnownBlock.ChunkHashes.size(); + const ThinChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; + size_t BlockAttachmentCount = KnownBlock.ChunkRawHashes.size(); if (BlockAttachmentCount == 0) { continue; } size_t FoundAttachmentCount = 0; - for (const IoHash& KnownHash : KnownBlock.ChunkHashes) + for (const IoHash& KnownHash : KnownBlock.ChunkRawHashes) { if (Attachments.contains(KnownHash)) { @@ -1447,8 +1449,8 @@ BuildContainer(CidStore& ChunkStore, std::vector<size_t> ReusedBlockIndexes = FindReuseBlocks(KnownBlocks, FoundHashes, OptionalContext); for (size_t KnownBlockIndex : ReusedBlockIndexes) { - const ChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; - for (const IoHash& KnownHash : KnownBlock.ChunkHashes) + const ThinChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; + for (const IoHash& KnownHash : KnownBlock.ChunkRawHashes) { if (UploadAttachments.erase(KnownHash) == 1) { @@ -1784,8 +1786,8 @@ BuildContainer(CidStore& ChunkStore, std::vector<size_t> ReusedBlockFromChunking = FindReuseBlocks(KnownBlocks, ChunkedHashes, OptionalContext); for (size_t KnownBlockIndex : ReusedBlockIndexes) { - const ChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; - for (const IoHash& KnownHash : KnownBlock.ChunkHashes) + const ThinChunkBlockDescription& KnownBlock = KnownBlocks[KnownBlockIndex]; + for (const IoHash& KnownHash : KnownBlock.ChunkRawHashes) { if (ChunkedHashes.erase(KnownHash) == 1) { @@ -1803,7 +1805,7 @@ BuildContainer(CidStore& ChunkStore, Blocks.reserve(ReuseBlockCount); for (auto It = ReusedBlockIndexes.begin(); It != UniqueKnownBlocksEnd; It++) { - Blocks.push_back(KnownBlocks[*It]); + Blocks.push_back({KnownBlocks[*It]}); } remotestore_impl::ReportMessage(OptionalContext, fmt::format("Reused {} attachments from {} blocks", ReusedAttachmentCount, ReuseBlockCount)); @@ -1919,9 +1921,9 @@ BuildContainer(CidStore& ChunkStore, { // We can share the lock as we are not resizing the vector and only touch BlockHash at our own index RwLock::SharedLockScope _(BlocksLock); - Blocks[BlockIndex].ChunkHashes.insert(Blocks[BlockIndex].ChunkHashes.end(), - BlockAttachmentHashes.begin(), - BlockAttachmentHashes.end()); + Blocks[BlockIndex].ChunkRawHashes.insert(Blocks[BlockIndex].ChunkRawHashes.end(), + BlockAttachmentHashes.begin(), + BlockAttachmentHashes.end()); } uint64_t NowMS = Timer.GetElapsedTimeMs(); ZEN_INFO("Assembled block {} with {} chunks in {} ({})", @@ -2167,7 +2169,7 @@ BuildContainer(CidStore& ChunkStore, { for (const ChunkBlockDescription& B : Blocks) { - ZEN_ASSERT(!B.ChunkHashes.empty()); + ZEN_ASSERT(!B.ChunkRawHashes.empty()); if (BuildBlocks) { ZEN_ASSERT(B.BlockHash != IoHash::Zero); @@ -2177,7 +2179,7 @@ BuildContainer(CidStore& ChunkStore, OplogContinerWriter.AddBinaryAttachment("rawhash"sv, B.BlockHash); OplogContinerWriter.BeginArray("chunks"sv); { - for (const IoHash& RawHash : B.ChunkHashes) + for (const IoHash& RawHash : B.ChunkRawHashes) { OplogContinerWriter.AddHash(RawHash); } @@ -2193,7 +2195,7 @@ BuildContainer(CidStore& ChunkStore, { OplogContinerWriter.BeginArray("chunks"sv); { - for (const IoHash& RawHash : B.ChunkHashes) + for (const IoHash& RawHash : B.ChunkRawHashes) { OplogContinerWriter.AddBinaryAttachment(RawHash); } @@ -2389,7 +2391,7 @@ SaveOplog(CidStore& ChunkStore, OnBlock = UploadBlock; } - std::vector<ChunkBlockDescription> KnownBlocks; + std::vector<ThinChunkBlockDescription> KnownBlocks; uint64_t TransferWallTimeMS = 0; diff --git a/src/zenserver/projectstore/remoteprojectstore.h b/src/zenserver/projectstore/remoteprojectstore.h index 1ef0416b7..1210afc7c 100644 --- a/src/zenserver/projectstore/remoteprojectstore.h +++ b/src/zenserver/projectstore/remoteprojectstore.h @@ -66,7 +66,7 @@ public: struct GetKnownBlocksResult : public Result { - std::vector<ChunkBlockDescription> Blocks; + std::vector<ThinChunkBlockDescription> Blocks; }; struct RemoteStoreInfo @@ -166,7 +166,7 @@ RemoteProjectStore::Result LoadOplog(CidStore& ChunkStore, bool CleanOplog, JobContext* OptionalContext); -std::vector<IoHash> GetBlockHashesFromOplog(CbObjectView ContainerObject); -std::vector<ChunkBlockDescription> GetBlocksFromOplog(CbObjectView ContainerObject, std::span<const IoHash> IncludeBlockHashes); +std::vector<IoHash> GetBlockHashesFromOplog(CbObjectView ContainerObject); +std::vector<ThinChunkBlockDescription> GetBlocksFromOplog(CbObjectView ContainerObject, std::span<const IoHash> IncludeBlockHashes); } // namespace zen diff --git a/src/zenutil/chunkblock.cpp b/src/zenutil/chunkblock.cpp index 6dae5af11..a19cf5c1b 100644 --- a/src/zenutil/chunkblock.cpp +++ b/src/zenutil/chunkblock.cpp @@ -3,6 +3,7 @@ #include <zenutil/chunkblock.h> #include <zencore/compactbinarybuilder.h> +#include <zencore/fmtutils.h> #include <zencore/logging.h> #include <vector> @@ -18,20 +19,27 @@ ParseChunkBlockDescription(const CbObjectView& BlockObject) Result.BlockHash = BlockObject["rawHash"sv].AsHash(); if (Result.BlockHash != IoHash::Zero) { + Result.HeaderSize = BlockObject["headerSize"sv].AsUInt64(); CbArrayView ChunksArray = BlockObject["rawHashes"sv].AsArrayView(); - Result.ChunkHashes.reserve(ChunksArray.Num()); + Result.ChunkRawHashes.reserve(ChunksArray.Num()); for (CbFieldView ChunkView : ChunksArray) { - Result.ChunkHashes.push_back(ChunkView.AsHash()); + Result.ChunkRawHashes.push_back(ChunkView.AsHash()); } - CbArrayView ChunkRawLengthsArray = BlockObject["chunkRawLengths"sv].AsArrayView(); - std::vector<uint32_t> ChunkLengths; + CbArrayView ChunkRawLengthsArray = BlockObject["chunkRawLengths"sv].AsArrayView(); Result.ChunkRawLengths.reserve(ChunkRawLengthsArray.Num()); for (CbFieldView ChunkView : ChunkRawLengthsArray) { Result.ChunkRawLengths.push_back(ChunkView.AsUInt32()); } + + CbArrayView ChunkCompressedLengthsArray = BlockObject["chunkCompressedLengths"sv].AsArrayView(); + Result.ChunkCompressedLengths.reserve(ChunkCompressedLengthsArray.Num()); + for (CbFieldView ChunkView : ChunkCompressedLengthsArray) + { + Result.ChunkCompressedLengths.push_back(ChunkView.AsUInt32()); + } } return Result; } @@ -57,18 +65,23 @@ ParseChunkBlockDescriptionList(const CbObjectView& BlocksObject) CbObject BuildChunkBlockDescription(const ChunkBlockDescription& Block, CbObjectView MetaData) { - ZEN_ASSERT(Block.ChunkRawLengths.size() == Block.ChunkHashes.size()); + ZEN_ASSERT(Block.BlockHash != IoHash::Zero); + ZEN_ASSERT(Block.HeaderSize > 0); + ZEN_ASSERT(Block.ChunkRawLengths.size() == Block.ChunkRawHashes.size()); + ZEN_ASSERT(Block.ChunkCompressedLengths.size() == Block.ChunkRawHashes.size()); CbObjectWriter Writer; Writer.AddHash("rawHash"sv, Block.BlockHash); + Writer.AddInteger("headerSize"sv, Block.HeaderSize); Writer.BeginArray("rawHashes"sv); { - for (const IoHash& ChunkHash : Block.ChunkHashes) + for (const IoHash& ChunkHash : Block.ChunkRawHashes) { Writer.AddHash(ChunkHash); } } Writer.EndArray(); + Writer.BeginArray("chunkRawLengths"); { for (uint32_t ChunkSize : Block.ChunkRawLengths) @@ -78,11 +91,58 @@ BuildChunkBlockDescription(const ChunkBlockDescription& Block, CbObjectView Meta } Writer.EndArray(); + Writer.BeginArray("chunkCompressedLengths"); + { + for (uint32_t ChunkSize : Block.ChunkCompressedLengths) + { + Writer.AddInteger(ChunkSize); + } + } + Writer.EndArray(); + Writer.AddObject("metadata", MetaData); return Writer.Save(); } +ChunkBlockDescription +GetChunkBlockDescription(const SharedBuffer& BlockPayload, const IoHash& RawHash) +{ + ChunkBlockDescription BlockDescription = {{.BlockHash = IoHash::HashBuffer(BlockPayload)}}; + if (BlockDescription.BlockHash != RawHash) + { + throw std::runtime_error(fmt::format("Block {} content hash {} does not match block hash", RawHash, BlockDescription.BlockHash)); + } + if (IterateChunkBlock( + BlockPayload, + [&BlockDescription, RawHash](CompressedBuffer&& Chunk, const IoHash& AttachmentHash) { + if (CompositeBuffer Decompressed = Chunk.DecompressToComposite(); Decompressed) + { + IoHash ChunkHash = IoHash::HashBuffer(Decompressed.Flatten()); + if (ChunkHash != AttachmentHash) + { + throw std::runtime_error( + fmt::format("Chunk {} in block {} content hash {} does not match chunk", AttachmentHash, RawHash, ChunkHash)); + } + BlockDescription.ChunkRawHashes.push_back(AttachmentHash); + BlockDescription.ChunkRawLengths.push_back(gsl::narrow<uint32_t>(Decompressed.GetSize())); + BlockDescription.ChunkCompressedLengths.push_back(gsl::narrow<uint32_t>(Chunk.GetCompressedSize())); + } + else + { + throw std::runtime_error(fmt::format("Chunk {} in block {} is not a compressed buffer", AttachmentHash, RawHash)); + } + }, + BlockDescription.HeaderSize)) + { + return BlockDescription; + } + else + { + throw std::runtime_error(fmt::format("Block {} is malformed", RawHash)); + } +} + CompressedBuffer GenerateChunkBlock(std::vector<std::pair<IoHash, FetchChunkFunc>>&& FetchChunks, ChunkBlockDescription& OutBlock) { @@ -91,8 +151,9 @@ GenerateChunkBlock(std::vector<std::pair<IoHash, FetchChunkFunc>>&& FetchChunks, std::vector<SharedBuffer> ChunkSegments; ChunkSegments.resize(1); ChunkSegments.reserve(1 + ChunkCount); - OutBlock.ChunkHashes.reserve(ChunkCount); + OutBlock.ChunkRawHashes.reserve(ChunkCount); OutBlock.ChunkRawLengths.reserve(ChunkCount); + OutBlock.ChunkCompressedLengths.reserve(ChunkCount); { IoBuffer TempBuffer(ChunkCount * 9); MutableMemoryView View = TempBuffer.GetMutableView(); @@ -106,16 +167,19 @@ GenerateChunkBlock(std::vector<std::pair<IoHash, FetchChunkFunc>>&& FetchChunks, std::span<const SharedBuffer> Segments = Chunk.second.GetCompressed().GetSegments(); for (const SharedBuffer& Segment : Segments) { + ZEN_ASSERT(Segment.IsOwned()); ChunkSize += Segment.GetSize(); ChunkSegments.push_back(Segment); } BufferEndPtr += WriteVarUInt(ChunkSize, BufferEndPtr); - OutBlock.ChunkHashes.push_back(It.first); + OutBlock.ChunkRawHashes.push_back(It.first); OutBlock.ChunkRawLengths.push_back(gsl::narrow<uint32_t>(Chunk.first)); + OutBlock.ChunkCompressedLengths.push_back(gsl::narrow<uint32_t>(ChunkSize)); } ZEN_ASSERT(BufferEndPtr <= View.GetDataEnd()); ptrdiff_t TempBufferLength = std::distance(BufferStartPtr, BufferEndPtr); ChunkSegments[0] = SharedBuffer(IoBuffer(TempBuffer, 0, gsl::narrow<size_t>(TempBufferLength))); + OutBlock.HeaderSize = TempBufferLength; } CompressedBuffer CompressedBlock = CompressedBuffer::Compress(CompositeBuffer(std::move(ChunkSegments)), OodleCompressor::Mermaid, OodleCompressionLevel::None); @@ -124,7 +188,9 @@ GenerateChunkBlock(std::vector<std::pair<IoHash, FetchChunkFunc>>&& FetchChunks, } bool -IterateChunkBlock(const SharedBuffer& BlockPayload, std::function<void(CompressedBuffer&& Chunk, const IoHash& AttachmentHash)> Visitor) +IterateChunkBlock(const SharedBuffer& BlockPayload, + std::function<void(CompressedBuffer&& Chunk, const IoHash& AttachmentHash)> Visitor, + uint64_t& OutHeaderSize) { ZEN_ASSERT(BlockPayload); if (BlockPayload.GetSize() < 1) @@ -144,21 +210,23 @@ IterateChunkBlock(const SharedBuffer& BlockPayload, std::function<void(Compresse ChunkSizes.push_back(ReadVarUInt(ReadPtr, NumberSize)); ReadPtr += NumberSize; } + uint64_t Offset = std::distance((const uint8_t*)BlockView.GetData(), ReadPtr); + OutHeaderSize = Offset; for (uint64_t ChunkSize : ChunkSizes) { - IoBuffer Chunk(IoBuffer::Wrap, ReadPtr, ChunkSize); + IoBuffer Chunk(BlockPayload.AsIoBuffer(), Offset, ChunkSize); IoHash AttachmentRawHash; uint64_t AttachmentRawSize; CompressedBuffer CompressedChunk = CompressedBuffer::FromCompressed(SharedBuffer(Chunk), AttachmentRawHash, AttachmentRawSize); - + ZEN_ASSERT_SLOW(IoHash::HashBuffer(CompressedChunk.DecompressToComposite()) == AttachmentRawHash); if (!CompressedChunk) { ZEN_ERROR("Invalid chunk in block"); return false; } Visitor(std::move(CompressedChunk), AttachmentRawHash); - ReadPtr += ChunkSize; - ZEN_ASSERT(ReadPtr <= BlockView.GetDataEnd()); + Offset += ChunkSize; + ZEN_ASSERT(Offset <= BlockView.GetSize()); } return true; }; diff --git a/src/zenutil/chunkedcontent.cpp b/src/zenutil/chunkedcontent.cpp new file mode 100644 index 000000000..a41b71972 --- /dev/null +++ b/src/zenutil/chunkedcontent.cpp @@ -0,0 +1,865 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#include <zenutil/chunkedcontent.h> + +#include <zencore/filesystem.h> +#include <zencore/fmtutils.h> +#include <zencore/logging.h> +#include <zencore/scopeguard.h> +#include <zencore/timer.h> + +#include <zenutil/chunkedfile.h> +#include <zenutil/chunkingcontroller.h> +#include <zenutil/parallellwork.h> +#include <zenutil/workerpools.h> + +ZEN_THIRD_PARTY_INCLUDES_START +#include <tsl/robin_set.h> +#include <gsl/gsl-lite.hpp> +ZEN_THIRD_PARTY_INCLUDES_END + +namespace zen { + +using namespace std::literals; + +namespace { + void AddCunkSequence(ChunkingStatistics& Stats, + ChunkedContentData& InOutChunkedContent, + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& ChunkHashToChunkIndex, + const IoHash& RawHash, + std::span<const uint32_t> ChunkSequence, + std::span<const IoHash> ChunkHashes, + std::span<const uint64_t> ChunkRawSizes) + { + ZEN_ASSERT(ChunkHashes.size() == ChunkRawSizes.size()); + InOutChunkedContent.ChunkCounts.push_back(gsl::narrow<uint32_t>(ChunkSequence.size())); + InOutChunkedContent.ChunkOrders.reserve(InOutChunkedContent.ChunkOrders.size() + ChunkSequence.size()); + + for (uint32_t ChunkedSequenceIndex : ChunkSequence) + { + const IoHash& ChunkHash = ChunkHashes[ChunkedSequenceIndex]; + if (auto It = ChunkHashToChunkIndex.find(ChunkHash); It != ChunkHashToChunkIndex.end()) + { + uint32_t ChunkIndex = gsl::narrow<uint32_t>(It->second); + InOutChunkedContent.ChunkOrders.push_back(ChunkIndex); + } + else + { + uint32_t ChunkIndex = gsl::narrow<uint32_t>(InOutChunkedContent.ChunkHashes.size()); + ChunkHashToChunkIndex.insert_or_assign(ChunkHash, ChunkIndex); + InOutChunkedContent.ChunkHashes.push_back(ChunkHash); + InOutChunkedContent.ChunkRawSizes.push_back(ChunkRawSizes[ChunkedSequenceIndex]); + InOutChunkedContent.ChunkOrders.push_back(ChunkIndex); + Stats.UniqueChunksFound++; + Stats.UniqueBytesFound += ChunkRawSizes[ChunkedSequenceIndex]; + } + } + InOutChunkedContent.SequenceRawHashes.push_back(RawHash); + Stats.UniqueSequencesFound++; + } + + void AddCunkSequence(ChunkingStatistics& Stats, + ChunkedContentData& InOutChunkedContent, + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& ChunkHashToChunkIndex, + const IoHash& RawHash, + const uint64_t RawSize) + { + InOutChunkedContent.ChunkCounts.push_back(1); + + if (auto It = ChunkHashToChunkIndex.find(RawHash); It != ChunkHashToChunkIndex.end()) + { + uint32_t ChunkIndex = gsl::narrow<uint32_t>(It->second); + InOutChunkedContent.ChunkOrders.push_back(ChunkIndex); + } + else + { + uint32_t ChunkIndex = gsl::narrow<uint32_t>(InOutChunkedContent.ChunkHashes.size()); + ChunkHashToChunkIndex.insert_or_assign(RawHash, ChunkIndex); + InOutChunkedContent.ChunkHashes.push_back(RawHash); + InOutChunkedContent.ChunkRawSizes.push_back(RawSize); + InOutChunkedContent.ChunkOrders.push_back(ChunkIndex); + Stats.UniqueChunksFound++; + Stats.UniqueBytesFound += RawSize; + } + InOutChunkedContent.SequenceRawHashes.push_back(RawHash); + Stats.UniqueSequencesFound++; + } + + IoHash HashOneFile(ChunkingStatistics& Stats, + const ChunkingController& InChunkingController, + ChunkedFolderContent& OutChunkedContent, + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& ChunkHashToChunkIndex, + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& RawHashToSequenceRawHashIndex, + RwLock& Lock, + const std::filesystem::path& FolderPath, + uint32_t PathIndex) + { + const uint64_t RawSize = OutChunkedContent.RawSizes[PathIndex]; + const std::filesystem::path& Path = OutChunkedContent.Paths[PathIndex]; + + if (RawSize == 0) + { + return IoHash::Zero; + } + else + { + ChunkedInfoWithSource Chunked; + const bool DidChunking = + InChunkingController.ProcessFile((FolderPath / Path).make_preferred(), RawSize, Chunked, Stats.BytesHashed); + if (DidChunking) + { + Lock.WithExclusiveLock([&]() { + if (!RawHashToSequenceRawHashIndex.contains(Chunked.Info.RawHash)) + { + RawHashToSequenceRawHashIndex.insert( + {Chunked.Info.RawHash, gsl::narrow<uint32_t>(OutChunkedContent.ChunkedContent.SequenceRawHashes.size())}); + std::vector<uint64_t> ChunkSizes; + ChunkSizes.reserve(Chunked.ChunkSources.size()); + for (const ChunkSource& Source : Chunked.ChunkSources) + { + ChunkSizes.push_back(Source.Size); + } + AddCunkSequence(Stats, + OutChunkedContent.ChunkedContent, + ChunkHashToChunkIndex, + Chunked.Info.RawHash, + Chunked.Info.ChunkSequence, + Chunked.Info.ChunkHashes, + ChunkSizes); + Stats.UniqueSequencesFound++; + } + }); + Stats.FilesChunked++; + return Chunked.Info.RawHash; + } + else + { + IoBuffer Buffer = IoBufferBuilder::MakeFromFile((FolderPath / Path).make_preferred()); + const IoHash Hash = IoHash::HashBuffer(Buffer, &Stats.BytesHashed); + + Lock.WithExclusiveLock([&]() { + if (!RawHashToSequenceRawHashIndex.contains(Hash)) + { + RawHashToSequenceRawHashIndex.insert( + {Hash, gsl::narrow<uint32_t>(OutChunkedContent.ChunkedContent.SequenceRawHashes.size())}); + AddCunkSequence(Stats, OutChunkedContent.ChunkedContent, ChunkHashToChunkIndex, Hash, RawSize); + Stats.UniqueSequencesFound++; + } + }); + return Hash; + } + } + } + + std::string PathCompareString(const std::filesystem::path& Path) { return ToLower(Path.generic_string()); } + +} // namespace + +std::string_view FolderContentSourcePlatformNames[(size_t)SourcePlatform::_Count] = {"Windows"sv, "Linux"sv, "MacOS"sv}; + +std::string_view +ToString(SourcePlatform Platform) +{ + return FolderContentSourcePlatformNames[(size_t)Platform]; +} + +SourcePlatform +FromString(std::string_view Platform, SourcePlatform Default) +{ + for (size_t Index = 0; Index < (size_t)SourcePlatform::_Count; Index++) + { + if (Platform == FolderContentSourcePlatformNames[Index]) + { + return (SourcePlatform)Index; + } + } + return Default; +} + +SourcePlatform +GetSourceCurrentPlatform() +{ +#if ZEN_PLATFORM_WINDOWS + return SourcePlatform::Windows; +#endif +#if ZEN_PLATFORM_MAC + return SourcePlatform::MacOS; +#endif +#if ZEN_PLATFORM_LINUX + return SourcePlatform::Linux; +#endif +} + +bool +FolderContent::AreFileAttributesEqual(const uint32_t Lhs, const uint32_t Rhs) +{ +#if ZEN_PLATFORM_WINDOWS + return (Lhs & 0xff) == (Rhs & 0xff); +#endif +#if ZEN_PLATFORM_MAC + return Lhs == Rhs; +#endif +#if ZEN_PLATFORM_LINUX + return Lhs == Rhs; +#endif +} + +bool +FolderContent::operator==(const FolderContent& Rhs) const +{ + if ((Platform == Rhs.Platform) && (RawSizes == Rhs.RawSizes) && (Attributes == Rhs.Attributes) && + (ModificationTicks == Rhs.ModificationTicks) && (Paths.size() == Rhs.Paths.size())) + { + size_t PathCount = 0; + for (size_t PathIndex = 0; PathIndex < PathCount; PathIndex++) + { + if (Paths[PathIndex].generic_string() != Rhs.Paths[PathIndex].generic_string()) + { + return false; + } + } + return true; + } + return false; +} + +bool +FolderContent::AreKnownFilesEqual(const FolderContent& Rhs) const +{ + tsl::robin_map<std::string, size_t> RhsPathToIndex; + const size_t RhsPathCount = Rhs.Paths.size(); + RhsPathToIndex.reserve(RhsPathCount); + for (size_t RhsPathIndex = 0; RhsPathIndex < RhsPathCount; RhsPathIndex++) + { + RhsPathToIndex.insert({Rhs.Paths[RhsPathIndex].generic_string(), RhsPathIndex}); + } + const size_t PathCount = Paths.size(); + for (size_t PathIndex = 0; PathIndex < PathCount; PathIndex++) + { + if (auto It = RhsPathToIndex.find(Paths[PathIndex].generic_string()); It != RhsPathToIndex.end()) + { + const size_t RhsPathIndex = It->second; + if ((RawSizes[PathIndex] != Rhs.RawSizes[RhsPathIndex]) || + (!AreFileAttributesEqual(Attributes[PathIndex], Rhs.Attributes[RhsPathIndex])) || + (ModificationTicks[PathIndex] != Rhs.ModificationTicks[RhsPathIndex])) + { + return false; + } + } + else + { + return false; + } + } + return true; +} + +void +FolderContent::UpdateState(const FolderContent& Rhs, std::vector<uint32_t>& OutPathIndexesOufOfDate) +{ + tsl::robin_map<std::string, uint32_t> RhsPathToIndex; + const uint32_t RhsPathCount = gsl::narrow<uint32_t>(Rhs.Paths.size()); + RhsPathToIndex.reserve(RhsPathCount); + for (uint32_t RhsPathIndex = 0; RhsPathIndex < RhsPathCount; RhsPathIndex++) + { + RhsPathToIndex.insert({Rhs.Paths[RhsPathIndex].generic_string(), RhsPathIndex}); + } + uint32_t PathCount = gsl::narrow<uint32_t>(Paths.size()); + for (uint32_t PathIndex = 0; PathIndex < PathCount;) + { + if (auto It = RhsPathToIndex.find(Paths[PathIndex].generic_string()); It != RhsPathToIndex.end()) + { + const uint32_t RhsPathIndex = It->second; + + if ((RawSizes[PathIndex] != Rhs.RawSizes[RhsPathIndex]) || + (ModificationTicks[PathIndex] != Rhs.ModificationTicks[RhsPathIndex])) + { + RawSizes[PathIndex] = Rhs.RawSizes[RhsPathIndex]; + ModificationTicks[PathIndex] = Rhs.ModificationTicks[RhsPathIndex]; + OutPathIndexesOufOfDate.push_back(PathIndex); + } + Attributes[PathIndex] = Rhs.Attributes[RhsPathIndex]; + PathIndex++; + } + else + { + Paths.erase(Paths.begin() + PathIndex); + RawSizes.erase(RawSizes.begin() + PathIndex); + Attributes.erase(Attributes.begin() + PathIndex); + ModificationTicks.erase(ModificationTicks.begin() + PathIndex); + PathCount--; + } + } +} + +FolderContent +GetUpdatedContent(const FolderContent& Old, const FolderContent& New, std::vector<std::filesystem::path>& OutDeletedPathIndexes) +{ + FolderContent Result = {.Platform = Old.Platform}; + tsl::robin_map<std::string, uint32_t> NewPathToIndex; + const uint32_t NewPathCount = gsl::narrow<uint32_t>(New.Paths.size()); + NewPathToIndex.reserve(NewPathCount); + for (uint32_t NewPathIndex = 0; NewPathIndex < NewPathCount; NewPathIndex++) + { + NewPathToIndex.insert({New.Paths[NewPathIndex].generic_string(), NewPathIndex}); + } + uint32_t OldPathCount = gsl::narrow<uint32_t>(Old.Paths.size()); + for (uint32_t OldPathIndex = 0; OldPathIndex < OldPathCount; OldPathIndex++) + { + if (auto It = NewPathToIndex.find(Old.Paths[OldPathIndex].generic_string()); It != NewPathToIndex.end()) + { + const uint32_t NewPathIndex = It->second; + + if ((Old.RawSizes[OldPathIndex] != New.RawSizes[NewPathIndex]) || + (Old.ModificationTicks[OldPathIndex] != New.ModificationTicks[NewPathIndex])) + { + Result.Paths.push_back(New.Paths[NewPathIndex]); + Result.RawSizes.push_back(New.RawSizes[NewPathIndex]); + Result.Attributes.push_back(New.Attributes[NewPathIndex]); + Result.ModificationTicks.push_back(New.ModificationTicks[NewPathIndex]); + } + } + else + { + OutDeletedPathIndexes.push_back(Old.Paths[OldPathIndex]); + } + } + return Result; +} + +void +SaveFolderContentToCompactBinary(const FolderContent& Content, CbWriter& Output) +{ + Output.AddString("platform"sv, ToString(Content.Platform)); + compactbinary_helpers::WriteArray(Content.Paths, "paths"sv, Output); + compactbinary_helpers::WriteArray(Content.RawSizes, "rawSizes"sv, Output); + compactbinary_helpers::WriteArray(Content.Attributes, "attributes"sv, Output); + compactbinary_helpers::WriteArray(Content.ModificationTicks, "modificationTimes"sv, Output); +} + +FolderContent +LoadFolderContentToCompactBinary(CbObjectView Input) +{ + FolderContent Content; + Content.Platform = FromString(Input["platform"sv].AsString(), GetSourceCurrentPlatform()); + compactbinary_helpers::ReadArray("paths"sv, Input, Content.Paths); + compactbinary_helpers::ReadArray("rawSizes"sv, Input, Content.RawSizes); + compactbinary_helpers::ReadArray("attributes"sv, Input, Content.Attributes); + compactbinary_helpers::ReadArray("modificationTimes"sv, Input, Content.ModificationTicks); + return Content; +} + +FolderContent +GetFolderContent(GetFolderContentStatistics& Stats, + const std::filesystem::path& RootPath, + std::function<bool(const std::string_view& RelativePath)>&& AcceptDirectory, + std::function<bool(std::string_view RelativePath, uint64_t Size, uint32_t Attributes)>&& AcceptFile, + WorkerThreadPool& WorkerPool, + int32_t UpdateInteralMS, + std::function<void(bool IsAborted, std::ptrdiff_t PendingWork)>&& UpdateCallback, + std::atomic<bool>& AbortFlag) +{ + Stopwatch Timer; + auto _ = MakeGuard([&Stats, &Timer]() { Stats.ElapsedWallTimeUS = Timer.GetElapsedTimeUs(); }); + + FolderContent Content; + struct AsyncVisitor : public GetDirectoryContentVisitor + { + AsyncVisitor(GetFolderContentStatistics& Stats, + std::atomic<bool>& AbortFlag, + FolderContent& Content, + std::function<bool(const std::string_view& RelativePath)>&& AcceptDirectory, + std::function<bool(std::string_view RelativePath, uint64_t Size, uint32_t Attributes)>&& AcceptFile) + : m_Stats(Stats) + , m_AbortFlag(AbortFlag) + , m_FoundContent(Content) + , m_AcceptDirectory(std::move(AcceptDirectory)) + , m_AcceptFile(std::move(AcceptFile)) + { + } + virtual void AsyncVisitDirectory(const std::filesystem::path& RelativeRoot, DirectoryContent&& Content) override + { + if (!m_AbortFlag) + { + m_Stats.FoundFileCount += Content.FileNames.size(); + for (uint64_t FileSize : Content.FileSizes) + { + m_Stats.FoundFileByteCount += FileSize; + } + std::string RelativeDirectoryPath = RelativeRoot.generic_string(); + if (m_AcceptDirectory(RelativeDirectoryPath)) + { + std::vector<std::filesystem::path> Paths; + std::vector<uint64_t> RawSizes; + std::vector<uint32_t> Attributes; + std::vector<uint64_t> ModificatonTicks; + Paths.reserve(Content.FileNames.size()); + RawSizes.reserve(Content.FileNames.size()); + Attributes.reserve(Content.FileNames.size()); + ModificatonTicks.reserve(Content.FileModificationTicks.size()); + + for (size_t FileIndex = 0; FileIndex < Content.FileNames.size(); FileIndex++) + { + const std::filesystem::path& FileName = Content.FileNames[FileIndex]; + std::string RelativePath = (RelativeRoot / FileName).generic_string(); + std::replace(RelativePath.begin(), RelativePath.end(), '\\', '/'); + if (m_AcceptFile(RelativePath, Content.FileSizes[FileIndex], Content.FileAttributes[FileIndex])) + { + Paths.emplace_back(std::move(RelativePath)); + RawSizes.emplace_back(Content.FileSizes[FileIndex]); + Attributes.emplace_back(Content.FileAttributes[FileIndex]); + ModificatonTicks.emplace_back(Content.FileModificationTicks[FileIndex]); + + m_Stats.AcceptedFileCount++; + m_Stats.AcceptedFileByteCount += Content.FileSizes[FileIndex]; + } + } + m_Lock.WithExclusiveLock([&]() { + m_FoundContent.Paths.insert(m_FoundContent.Paths.end(), Paths.begin(), Paths.end()); + m_FoundContent.RawSizes.insert(m_FoundContent.RawSizes.end(), RawSizes.begin(), RawSizes.end()); + m_FoundContent.Attributes.insert(m_FoundContent.Attributes.end(), Attributes.begin(), Attributes.end()); + m_FoundContent.ModificationTicks.insert(m_FoundContent.ModificationTicks.end(), + ModificatonTicks.begin(), + ModificatonTicks.end()); + }); + } + } + } + + GetFolderContentStatistics& m_Stats; + std::atomic<bool>& m_AbortFlag; + RwLock m_Lock; + FolderContent& m_FoundContent; + std::function<bool(const std::string_view& RelativePath)> m_AcceptDirectory; + std::function<bool(std::string_view RelativePath, uint64_t Size, uint32_t Attributes)> m_AcceptFile; + } Visitor(Stats, AbortFlag, Content, std::move(AcceptDirectory), std::move(AcceptFile)); + + Latch PendingWork(1); + GetDirectoryContent(RootPath, + DirectoryContentFlags::IncludeFiles | DirectoryContentFlags::Recursive | DirectoryContentFlags::IncludeFileSizes | + DirectoryContentFlags::IncludeAttributes | DirectoryContentFlags::IncludeModificationTick, + Visitor, + WorkerPool, + PendingWork); + PendingWork.CountDown(); + while (!PendingWork.Wait(UpdateInteralMS)) + { + UpdateCallback(AbortFlag.load(), PendingWork.Remaining()); + } + std::vector<size_t> Order; + size_t PathCount = Content.Paths.size(); + Order.resize(Content.Paths.size()); + std::vector<std::string> Parents; + Parents.reserve(PathCount); + std::vector<std::string> Filenames; + Filenames.reserve(PathCount); + for (size_t OrderIndex = 0; OrderIndex < PathCount; OrderIndex++) + { + Order[OrderIndex] = OrderIndex; + Parents.emplace_back(Content.Paths[OrderIndex].parent_path().generic_string()); + Filenames.emplace_back(Content.Paths[OrderIndex].filename().generic_string()); + } + std::sort(Order.begin(), Order.end(), [&Parents, &Filenames](size_t Lhs, size_t Rhs) { + const std::string& LhsParent = Parents[Lhs]; + const std::string& RhsParent = Parents[Rhs]; + if (LhsParent < RhsParent) + { + return true; + } + else if (LhsParent > RhsParent) + { + return false; + } + return Filenames[Lhs] < Filenames[Rhs]; + }); + FolderContent OrderedContent; + OrderedContent.Paths.reserve(PathCount); + OrderedContent.RawSizes.reserve(PathCount); + OrderedContent.Attributes.reserve(PathCount); + OrderedContent.ModificationTicks.reserve(PathCount); + for (size_t OrderIndex : Order) + { + OrderedContent.Paths.emplace_back(std::move(Content.Paths[OrderIndex])); + OrderedContent.RawSizes.emplace_back(Content.RawSizes[OrderIndex]); + OrderedContent.Attributes.emplace_back(Content.Attributes[OrderIndex]); + OrderedContent.ModificationTicks.emplace_back(Content.ModificationTicks[OrderIndex]); + } + return OrderedContent; +} + +void +SaveChunkedFolderContentToCompactBinary(const ChunkedFolderContent& Content, CbWriter& Output) +{ + Output.AddString("platform"sv, ToString(Content.Platform)); + compactbinary_helpers::WriteArray(Content.Paths, "paths"sv, Output); + compactbinary_helpers::WriteArray(Content.RawSizes, "rawSizes"sv, Output); + compactbinary_helpers::WriteArray(Content.Attributes, "attributes"sv, Output); + compactbinary_helpers::WriteArray(Content.RawHashes, "rawHashes"sv, Output); + + Output.BeginObject("chunkedContent"); + compactbinary_helpers::WriteArray(Content.ChunkedContent.SequenceRawHashes, "sequenceRawHashes"sv, Output); + compactbinary_helpers::WriteArray(Content.ChunkedContent.ChunkCounts, "chunkCounts"sv, Output); + compactbinary_helpers::WriteArray(Content.ChunkedContent.ChunkOrders, "chunkOrders"sv, Output); + compactbinary_helpers::WriteArray(Content.ChunkedContent.ChunkHashes, "chunkHashes"sv, Output); + compactbinary_helpers::WriteArray(Content.ChunkedContent.ChunkRawSizes, "chunkRawSizes"sv, Output); + Output.EndObject(); // chunkedContent +} + +ChunkedFolderContent +LoadChunkedFolderContentToCompactBinary(CbObjectView Input) +{ + ChunkedFolderContent Content; + Content.Platform = FromString(Input["platform"sv].AsString(), GetSourceCurrentPlatform()); + compactbinary_helpers::ReadArray("paths"sv, Input, Content.Paths); + compactbinary_helpers::ReadArray("rawSizes"sv, Input, Content.RawSizes); + compactbinary_helpers::ReadArray("attributes"sv, Input, Content.Attributes); + compactbinary_helpers::ReadArray("rawHashes"sv, Input, Content.RawHashes); + + CbObjectView ChunkedContentView = Input["chunkedContent"sv].AsObjectView(); + compactbinary_helpers::ReadArray("sequenceRawHashes"sv, ChunkedContentView, Content.ChunkedContent.SequenceRawHashes); + compactbinary_helpers::ReadArray("chunkCounts"sv, ChunkedContentView, Content.ChunkedContent.ChunkCounts); + compactbinary_helpers::ReadArray("chunkOrders"sv, ChunkedContentView, Content.ChunkedContent.ChunkOrders); + compactbinary_helpers::ReadArray("chunkHashes"sv, ChunkedContentView, Content.ChunkedContent.ChunkHashes); + compactbinary_helpers::ReadArray("chunkRawSizes"sv, ChunkedContentView, Content.ChunkedContent.ChunkRawSizes); + return Content; +} + +ChunkedFolderContent +MergeChunkedFolderContents(const ChunkedFolderContent& Base, std::span<const ChunkedFolderContent> Overlays) +{ + ZEN_ASSERT(!Overlays.empty()); + + ChunkedFolderContent Result; + const size_t BasePathCount = Base.Paths.size(); + Result.Paths.reserve(BasePathCount); + Result.RawSizes.reserve(BasePathCount); + Result.Attributes.reserve(BasePathCount); + Result.RawHashes.reserve(BasePathCount); + + const size_t BaseChunkCount = Base.ChunkedContent.ChunkHashes.size(); + Result.ChunkedContent.SequenceRawHashes.reserve(Base.ChunkedContent.SequenceRawHashes.size()); + Result.ChunkedContent.ChunkCounts.reserve(BaseChunkCount); + Result.ChunkedContent.ChunkHashes.reserve(BaseChunkCount); + Result.ChunkedContent.ChunkRawSizes.reserve(BaseChunkCount); + Result.ChunkedContent.ChunkOrders.reserve(Base.ChunkedContent.ChunkOrders.size()); + + tsl::robin_map<std::string, std::filesystem::path> GenericPathToActualPath; + for (const std::filesystem::path& Path : Base.Paths) + { + GenericPathToActualPath.insert({PathCompareString(Path), Path}); + } + for (const ChunkedFolderContent& Overlay : Overlays) + { + for (const std::filesystem::path& Path : Overlay.Paths) + { + GenericPathToActualPath.insert({PathCompareString(Path), Path}); + } + } + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> RawHashToSequenceRawHashIndex; + + auto BuildOverlayPaths = [](std::span<const ChunkedFolderContent> Overlays) -> tsl::robin_set<std::string> { + tsl::robin_set<std::string> Result; + for (const ChunkedFolderContent& OverlayContent : Overlays) + { + for (const std::filesystem::path& Path : OverlayContent.Paths) + { + Result.insert(PathCompareString(Path)); + } + } + return Result; + }; + + auto AddContent = [&BuildOverlayPaths](ChunkedFolderContent& Result, + const ChunkedFolderContent& OverlayContent, + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& ChunkHashToChunkIndex, + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher>& RawHashToSequenceRawHashIndex, + const tsl::robin_map<std::string, std::filesystem::path>& GenericPathToActualPath, + std::span<const ChunkedFolderContent> Overlays) { + const ChunkedContentLookup OverlayLookup = BuildChunkedContentLookup(OverlayContent); + tsl::robin_set<std::string> BaseOverlayPaths = BuildOverlayPaths(Overlays); + for (uint32_t PathIndex = 0; PathIndex < OverlayContent.Paths.size(); PathIndex++) + { + std::string GenericPath = PathCompareString(OverlayContent.Paths[PathIndex]); + if (!BaseOverlayPaths.contains(GenericPath)) + { + // This asset will not be overridden by a later layer - add it + + const std::filesystem::path OriginalPath = GenericPathToActualPath.at(GenericPath); + Result.Paths.push_back(OriginalPath); + const IoHash& RawHash = OverlayContent.RawHashes[PathIndex]; + Result.RawSizes.push_back(OverlayContent.RawSizes[PathIndex]); + Result.Attributes.push_back(OverlayContent.Attributes[PathIndex]); + Result.RawHashes.push_back(RawHash); + + if (OverlayContent.RawSizes[PathIndex] > 0) + { + if (!RawHashToSequenceRawHashIndex.contains(RawHash)) + { + RawHashToSequenceRawHashIndex.insert( + {RawHash, gsl::narrow<uint32_t>(Result.ChunkedContent.SequenceRawHashes.size())}); + const uint32_t SequenceRawHashIndex = OverlayLookup.RawHashToSequenceRawHashIndex.at(RawHash); + const uint32_t OrderIndexOffset = OverlayLookup.SequenceRawHashIndexChunkOrderOffset[SequenceRawHashIndex]; + const uint32_t ChunkCount = OverlayContent.ChunkedContent.ChunkCounts[SequenceRawHashIndex]; + ChunkingStatistics Stats; + std::span<const uint32_t> OriginalChunkOrder = + std::span<const uint32_t>(OverlayContent.ChunkedContent.ChunkOrders).subspan(OrderIndexOffset, ChunkCount); + AddCunkSequence(Stats, + Result.ChunkedContent, + ChunkHashToChunkIndex, + RawHash, + OriginalChunkOrder, + OverlayContent.ChunkedContent.ChunkHashes, + OverlayContent.ChunkedContent.ChunkRawSizes); + Stats.UniqueSequencesFound++; + } + } + } + } + }; + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> MergedChunkHashToChunkIndex; + AddContent(Result, Base, MergedChunkHashToChunkIndex, RawHashToSequenceRawHashIndex, GenericPathToActualPath, Overlays); + for (uint32_t OverlayIndex = 0; OverlayIndex < Overlays.size(); OverlayIndex++) + { + AddContent(Result, + Overlays[OverlayIndex], + MergedChunkHashToChunkIndex, + RawHashToSequenceRawHashIndex, + GenericPathToActualPath, + Overlays.subspan(OverlayIndex + 1)); + } + return Result; +} + +ChunkedFolderContent +DeletePathsFromChunkedContent(const ChunkedFolderContent& BaseContent, std::span<const std::filesystem::path> DeletedPaths) +{ + ZEN_ASSERT(DeletedPaths.size() <= BaseContent.Paths.size()); + ChunkedFolderContent Result = {.Platform = BaseContent.Platform}; + if (DeletedPaths.size() < BaseContent.Paths.size()) + { + tsl::robin_set<std::string> DeletedPathSet; + DeletedPathSet.reserve(DeletedPaths.size()); + for (const std::filesystem::path& DeletedPath : DeletedPaths) + { + DeletedPathSet.insert(PathCompareString(DeletedPath)); + } + const ChunkedContentLookup BaseLookup = BuildChunkedContentLookup(BaseContent); + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> ChunkHashToChunkIndex; + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> RawHashToSequenceRawHashIndex; + for (uint32_t PathIndex = 0; PathIndex < BaseContent.Paths.size(); PathIndex++) + { + const std::filesystem::path& Path = BaseContent.Paths[PathIndex]; + if (!DeletedPathSet.contains(PathCompareString(Path))) + { + const IoHash& RawHash = BaseContent.RawHashes[PathIndex]; + const uint64_t RawSize = BaseContent.RawSizes[PathIndex]; + Result.Paths.push_back(Path); + Result.RawSizes.push_back(RawSize); + Result.Attributes.push_back(BaseContent.Attributes[PathIndex]); + Result.RawHashes.push_back(RawHash); + if (RawSize > 0) + { + if (!RawHashToSequenceRawHashIndex.contains(RawHash)) + { + RawHashToSequenceRawHashIndex.insert( + {RawHash, gsl::narrow<uint32_t>(Result.ChunkedContent.SequenceRawHashes.size())}); + const uint32_t SequenceRawHashIndex = BaseLookup.RawHashToSequenceRawHashIndex.at(RawHash); + const uint32_t OrderIndexOffset = BaseLookup.SequenceRawHashIndexChunkOrderOffset[SequenceRawHashIndex]; + const uint32_t ChunkCount = BaseContent.ChunkedContent.ChunkCounts[SequenceRawHashIndex]; + ChunkingStatistics Stats; + std::span<const uint32_t> OriginalChunkOrder = + std::span<const uint32_t>(BaseContent.ChunkedContent.ChunkOrders).subspan(OrderIndexOffset, ChunkCount); + AddCunkSequence(Stats, + Result.ChunkedContent, + ChunkHashToChunkIndex, + RawHash, + OriginalChunkOrder, + BaseContent.ChunkedContent.ChunkHashes, + BaseContent.ChunkedContent.ChunkRawSizes); + Stats.UniqueSequencesFound++; + } + } + } + } + } + return Result; +} + +ChunkedFolderContent +ChunkFolderContent(ChunkingStatistics& Stats, + WorkerThreadPool& WorkerPool, + const std::filesystem::path& RootPath, + const FolderContent& Content, + const ChunkingController& InChunkingController, + int32_t UpdateInteralMS, + std::function<void(bool IsAborted, std::ptrdiff_t PendingWork)>&& UpdateCallback, + std::atomic<bool>& AbortFlag) +{ + Stopwatch Timer; + auto _ = MakeGuard([&Stats, &Timer]() { Stats.ElapsedWallTimeUS = Timer.GetElapsedTimeUs(); }); + + ChunkedFolderContent Result = {.Platform = Content.Platform, + .Paths = Content.Paths, + .RawSizes = Content.RawSizes, + .Attributes = Content.Attributes}; + const size_t ItemCount = Result.Paths.size(); + Result.RawHashes.resize(ItemCount, IoHash::Zero); + Result.ChunkedContent.SequenceRawHashes.reserve(ItemCount); // Up to 1 per file, maybe less + Result.ChunkedContent.ChunkCounts.reserve(ItemCount); // Up to one per file + Result.ChunkedContent.ChunkOrders.reserve(ItemCount); // At least 1 per file, maybe more + Result.ChunkedContent.ChunkHashes.reserve(ItemCount); // At least 1 per file, maybe more + Result.ChunkedContent.ChunkRawSizes.reserve(ItemCount); // At least 1 per file, maybe more + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> ChunkHashToChunkIndex; + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> RawHashToChunkSequenceIndex; + RawHashToChunkSequenceIndex.reserve(ItemCount); + ChunkHashToChunkIndex.reserve(ItemCount); + { + std::vector<uint32_t> Order; + Order.resize(ItemCount); + for (uint32_t I = 0; I < ItemCount; I++) + { + Order[I] = I; + } + + // Handle the biggest files first so we don't end up with one straggling large file at the end + // std::sort(Order.begin(), Order.end(), [&](uint32_t Lhs, uint32_t Rhs) { return Result.RawSizes[Lhs] > Result.RawSizes[Rhs]; + //}); + + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> RawHashToSequenceRawHashIndex; + RawHashToSequenceRawHashIndex.reserve(ItemCount); + + RwLock Lock; + + ParallellWork Work(AbortFlag); + + for (uint32_t PathIndex : Order) + { + if (Work.IsAborted()) + { + break; + } + Work.ScheduleWork( + WorkerPool, // GetSyncWorkerPool() + [&, PathIndex](std::atomic<bool>& AbortFlag) { + if (!AbortFlag) + { + IoHash RawHash = HashOneFile(Stats, + InChunkingController, + Result, + ChunkHashToChunkIndex, + RawHashToSequenceRawHashIndex, + Lock, + RootPath, + PathIndex); + Lock.WithExclusiveLock([&]() { Result.RawHashes[PathIndex] = RawHash; }); + Stats.FilesProcessed++; + } + }, + [&, PathIndex](const std::exception& Ex, std::atomic<bool>& AbortFlag) { + ZEN_CONSOLE("Failed scanning file {}. Reason: {}", Result.Paths[PathIndex], Ex.what()); + AbortFlag = true; + }); + } + + Work.Wait(UpdateInteralMS, [&](bool IsAborted, std::ptrdiff_t PendingWork) { + ZEN_UNUSED(IsAborted); + ZEN_UNUSED(PendingWork); + UpdateCallback(Work.IsAborted(), Work.PendingWork().Remaining()); + }); + } + return Result; +} + +ChunkedContentLookup +BuildChunkedContentLookup(const ChunkedFolderContent& Content) +{ + struct ChunkLocationReference + { + uint32_t ChunkIndex; + ChunkedContentLookup::ChunkLocation Location; + }; + + ChunkedContentLookup Result; + { + const uint32_t SequenceRawHashesCount = gsl::narrow<uint32_t>(Content.ChunkedContent.SequenceRawHashes.size()); + Result.RawHashToSequenceRawHashIndex.reserve(SequenceRawHashesCount); + Result.SequenceRawHashIndexChunkOrderOffset.reserve(SequenceRawHashesCount); + uint32_t OrderOffset = 0; + for (uint32_t SequenceRawHashIndex = 0; SequenceRawHashIndex < Content.ChunkedContent.SequenceRawHashes.size(); + SequenceRawHashIndex++) + { + Result.RawHashToSequenceRawHashIndex.insert( + {Content.ChunkedContent.SequenceRawHashes[SequenceRawHashIndex], SequenceRawHashIndex}); + Result.SequenceRawHashIndexChunkOrderOffset.push_back(OrderOffset); + OrderOffset += Content.ChunkedContent.ChunkCounts[SequenceRawHashIndex]; + } + } + + std::vector<ChunkLocationReference> Locations; + Locations.reserve(Content.ChunkedContent.ChunkOrders.size()); + for (uint32_t PathIndex = 0; PathIndex < Content.Paths.size(); PathIndex++) + { + if (Content.RawSizes[PathIndex] > 0) + { + const IoHash& RawHash = Content.RawHashes[PathIndex]; + uint32_t SequenceRawHashIndex = Result.RawHashToSequenceRawHashIndex.at(RawHash); + const uint32_t OrderOffset = Result.SequenceRawHashIndexChunkOrderOffset[SequenceRawHashIndex]; + const uint32_t ChunkCount = Content.ChunkedContent.ChunkCounts[SequenceRawHashIndex]; + uint64_t LocationOffset = 0; + for (size_t OrderIndex = OrderOffset; OrderIndex < OrderOffset + ChunkCount; OrderIndex++) + { + uint32_t ChunkIndex = Content.ChunkedContent.ChunkOrders[OrderIndex]; + + Locations.push_back(ChunkLocationReference{ChunkIndex, ChunkedContentLookup::ChunkLocation{PathIndex, LocationOffset}}); + + LocationOffset += Content.ChunkedContent.ChunkRawSizes[ChunkIndex]; + } + ZEN_ASSERT(LocationOffset == Content.RawSizes[PathIndex]); + } + } + + std::sort(Locations.begin(), Locations.end(), [](const ChunkLocationReference& Lhs, const ChunkLocationReference& Rhs) { + if (Lhs.ChunkIndex < Rhs.ChunkIndex) + { + return true; + } + if (Lhs.ChunkIndex > Rhs.ChunkIndex) + { + return false; + } + if (Lhs.Location.PathIndex < Rhs.Location.PathIndex) + { + return true; + } + if (Lhs.Location.PathIndex > Rhs.Location.PathIndex) + { + return false; + } + return Lhs.Location.Offset < Rhs.Location.Offset; + }); + + Result.ChunkLocations.reserve(Locations.size()); + const uint32_t ChunkCount = gsl::narrow<uint32_t>(Content.ChunkedContent.ChunkHashes.size()); + Result.ChunkHashToChunkIndex.reserve(ChunkCount); + size_t RangeOffset = 0; + for (uint32_t ChunkIndex = 0; ChunkIndex < ChunkCount; ChunkIndex++) + { + Result.ChunkHashToChunkIndex.insert({Content.ChunkedContent.ChunkHashes[ChunkIndex], ChunkIndex}); + uint32_t Count = 0; + while (Locations[RangeOffset + Count].ChunkIndex == ChunkIndex) + { + Result.ChunkLocations.push_back(Locations[RangeOffset + Count].Location); + Count++; + } + Result.ChunkLocationOffset.push_back(RangeOffset); + Result.ChunkLocationCounts.push_back(Count); + RangeOffset += Count; + } + + return Result; +} + +} // namespace zen diff --git a/src/zenutil/chunkingcontroller.cpp b/src/zenutil/chunkingcontroller.cpp new file mode 100644 index 000000000..bc0e57b14 --- /dev/null +++ b/src/zenutil/chunkingcontroller.cpp @@ -0,0 +1,265 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#include <zenutil/chunkingcontroller.h> + +#include <zencore/basicfile.h> +#include <zencore/compactbinarybuilder.h> + +ZEN_THIRD_PARTY_INCLUDES_START +#include <tsl/robin_map.h> +ZEN_THIRD_PARTY_INCLUDES_END + +namespace zen { +using namespace std::literals; + +namespace { + std::vector<std::string> ReadStringArray(CbArrayView StringArray) + { + std::vector<std::string> Result; + Result.reserve(StringArray.Num()); + for (CbFieldView FieldView : StringArray) + { + Result.emplace_back(FieldView.AsString()); + } + return Result; + } + + ChunkedParams ReadChunkParams(CbObjectView Params) + { + bool UseThreshold = Params["UseThreshold"sv].AsBool(true); + size_t MinSize = Params["MinSize"sv].AsUInt64(DefaultChunkedParams.MinSize); + size_t MaxSize = Params["MaxSize"sv].AsUInt64(DefaultChunkedParams.MaxSize); + size_t AvgSize = Params["AvgSize"sv].AsUInt64(DefaultChunkedParams.AvgSize); + + return ChunkedParams{.UseThreshold = UseThreshold, .MinSize = MinSize, .MaxSize = MaxSize, .AvgSize = AvgSize}; + } + +} // namespace + +class BasicChunkingController : public ChunkingController +{ +public: + BasicChunkingController(std::span<const std::string_view> ExcludeExtensions, + uint64_t ChunkFileSizeLimit, + const ChunkedParams& ChunkingParams) + : m_ChunkExcludeExtensions(ExcludeExtensions.begin(), ExcludeExtensions.end()) + , m_ChunkFileSizeLimit(ChunkFileSizeLimit) + , m_ChunkingParams(ChunkingParams) + { + } + + BasicChunkingController(CbObjectView Parameters) + : m_ChunkExcludeExtensions(ReadStringArray(Parameters["ChunkExcludeExtensions"sv].AsArrayView())) + , m_ChunkFileSizeLimit(Parameters["ChunkFileSizeLimit"sv].AsUInt64(DefaultChunkingFileSizeLimit)) + , m_ChunkingParams(ReadChunkParams(Parameters["ChunkingParams"sv].AsObjectView())) + { + } + + virtual bool ProcessFile(const std::filesystem::path& InputPath, + uint64_t RawSize, + ChunkedInfoWithSource& OutChunked, + std::atomic<uint64_t>& BytesProcessed) const override + { + const bool ExcludeFromChunking = + std::find(m_ChunkExcludeExtensions.begin(), m_ChunkExcludeExtensions.end(), InputPath.extension()) != + m_ChunkExcludeExtensions.end(); + + if (ExcludeFromChunking || (RawSize < m_ChunkFileSizeLimit)) + { + return false; + } + + BasicFile Buffer(InputPath, BasicFile::Mode::kRead); + OutChunked = ChunkData(Buffer, 0, RawSize, m_ChunkingParams, &BytesProcessed); + return true; + } + + virtual std::string_view GetName() const override { return Name; } + + virtual CbObject GetParameters() const override + { + CbObjectWriter Writer; + Writer.BeginArray("ChunkExcludeExtensions"sv); + { + for (const std::string& Extension : m_ChunkExcludeExtensions) + { + Writer.AddString(Extension); + } + } + Writer.EndArray(); // ChunkExcludeExtensions + Writer.AddInteger("ChunkFileSizeLimit"sv, m_ChunkFileSizeLimit); + Writer.BeginObject("ChunkingParams"sv); + { + Writer.AddBool("UseThreshold"sv, m_ChunkingParams.UseThreshold); + + Writer.AddInteger("MinSize"sv, (uint64_t)m_ChunkingParams.MinSize); + Writer.AddInteger("MaxSize"sv, (uint64_t)m_ChunkingParams.MaxSize); + Writer.AddInteger("AvgSize"sv, (uint64_t)m_ChunkingParams.AvgSize); + } + Writer.EndObject(); // ChunkingParams + return Writer.Save(); + } + static constexpr std::string_view Name = "BasicChunkingController"sv; + +protected: + const std::vector<std::string> m_ChunkExcludeExtensions; + const uint64_t m_ChunkFileSizeLimit; + const ChunkedParams m_ChunkingParams; +}; + +class ChunkingControllerWithFixedChunking : public ChunkingController +{ +public: + ChunkingControllerWithFixedChunking(std::span<const std::string_view> FixedChunkingExtensions, + uint64_t ChunkFileSizeLimit, + const ChunkedParams& ChunkingParams, + uint32_t FixedChunkingChunkSize) + : m_FixedChunkingExtensions(FixedChunkingExtensions.begin(), FixedChunkingExtensions.end()) + , m_ChunkFileSizeLimit(ChunkFileSizeLimit) + , m_ChunkingParams(ChunkingParams) + , m_FixedChunkingChunkSize(FixedChunkingChunkSize) + { + } + + ChunkingControllerWithFixedChunking(CbObjectView Parameters) + : m_FixedChunkingExtensions(ReadStringArray(Parameters["FixedChunkingExtensions"sv].AsArrayView())) + , m_ChunkFileSizeLimit(Parameters["ChunkFileSizeLimit"sv].AsUInt64(DefaultChunkingFileSizeLimit)) + , m_ChunkingParams(ReadChunkParams(Parameters["ChunkingParams"sv].AsObjectView())) + , m_FixedChunkingChunkSize(Parameters["FixedChunkingChunkSize"sv].AsUInt32(16u * 1024u * 1024u)) + { + } + + virtual bool ProcessFile(const std::filesystem::path& InputPath, + uint64_t RawSize, + ChunkedInfoWithSource& OutChunked, + std::atomic<uint64_t>& BytesProcessed) const override + { + if (RawSize < m_ChunkFileSizeLimit) + { + return false; + } + const bool FixedChunking = std::find(m_FixedChunkingExtensions.begin(), m_FixedChunkingExtensions.end(), InputPath.extension()) != + m_FixedChunkingExtensions.end(); + + if (FixedChunking) + { + IoHashStream FullHash; + IoBuffer Source = IoBufferBuilder::MakeFromFile(InputPath); + uint64_t Offset = 0; + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> ChunkHashToChunkIndex; + ChunkHashToChunkIndex.reserve(1 + (RawSize / m_FixedChunkingChunkSize)); + while (Offset < RawSize) + { + uint64_t ChunkSize = std::min<uint64_t>(RawSize - Offset, m_FixedChunkingChunkSize); + IoBuffer Chunk(Source, Offset, ChunkSize); + MemoryView ChunkData = Chunk.GetView(); + FullHash.Append(ChunkData); + + IoHash ChunkHash = IoHash::HashBuffer(ChunkData); + if (auto It = ChunkHashToChunkIndex.find(ChunkHash); It != ChunkHashToChunkIndex.end()) + { + OutChunked.Info.ChunkSequence.push_back(It->second); + } + else + { + uint32_t ChunkIndex = gsl::narrow<uint32_t>(OutChunked.Info.ChunkHashes.size()); + OutChunked.Info.ChunkHashes.push_back(ChunkHash); + OutChunked.Info.ChunkSequence.push_back(ChunkIndex); + OutChunked.ChunkSources.push_back({.Offset = Offset, .Size = gsl::narrow<uint32_t>(ChunkSize)}); + } + Offset += ChunkSize; + BytesProcessed.fetch_add(ChunkSize); + } + OutChunked.Info.RawSize = RawSize; + OutChunked.Info.RawHash = FullHash.GetHash(); + return true; + } + else + { + BasicFile Buffer(InputPath, BasicFile::Mode::kRead); + OutChunked = ChunkData(Buffer, 0, RawSize, m_ChunkingParams, &BytesProcessed); + return true; + } + } + + virtual std::string_view GetName() const override { return Name; } + + virtual CbObject GetParameters() const override + { + CbObjectWriter Writer; + Writer.BeginArray("FixedChunkingExtensions"); + { + for (const std::string& Extension : m_FixedChunkingExtensions) + { + Writer.AddString(Extension); + } + } + Writer.EndArray(); // ChunkExcludeExtensions + Writer.AddInteger("ChunkFileSizeLimit"sv, m_ChunkFileSizeLimit); + Writer.BeginObject("ChunkingParams"sv); + { + Writer.AddBool("UseThreshold"sv, m_ChunkingParams.UseThreshold); + + Writer.AddInteger("MinSize"sv, (uint64_t)m_ChunkingParams.MinSize); + Writer.AddInteger("MaxSize"sv, (uint64_t)m_ChunkingParams.MaxSize); + Writer.AddInteger("AvgSize"sv, (uint64_t)m_ChunkingParams.AvgSize); + } + Writer.EndObject(); // ChunkingParams + Writer.AddInteger("FixedChunkingChunkSize"sv, m_FixedChunkingChunkSize); + return Writer.Save(); + } + + static constexpr std::string_view Name = "ChunkingControllerWithFixedChunking"sv; + +protected: + const std::vector<std::string> m_FixedChunkingExtensions; + const uint64_t m_ChunkFileSizeLimit; + const ChunkedParams m_ChunkingParams; + const uint32_t m_FixedChunkingChunkSize; +}; + +std::unique_ptr<ChunkingController> +CreateBasicChunkingController(std::span<const std::string_view> ExcludeExtensions, + uint64_t ChunkFileSizeLimit, + const ChunkedParams& ChunkingParams) +{ + return std::make_unique<BasicChunkingController>(ExcludeExtensions, ChunkFileSizeLimit, ChunkingParams); +} +std::unique_ptr<ChunkingController> +CreateBasicChunkingController(CbObjectView Parameters) +{ + return std::make_unique<BasicChunkingController>(Parameters); +} + +std::unique_ptr<ChunkingController> +CreateChunkingControllerWithFixedChunking(std::span<const std::string_view> FixedChunkingExtensions, + uint64_t ChunkFileSizeLimit, + const ChunkedParams& ChunkingParams, + uint32_t FixedChunkingChunkSize) +{ + return std::make_unique<ChunkingControllerWithFixedChunking>(FixedChunkingExtensions, + ChunkFileSizeLimit, + ChunkingParams, + FixedChunkingChunkSize); +} +std::unique_ptr<ChunkingController> +CreateChunkingControllerWithFixedChunking(CbObjectView Parameters) +{ + return std::make_unique<ChunkingControllerWithFixedChunking>(Parameters); +} + +std::unique_ptr<ChunkingController> +CreateChunkingController(std::string_view Name, CbObjectView Parameters) +{ + if (Name == BasicChunkingController::Name) + { + return CreateBasicChunkingController(Parameters); + } + else if (Name == ChunkingControllerWithFixedChunking::Name) + { + return CreateChunkingControllerWithFixedChunking(Parameters); + } + return {}; +} + +} // namespace zen diff --git a/src/zenutil/filebuildstorage.cpp b/src/zenutil/filebuildstorage.cpp new file mode 100644 index 000000000..78ebcdd55 --- /dev/null +++ b/src/zenutil/filebuildstorage.cpp @@ -0,0 +1,616 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#include <zenutil/filebuildstorage.h> + +#include <zencore/basicfile.h> +#include <zencore/compactbinarybuilder.h> +#include <zencore/compactbinaryvalidation.h> +#include <zencore/fmtutils.h> +#include <zencore/scopeguard.h> +#include <zencore/timer.h> + +namespace zen { + +using namespace std::literals; + +class FileBuildStorage : public BuildStorage +{ +public: + explicit FileBuildStorage(const std::filesystem::path& StoragePath, + BuildStorage::Statistics& Stats, + bool EnableJsonOutput, + double LatencySec, + double DelayPerKBSec) + : m_StoragePath(StoragePath) + , m_Stats(Stats) + , m_EnableJsonOutput(EnableJsonOutput) + , m_LatencySec(LatencySec) + , m_DelayPerKBSec(DelayPerKBSec) + { + CreateDirectories(GetBuildsFolder()); + CreateDirectories(GetBlobsFolder()); + CreateDirectories(GetBlobsMetadataFolder()); + } + + virtual ~FileBuildStorage() {} + + virtual CbObject ListBuilds(CbObject Query) override + { + ZEN_UNUSED(Query); + + SimulateLatency(Query.GetSize(), 0); + + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BuildFolder = GetBuildsFolder(); + DirectoryContent Content; + GetDirectoryContent(BuildFolder, DirectoryContentFlags::IncludeDirs, Content); + CbObjectWriter Writer; + Writer.BeginArray("results"); + { + for (const std::filesystem::path& BuildPath : Content.Directories) + { + Oid BuildId = Oid::TryFromHexString(BuildPath.stem().string()); + if (BuildId != Oid::Zero) + { + Writer.BeginObject(); + { + Writer.AddObjectId("buildId", BuildId); + Writer.AddObject("metadata", ReadBuild(BuildId)["metadata"sv].AsObjectView()); + } + Writer.EndObject(); + } + } + } + Writer.EndArray(); // builds + Writer.Save(); + SimulateLatency(Writer.GetSaveSize(), 0); + return Writer.Save(); + } + + virtual CbObject PutBuild(const Oid& BuildId, const CbObject& MetaData) override + { + SimulateLatency(MetaData.GetSize(), 0); + + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + CbObjectWriter BuildObject; + BuildObject.AddObject("metadata", MetaData); + BuildObject.AddInteger("chunkSize"sv, 32u * 1024u * 1024u); + WriteBuild(BuildId, BuildObject.Save()); + + CbObjectWriter BuildResponse; + BuildResponse.AddInteger("chunkSize"sv, 32u * 1024u * 1024u); + BuildResponse.Save(); + + SimulateLatency(0, BuildResponse.GetSaveSize()); + return BuildResponse.Save(); + } + + virtual CbObject GetBuild(const Oid& BuildId) override + { + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + CbObject Build = ReadBuild(BuildId); + SimulateLatency(0, Build.GetSize()); + return Build; + } + + virtual void FinalizeBuild(const Oid& BuildId) override + { + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + ZEN_UNUSED(BuildId); + SimulateLatency(0, 0); + } + + virtual std::pair<IoHash, std::vector<IoHash>> PutBuildPart(const Oid& BuildId, + const Oid& BuildPartId, + std::string_view PartName, + const CbObject& MetaData) override + { + SimulateLatency(MetaData.GetSize(), 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BuildPartDataPath = GetBuildPartPath(BuildId, BuildPartId); + CreateDirectories(BuildPartDataPath.parent_path()); + + TemporaryFile::SafeWriteFile(BuildPartDataPath, MetaData.GetView()); + m_WrittenBytes += MetaData.GetSize(); + WriteAsJson(BuildPartDataPath, MetaData); + + IoHash RawHash = IoHash::HashBuffer(MetaData.GetView()); + + CbObjectWriter Writer; + { + CbObject BuildObject = ReadBuild(BuildId); + CbObjectView PartsObject = BuildObject["parts"sv].AsObjectView(); + CbObjectView MetaDataView = BuildObject["metadata"sv].AsObjectView(); + + Writer.AddObject("metadata"sv, MetaDataView); + Writer.BeginObject("parts"sv); + { + for (CbFieldView PartView : PartsObject) + { + if (PartView.GetName() != PartName) + { + Writer.AddObjectId(PartView.GetName(), PartView.AsObjectId()); + } + } + Writer.AddObjectId(PartName, BuildPartId); + } + Writer.EndObject(); // parts + } + WriteBuild(BuildId, Writer.Save()); + + std::vector<IoHash> NeededAttachments = GetNeededAttachments(MetaData); + + SimulateLatency(0, sizeof(IoHash) * NeededAttachments.size()); + + return std::make_pair(RawHash, std::move(NeededAttachments)); + } + + virtual CbObject GetBuildPart(const Oid& BuildId, const Oid& BuildPartId) override + { + SimulateLatency(0, 0); + + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BuildPartDataPath = GetBuildPartPath(BuildId, BuildPartId); + + IoBuffer Payload = ReadFile(BuildPartDataPath).Flatten(); + m_Stats.TotalBytesRead += Payload.GetSize(); + + ZEN_ASSERT(ValidateCompactBinary(Payload.GetView(), CbValidateMode::Default) == CbValidateError::None); + + CbObject BuildPartObject = CbObject(SharedBuffer(Payload)); + + SimulateLatency(0, BuildPartObject.GetSize()); + + return BuildPartObject; + } + + virtual std::vector<IoHash> FinalizeBuildPart(const Oid& BuildId, const Oid& BuildPartId, const IoHash& PartHash) override + { + SimulateLatency(0, 0); + + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BuildPartDataPath = GetBuildPartPath(BuildId, BuildPartId); + IoBuffer Payload = ReadFile(BuildPartDataPath).Flatten(); + m_Stats.TotalBytesRead += Payload.GetSize(); + IoHash RawHash = IoHash::HashBuffer(Payload.GetView()); + if (RawHash != PartHash) + { + throw std::runtime_error( + fmt::format("Failed finalizing build part {}: Expected hash {}, got {}", BuildPartId, PartHash, RawHash)); + } + + CbObject BuildPartObject = CbObject(SharedBuffer(Payload)); + std::vector<IoHash> NeededAttachments(GetNeededAttachments(BuildPartObject)); + + SimulateLatency(0, NeededAttachments.size() * sizeof(IoHash)); + + return NeededAttachments; + } + + virtual void PutBuildBlob(const Oid& BuildId, + const IoHash& RawHash, + ZenContentType ContentType, + const CompositeBuffer& Payload) override + { + ZEN_UNUSED(BuildId); + ZEN_ASSERT(ContentType == ZenContentType::kCompressedBinary); + SimulateLatency(Payload.GetSize(), 0); + + ZEN_ASSERT_SLOW(ValidateCompressedBuffer(RawHash, Payload)); + + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BlockPath = GetBlobPayloadPath(RawHash); + if (!std::filesystem::is_regular_file(BlockPath)) + { + CreateDirectories(BlockPath.parent_path()); + TemporaryFile::SafeWriteFile(BlockPath, Payload.Flatten().GetView()); + } + m_Stats.TotalBytesWritten += Payload.GetSize(); + SimulateLatency(0, 0); + } + + virtual std::vector<std::function<void()>> PutLargeBuildBlob(const Oid& BuildId, + const IoHash& RawHash, + ZenContentType ContentType, + uint64_t PayloadSize, + std::function<IoBuffer(uint64_t Offset, uint64_t Size)>&& Transmitter, + std::function<void(uint64_t, bool)>&& OnSentBytes) override + { + ZEN_UNUSED(BuildId); + ZEN_UNUSED(ContentType); + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BlockPath = GetBlobPayloadPath(RawHash); + if (!std::filesystem::is_regular_file(BlockPath)) + { + CreateDirectories(BlockPath.parent_path()); + + struct WorkloadData + { + std::function<IoBuffer(uint64_t Offset, uint64_t Size)> Transmitter; + std::function<void(uint64_t, bool)> OnSentBytes; + TemporaryFile TempFile; + std::atomic<size_t> PartsLeft; + }; + + std::shared_ptr<WorkloadData> Workload(std::make_shared<WorkloadData>()); + Workload->Transmitter = std::move(Transmitter); + Workload->OnSentBytes = std::move(OnSentBytes); + std::error_code Ec; + Workload->TempFile.CreateTemporary(BlockPath.parent_path(), Ec); + + if (Ec) + { + throw std::runtime_error( + fmt::format("Failed opening temporary file '{}': {} ({})", Workload->TempFile.GetPath(), Ec.message(), Ec.value())); + } + + std::vector<std::function<void()>> WorkItems; + uint64_t Offset = 0; + while (Offset < PayloadSize) + { + uint64_t Size = Min(32u * 1024u * 1024u, PayloadSize - Offset); + + WorkItems.push_back([this, RawHash, BlockPath, Workload, Offset, Size]() { + IoBuffer PartPayload = Workload->Transmitter(Offset, Size); + SimulateLatency(PartPayload.GetSize(), 0); + + std::error_code Ec; + Workload->TempFile.Write(PartPayload, Offset, Ec); + if (Ec) + { + throw std::runtime_error(fmt::format("Failed writing to temporary file '{}': {} ({})", + Workload->TempFile.GetPath(), + Ec.message(), + Ec.value())); + } + uint64_t BytesWritten = PartPayload.GetSize(); + m_Stats.TotalBytesWritten += BytesWritten; + const bool IsLastPart = Workload->PartsLeft.fetch_sub(1) == 1; + if (IsLastPart) + { + Workload->TempFile.Flush(); + ZEN_ASSERT_SLOW(ValidateCompressedBuffer(RawHash, CompositeBuffer(Workload->TempFile.ReadAll()))); + Workload->TempFile.MoveTemporaryIntoPlace(BlockPath, Ec); + if (Ec) + { + throw std::runtime_error(fmt::format("Failed moving temporary file '{}' to '{}': {} ({})", + Workload->TempFile.GetPath(), + BlockPath, + Ec.message(), + Ec.value())); + } + } + Workload->OnSentBytes(BytesWritten, IsLastPart); + SimulateLatency(0, 0); + }); + + Offset += Size; + } + Workload->PartsLeft.store(WorkItems.size()); + + SimulateLatency(0, 0); + return WorkItems; + } + SimulateLatency(0, 0); + return {}; + } + + virtual IoBuffer GetBuildBlob(const Oid& BuildId, const IoHash& RawHash) override + { + ZEN_UNUSED(BuildId); + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BlockPath = GetBlobPayloadPath(RawHash); + if (std::filesystem::is_regular_file(BlockPath)) + { + IoBuffer Payload = ReadFile(BlockPath).Flatten(); + ZEN_ASSERT_SLOW(ValidateCompressedBuffer(RawHash, CompositeBuffer(SharedBuffer(Payload)))); + m_Stats.TotalBytesRead += Payload.GetSize(); + Payload.SetContentType(ZenContentType::kCompressedBinary); + SimulateLatency(0, Payload.GetSize()); + return Payload; + } + SimulateLatency(0, 0); + return IoBuffer{}; + } + + virtual std::vector<std::function<void()>> GetLargeBuildBlob( + const Oid& BuildId, + const IoHash& RawHash, + uint64_t ChunkSize, + std::function<void(uint64_t Offset, const IoBuffer& Chunk, uint64_t BytesRemaining)>&& Receiver) override + { + ZEN_UNUSED(BuildId); + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BlockPath = GetBlobPayloadPath(RawHash); + if (std::filesystem::is_regular_file(BlockPath)) + { + struct WorkloadData + { + std::atomic<uint64_t> BytesRemaining; + IoBuffer BlobFile; + std::function<void(uint64_t Offset, const IoBuffer& Chunk, uint64_t BytesRemaining)> Receiver; + }; + + std::shared_ptr<WorkloadData> Workload(std::make_shared<WorkloadData>()); + Workload->BlobFile = IoBufferBuilder::MakeFromFile(BlockPath); + const uint64_t BlobSize = Workload->BlobFile.GetSize(); + + Workload->Receiver = std::move(Receiver); + Workload->BytesRemaining = BlobSize; + + std::vector<std::function<void()>> WorkItems; + uint64_t Offset = 0; + while (Offset < BlobSize) + { + uint64_t Size = Min(ChunkSize, BlobSize - Offset); + WorkItems.push_back([this, BlockPath, Workload, Offset, Size]() { + SimulateLatency(0, 0); + IoBuffer PartPayload(Workload->BlobFile, Offset, Size); + m_Stats.TotalBytesRead += PartPayload.GetSize(); + uint64_t ByteRemaning = Workload->BytesRemaining.fetch_sub(Size); + Workload->Receiver(Offset, PartPayload, ByteRemaning); + SimulateLatency(Size, PartPayload.GetSize()); + }); + + Offset += Size; + } + SimulateLatency(0, 0); + return WorkItems; + } + return {}; + } + + virtual void PutBlockMetadata(const Oid& BuildId, const IoHash& BlockRawHash, const CbObject& MetaData) override + { + ZEN_UNUSED(BuildId); + + SimulateLatency(MetaData.GetSize(), 0); + + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + const std::filesystem::path BlockMetaDataPath = GetBlobMetadataPath(BlockRawHash); + CreateDirectories(BlockMetaDataPath.parent_path()); + TemporaryFile::SafeWriteFile(BlockMetaDataPath, MetaData.GetView()); + m_Stats.TotalBytesWritten += MetaData.GetSize(); + WriteAsJson(BlockMetaDataPath, MetaData); + SimulateLatency(0, 0); + } + + virtual std::vector<ChunkBlockDescription> FindBlocks(const Oid& BuildId) override + { + ZEN_UNUSED(BuildId); + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + DirectoryContent Content; + GetDirectoryContent(GetBlobsMetadataFolder(), DirectoryContentFlags::IncludeFiles, Content); + std::vector<ChunkBlockDescription> Result; + for (const std::filesystem::path& MetaDataFile : Content.Files) + { + IoHash ChunkHash; + if (IoHash::TryParse(MetaDataFile.stem().string(), ChunkHash)) + { + std::filesystem::path BlockPath = GetBlobPayloadPath(ChunkHash); + if (std::filesystem::is_regular_file(BlockPath)) + { + IoBuffer BlockMetaDataPayload = ReadFile(MetaDataFile).Flatten(); + + m_Stats.TotalBytesRead += BlockMetaDataPayload.GetSize(); + + CbObject BlockObject = CbObject(SharedBuffer(BlockMetaDataPayload)); + Result.emplace_back(ParseChunkBlockDescription(BlockObject)); + } + } + } + SimulateLatency(0, sizeof(IoHash) * Result.size()); + return Result; + } + + virtual std::vector<ChunkBlockDescription> GetBlockMetadata(const Oid& BuildId, std::span<const IoHash> BlockHashes) override + { + ZEN_UNUSED(BuildId); + SimulateLatency(0, 0); + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + m_Stats.TotalRequestCount++; + + std::vector<ChunkBlockDescription> Result; + for (const IoHash& BlockHash : BlockHashes) + { + std::filesystem::path MetaDataFile = GetBlobMetadataPath(BlockHash); + if (std::filesystem::is_regular_file(MetaDataFile)) + { + IoBuffer BlockMetaDataPayload = ReadFile(MetaDataFile).Flatten(); + + m_Stats.TotalBytesRead += BlockMetaDataPayload.GetSize(); + + CbObject BlockObject = CbObject(SharedBuffer(BlockMetaDataPayload)); + Result.emplace_back(ParseChunkBlockDescription(BlockObject)); + } + } + SimulateLatency(sizeof(BlockHashes) * BlockHashes.size(), sizeof(ChunkBlockDescription) * Result.size()); + return Result; + } + +protected: + std::filesystem::path GetBuildsFolder() const { return m_StoragePath / "builds"; } + std::filesystem::path GetBlobsFolder() const { return m_StoragePath / "blobs"; } + std::filesystem::path GetBlobsMetadataFolder() const { return m_StoragePath / "blocks"; } + std::filesystem::path GetBuildFolder(const Oid& BuildId) const { return GetBuildsFolder() / BuildId.ToString(); } + + std::filesystem::path GetBuildPath(const Oid& BuildId) const { return GetBuildFolder(BuildId) / "metadata.cb"; } + + std::filesystem::path GetBuildPartFolder(const Oid& BuildId, const Oid& BuildPartId) const + { + return GetBuildFolder(BuildId) / "parts" / BuildPartId.ToString(); + } + + std::filesystem::path GetBuildPartPath(const Oid& BuildId, const Oid& BuildPartId) const + { + return GetBuildPartFolder(BuildId, BuildPartId) / "metadata.cb"; + } + + std::filesystem::path GetBlobPayloadPath(const IoHash& RawHash) const { return GetBlobsFolder() / fmt::format("{}.cbz", RawHash); } + + std::filesystem::path GetBlobMetadataPath(const IoHash& RawHash) const + { + return GetBlobsMetadataFolder() / fmt::format("{}.cb", RawHash); + } + + void SimulateLatency(uint64_t ReceiveSize, uint64_t SendSize) + { + double SleepSec = m_LatencySec; + if (m_DelayPerKBSec > 0.0) + { + SleepSec += m_DelayPerKBSec * (double(SendSize + ReceiveSize) / 1024u); + } + if (SleepSec > 0) + { + Sleep(int(SleepSec * 1000)); + } + } + + void WriteAsJson(const std::filesystem::path& OriginalPath, CbObjectView Data) const + { + if (m_EnableJsonOutput) + { + ExtendableStringBuilder<128> SB; + CompactBinaryToJson(Data, SB); + std::filesystem::path JsonPath = OriginalPath; + JsonPath.replace_extension(".json"); + std::string_view JsonMetaData = SB.ToView(); + TemporaryFile::SafeWriteFile(JsonPath, MemoryView(JsonMetaData.data(), JsonMetaData.length())); + } + } + + void WriteBuild(const Oid& BuildId, CbObjectView Data) + { + const std::filesystem::path BuildDataPath = GetBuildPath(BuildId); + CreateDirectories(BuildDataPath.parent_path()); + TemporaryFile::SafeWriteFile(BuildDataPath, Data.GetView()); + m_Stats.TotalBytesWritten += Data.GetSize(); + WriteAsJson(BuildDataPath, Data); + } + + CbObject ReadBuild(const Oid& BuildId) + { + const std::filesystem::path BuildDataPath = GetBuildPath(BuildId); + FileContents Content = ReadFile(BuildDataPath); + if (Content.ErrorCode) + { + throw std::runtime_error(fmt::format("Failed reading build '{}' from '{}': {} ({})", + BuildId, + BuildDataPath, + Content.ErrorCode.message(), + Content.ErrorCode.value())); + } + IoBuffer Payload = Content.Flatten(); + m_Stats.TotalBytesRead += Payload.GetSize(); + ZEN_ASSERT(ValidateCompactBinary(Payload.GetView(), CbValidateMode::Default) == CbValidateError::None); + CbObject BuildObject = CbObject(SharedBuffer(Payload)); + return BuildObject; + } + + std::vector<IoHash> GetNeededAttachments(CbObjectView BuildPartObject) + { + std::vector<IoHash> NeededAttachments; + BuildPartObject.IterateAttachments([&](CbFieldView FieldView) { + const IoHash AttachmentHash = FieldView.AsBinaryAttachment(); + const std::filesystem::path BlockPath = GetBlobPayloadPath(AttachmentHash); + if (!std::filesystem::is_regular_file(BlockPath)) + { + NeededAttachments.push_back(AttachmentHash); + } + }); + return NeededAttachments; + } + + bool ValidateCompressedBuffer(const IoHash& RawHash, const CompositeBuffer& Payload) + { + IoHash VerifyHash; + uint64_t VerifySize; + CompressedBuffer ValidateBuffer = CompressedBuffer::FromCompressed(Payload, VerifyHash, VerifySize); + if (!ValidateBuffer) + { + return false; + } + if (VerifyHash != RawHash) + { + return false; + } + CompositeBuffer Decompressed = ValidateBuffer.DecompressToComposite(); + if (!Decompressed) + { + return false; + } + IoHash Hash = IoHash::HashBuffer(Decompressed); + if (Hash != RawHash) + { + return false; + } + return true; + } + +private: + const std::filesystem::path m_StoragePath; + BuildStorage::Statistics& m_Stats; + const bool m_EnableJsonOutput = false; + std::atomic<uint64_t> m_WrittenBytes; + + const double m_LatencySec = 0.0; + const double m_DelayPerKBSec = 0.0; +}; + +std::unique_ptr<BuildStorage> +CreateFileBuildStorage(const std::filesystem::path& StoragePath, + BuildStorage::Statistics& Stats, + bool EnableJsonOutput, + double LatencySec, + double DelayPerKBSec) +{ + return std::make_unique<FileBuildStorage>(StoragePath, Stats, EnableJsonOutput, LatencySec, DelayPerKBSec); +} + +} // namespace zen diff --git a/src/zenutil/include/zenutil/buildstorage.h b/src/zenutil/include/zenutil/buildstorage.h new file mode 100644 index 000000000..9c236310f --- /dev/null +++ b/src/zenutil/include/zenutil/buildstorage.h @@ -0,0 +1,55 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include <zencore/compactbinary.h> +#include <zenutil/chunkblock.h> + +namespace zen { + +class BuildStorage +{ +public: + struct Statistics + { + std::atomic<uint64_t> TotalBytesRead = 0; + std::atomic<uint64_t> TotalBytesWritten = 0; + std::atomic<uint64_t> TotalRequestCount = 0; + std::atomic<uint64_t> TotalRequestTimeUs = 0; + std::atomic<uint64_t> TotalExecutionTimeUs = 0; + }; + + virtual ~BuildStorage() {} + + virtual CbObject ListBuilds(CbObject Query) = 0; + virtual CbObject PutBuild(const Oid& BuildId, const CbObject& MetaData) = 0; + virtual CbObject GetBuild(const Oid& BuildId) = 0; + virtual void FinalizeBuild(const Oid& BuildId) = 0; + + virtual std::pair<IoHash, std::vector<IoHash>> PutBuildPart(const Oid& BuildId, + const Oid& BuildPartId, + std::string_view PartName, + const CbObject& MetaData) = 0; + virtual CbObject GetBuildPart(const Oid& BuildId, const Oid& BuildPartId) = 0; + virtual std::vector<IoHash> FinalizeBuildPart(const Oid& BuildId, const Oid& BuildPartId, const IoHash& PartHash) = 0; + virtual void PutBuildBlob(const Oid& BuildId, const IoHash& RawHash, ZenContentType ContentType, const CompositeBuffer& Payload) = 0; + virtual std::vector<std::function<void()>> PutLargeBuildBlob(const Oid& BuildId, + const IoHash& RawHash, + ZenContentType ContentType, + uint64_t PayloadSize, + std::function<IoBuffer(uint64_t Offset, uint64_t Size)>&& Transmitter, + std::function<void(uint64_t, bool)>&& OnSentBytes) = 0; + + virtual IoBuffer GetBuildBlob(const Oid& BuildId, const IoHash& RawHash) = 0; + virtual std::vector<std::function<void()>> GetLargeBuildBlob( + const Oid& BuildId, + const IoHash& RawHash, + uint64_t ChunkSize, + std::function<void(uint64_t Offset, const IoBuffer& Chunk, uint64_t BytesRemaining)>&& Receiver) = 0; + + virtual void PutBlockMetadata(const Oid& BuildId, const IoHash& BlockRawHash, const CbObject& MetaData) = 0; + virtual std::vector<ChunkBlockDescription> FindBlocks(const Oid& BuildId) = 0; + virtual std::vector<ChunkBlockDescription> GetBlockMetadata(const Oid& BuildId, std::span<const IoHash> BlockHashes) = 0; +}; + +} // namespace zen diff --git a/src/zenutil/include/zenutil/chunkblock.h b/src/zenutil/include/zenutil/chunkblock.h index 9b7414629..21107fb7c 100644 --- a/src/zenutil/include/zenutil/chunkblock.h +++ b/src/zenutil/include/zenutil/chunkblock.h @@ -12,21 +12,28 @@ namespace zen { -struct ChunkBlockDescription +struct ThinChunkBlockDescription { - IoHash BlockHash; - std::vector<IoHash> ChunkHashes; + IoHash BlockHash; + std::vector<IoHash> ChunkRawHashes; +}; + +struct ChunkBlockDescription : public ThinChunkBlockDescription +{ + uint64_t HeaderSize; std::vector<uint32_t> ChunkRawLengths; + std::vector<uint32_t> ChunkCompressedLengths; }; std::vector<ChunkBlockDescription> ParseChunkBlockDescriptionList(const CbObjectView& BlocksObject); ChunkBlockDescription ParseChunkBlockDescription(const CbObjectView& BlockObject); CbObject BuildChunkBlockDescription(const ChunkBlockDescription& Block, CbObjectView MetaData); - +ChunkBlockDescription GetChunkBlockDescription(const SharedBuffer& BlockPayload, const IoHash& RawHash); typedef std::function<std::pair<uint64_t, CompressedBuffer>(const IoHash& RawHash)> FetchChunkFunc; CompressedBuffer GenerateChunkBlock(std::vector<std::pair<IoHash, FetchChunkFunc>>&& FetchChunks, ChunkBlockDescription& OutBlock); bool IterateChunkBlock(const SharedBuffer& BlockPayload, - std::function<void(CompressedBuffer&& Chunk, const IoHash& AttachmentHash)> Visitor); + std::function<void(CompressedBuffer&& Chunk, const IoHash& AttachmentHash)> Visitor, + uint64_t& OutHeaderSize); } // namespace zen diff --git a/src/zenutil/include/zenutil/chunkedcontent.h b/src/zenutil/include/zenutil/chunkedcontent.h new file mode 100644 index 000000000..15c687462 --- /dev/null +++ b/src/zenutil/include/zenutil/chunkedcontent.h @@ -0,0 +1,256 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include <zencore/compactbinary.h> +#include <zencore/compactbinarybuilder.h> +#include <zencore/iohash.h> + +#include <filesystem> +#include <vector> + +ZEN_THIRD_PARTY_INCLUDES_START +#include <tsl/robin_map.h> +ZEN_THIRD_PARTY_INCLUDES_END + +namespace zen { + +class CbWriter; +class ChunkingController; +class WorkerThreadPool; + +enum class SourcePlatform +{ + Windows = 0, + Linux = 1, + MacOS = 2, + _Count +}; + +std::string_view ToString(SourcePlatform Platform); +SourcePlatform FromString(std::string_view Platform, SourcePlatform Default); +SourcePlatform GetSourceCurrentPlatform(); + +struct FolderContent +{ + SourcePlatform Platform = GetSourceCurrentPlatform(); + std::vector<std::filesystem::path> Paths; + std::vector<uint64_t> RawSizes; + std::vector<uint32_t> Attributes; + std::vector<uint64_t> ModificationTicks; + + bool operator==(const FolderContent& Rhs) const; + + bool AreKnownFilesEqual(const FolderContent& Rhs) const; + void UpdateState(const FolderContent& Rhs, std::vector<uint32_t>& PathIndexesOufOfDate); + static bool AreFileAttributesEqual(const uint32_t Lhs, const uint32_t Rhs); +}; + +FolderContent GetUpdatedContent(const FolderContent& Old, + const FolderContent& New, + std::vector<std::filesystem::path>& OutDeletedPathIndexes); + +void SaveFolderContentToCompactBinary(const FolderContent& Content, CbWriter& Output); +FolderContent LoadFolderContentToCompactBinary(CbObjectView Input); + +struct GetFolderContentStatistics +{ + std::atomic<uint64_t> FoundFileCount = 0; + std::atomic<uint64_t> FoundFileByteCount = 0; + std::atomic<uint64_t> AcceptedFileCount = 0; + std::atomic<uint64_t> AcceptedFileByteCount = 0; + uint64_t ElapsedWallTimeUS = 0; +}; + +FolderContent GetFolderContent(GetFolderContentStatistics& Stats, + const std::filesystem::path& RootPath, + std::function<bool(const std::string_view& RelativePath)>&& AcceptDirectory, + std::function<bool(std::string_view RelativePath, uint64_t Size, uint32_t Attributes)>&& AcceptFile, + WorkerThreadPool& WorkerPool, + int32_t UpdateInteralMS, + std::function<void(bool IsAborted, std::ptrdiff_t PendingWork)>&& UpdateCallback, + std::atomic<bool>& AbortFlag); + +struct ChunkedContentData +{ + // To describe one asset with a particular RawHash, find the index of the hash in SequenceRawHashes + // ChunkCounts for that index will be the number of indexes in ChunkOrders that describe + // the sequence of chunks required to reconstruct the asset. + // Offset into ChunkOrders is based on how many entries in ChunkOrders the previous [n - 1] SequenceRawHashes uses + std::vector<IoHash> SequenceRawHashes; // Raw hash for Chunk sequence + std::vector<uint32_t> ChunkCounts; // Chunk count of ChunkOrder for SequenceRawHashes[n] + std::vector<uint32_t> ChunkOrders; // Chunk sequence indexed into ChunkHashes, ChunkCounts[n] indexes per SequenceRawHashes[n] + std::vector<IoHash> ChunkHashes; // Unique chunk hashes + std::vector<uint64_t> ChunkRawSizes; // Unique chunk raw size for ChunkHash[n] +}; + +struct ChunkedFolderContent +{ + SourcePlatform Platform = GetSourceCurrentPlatform(); + std::vector<std::filesystem::path> Paths; + std::vector<uint64_t> RawSizes; + std::vector<uint32_t> Attributes; + std::vector<IoHash> RawHashes; + ChunkedContentData ChunkedContent; +}; + +void SaveChunkedFolderContentToCompactBinary(const ChunkedFolderContent& Content, CbWriter& Output); +ChunkedFolderContent LoadChunkedFolderContentToCompactBinary(CbObjectView Input); + +ChunkedFolderContent MergeChunkedFolderContents(const ChunkedFolderContent& Base, std::span<const ChunkedFolderContent> Overlays); +ChunkedFolderContent DeletePathsFromChunkedContent(const ChunkedFolderContent& Base, std::span<const std::filesystem::path> DeletedPaths); + +struct ChunkingStatistics +{ + std::atomic<uint64_t> FilesProcessed = 0; + std::atomic<uint64_t> FilesChunked = 0; + std::atomic<uint64_t> BytesHashed = 0; + std::atomic<uint64_t> UniqueChunksFound = 0; + std::atomic<uint64_t> UniqueSequencesFound = 0; + std::atomic<uint64_t> UniqueBytesFound = 0; + uint64_t ElapsedWallTimeUS = 0; +}; + +ChunkedFolderContent ChunkFolderContent(ChunkingStatistics& Stats, + WorkerThreadPool& WorkerPool, + const std::filesystem::path& RootPath, + const FolderContent& Content, + const ChunkingController& InChunkingController, + int32_t UpdateInteralMS, + std::function<void(bool IsAborted, std::ptrdiff_t PendingWork)>&& UpdateCallback, + std::atomic<bool>& AbortFlag); + +struct ChunkedContentLookup +{ + struct ChunkLocation + { + uint32_t PathIndex; + uint64_t Offset; + }; + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> ChunkHashToChunkIndex; + tsl::robin_map<IoHash, uint32_t, IoHash::Hasher> RawHashToSequenceRawHashIndex; + std::vector<uint32_t> SequenceRawHashIndexChunkOrderOffset; + std::vector<ChunkLocation> ChunkLocations; + std::vector<size_t> ChunkLocationOffset; // ChunkLocations[ChunkLocationOffset[ChunkIndex]] -> start of sources for ChunkIndex + std::vector<uint32_t> ChunkLocationCounts; // ChunkLocationCounts[ChunkIndex] count of chunk locations for ChunkIndex +}; + +ChunkedContentLookup BuildChunkedContentLookup(const ChunkedFolderContent& Content); + +inline std::pair<size_t, uint32_t> +GetChunkLocationRange(const ChunkedContentLookup& Lookup, uint32_t ChunkIndex) +{ + return std::make_pair(Lookup.ChunkLocationOffset[ChunkIndex], Lookup.ChunkLocationCounts[ChunkIndex]); +} + +inline std::span<const ChunkedContentLookup::ChunkLocation> +GetChunkLocations(const ChunkedContentLookup& Lookup, uint32_t ChunkIndex) +{ + std::pair<size_t, uint32_t> Range = GetChunkLocationRange(Lookup, ChunkIndex); + return std::span<const ChunkedContentLookup::ChunkLocation>(Lookup.ChunkLocations).subspan(Range.first, Range.second); +} + +namespace compactbinary_helpers { + template<typename Type> + void WriteArray(std::span<const Type> Values, std::string_view ArrayName, CbWriter& Output) + { + Output.BeginArray(ArrayName); + for (const Type Value : Values) + { + Output << Value; + } + Output.EndArray(); + } + + template<typename Type> + void WriteArray(const std::vector<Type>& Values, std::string_view ArrayName, CbWriter& Output) + { + WriteArray(std::span<const Type>(Values), ArrayName, Output); + } + + template<> + inline void WriteArray(std::span<const std::filesystem::path> Values, std::string_view ArrayName, CbWriter& Output) + { + Output.BeginArray(ArrayName); + for (const std::filesystem::path& Path : Values) + { + Output.AddString((const char*)Path.generic_u8string().c_str()); + } + Output.EndArray(); + } + + template<> + inline void WriteArray(const std::vector<std::filesystem::path>& Values, std::string_view ArrayName, CbWriter& Output) + { + WriteArray(std::span<const std::filesystem::path>(Values), ArrayName, Output); + } + + inline void WriteBinaryAttachmentArray(std::span<const IoHash> Values, std::string_view ArrayName, CbWriter& Output) + { + Output.BeginArray(ArrayName); + for (const IoHash& Hash : Values) + { + Output.AddBinaryAttachment(Hash); + } + Output.EndArray(); + } + + inline void WriteBinaryAttachmentArray(const std::vector<IoHash>& Values, std::string_view ArrayName, CbWriter& Output) + { + WriteArray(std::span<const IoHash>(Values), ArrayName, Output); + } + + inline void ReadArray(std::string_view ArrayName, CbObjectView Input, std::vector<uint32_t>& Result) + { + CbArrayView Array = Input[ArrayName].AsArrayView(); + Result.reserve(Array.Num()); + for (CbFieldView ItemView : Array) + { + Result.push_back(ItemView.AsUInt32()); + } + } + + inline void ReadArray(std::string_view ArrayName, CbObjectView Input, std::vector<uint64_t>& Result) + { + CbArrayView Array = Input[ArrayName].AsArrayView(); + Result.reserve(Array.Num()); + for (CbFieldView ItemView : Array) + { + Result.push_back(ItemView.AsUInt64()); + } + } + + inline void ReadArray(std::string_view ArrayName, CbObjectView Input, std::vector<std::filesystem::path>& Result) + { + CbArrayView Array = Input[ArrayName].AsArrayView(); + Result.reserve(Array.Num()); + for (CbFieldView ItemView : Array) + { + std::u8string_view U8Path = ItemView.AsU8String(); + Result.push_back(std::filesystem::path(U8Path)); + } + } + + inline void ReadArray(std::string_view ArrayName, CbObjectView Input, std::vector<IoHash>& Result) + { + CbArrayView Array = Input[ArrayName].AsArrayView(); + Result.reserve(Array.Num()); + for (CbFieldView ItemView : Array) + { + Result.push_back(ItemView.AsHash()); + } + } + + inline void ReadBinaryAttachmentArray(std::string_view ArrayName, CbObjectView Input, std::vector<IoHash>& Result) + { + CbArrayView Array = Input[ArrayName].AsArrayView(); + Result.reserve(Array.Num()); + for (CbFieldView ItemView : Array) + { + Result.push_back(ItemView.AsBinaryAttachment()); + } + } + +} // namespace compactbinary_helpers + +} // namespace zen diff --git a/src/zenutil/include/zenutil/chunkingcontroller.h b/src/zenutil/include/zenutil/chunkingcontroller.h new file mode 100644 index 000000000..fe4fc1bb5 --- /dev/null +++ b/src/zenutil/include/zenutil/chunkingcontroller.h @@ -0,0 +1,55 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include <zencore/compactbinary.h> + +#include <zenutil/chunkedfile.h> + +#include <atomic> +#include <filesystem> + +namespace zen { + +const std::vector<std::string_view> DefaultChunkingExcludeExtensions = {".exe", ".dll", ".pdb", ".self"}; + +const ChunkedParams DefaultChunkedParams = {.MinSize = ((8u * 1u) * 1024u) - 128u, + .MaxSize = 128u * 1024u, + .AvgSize = ((8u * 4u) * 1024u) + 128u}; + +const size_t DefaultChunkingFileSizeLimit = DefaultChunkedParams.MaxSize; + +const uint32_t DefaultFixedChunkingChunkSize = 16u * 1024u * 1024u; + +struct ChunkedInfoWithSource; + +class ChunkingController +{ +public: + virtual ~ChunkingController() {} + + // Return true if the input file was processed. If true is returned OutChunked will contain the chunked info + virtual bool ProcessFile(const std::filesystem::path& InputPath, + uint64_t RawSize, + ChunkedInfoWithSource& OutChunked, + std::atomic<uint64_t>& BytesProcessed) const = 0; + virtual std::string_view GetName() const = 0; + virtual CbObject GetParameters() const = 0; +}; + +std::unique_ptr<ChunkingController> CreateBasicChunkingController( + std::span<const std::string_view> ExcludeExtensions = DefaultChunkingExcludeExtensions, + uint64_t ChunkFileSizeLimit = DefaultChunkingFileSizeLimit, + const ChunkedParams& ChunkingParams = DefaultChunkedParams); +std::unique_ptr<ChunkingController> CreateBasicChunkingController(CbObjectView Parameters); + +std::unique_ptr<ChunkingController> CreateChunkingControllerWithFixedChunking( + std::span<const std::string_view> ExcludeExtensions = DefaultChunkingExcludeExtensions, + uint64_t ChunkFileSizeLimit = DefaultChunkingFileSizeLimit, + const ChunkedParams& ChunkingParams = DefaultChunkedParams, + uint32_t FixedChunkingChunkSize = DefaultFixedChunkingChunkSize); +std::unique_ptr<ChunkingController> CreateChunkingControllerWithFixedChunking(CbObjectView Parameters); + +std::unique_ptr<ChunkingController> CreateChunkingController(std::string_view Name, CbObjectView Parameters); + +} // namespace zen diff --git a/src/zenutil/include/zenutil/filebuildstorage.h b/src/zenutil/include/zenutil/filebuildstorage.h new file mode 100644 index 000000000..c95fb32e6 --- /dev/null +++ b/src/zenutil/include/zenutil/filebuildstorage.h @@ -0,0 +1,16 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include <zencore/logging.h> +#include <zenutil/buildstorage.h> + +namespace zen { +class HttpClient; + +std::unique_ptr<BuildStorage> CreateFileBuildStorage(const std::filesystem::path& StoragePath, + BuildStorage::Statistics& Stats, + bool EnableJsonOutput, + double LatencySec = 0.0, + double DelayPerKBSec = 0.0); +} // namespace zen diff --git a/src/zenutil/include/zenutil/jupiter/jupiterbuildstorage.h b/src/zenutil/include/zenutil/jupiter/jupiterbuildstorage.h new file mode 100644 index 000000000..89fc70140 --- /dev/null +++ b/src/zenutil/include/zenutil/jupiter/jupiterbuildstorage.h @@ -0,0 +1,17 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include <zencore/logging.h> +#include <zenutil/buildstorage.h> + +namespace zen { +class HttpClient; + +std::unique_ptr<BuildStorage> CreateJupiterBuildStorage(LoggerRef InLog, + HttpClient& InHttpClient, + BuildStorage::Statistics& Stats, + std::string_view Namespace, + std::string_view Bucket, + const std::filesystem::path& TempFolderPath); +} // namespace zen diff --git a/src/zenutil/include/zenutil/parallellwork.h b/src/zenutil/include/zenutil/parallellwork.h new file mode 100644 index 000000000..7a8218c51 --- /dev/null +++ b/src/zenutil/include/zenutil/parallellwork.h @@ -0,0 +1,69 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#pragma once + +#include <zencore/thread.h> +#include <zencore/workthreadpool.h> + +#include <atomic> + +namespace zen { + +class ParallellWork +{ +public: + ParallellWork(std::atomic<bool>& AbortFlag) : m_AbortFlag(AbortFlag), m_PendingWork(1) {} + + ~ParallellWork() + { + // Make sure to call Wait before destroying + ZEN_ASSERT(m_PendingWork.Remaining() == 0); + } + + void ScheduleWork(WorkerThreadPool& WorkerPool, + std::function<void(std::atomic<bool>& AbortFlag)>&& Work, + std::function<void(const std::exception& Ex, std::atomic<bool>& AbortFlag)>&& OnError) + { + m_PendingWork.AddCount(1); + try + { + WorkerPool.ScheduleWork([this, Work = std::move(Work), OnError = std::move(OnError)] { + try + { + Work(m_AbortFlag); + } + catch (const std::exception& Ex) + { + OnError(Ex, m_AbortFlag); + } + m_PendingWork.CountDown(); + }); + } + catch (const std::exception&) + { + m_PendingWork.CountDown(); + throw; + } + } + + void Abort() { m_AbortFlag = true; } + + bool IsAborted() const { return m_AbortFlag.load(); } + + void Wait(int32_t UpdateInteralMS, std::function<void(bool IsAborted, std::ptrdiff_t PendingWork)>&& UpdateCallback) + { + ZEN_ASSERT(m_PendingWork.Remaining() > 0); + m_PendingWork.CountDown(); + while (!m_PendingWork.Wait(UpdateInteralMS)) + { + UpdateCallback(m_AbortFlag.load(), m_PendingWork.Remaining()); + } + } + Latch& PendingWork() { return m_PendingWork; } + +private: + std::atomic<bool>& m_AbortFlag; + Latch m_PendingWork; +}; + +} // namespace zen diff --git a/src/zenutil/jupiter/jupiterbuildstorage.cpp b/src/zenutil/jupiter/jupiterbuildstorage.cpp new file mode 100644 index 000000000..481e9146f --- /dev/null +++ b/src/zenutil/jupiter/jupiterbuildstorage.cpp @@ -0,0 +1,371 @@ +// Copyright Epic Games, Inc. All Rights Reserved. + +#include <zenutil/jupiter/jupiterbuildstorage.h> + +#include <zencore/compactbinarybuilder.h> +#include <zencore/fmtutils.h> +#include <zencore/scopeguard.h> +#include <zencore/timer.h> +#include <zenutil/jupiter/jupitersession.h> + +ZEN_THIRD_PARTY_INCLUDES_START +#include <tsl/robin_map.h> +ZEN_THIRD_PARTY_INCLUDES_END + +namespace zen { + +using namespace std::literals; + +class JupiterBuildStorage : public BuildStorage +{ +public: + JupiterBuildStorage(LoggerRef InLog, + HttpClient& InHttpClient, + Statistics& Stats, + std::string_view Namespace, + std::string_view Bucket, + const std::filesystem::path& TempFolderPath) + : m_Session(InLog, InHttpClient) + , m_Stats(Stats) + , m_Namespace(Namespace) + , m_Bucket(Bucket) + , m_TempFolderPath(TempFolderPath) + { + } + virtual ~JupiterBuildStorage() {} + + virtual CbObject ListBuilds(CbObject Query) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + IoBuffer Payload = Query.GetBuffer().AsIoBuffer(); + Payload.SetContentType(ZenContentType::kCbObject); + JupiterResult ListResult = m_Session.ListBuilds(m_Namespace, m_Bucket, Payload); + AddStatistic(ListResult); + if (!ListResult.Success) + { + throw std::runtime_error(fmt::format("Failed listing builds: {} ({})", ListResult.Reason, ListResult.ErrorCode)); + } + return PayloadToJson("Failed listing builds"sv, ListResult.Response); + } + + virtual CbObject PutBuild(const Oid& BuildId, const CbObject& MetaData) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + IoBuffer Payload = MetaData.GetBuffer().AsIoBuffer(); + Payload.SetContentType(ZenContentType::kCbObject); + JupiterResult PutResult = m_Session.PutBuild(m_Namespace, m_Bucket, BuildId, Payload); + AddStatistic(PutResult); + if (!PutResult.Success) + { + throw std::runtime_error(fmt::format("Failed creating build: {} ({})", PutResult.Reason, PutResult.ErrorCode)); + } + return PayloadToJson(fmt::format("Failed creating build: {}", BuildId), PutResult.Response); + } + + virtual CbObject GetBuild(const Oid& BuildId) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult GetBuildResult = m_Session.GetBuild(m_Namespace, m_Bucket, BuildId); + AddStatistic(GetBuildResult); + if (!GetBuildResult.Success) + { + throw std::runtime_error(fmt::format("Failed fetching build: {} ({})", GetBuildResult.Reason, GetBuildResult.ErrorCode)); + } + return PayloadToJson(fmt::format("Failed fetching build {}:", BuildId), GetBuildResult.Response); + } + + virtual void FinalizeBuild(const Oid& BuildId) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult FinalizeBuildResult = m_Session.FinalizeBuild(m_Namespace, m_Bucket, BuildId); + AddStatistic(FinalizeBuildResult); + if (!FinalizeBuildResult.Success) + { + throw std::runtime_error( + fmt::format("Failed finalizing build part: {} ({})", FinalizeBuildResult.Reason, FinalizeBuildResult.ErrorCode)); + } + } + + virtual std::pair<IoHash, std::vector<IoHash>> PutBuildPart(const Oid& BuildId, + const Oid& BuildPartId, + std::string_view PartName, + const CbObject& MetaData) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + IoBuffer Payload = MetaData.GetBuffer().AsIoBuffer(); + Payload.SetContentType(ZenContentType::kCbObject); + PutBuildPartResult PutPartResult = m_Session.PutBuildPart(m_Namespace, m_Bucket, BuildId, BuildPartId, PartName, Payload); + AddStatistic(PutPartResult); + if (!PutPartResult.Success) + { + throw std::runtime_error(fmt::format("Failed creating build part: {} ({})", PutPartResult.Reason, PutPartResult.ErrorCode)); + } + return std::make_pair(PutPartResult.RawHash, std::move(PutPartResult.Needs)); + } + + virtual CbObject GetBuildPart(const Oid& BuildId, const Oid& BuildPartId) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult GetBuildPartResult = m_Session.GetBuildPart(m_Namespace, m_Bucket, BuildId, BuildPartId); + AddStatistic(GetBuildPartResult); + if (!GetBuildPartResult.Success) + { + throw std::runtime_error(fmt::format("Failed fetching build part {}: {} ({})", + BuildPartId, + GetBuildPartResult.Reason, + GetBuildPartResult.ErrorCode)); + } + return PayloadToJson(fmt::format("Failed fetching build part {}:", BuildPartId), GetBuildPartResult.Response); + } + + virtual std::vector<IoHash> FinalizeBuildPart(const Oid& BuildId, const Oid& BuildPartId, const IoHash& PartHash) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + FinalizeBuildPartResult FinalizePartResult = m_Session.FinalizeBuildPart(m_Namespace, m_Bucket, BuildId, BuildPartId, PartHash); + AddStatistic(FinalizePartResult); + if (!FinalizePartResult.Success) + { + throw std::runtime_error( + fmt::format("Failed finalizing build part: {} ({})", FinalizePartResult.Reason, FinalizePartResult.ErrorCode)); + } + return std::move(FinalizePartResult.Needs); + } + + virtual void PutBuildBlob(const Oid& BuildId, + const IoHash& RawHash, + ZenContentType ContentType, + const CompositeBuffer& Payload) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult PutBlobResult = m_Session.PutBuildBlob(m_Namespace, m_Bucket, BuildId, RawHash, ContentType, Payload); + AddStatistic(PutBlobResult); + if (!PutBlobResult.Success) + { + throw std::runtime_error(fmt::format("Failed putting build part: {} ({})", PutBlobResult.Reason, PutBlobResult.ErrorCode)); + } + } + + virtual std::vector<std::function<void()>> PutLargeBuildBlob(const Oid& BuildId, + const IoHash& RawHash, + ZenContentType ContentType, + uint64_t PayloadSize, + std::function<IoBuffer(uint64_t Offset, uint64_t Size)>&& Transmitter, + std::function<void(uint64_t, bool)>&& OnSentBytes) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + std::vector<std::function<JupiterResult(bool&)>> WorkItems; + JupiterResult PutMultipartBlobResult = m_Session.PutMultipartBuildBlob(m_Namespace, + m_Bucket, + BuildId, + RawHash, + ContentType, + PayloadSize, + std::move(Transmitter), + WorkItems); + AddStatistic(PutMultipartBlobResult); + if (!PutMultipartBlobResult.Success) + { + throw std::runtime_error( + fmt::format("Failed putting build part: {} ({})", PutMultipartBlobResult.Reason, PutMultipartBlobResult.ErrorCode)); + } + OnSentBytes(PutMultipartBlobResult.SentBytes, WorkItems.empty()); + + std::vector<std::function<void()>> WorkList; + for (auto& WorkItem : WorkItems) + { + WorkList.emplace_back([this, WorkItem = std::move(WorkItem), OnSentBytes]() { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + bool IsComplete = false; + JupiterResult PartResult = WorkItem(IsComplete); + AddStatistic(PartResult); + if (!PartResult.Success) + { + throw std::runtime_error(fmt::format("Failed putting build part: {} ({})", PartResult.Reason, PartResult.ErrorCode)); + } + OnSentBytes(PartResult.SentBytes, IsComplete); + }); + } + return WorkList; + } + + virtual IoBuffer GetBuildBlob(const Oid& BuildId, const IoHash& RawHash) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult GetBuildBlobResult = m_Session.GetBuildBlob(m_Namespace, m_Bucket, BuildId, RawHash, m_TempFolderPath); + AddStatistic(GetBuildBlobResult); + if (!GetBuildBlobResult.Success) + { + throw std::runtime_error( + fmt::format("Failed fetching build blob {}: {} ({})", RawHash, GetBuildBlobResult.Reason, GetBuildBlobResult.ErrorCode)); + } + return std::move(GetBuildBlobResult.Response); + } + + virtual std::vector<std::function<void()>> GetLargeBuildBlob( + const Oid& BuildId, + const IoHash& RawHash, + uint64_t ChunkSize, + std::function<void(uint64_t Offset, const IoBuffer& Chunk, uint64_t BytesRemaining)>&& Receiver) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + std::vector<std::function<JupiterResult()>> WorkItems; + JupiterResult GetMultipartBlobResult = + m_Session.GetMultipartBuildBlob(m_Namespace, m_Bucket, BuildId, RawHash, ChunkSize, std::move(Receiver), WorkItems); + + AddStatistic(GetMultipartBlobResult); + if (!GetMultipartBlobResult.Success) + { + throw std::runtime_error( + fmt::format("Failed getting build part: {} ({})", GetMultipartBlobResult.Reason, GetMultipartBlobResult.ErrorCode)); + } + std::vector<std::function<void()>> WorkList; + for (auto& WorkItem : WorkItems) + { + WorkList.emplace_back([this, WorkItem = std::move(WorkItem)]() { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult PartResult = WorkItem(); + AddStatistic(PartResult); + if (!PartResult.Success) + { + throw std::runtime_error(fmt::format("Failed getting build part: {} ({})", PartResult.Reason, PartResult.ErrorCode)); + } + }); + } + return WorkList; + } + + virtual void PutBlockMetadata(const Oid& BuildId, const IoHash& BlockRawHash, const CbObject& MetaData) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + IoBuffer Payload = MetaData.GetBuffer().AsIoBuffer(); + Payload.SetContentType(ZenContentType::kCbObject); + JupiterResult PutMetaResult = m_Session.PutBlockMetadata(m_Namespace, m_Bucket, BuildId, BlockRawHash, Payload); + AddStatistic(PutMetaResult); + if (!PutMetaResult.Success) + { + throw std::runtime_error( + fmt::format("Failed putting build block metadata: {} ({})", PutMetaResult.Reason, PutMetaResult.ErrorCode)); + } + } + + virtual std::vector<ChunkBlockDescription> FindBlocks(const Oid& BuildId) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + JupiterResult FindResult = m_Session.FindBlocks(m_Namespace, m_Bucket, BuildId); + AddStatistic(FindResult); + if (!FindResult.Success) + { + throw std::runtime_error(fmt::format("Failed fetching known blocks: {} ({})", FindResult.Reason, FindResult.ErrorCode)); + } + return ParseChunkBlockDescriptionList(PayloadToJson("Failed fetching known blocks"sv, FindResult.Response)); + } + + virtual std::vector<ChunkBlockDescription> GetBlockMetadata(const Oid& BuildId, std::span<const IoHash> BlockHashes) override + { + Stopwatch ExecutionTimer; + auto _ = MakeGuard([&]() { m_Stats.TotalExecutionTimeUs += ExecutionTimer.GetElapsedTimeUs(); }); + CbObjectWriter Request; + + Request.BeginArray("blocks"sv); + for (const IoHash& BlockHash : BlockHashes) + { + Request.AddHash(BlockHash); + } + Request.EndArray(); + + IoBuffer Payload = Request.Save().GetBuffer().AsIoBuffer(); + Payload.SetContentType(ZenContentType::kCbObject); + JupiterResult GetBlockMetadataResult = m_Session.GetBlockMetadata(m_Namespace, m_Bucket, BuildId, Payload); + AddStatistic(GetBlockMetadataResult); + if (!GetBlockMetadataResult.Success) + { + throw std::runtime_error( + fmt::format("Failed fetching block metadatas: {} ({})", GetBlockMetadataResult.Reason, GetBlockMetadataResult.ErrorCode)); + } + std::vector<ChunkBlockDescription> UnorderedList = + ParseChunkBlockDescriptionList(PayloadToJson("Failed fetching block metadatas", GetBlockMetadataResult.Response)); + tsl::robin_map<IoHash, size_t, IoHash::Hasher> BlockDescriptionLookup; + for (size_t DescriptionIndex = 0; DescriptionIndex < UnorderedList.size(); DescriptionIndex++) + { + const ChunkBlockDescription& Description = UnorderedList[DescriptionIndex]; + BlockDescriptionLookup.insert_or_assign(Description.BlockHash, DescriptionIndex); + } + std::vector<ChunkBlockDescription> SortedBlockDescriptions; + SortedBlockDescriptions.reserve(BlockDescriptionLookup.size()); + for (const IoHash& BlockHash : BlockHashes) + { + if (auto It = BlockDescriptionLookup.find(BlockHash); It != BlockDescriptionLookup.end()) + { + SortedBlockDescriptions.push_back(std::move(UnorderedList[It->second])); + } + } + return SortedBlockDescriptions; + } + +private: + static CbObject PayloadToJson(std::string_view Context, const IoBuffer& Payload) + { + if (Payload.GetContentType() == ZenContentType::kJSON) + { + std::string_view Json(reinterpret_cast<const char*>(Payload.GetData()), Payload.GetSize()); + return LoadCompactBinaryFromJson(Json).AsObject(); + } + else if (Payload.GetContentType() == ZenContentType::kCbObject) + { + return LoadCompactBinaryObject(Payload); + } + else if (Payload.GetContentType() == ZenContentType::kCompressedBinary) + { + IoHash RawHash; + uint64_t RawSize; + return LoadCompactBinaryObject(CompressedBuffer::FromCompressed(SharedBuffer(Payload), RawHash, RawSize)); + } + else + { + throw std::runtime_error( + fmt::format("{}: {} ({})", "Unsupported response format", Context, ToString(Payload.GetContentType()))); + } + } + + void AddStatistic(const JupiterResult& Result) + { + m_Stats.TotalBytesWritten += Result.SentBytes; + m_Stats.TotalBytesRead += Result.ReceivedBytes; + m_Stats.TotalRequestTimeUs += uint64_t(Result.ElapsedSeconds * 1000000.0); + m_Stats.TotalRequestCount++; + } + + JupiterSession m_Session; + Statistics& m_Stats; + const std::string m_Namespace; + const std::string m_Bucket; + const std::filesystem::path m_TempFolderPath; +}; + +std::unique_ptr<BuildStorage> +CreateJupiterBuildStorage(LoggerRef InLog, + HttpClient& InHttpClient, + BuildStorage::Statistics& Stats, + std::string_view Namespace, + std::string_view Bucket, + const std::filesystem::path& TempFolderPath) +{ + return std::make_unique<JupiterBuildStorage>(InLog, InHttpClient, Stats, Namespace, Bucket, TempFolderPath); +} + +} // namespace zen |