| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A collection of security, correctness, and robustness fixes in `zenhttp` and `zencore` surfaced by security review. Most items are small, independent commits grouped here because they all tighten trust boundaries or fix UB along the same code paths.
## WebSocket protocol hardening (RFC 6455)
- **Enforce the client-side mask bit**. Server-side frame loops now reject unmasked frames with close code 1002 per §5.1. Prevents HTTP intermediary smuggling.
- **Validate control frames and RSV bits**. Fragmented control frames, oversized (>125 B) control payloads, and any non-zero RSV bit now fail the connection before allocation.
- **Lower per-frame payload cap** from 256 MB → 4 MB. Bounds per-connection accumulator memory.
- **Implement message fragmentation**. Continuation frames are coalesced and delivered as a single message; interleaved non-control frames close with 1002; assembled messages are capped at 4 MB (1009 on overflow). Previously partial fragments were delivered to handlers, bypassing payload validation.
- **Parse the 101 handshake response properly** in `HttpWsClient`. Status-line, `Upgrade`, `Connection`, and `Sec-WebSocket-Accept` are now matched exactly rather than via substring searches against the full body.
## Auth / OIDC hardening
- **Constant-time password compare** in `PasswordSecurity::IsAllowed` (closes a remote length/content timing oracle). Adds a shared `ConstantTimeEquals` helper.
- **Harden Basic-auth header parsing**: trim trailing LWS, reject control bytes and DEL in the credential.
- **OIDC discovery pinning**: require HTTPS (loopback exempt), verify `issuer` matches `BaseUrl`, require `token_endpoint` / `userinfo_endpoint` / `jwks_uri` to share origin with `BaseUrl`, reject empty `token_endpoint`.
- **Restrict `POST /auth/oidc/refreshtoken`** to local-machine requests. Previously unauthenticated in default deployments — remote callers could evict or replace cached tokens.
- **Stop logging OIDC provider response bodies** on refresh failure (IdPs echo `refresh_token` back in error bodies).
- **Drop the unused `IdentityToken` field** from `OidcClient` / `OpenIdToken` so nothing in the tree accidentally trusts an unverified JWT.
## Auth state encryption migration
- Add `AesGcm` AEAD primitive (BCrypt / OpenSSL backends, mbedTLS stubbed) and `CryptoRandom::Fill` CSPRNG helper in `zencore/crypto.h`.
- Migrate authstate file from AES-256-CBC with a fixed IV to AES-GCM with a fresh 12-byte random nonce per write and the 4-byte `ZEN1` magic bound as AAD. Legacy-CBC files are transparently read once and rewritten in the new format.
## Filesystem / IO robustness
- `IoBufferExtendedCore::Materialize` now checks `MAP_FAILED` on POSIX (was comparing to `nullptr`, which let the failure sentinel propagate into later reads and `munmap(MAP_FAILED, ...)`).
- `IoBufferBuilder::MakeFromFile / MakeFromTemporaryFile`: close the FD/HANDLE on exception via a dismissable `ScopeGuard`; actually check the `fstat()` return value (previously used an uninitialized `FileSize`).
- `ReadFromFileMaybe`: loop short reads, retry `EINTR`, chunk Windows `ReadFile` at `0xFFFFFFFF` bytes (fixes silent truncation of multi-GiB reads).
- `WipeDirectory`: compare `FindFirstFileW` handle against `INVALID_HANDLE_VALUE` rather than `nullptr`.
- `RemoveFileNative` (Linux/macOS): report non-`ENOENT` stat failures via the `std::error_code` out-param and stop reading `st_mode` after a failed stat.
## Buffer / compression correctness
- Avoid per-copy `IoBufferCore` heap allocations in `CompositeBuffer::CopyTo / ViewOrCopyRange` iterators; add fast path for `BufferHeader::Read` when the 64-byte header fits in the first plain-memory segment.
- `BufferHeader`: add `IsHeaderValid()` gate covering `BlockSizeExponent` range, `BlockCount * BlockSize` overflow, and `TotalRawSize` bounds before any arithmetic uses them. Defends against attacker-controlled headers that can pass the CRC and trigger OOB writes in `DecompressBlock`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix double WriteResponse in PUT record failure path; the detail-body branch now short-circuits instead of falling through to a second WriteResponse call
- Return 405 Method Not Allowed for unsupported verbs in the root, namespace, bucket, record, and chunk handlers (previously fell through to no response)
- Clamp exec$/replay-recording thread_count so a bogus query value cannot spawn an unbounded worker pool
## Performance / cleanup
- NamespaceMap now uses TransparentStringHash + std::equal_to<>, so Get/Put/Find/Drop can probe the map with a std::string_view directly instead of constructing a temporary std::string on every request
- Replace insert_or_assign with try_emplace under the exclusive lock in GetNamespace; the find() re-check already guarantees the key is absent, so try_emplace matches intent better
## Reverted
- The earlier change to erase the pinned entry from m_DroppedNamespaces after DropNamespace's post-drop work was reverted: other threads may still hold pointers into a dropped namespace, so tearing it down eagerly is unsafe. Dropped namespaces remain pinned for the lifetime of the process as before.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A series of correctness and API hygiene fixes to the intrusive refcount primitives in `zenbase`, culminating in the removal of `RefPtr<T>` in favour of a single unified `Ref<T>` smart pointer.
The changes are motivated by two pieces of latent UB sitting under every `Ref<T>` / `TRefCounted<T>` in the codebase, plus a handful of API footguns on the smart-pointer side (silent raw-pointer decay, missing converting moves, unconstrained conversions from unrelated types).
## Correctness fixes
- **Strict-aliasing UB in atomic helpers** — `AtomicIncrement`/`Decrement`/`Add` took a `volatile uint32_t&` and reinterpret-cast it to `std::atomic<T>*`. The object was never constructed as a `std::atomic`, so the access was type-punning UB. Fixed by changing `m_RefCount` to `std::atomic<uint32_t>` directly in `RefCounted`, `TRefCounted<T>` and `IoBufferCore`. The helpers (and `zenbase/atomic.h`) are later removed entirely — the three callers now invoke `fetch_add`/`fetch_sub` directly.
- **const_cast of non-mutable member** — `AddRef()` / `Release()` are `const` but mutated `m_RefCount` via `const_cast`. Since `m_RefCount` wasn't `mutable`, writing through the cast was UB for any `const`-qualified holder (e.g. a `static const` refcounted singleton). Fixed by marking `m_RefCount` `mutable` and dropping the `const_cast` in `AddRef`/`Release`.
- **Public non-virtual `TRefCounted` destructor** — allowed `delete basePtr;` to slice past the CRTP `DeleteThis()` contract. Moved to `protected`.
## Memory-ordering cleanup
- `AddRef` weakened from seq_cst to **relaxed** (a thread can only take a new reference via one it already holds; nothing needs to synchronize).
- `Release` weakened from seq_cst to **acq_rel** (sufficient to order prior writes before the destructor, and make the decrement visible to observers).
- Diagnostic `RefCount()` / `GetRefCount()` reads made **relaxed** and spelled out as explicit `.load()` — the returned value is stale the moment it's observed, so stronger ordering gives no guarantee.
- No-op on x86 (`lock xadd` either way), but removes a full barrier on every `Ref<T>` copy on ARM64 (Apple silicon / Windows-on-ARM).
## `RefPtr` / `Ref` unification
Before this branch, `RefPtr<T>` and `Ref<T>` were subtly different in ways that made the safer of the two (`Ref`) harder to use and the looser one (`RefPtr`) dangerous:
- `RefPtr::operator T*()` was implicit — `delete refPtr;` compiled silently (double-delete), and the raw pointer could outlive the temporary `RefPtr` it was extracted from. Made `explicit`, then removed entirely once call sites were migrated to `.Get()`.
- `RefPtr(T*)` was implicit while `RefPtr(RefPtr<Derived>&&)` was `explicit` — exactly the opposite of the safety intent. Reversed.
- `RefPtr`'s converting move was unconstrained (any `RefPtr<U>` with an implicitly-convertible `U*` satisfied it, including `void*` and multiple-inheritance base offsets). Added a `DerivedFrom<U, T>` constraint matching `Ref<T>`.
- `Ref<T>` was missing a converting move ctor / move-assignment from `Ref<Derived>` — upcasts of rvalues were going through `AddRef`+`Release` instead of a pointer steal. Added.
- `Release()` and the non-move smart-pointer ops were not `noexcept`, despite being so in practice. Marked `noexcept` throughout.
After all of the above, the two types were functionally identical. The final commit deletes `RefPtr` and rewrites the ~10 consumer files to use `Ref`.
|
| |
|
|
| |
- Improvement: New `ZEN_SCOPED_LOG(Expr)` macro routes `ZEN_INFO`/`ZEN_WARN`/`ZEN_DEBUG` in the enclosing block through the given logger expression instead of the default
- Improvement: `BuildContainer`, `SaveOplog`, and `LoadOplogContext` now take a caller-provided `LoggerRef` so diagnostic messages route through the caller's logger
|
| |
|
|
|
|
|
|
|
|
| |
This PR introduces an in-memory `CidStore` option primarily for use with compute, to avoid hitting disk for ephemeral data which is not really worth persisting. And in particular not worth paying the critical path cost of persistence.
- **MemoryCidStore**: In-memory CidStore implementation backed by a hash map, optionally layered over a standard CidStore. Writes to the backing store are dispatched asynchronously via a dedicated flush thread to avoid blocking callers on disk I/O. Reads check memory first, then fall back to the backing store without caching the result.
- **ChunkStore interface**: Extract `ChunkStore` abstract class (`AddChunk`, `ContainsChunk`, `FilterChunks`) and `FallbackChunkResolver` into `zenstore.h` so `HttpComputeService` can accept different storage backends for action inputs vs worker binaries. `CidStore` and `MemoryCidStore` both implement `ChunkStore`.
- **Compute service wiring**: `HttpComputeService` takes two `ChunkStore&` params (action + worker). The compute server uses `MemoryCidStore` for actions (no disk persistence needed) and disk-backed `CidStore` for workers (cross-action reuse). The storage server passes its `CidStore` for both (unchanged behavior).
|
| |
|
|
|
|
| |
- Feature: Added Workspaces dashboard page with HTTP request stats and per-workspace metrics
- Feature: Added Build Storage dashboard page with service-specific HTTP request stats
- Improvement: Front page now shows Hub and Object Store activity tiles; HTTP panel is fixed above the tiles grid
- Improvement: HTTP stats tiles now include 5m/15m rates and p999/max latency across all service pages
|
| |
|
|
|
|
|
|
|
|
|
| |
- Fix clang-format error accidentally introduced by recent PR
- Fix `FileSize()` CAS race that repeatedly invalidated the cache when concurrent callers both missed; remove `store(0)` on CAS failure
- Fix `WriteChunks` not accounting for initial alignment padding in `m_TotalSize`, causing drift vs `WriteChunk`'s correct accounting
- Fix Create retry sleep computing negative values (100 - N*100 instead of 100 + N*100), matching the Open retry pattern
- Fix `~BlockStore` error log missing format placeholder for `Ex.what()`
- Fix `GetFreeBlockIndex` infinite loop when all indexes have orphan files on disk but aren't in `m_ChunkBlocks`; bound probe to `m_MaxBlockCount`
- Fix `IterateBlock` ignoring `SmallSizeCallback` return value for single out-of-bounds chunks, preventing early termination
- Fix `BlockStoreCompactState::IterateBlocks` iterating map by value instead of const reference
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Bug fixes across zenstore, zenremotestore, and related subsystems, primarily surfaced by static analysis.**
## Cache subsystem (cachedisklayer.cpp)
- Fixed tombstone scoping bug: tombstone flag and missing entry were recorded outside the block where data was removed, causing non-missing entries to be incorrectly tombstoned
- Fixed use-after-overwrite: `RemoveMemCachedData`/`RemoveMetaData` were called after `Payload` was overwritten on cache put, leaking stale data
- Fixed incorrect retry sleep formula (`100 - (3 - RetriesLeft) * 100` always produced the same or negative value; corrected to `(3 - RetriesLeft) * 100`)
- Fixed broken `break` missing from sidecar file read loop, causing reads past valid data
- Fixed missing format argument in three `ZEN_WARN`/`ZEN_ERROR` log calls (format string had `{}` placeholders with no corresponding argument, or vice versa)
- Fixed elapsed timer being accumulated inside the wrong scope in `HandleRpcGetCacheRecords`
- Fixed test asserting against unserialized `RecordPolicy` instead of the deserialized `Loaded` copy
- Initialized `AbortFlag`/`PauseFlag` atomics at declaration (UB if read before first write)
## Build store (buildstore.cpp / buildstore.h)
- Fixed wrong variable used in warning log: used loop index `ResultIndex` instead of `Index`/`MetaLocationResultIndexes[Index]`, logging wrong hash values
- Fixed `sizeof(AccessTimesHeader)` used instead of `sizeof(AccessTimeRecord)` when advancing write offset, corrupting the access times file if the sizes differ
- Initialized `m_LastAccessTimeUpdateCount` atomic member (was uninitialized)
- Changed map iteration loops to use `const auto&` to avoid unnecessary copies
## Project store (projectstore.cpp / projectstore.h)
- Fixed wrong iterator dereferenced in `IterateChunks`: used `ChunkIt->second` (from a different map lookup) instead of `MetaIt->second`
- Fixed wrong assert variable: `Sizes[Index]` should be `RawSizes[Index]`
- Fixed `MakeTombstone`/`IsTombstone` inconsistency: `MakeTombstone` was zeroing `OpLsn` but `IsTombstone` checks `OpLsn.Number != 0`; tombstone creation now preserves `OpLsn`
- Fixed uninitialized `InvalidEntries` counter
- Fixed format string mismatch in warning log
- Initialized `AbortFlag`/`PauseFlag` atomics; changed map iteration to `const auto&`
## Workspaces (workspaces.cpp)
- Fixed missing alias registration when a workspace share is updated: alias was deleted but never re-inserted
- Fixed integer overflow in range clamping: `(RequestedOffset + RequestedSize) > Size` could wrap; corrected to `RequestedSize > Size - RequestedOffset`
- Changed map iteration loops to `const auto&`
## CAS subsystem (cas.cpp, caslog.cpp, compactcas.cpp, filecas.cpp)
- Fixed `IterateChunks` passing original `Payload` buffer instead of the modified `Chunk` buffer (content type was set on the copy but the original was sent to the callback)
- Fixed invalid `std::future::get()` call on default-constructed futures
- Fixed sign-comparison in `CasLogFile::Replay` loop (`int i` vs `size_t`)
- Changed `CasLogFile::IsValid` and `Open` to take `const std::filesystem::path&` instead of by value
- Fixed format string in `~CasContainerStrategy` error log
## Remote store (zenremotestore)
- Fixed `FolderContent::operator==` always returning true: loop variable `PathCount` was initialized to 0 instead of `Paths.size()`
- Fixed `GetChunkIndexForRawHash` looking up from wrong map (`RawHashToSequenceIndex` instead of `ChunkHashToChunkIndex`)
- Fixed double-counted `UniqueSequencesFound` stat (incremented in both branches of an if/else)
- Fixed `RawSize` sentinel value truncation: `(uint32_t)-1` assigned to a `uint64_t` field; corrected to `(uint64_t)-1`
- Initialized uninitialized atomic and struct members across `buildstorageoperations.h`, `chunkblock.h`, and `remoteprojectstore.h`
|
| |
|
|
|
| |
Various fixes to make cpp files build in unity build mode
as an aside using Unity build doesn't really seem to work on Linux, unsure why but it leads to link-time issues
|
| |
|
|
|
|
|
|
| |
file retry logic (#766)
* GC - fix handling of attachment ranges
* fix trace/log strings
* fix HTTP access token expiration time logic
* added missing lock retry in zenserver startup
|
| |
|
|
|
| |
* add oplog snapshot function to allow reduction of held oplog locks
* release project lock when precaching each oplog
|
| |
|
| |
initially we had ZENCORE_API macros to potentially allow for DLL linkage. It turns out that this is not useful and the macros just contribute noise, so this change removes them completely.
|
| |
|
|
| |
When changing the default limit-overwrite behavior, a unit test surfaced a bug where an put of data with overwrite cache policy would not get propagated via zen's built-in upstream mechanism with a matching overwrite cache policy to the upstream. This change ensures that it does and leaves the unit test configured to exercise this scenario.
|
| | |
|
| |
|
|
|
| |
* use fixed vectors for batch requests
* refactor cache batch value put/get to not execute code that can throw execeptions in destructor
* extend test with multi-bucket requests
|
| |
|
|
| |
snapshot (#673)
|
| |
|
|
|
| |
- Improvement: Deeper validation of data when scrub is activated (cas/cache/project)
- Improvement: Enabled more multi threading when running scrub operations
- Improvement: Added means to force a scrub operation at startup with a new release using ZEN_DATA_FORCE_SCRUB_VERSION variable in xmake.lua
|
| |
|
|
|
| |
* rework block store block flushing to only happen once at end of block write outside of locks
* fix warning at startup if no gc.dlog file exists
|
| |
|
|
| |
* if we are low on disk space, only run GC if it will remove any data
* make sure we don't treat bail of GC due to disk space as success causing 0 wait between GC passes
|
| |
|
|
|
|
|
|
| |
- adds `zentelemetry` project which houses new functionality for serializing logs and traces in OpenTelemetry Protocol format (OTLP)
- moved existing stats functionality from `zencore` to `zentelemetry`
- adds `TRefCounted<T>` for vtable-less refcounting
- adds `MemoryArena` class which allows for linear allocation of memory from chunks
- adds `protozero` which is used to encode OTLP protobuf messages
|
| |
|
| |
* restructure builds storage stats to match web-ui expectations
|
| |
|
|
| |
* don't use cacherequests utils in cache_cmd.cpp
* make zenutil/cacherequests code into test code helpers only
|
| |
|
| |
* move zen vfs implementation to zenstore
|
| | |
|
| |\ |
|
| | |
| |
| |
| | |
- Improvement: Faster oplog import due to chunk existance check improvement
- Improvement: Cancelling oplog import is now more responsive during initial phase
|
| |/
|
|
| |
Conflicts are now treated as successes, and we optionally return a Details array instead of an ErrorMessages array. Details are returned for all requests in a batch, or no requests in a batch depending on whether there are any details to be shared about any of the put requests. The details for a conflict include the raw hash and raw size of the item. If the item is a record, we also include the record as an object.
|
| | |
|
| |
|
|
|
|
|
| |
- Refactor so we can have more than one cas store for project store and cache.
- Refactor `UpstreamCacheClient` so it is not tied to a specific CidStore
- Refactor scrub to keep the GC interface ScrubStorage function separate from scrub accessor functions (renamed to Scrub).
- Refactor storage size to keep GC interface StorageSize function separate from size accessor functions (renamed to TotalSize)
- Refactor cache storage so `ZenCacheDiskLayer::CacheBucket` implements GcStorage interface rather than `ZenCacheNamespace`
|
| |
|
|
| |
keep rawsize and rawhash if available when using batch for inline puts
keep rawsize and rawhash of input value if we have calculated it for validation already
|
| |\ |
|
| | |
| |
| |
| |
| |
| | |
- Improvement: Refactored build store cache to use existing CidStore implementation instead of implementation specific blob storage
- **CAUTION** This will clear any existing cache when updating as the manifest version and storage strategy has changed
- Bugfix: BuildStorage cache return "true" for metadata existance for all blobs that had payloads regardless of actual existance for metadata
|
| | | |
|
| | | |
|
| |\| |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* make sure to close log file when resetting log
* drop entries that refers to missing blocks
* Don't scrub keys that has been rewritten
* currectly count added bytes / m_TotalSize
* fix negative sleep time in BlockStoreFile::Open()
* be defensive when fetching log position
* append to log files *after* we updated all state successfully
* explicitly close stuff in destructors with exception catching
* clean up empty size block store files
|
| | |
| |
| |
| | |
- Bugfix: Flush the last block before closing the last new block written to during blockstore compact. UE-291196
- Feature: Drop unreachable CAS data during GC pass. UE-291196
|
| | |
| |
| | |
Improvement: Faster oplog validate to reduce GC wall time and disk I/O pressure
|
| | |
| |
| |
| | |
* don't hold exclusive locks while deleting files from a dropped bucket/namespace
* cleaner detection of missing namespace when issuing a drop
|
| | |
| |
| |
| |
| | |
- Improvement: Cleaned up snapshot writing for CompactCAS/FileCas/Cache/Project stores
- Improvement: Safer recovery when failing to delete log for CompactCAS/FileCas/Cache/Project stores
- Improvement: Added log file reset when writing snapshot at startup for FileCas
|
| | |
| |
| |
| | |
Feature: Add per bucket cache configuration (Lua options file only)
Improvement: --cache-memlayer-sizethreshold is now deprecated and has a new name: --cache-bucket-memlayer-sizethreshold to line up with per cache bucket configuration
|
| | |
| |
| | |
- Feature: zenserver option `--buildstore-disksizelimit` to set an soft upper limit for build storage data. Defaults to 1TB.
|
| | |
| |
| |
| |
| | |
* save payload size in log for buildstore
* read/write access times and manifest for buldstore
* use retry when removing temporary files
|
| | |
| |
| | |
- Feature: `zen builds` auth option `--oidctoken-exe-path` to let zen run the OidcToken executable to get and refresh authentication token
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
- **EXPERIMENTAL** `zen builds`
- Feature: `--zen-cache-host` option for `upload` and `download` operations to use a zenserver host `/builds` endpoint for storing build blob and blob metadata
- Feature: New `/builds` endpoint for caching build blobs and blob metadata
- `/builds/{namespace}/{bucket}/{buildid}/blobs/{hash}` `GET` and `PUT` method for storing and fetching blobs
- `/builds/{namespace}/{bucket}/{buildid}/blobs/putBlobMetadata` `POST` method for storing metadata about blobs
- `/builds/{namespace}/{bucket}/{buildid}/blobs/getBlobMetadata` `POST` method for fetching metadata about blobs
- `/builds/{namespace}/{bucket}/{buildid}/blobs/exists` `POST` method for checking existance of blobs
|
| | |
| |
| |
| |
| |
| | |
* Added EASTL to help with eliminating memory allocations
* Applied EASTL to eliminate memory allocations, primarily by using `fixed_vector` et al to use stack allocations / inline struct allocations
Reduces memory events in traces by close to a factor of 10 in test scenario (starting editor for project F)
|
| | | |
|
| | |
| |
| |
| | |
Result structure contains status and a string message (may be empty)
|
| | | |
|
| |/
|
|
| |
Overwrite with differing value should be denied if QueryLocal is not present and StoreLocal is present. Overwrite with equal value should succeed regardless of policy flags.
|