| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
provision is complete
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
allows for more granular notification and stricter state transitions
|
| | |
|
| | |
|
| |
|
|
|
| |
- Improvement: Hub provision, deprovision, hibernate, and wake operations are now async. HTTP requests returns 202 Accepted while the operation completes in the background
- Improvement: Hub returns 202 Accepted (instead of 409 Conflict) when the same async operation is already in progress for a module
- Improvement: Hub returns 200 OK when a requested state transition is already satisfied
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a `SubprocessManager` for managing child processes with ASIO-integrated async exit detection, stdout/stderr pipe capture, and periodic metrics sampling. Also introduces `ProcessGroup` for OS-backed process grouping (Windows JobObjects / POSIX process groups).
### SubprocessManager
- Async process exit detection using platform-native mechanisms (Windows `object_handle`, Linux `pidfd_open`, macOS `kqueue EVFILT_PROC`) — no polling
- Stdout/stderr capture via async pipe readers with per-process or default callbacks
- Periodic round-robin metrics sampling (CPU, memory) across managed processes
- Spawn, adopt, remove, kill, and enumerate managed processes
### ProcessGroup
- OS-level process grouping: Windows JobObject (kill-on-close guarantee), POSIX `setpgid` (bulk signal delivery)
- Atomic group kill via `TerminateJobObject` (Windows) or `kill(-pgid, sig)` (POSIX)
- Per-group aggregate metrics and enumeration
### ProcessHandle improvements
- Added explicit constructors from `int` (pid) and `void*` (native handle)
- Added move constructor and move assignment operator
### ProcessMetricsTracker
- Cross-platform process metrics (CPU time, working set, page faults) via `QueryProcessMetrics()`
- ASIO timer-driven periodic sampling with configurable interval and batch size
- Aggregate metrics across tracked processes
### Other changes
- Fixed `zentest-appstub` writing a spurious `Versions` file to cwd on every invocation
|
| |
|
|
| |
* refactor hub callbacks
* improve http responses
|
| |
|
|
|
|
|
| |
- **Cross-platform `GetProcessMetrics`**: Implement Linux (`/proc/{pid}/stat`, `/proc/{pid}/statm`, `/proc/{pid}/status`) and macOS (`proc_pidinfo(PROC_PIDTASKINFO)`) support for CPU times and memory metrics. Fix Windows to populate the `MemoryBytes` field (was always 0). All platforms now set `MemoryBytes = WorkingSetSize`.
- **`ProcessMetricsTracker`**: Experimental utility class (`zenutil`) that periodically samples resource usage for a set of tracked child processes. Supports both a dedicated background thread and an ASIO steady_timer mode. Computes delta-based CPU usage percentage across samples, with batched sampling (8 processes per tick) to limit per-cycle overhead.
- **`ProcessHandle` documentation**: Add Doxygen comments to all public methods describing platform-specific behavior.
- **Cleanup**: Remove unused `ZEN_RUN_TESTS` macro (inlined at its single call site in `zenserver/main.cpp`), remove dead `#if 0` thread-shutdown workaround block.
- **Minor fixes**: Use `HttpClientAccessToken` constructor in hordeclient instead of setting private members directly. Log ASIO version at startup and include it in the server settings list.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## Summary
This PR adds a session management service, several new dashboard pages, and a number of infrastructure improvements.
### Sessions Service
- `SessionsServiceClient` in `zenutil` announces sessions to a remote zenserver with a 15s heartbeat (POST/PUT/DELETE lifecycle)
- Storage server registers itself with its own local sessions service on startup
- Session mode attribute coupled to server mode (Compute, Proxy, Hub, etc.)
- Ended sessions tracked with `ended_at` timestamp; status filtering (Active/Ended/All)
- `--sessions-url` config option for remote session announcement
- In-process log sink (`InProcSessionLogSink`) forwards server log output to the server's own session, visible in the dashboard
### Session Log Viewer
- POST/GET endpoints for session logs (`/sessions/{id}/log`) supporting raw text and structured JSON/CbObject with batch `entries` array
- In-memory log storage per session (capped at 10k entries) with cursor-based pagination for efficient incremental fetching
- Log panel in the sessions dashboard with incremental DOM updates, auto-scroll (Follow toggle), newest-first toggle, text filter, and log-level coloring
- Auto-selects the server's own session on page load
### TCP Log Streaming
- `LogStreamListener` and `TcpLogStreamSink` for log delivery over TCP
- Sequence numbers on each message with drop detection and synthetic "dropped" notice on gaps
- Gathered buffer writes to reduce syscall overhead when flushing batches
- Tests covering basic delivery, multi-line splitting, drop detection, and sequencing
### New Dashboard Pages
- **Sessions**: master-detail layout with selectable rows, metadata panel, live WebSocket updates, paging, abbreviated date formatting, and "this" pill for the local session
- **Object Store**: summary stats tiles and bucket table with click-to-expand inline object listing (`GET /obj/`)
- **Storage**: per-volume disk usage breakdown (`GET /admin/storage`), Garbage Collection status section (next-run countdown, last-run stats), and GC History table with paginated rows and expandable detail panels
- **Network**: overview tiles, per-service request table, proxy connections, and live WebSocket updates; distinct client IPs and session counts via HyperLogLog
### Documentation Page
- In-dashboard Docs page with sidebar navigation, markdown rendering (via `marked`), Mermaid diagram support (theme-aware), collapsible sections, text filtering with highlighting, and cross-document linking
- New user-facing docs: `overview.md` (with architecture and per-mode diagrams), `sessions.md`, `cache.md`, `projects.md`; updated `compute.md`
- Dev docs moved to `docs/dev/`
### Infrastructure & Bug Fixes
- **Deflate compression** for the embedded frontend zip (~3.4MB → ~950KB); zlib inflate support added to `ZipFs` with cached decompressed buffers
- **Local IP addresses**: `GetLocalIpAddresses()` (Windows via `GetAdaptersAddresses`, Linux/Mac via `getifaddrs`); surfaced in `/status/status`, `/health/info`, and the dashboard banner
- **Dashboard nav**: unified into `zen-nav` web component with `MutationObserver` for dynamically added links, CSS `::part()` to merge banner/nav border radii, and prefix-based active link detection
- Stats broadcast refactored from manual JSON string concatenation to `CbObjectWriter`; `CbObject`-to-JS conversion improved for `TimeSpan`, `DateTime`, and large integers
- Stats WebSocket boilerplate consolidated into `ZenPage.connect_stats_ws()`
|
| |
|
| |
* add hub instance crash recovery
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
## Summary
Adds probabilistic cardinality estimation for tracking unique HTTP clients and sessions using a HyperLogLog implementation.
- Add a `HyperLogLog<Precision>` template in `zentelemetry` with thread-safe lock-free register updates, merge support, and XXH3 hashing
- Feed client IP addresses (via raw bytes) and session IDs (via `Oid` bytes) into their respective HyperLogLog estimators from both the ASIO and http.sys server backends
- Emit `distinct_clients` and `distinct_sessions` cardinality estimates in HTTP `CollectStats()`
- Add tests covering empty, single, duplicates, accuracy, merge, and clear scenarios
## Why HyperLogLog
Tracking exact unique counts would require storing every observed IP or session ID. HyperLogLog provides a memory-bounded probabilistic estimate (~1–2% error) using only a few KB of memory regardless of traffic volume.
|
| |
|
|
|
|
|
|
|
|
|
| |
- **`Logger` now holds a single `SinkPtr`** instead of a `std::vector<SinkPtr>`. The `SetSinks`/`AddSink` API is replaced with a single `SetSink`. This removes complexity from `Logger` itself and makes `Clone()` cheaper (no vector copy).
- **New `BroadcastSink`** (`zencore/logging/broadcastsink.h`) acts as a thread-safe, shared indirection point that fans out to a dynamic list of child sinks. Adding or removing a child sink via `AddSink`/`RemoveSink` is immediately visible to every `Logger` that holds a reference to it — including cloned loggers — without requiring each logger to be updated individually.
- **`GetDefaultBroadcastSink()`** (exposed from `zenutil/logging.h`) gives server-layer code access to the shared broadcast sink so it can register optional sinks (OTel, TCP log stream) after logging is initialized, without going through `Default()->AddSink()`.
### Motivation
Previously, dynamically adding sinks post-initialization mutated the default logger's internal sink vector directly. This was fragile: cloned loggers (created before `AddSink` was called) would not pick up the new sinks. `BroadcastSink` fixes this by making the sink list a shared, mutable object that all loggers sharing the same broadcast instance observe uniformly.
|
| |
|
|
|
|
|
|
|
|
|
| |
This PR improves process lifecycle handling and resilience across several areas:
- **Reclaim stale shared-memory entries instead of exiting** (`zenserver.cpp`): When a zenserver instance fails to attach as a sponsor to an existing process (e.g. because the PID was reused by an unrelated process), the server now clears the stale shared-memory entry and proceeds with normal startup instead of calling `std::exit(1)`.
- **Wait for child process exit in `Kill()` and `Terminate()` on Unix** (`process.cpp`): After sending `SIGTERM` in `Kill()`, the code now waits up to 5s for graceful shutdown (escalating to `SIGKILL` on timeout), matching the Windows behavior. `Terminate()` also waits after `SIGKILL` so the child is properly reaped and doesn't linger as a zombie clogging up the process table.
- **Fix sysctl buffer race in macOS `FindProcess`** (`process.cpp`): The macOS process enumeration now retries the `sysctl` call (up to 3 attempts with 25% buffer padding) to handle the race where the process list changes between the sizing call and the data-fetching call. Also flattens the nesting and fixes the guard/free scoping.
- **Terminate stale processes before integration tests** (`zenserver-test.cpp`, `test.lua`): The integration test runner now accepts a `--kill-stale-processes` flag (passed automatically by `test.lua`) that scans for and terminates any leftover `zenserver`, `zenserver-test`, and `zentest-appstub` processes from previous test runs, logging the executable name and PID of each. This addresses flaky test failures caused by stale processes from prior runs holding ports or other resources.
|
| |
|
|
| |
- Improvement: Lazy initialize CpuSampler - reduces startup time by ~600 ms
- Bugfix: Don't try to wipe .sentry-native folder at missing manifest - sentry is already running. Reduces startup time by ~450 ms when data folder is empty
|
| |
|
|
|
|
| |
- Feature: Added S3 hydration backend for hub mode (`--hub-hydration-target-spec s3://<bucket>[/<prefix>]`)
- Credentials resolved from `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars, falling back to EC2 instance profile via IMDS
- Each dehydration uploads to a new timestamped folder and commits a `current-state.json` pointer on success, so a failed upload never invalidates the previous state
- Hydration downloads to a temp directory first and only replaces the server state on full success; failures leave the existing state intact
|
| |
|
|
|
|
|
|
|
|
| |
- Improvement: Hub dashboard module list improved
- Animated state dots, with flashing transitions for hibernating, waking, provisioning, and deprovisioning
- Per-row port used by the provisioned instance
- Per-row hibernate, wake, and deprovision actions with confirmation for destructive operations
- Per-row button to open the instance dashboard in a new window
- Multi-select with bulk hibernate/wake/deprovision and select-all
- Pagination at 50 lines per page
- Inline fold-out panel per row with human-readable process metrics
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Upgrade mimalloc from v2.1.2 to v2.2.7. Note that mimalloc is no longer the default allocator so this only impacts users who somehow opt into mimalloc via `--malloc=mimalloc` or compile with different defaults
- Add all available mimalloc versions (1.6.7–3.2.8) to the package definition for testing
- Log the active memory allocator (with version where available) at server startup
- Annotate vendored rpmalloc with its source commit and version
## Notable changes in mimalloc 2.1.2 → 2.2.7
- **Memory release fix** (2.2.4): fix case where OS memory was not always fully released
- **Race condition fix** (2.2.6): fixed rare race condition and potential buffer overflow in debug statistics
- **Windows arm64 support** (2.1.9)
- **Guarded build** (2.1.9): new build mode that places OS guard pages behind objects to catch buffer overflows
- **THP awareness** (2.2.6): auto-detects transparent huge pages and adjusts purge size to avoid fragmentation
- **Faster TLS access on Windows** (2.2.6)
- **Improved calloc and aligned allocation performance** (2.2.6)
- **New diagnostic APIs** (2.2.2): `mi_options_print`, `mi_arenas_print`, `mi_stat_get` / `mi_stat_get_json`
- **macOS**: use `MADV_FREE_REUSABLE` for better memory behavior (2.2.4)
- **Build fixes**: Android, Xbox, musl, mingw, arm32, Debian 32-bit, non-BMI1 x64 systems
## Allocator logging
Added `FMalloc::GetName()` pure virtual so the server logs which allocator is active at startup:
```
zenserver - memory allocator: mimalloc 2.2.7
```
Allocator names include version where available:
- `mimalloc 2.2.7` (runtime version via `mi_version()`)
- `rpmalloc 1.5.0-dev.20250810` (ad-hoc version from vendored develop branch commit)
- `ansi`, `stomp` (no version info available)
## Test plan
- [x] Builds successfully on Windows (release)
- [x] Verify server startup log shows allocator name
- [x] Test with `--malloc=mimalloc` (default) and `--malloc=rpmalloc`
- [x] Run test suites to check for regressions
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Feature: Added `zen hub` command for managing a hub server and its provisioned module instances:
- `zen hub up` - Start a hub server (equivalent to `zen up` in hub mode)
- `zen hub down` - Shut down a hub server
- `zen hub provision <moduleid>` - Provision a storage server instance for a module
- `zen hub deprovision <moduleid>` - Deprovision a storage server instance
- `zen hub hibernate <moduleid>` - Hibernate a provisioned instance (shut down, data preserved)
- `zen hub wake <moduleid>` - Wake a hibernated instance
- `zen hub status [moduleid]` - Show state of all instances or a specific module
- Feature: Added new hub HTTP endpoints for instance lifecycle management:
- `POST /hub/modules/{moduleid}/hibernate` - Hibernate the instance for the given module
- `POST /hub/modules/{moduleid}/wake` - Wake a hibernated instance for the given module
- Improvement: `zen up` refactored to use shared `StartupZenServer`/`ShutdownZenServer` helpers (also used by `zen hub up`/`zen hub down`)
- Bugfix: Fixed shutdown event not being cleared after the server process exits in `ZenServerInstance::Shutdown()`, which could cause stale state on reuse
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- **RAII pipe handles for child process stdout/stderr capture**: `StdoutPipeHandles` is now a proper RAII type with automatic cleanup, move semantics, and partial close support. This makes it safe to use pipes for capturing child process output without risking handle/fd leaks.
- **Optional separate stderr pipe**: `CreateProcOptions` now accepts a `StderrPipe` field so callers can capture stdout and stderr independently. When null (default), stderr shares the stdout pipe as before.
- **LogStreamListener with pluggable handler**: The TCP log stream listener accepts connections from remote processes and delivers parsed log lines through a `LogStreamHandler` interface, set dynamically via `SetHandler()`. This allows any client to receive log messages without depending on a specific console implementation.
- **TcpLogStreamSink for zen::logging**: A logging sink that forwards log messages to a `LogStreamListener` over TCP, using the native `zen::logging::Sink` infrastructure with proper thread-safe synchronization.
- **Reliable child process exit codes on Linux**: `waitpid` result handling is fixed so `ProcessHandle::GetExitCode()` returns the real exit code. `ProcessHandle::Reset()` reaps zombies directly, replacing the global `IgnoreChildSignals()` which prevented exit code collection entirely. Also fixes a TOCTOU race in `ProcessHandle::Wait()` on Linux/Mac.
- **Pipe capture test suite**: Tests covering stdout/stderr capture via pipes (both shared and separate modes), RAII cleanup, move semantics, and exit code propagation using `zentest-appstub` as the child process.
- **Service command integration tests**: Shell-based integration tests for `zen service` covering the full lifecycle (install, status, start, stop, uninstall) on all three platforms — Linux (systemd), macOS (launchd), and Windows (SCM via PowerShell).
- **Test script reorganization**: Platform-specific test scripts moved from `scripts/test_scripts/` into `scripts/test_linux/`, `test_mac/`, and `test_windows/`.
|
| |
|
|
|
|
| |
- When build store is not configured, `m_BuildCidStore` is null but was unconditionally added as a stats provider, causing a null pointer dereference crash in `StatsReporter::ReportStats`
- Added a null guard in `AddProvider` to reject null pointers defensively
- Added a conditional check at the call site in `InitializeStructuredCache`
|
| |
|
|
| |
- Bugfix: Retry OIDC token refresh once on failure before propagating the error
- Bugfix: Handle HTTP 501 (Not Implemented) from Jupiter as a signal to fall back from multi-range to single-range requests
|
| |
|
|
|
|
|
| |
- Improvement: Hub module listing now includes per-instance process metrics (memory, CPU time, working set, pagefile usage)
- Improvement: Hub now monitors provisioned instance health in the background and refreshes process metrics periodically
- Improvement: Hub no longer exposes raw `StorageServerInstance` pointers to callers; instance state is returned as value snapshots (`Hub::InstanceInfo`)
- Improvement: Hub instance access is now guarded by RAII per-instance locks (`SharedLockedPtr`/`ExclusiveLockedPtr`), preventing concurrent modifications during provisioning and deprovisioning
- Improvement: Hub instance lifecycle is now tracked as a `HubInstanceState` enum covering transitional states (Provisioning, Deprovisioning, Hibernating, Waking); exposed as a string in the HTTP API and dashboard
|
| |
|
| |
- Feature: Added support for consul token passed via environment variable, and specified a default env var name of CONSUL_HTTP_TOKEN for it in hub mode
|
| |
|
|
|
| |
This PR adds a `zen bench disk` subcommand to help with gathering user performance metrics
It also contains a fix for `xmake precommit`, the task now probes to find the most appropriate way to launch pre-commit
|
| | |
|
| |
|
| |
Authentication callbacks are not thread safe, ensured call sites does single threaded calls
|
| |
|
|
|
|
|
|
|
|
| |
- Install a crash handler at the very top of main() in both zenserver and zen
- On Windows, uses SetUnhandledExceptionFilter with StackWalk64 for accurate
crash-site backtraces with DbgHelp symbol resolution
- On Linux/Mac, uses sigaction with async-signal-safe backtrace output
- Automatically superseded when Sentry/crashpad installs its own handlers
- Stays active for the full process lifetime if Sentry is disabled or absent
- Include .sym debug symbol files in Linux release bundle
|
| |
|
| |
Improved workaround for troubles with code potentially logging before logging is initialized. Any logging will be routed to a default console logger until logging is initialized fully
|
| |\
| |
| | |
Zs/long filename improvement
|
| | | |
|
| | |\ |
|
| | | | |
|