# Hub Zenserver's hub mode runs as a coordinator that manages multiple storage server instances on a single host. Rather than serving storage requests directly, the hub listens for provision and deprovision requests from an orchestrator and spawns dedicated zenserver child processes on demand -- one per *module*. Each module is identified by an alphanumeric string (typically associated with a content plugin or project). Each instance gets its own TCP port and data directory under the hub's data directory. Typical deployments: - **Build farm**: build agents on the same host each receive an isolated zenserver instance for their build, provisioned by the build orchestration system. - **Shared team cache**: an orchestrator provisions an instance per project or team and deprovisions it after a period of inactivity. The hub handles the full lifecycle: hydrating new instances from a shared storage source, monitoring running processes, handling crashes, and cleaning up deprovisioned instances automatically. ## Instance Lifecycle Each module slot progresses through a series of states managed by the hub. ```mermaid stateDiagram-v2 [*] --> Unprovisioned Unprovisioned --> Provisioning : Provision Provisioning --> Provisioned : ready Provisioning --> Unprovisioned : failed Provisioned --> Hibernating : Hibernate Provisioned --> Deprovisioning : Deprovision / timeout Provisioned --> Obliterating : Obliterate Provisioned --> Crashed : process exited Hibernating --> Hibernated : stopped Hibernating --> Provisioned : failed Hibernated --> Waking : Wake Hibernated --> Deprovisioning : Deprovision / timeout Hibernated --> Obliterating : Obliterate Waking --> Provisioned : ready Waking --> Hibernated : failed Deprovisioning --> Unprovisioned : done Crashed --> Recovering : watchdog Crashed --> Deprovisioning : Deprovision Crashed --> Obliterating : Obliterate Recovering --> Provisioned : success Recovering --> Unprovisioned : failed Obliterating --> Unprovisioned : done Obliterating --> Crashed : failed ``` **Stable states:** - **Unprovisioned** - no process running; the slot is ready for a new provision request. - **Provisioned** - process running and serving requests. The watchdog monitors activity and deprovisions the instance after a configurable inactivity timeout (see [Watchdog Tuning](#watchdog-tuning)). - **Hibernated** - process stopped, data directory preserved. The instance can be woken quickly without re-hydrating. The watchdog deprovisions (and deletes data) after a configurable inactivity timeout (see [Watchdog Tuning](#watchdog-tuning)). - **Crashed** - process exited unexpectedly. The watchdog attempts an in-place restart automatically. Transitioning states (`Provisioning`, `Hibernating`, `Waking`, `Deprovisioning`, `Recovering`, `Obliterating`) are transient and held exclusively by one operation at a time. If hibernation fails (the process cannot be stopped cleanly), the instance remains Provisioned. **Hibernation vs deprovision:** hibernating stops the process but keeps the data directory intact, allowing fast restart on the next Wake. Deprovisioning triggers a GC cycle, then dehydrates the instance's state back to the configured backend, then deletes all local instance data. Explicit deprovision requests are always honoured; the watchdog timeout path always deprovisions rather than hibernates. **Obliterate vs deprovision:** deprovisioning preserves data on the hydration backend so the next provision of the same module starts warm. Obliterate permanently destroys both local instance data and all backend hydration data for the module. This is irreversible. Obliterate can be called on Provisioned, Hibernated, or Crashed instances. It also works on modules that are not currently tracked by the hub (already deprovisioned) -- in that case it deletes only the backend hydration data. Obliterating a module that was never provisioned is a no-op success. **Idempotent operations:** hibernating an already-hibernated instance, waking an already-provisioned instance, deprovisioning a non-existent module, or obliterating a never-provisioned module all return success without side effects. ## The Watchdog The hub runs a background watchdog thread that manages instance lifecycle automatically. **Cycle:** every `cycleintervalms` milliseconds (default 3 s) the watchdog: 1. Refreshes machine-level metrics (disk usage, system memory) used for provisioning limit checks. 2. Iterates over all active instance slots, spending at most `cycleprocessingbudgetms` (default 500 ms) per cycle with an `instancecheckthrottlems` (default 5 ms) pause between instance checks. **Per-instance checks:** For each **Provisioned** instance: - Verifies the process is still running. If not, transitions to Crashed. - Checks for inactivity. The check begins `inactivitycheckmarginseconds` (default 1 min) before the full timeout -- with a 10 min timeout, checking begins once 9 min of inactivity have elapsed. It makes a short HTTP request to the instance's `/stats/activity_counters` endpoint. Activity is any client storage request processed by the instance. If the activity counter increased since the previous check, the inactivity timer resets. If not, and the full `provisionedinactivitytimeoutseconds` has elapsed, the instance is automatically deprovisioned. For each **Hibernated** instance: - Checks elapsed time since last activity. No HTTP request is needed (process not running). After `hibernatedinactivitytimeoutseconds` (default 30 min), the instance is deprovisioned and its data deleted. For each **Crashed** instance: - Attempts an in-place restart (Recovering state) using the existing data directory, without re-hydrating. On success, the instance returns to Provisioned and an upstream notification is sent. On failure, the instance is deprovisioned. **Summary with default settings:** | Event | Time | |---|---| | Watchdog cycle | every 3 s | | Activity check fires | 9 min after last activity | | Provisioned auto-deprovision | 10 min after last activity | | Hibernated auto-deprovision | 30 min after last activity | ## Running the Hub ``` zenserver hub [options] ``` For non-trivial deployments, pass a Lua configuration file rather than command-line flags: ``` zenserver hub --config /etc/zen/hub.lua ``` The hub's own config file is separate from the per-instance config (see [Instance Management](#instance-management)). ## Configuration Reference Options can be set via Lua config file or command-line flags. Lua values take precedence when both are provided. For general server options (listening port, data directory, logging, HTTP server selection) that apply to zenserver in all modes, see the zenserver configuration documentation. --- ### Core Server Options These generic options apply to zenserver in all modes, including hub. | CLI flag | Lua key | Description | |-----------------|----------------------------|-------------| | `--config` | _(CLI only)_ | Path to the Lua configuration file for the hub. | | `--data-dir` | `server.datadir` | Root directory for hub data, logs, and instance subdirectories. | | `--port` | `network.port` | TCP port the hub listens on. | | `--http` | `network.httpserverclass` | HTTP server implementation (`httpsys` or `asio`). | | `--dedicated` | `server.dedicated` | Dedicated server mode: disables port probing and allocates more resources. | | `--no-sentry` | `server.sentry.disable` | Disable Sentry crash reporting. | --- ### Instance Management Controls the child storage server processes spawned by the hub. Each instance stores its data in `/servers//`. Instance directories persist across hibernation and are removed only on deprovision. | CLI flag | Lua key | Default | Description | |------------------------------------|--------------------------------|--------------------------------------|-------------| | `--hub-instance-base-port-number` | `hub.instance.baseportnumber` | `21000` | First port in the instance port pool. Instances are assigned ports sequentially from this base. Ports are not guaranteed to be stable across deprovision/re-provision cycles; a module may receive a different port after being deprovisioned and re-provisioned. | | `--hub-instance-limit` | `hub.instance.limits.count` | `1000` | Maximum simultaneously provisioned instances. Provision requests are rejected once this limit is reached. | | `--hub-instance-http` | `hub.instance.http` | `httpsys` (Windows), `asio` (Linux/macOS) | HTTP server implementation for child instances. On Windows, use `asio` if the hub runs without elevation and no URL reservation covers the instance port range. | | `--hub-instance-http-threads` | `hub.instance.httpthreads` | `0` | HTTP connection threads per child instance. `0` uses hardware concurrency. | | `--hub-instance-corelimit` | `hub.instance.corelimit` | `0` | Concurrency limit for child instances. `0` is automatic. | | `--hub-instance-provision-threads` | `hub.instance.provisionthreads` | `max(cpu/4, 2)` | Thread count for the instance provisioning worker pool. Controls parallel I/O during provision and deprovision operations. Set to `0` for synchronous operation. | | `--hub-instance-config` | `hub.instance.config` | _(none)_ | Path to a Lua config file passed to every spawned child instance. Use this to configure storage paths, cache sizes, and other storage server settings. See the zenserver configuration documentation. | | `--hub-instance-malloc` | `hub.instance.malloc` | _(none)_ | Memory allocator for child instances (`ansi`, `stomp`, `rpmalloc`, `mimalloc`). When unset, instances use their compiled-in default. | | `--hub-instance-trace` | `hub.instance.trace` | _(none)_ | Trace channel specification for child instances (e.g. `default`, `cpu,log`, `memory`). When set, instances start with tracing enabled on the specified channels. | | `--hub-instance-tracehost` | `hub.instance.tracehost` | _(none)_ | Trace host for child instances. Instances stream trace data to this host. | | `--hub-instance-tracefile` | `hub.instance.tracefile` | _(none)_ | Trace file path for child instances. Supports `{moduleid}` and `{port}` placeholders, resolved per instance. Without placeholders all instances write to the same file. | --- ### Resource Limits The hub checks machine resources before accepting a provision request. If any enabled limit is exceeded, the request is rejected. Limits are refreshed at the start of each watchdog cycle. Setting a limit to `0` disables it. Byte and percent limits are independent -- either can trigger a rejection. | CLI flag | Lua key | Default | Description | |-----------------------------------------|-----------------------------------------|---------|-------------| | `--hub-provision-disk-limit-bytes` | `hub.instance.limits.disklimitbytes` | `0` | Reject provision if used bytes on the filesystem volume containing the data directory exceeds this value. | | `--hub-provision-disk-limit-percent` | `hub.instance.limits.disklimitpercent` | `0` | Reject provision if used space on the filesystem volume containing the data directory exceeds this percentage of its total capacity (0-100). | | `--hub-provision-memory-limit-bytes` | `hub.instance.limits.memorylimitbytes` | `0` | Reject provision if used system-wide physical memory exceeds this many bytes. | | `--hub-provision-memory-limit-percent` | `hub.instance.limits.memorylimitpercent` | `0` | Reject provision if used system-wide physical memory exceeds this percentage of total RAM (0-100). | --- ### Watchdog Tuning | CLI flag | Lua key | Default | Description | |------------------------------------------------------------|-----------------------------------------------------|-------------------|-------------| | `--hub-watchdog-cycle-interval-ms` | `hub.watchdog.cycleintervalms` | `3 s (3000 ms)` | Milliseconds between watchdog cycles. | | `--hub-watchdog-cycle-processing-budget-ms` | `hub.watchdog.cycleprocessingbudgetms` | `500 ms` | Maximum milliseconds spent processing instances per cycle. | | `--hub-watchdog-instance-check-throttle-ms` | `hub.watchdog.instancecheckthrottlems` | `5 ms` | Milliseconds to wait between successive instance checks within a cycle. | | `--hub-watchdog-provisioned-inactivity-timeout-seconds` | `hub.watchdog.provisionedinactivitytimeoutseconds` | `10 min (600 s)` | Seconds of inactivity before a running instance is automatically deprovisioned. `0` disables automatic deprovisioning for running instances. | | `--hub-watchdog-hibernated-inactivity-timeout-seconds` | `hub.watchdog.hibernatedinactivitytimeoutseconds` | `30 min (1800 s)` | Seconds of inactivity before a hibernated instance is deprovisioned and its data deleted. `0` disables automatic deprovisioning for hibernated instances. | | `--hub-watchdog-inactivity-check-margin-seconds` | `hub.watchdog.inactivitycheckmarginseconds` | `1 min (60 s)` | Activity check window opens this many seconds before the provisioned inactivity timeout. With defaults, checking begins at 9 min and the hard timeout is at 10 min. | | `--hub-watchdog-activity-check-connect-timeout-ms` | `hub.watchdog.activitycheckconnecttimeoutms` | `100 ms` | Connect timeout in milliseconds for activity check requests. | | `--hub-watchdog-activity-check-request-timeout-ms` | `hub.watchdog.activitycheckrequesttimeoutms` | `200 ms` | Request timeout in milliseconds for activity check requests. | --- ### Hydration Hydration pre-populates a new instance's data directory from a shared storage backend before the instance starts serving requests. On deprovision, the hub dehydrates the instance's state back to the same backend so the next provision of that module starts warm. The hydration system is incremental and content-addressed: files are hashed and stored in a CAS (content-addressable store) on the backend. Only files that changed since the last dehydration are uploaded or downloaded, and unchanged content is shared across modules. A cached state object tracks the file manifest between hydrate/dehydrate cycles to avoid redundant hashing and transfers. Before dehydrating, the hub triggers a GC cycle on the instance to compact storage, reducing the amount of data transferred to the backend. If neither hydration option is set, the hub automatically creates a `hydration_storage` directory under its own data directory and uses that as the file hydration source. This is suitable for single-host deployments where instances share locally cached data. `targetspec` and `targetconfig` are mutually exclusive. | CLI flag | Lua key | Default | Description | |-----------------------------------|------------------------------|---------------------------|-------------| | `--hub-hydration-target-spec` | `hub.hydration.targetspec` | _(local path, see above)_ | Shorthand URI for the hydration source. Must use the `file://` prefix for file targets: `file:///absolute/path`. | | `--hub-hydration-target-config` | `hub.hydration.targetconfig` | _(none)_ | Path to a JSON file specifying the hydration source. Supports `file` and `s3` backends. | | `--hub-hydration-threads` | `hub.hydration.threads` | `max(cpu/4, 2)` | Thread count for the hydration/dehydration worker pool. Controls parallel file hashing and backend I/O during hydrate/dehydrate. Set to `0` for synchronous operation. | **File backend** (`hub.hydration.targetconfig` JSON): ```json { "type": "file", "settings": { "path": "/data/hydration_storage" } } ``` **S3 backend** (`hub.hydration.targetconfig` JSON): ```json { "type": "s3", "settings": { "uri": "s3://bucket-name/optional/prefix", "region": "us-east-1", "endpoint": "https://custom-endpoint", "path-style": false } } ``` S3 settings: | Field | Required | Description | |---|---|---| | `uri` | Yes | S3 URI. Include a prefix path to isolate hub data within a shared bucket. | | `region` | No | AWS region. Defaults to `us-east-1`; also reads `AWS_DEFAULT_REGION` / `AWS_REGION`. | | `endpoint` | No | Custom endpoint URL for S3-compatible services (MinIO, Ceph, etc.). | | `path-style` | No | Use path-style S3 URLs instead of virtual-hosted. Required by some S3-compatible services. Default `false`. | S3 credentials are read from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`). If `AWS_ACCESS_KEY_ID` is not set, the hub falls back to IMDS instance credentials. --- ### Upstream Notifications The hub can notify an external system when instance state changes. Notifications are disabled when `endpoint` is empty. | CLI flag | Lua key | Default | Description | |----------------------------------------|---------------------------------------|----------|-------------| | `--upstream-notification-endpoint` | `hub.upstreamnotification.endpoint` | _(none)_ | URL that receives state-change notifications. | | `--upstream-notification-instance-id` | `hub.upstreamnotification.instanceid` | _(none)_ | Identifier sent in notification payloads to distinguish this hub from others. | --- ### Consul Integration The hub can register with a Consul agent for service discovery and health reporting. Disabled when `endpoint` is empty. | CLI flag | Lua key | Default | Description | |--------------------------------------|--------------------------------------|----------------------|-------------| | `--consul-endpoint` | `hub.consul.endpoint` | _(none)_ | Consul agent URL. Example: `http://localhost:8500`. | | `--consul-token-env` | `hub.consul.tokenenv` | `CONSUL_HTTP_TOKEN` | Name of the environment variable from which the Consul access token is read. | | `--consul-health-interval-seconds` | `hub.consul.healthintervalseconds` | `10 s` | Interval in seconds between Consul health checks. | | `--consul-deregister-after-seconds` | `hub.consul.deregisterafterseconds` | `30 s` | Seconds after which Consul deregisters the service if health checks stop passing. | | `--consul-register-hub` | `hub.consul.registerhub` | `true` | Register the hub parent service with Consul. Instance registration is unaffected. | --- ### Windows: Job Object | CLI flag | Lua key | Default | Description | |--------------------------|----------------------|---------|-------------| | `--hub-use-job-object` | `hub.usejobobject` | `true` | Assigns child processes to a Windows Job Object configured to kill all children when the job handle closes. This ensures child processes are terminated if the hub exits unexpectedly. Disable only if the hub runs inside an existing Job Object that does not permit nested jobs. | --- ## Example Configurations ### Basic Single-host deployment with S3 hydration, an instance count cap, and ASIO HTTP (avoids the http.sys elevation requirement on Windows). ```lua hub = { instance = { baseportnumber = 21000, limits = { count = 20, disklimitpercent = 90, }, http = "asio", config = "/etc/zen/instance.lua", }, hydration = { targetconfig = "/etc/zen/hydration.json", }, } ``` `/etc/zen/hydration.json`: ```json { "type": "s3", "settings": { "uri": "s3://my-zen-cache/hub", "region": "us-east-1" } } ``` ### Full Production Build-farm hub with S3 hydration, Consul registration, upstream notifications, resource limits, and tuned watchdog. ```lua hub = { -- Upstream notification: called on instance state changes upstreamnotification = { endpoint = "https://orchestrator.internal/zen/notifications", instanceid = "build-farm-hub-01", }, -- Consul: service registration and health reporting consul = { endpoint = "http://localhost:8500", tokenenv = "CONSUL_HTTP_TOKEN", healthintervalseconds = 10, deregisterafterseconds = 30, }, instance = { -- Port range starts at 21000 (hub assigns sequentially) baseportnumber = 21000, limits = { count = 100, -- Reject provisions when disk usage exceeds 90% disklimitpercent = 90, -- Reject provisions when system RAM usage exceeds 85% memorylimitpercent = 85, }, -- Use asio to avoid http.sys elevation requirement for child instances http = "asio", httpthreads = 4, -- Threads for provision/deprovision I/O (0 = synchronous) provisionthreads = 4, -- Config file applied to every child instance config = "/etc/zen/instance.lua", }, -- Hydrate new instances from S3 hydration = { targetconfig = "/etc/zen/hydration.json", threads = 4, }, watchdog = { cycleintervalms = 3000, -- Deprovision running instances after 10 minutes of inactivity provisionedinactivitytimeoutseconds = 600, inactivitycheckmarginseconds = 60, -- Deprovision hibernated instances after 1 hour hibernatedinactivitytimeoutseconds = 3600, }, } ``` `/etc/zen/hydration.json`: ```json { "type": "s3", "settings": { "uri": "s3://my-zen-cache/build-farm", "region": "us-east-1" } } ``` --- ## Deprecated Flags These CLI flags still work but should not be used in new configurations. | Deprecated flag | Current flag | Lua key | |---|---|---| | `--instance-id` | `--upstream-notification-instance-id` | `hub.upstreamnotification.instanceid` | | `--hub-base-port-number` | `--hub-instance-base-port-number` | `hub.instance.baseportnumber` |