aboutsummaryrefslogtreecommitdiff
path: root/docs/hub.md
blob: 610f5d053e946100911d50a39eeb7baff3e78349 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
# Hub

Zenserver's hub mode runs as a coordinator that manages multiple storage server instances on
a single host. Rather than serving storage requests directly, the hub listens for provision
and deprovision requests from an orchestrator and spawns dedicated zenserver child processes
on demand -- one per *module*. Each module is identified by an alphanumeric string (typically
associated with a content plugin or project). Each instance gets its own TCP port and data
directory under the hub's data directory.

Typical deployments:

- **Build farm**: build agents on the same host each receive an isolated zenserver instance
  for their build, provisioned by the build orchestration system.
- **Shared team cache**: an orchestrator provisions an instance per project or team and
  deprovisions it after a period of inactivity.

The hub handles the full lifecycle: hydrating new instances from a shared storage source,
monitoring running processes, handling crashes, and cleaning up deprovisioned instances
automatically.

## Instance Lifecycle

Each module slot progresses through a series of states managed by the hub.

```mermaid
stateDiagram-v2
    [*] --> Unprovisioned
    Unprovisioned --> Provisioning : Provision
    Provisioning --> Provisioned : ready
    Provisioning --> Unprovisioned : failed
    Provisioned --> Hibernating : Hibernate
    Provisioned --> Deprovisioning : Deprovision / timeout
    Provisioned --> Obliterating : Obliterate
    Provisioned --> Crashed : process exited
    Hibernating --> Hibernated : stopped
    Hibernating --> Provisioned : failed
    Hibernated --> Waking : Wake
    Hibernated --> Deprovisioning : Deprovision / timeout
    Hibernated --> Obliterating : Obliterate
    Waking --> Provisioned : ready
    Waking --> Hibernated : failed
    Deprovisioning --> Unprovisioned : done
    Crashed --> Recovering : watchdog
    Crashed --> Deprovisioning : Deprovision
    Crashed --> Obliterating : Obliterate
    Recovering --> Provisioned : success
    Recovering --> Unprovisioned : failed
    Obliterating --> Unprovisioned : done
    Obliterating --> Crashed : failed
```

**Stable states:**

- **Unprovisioned** - no process running; the slot is ready for a new provision request.
- **Provisioned** - process running and serving requests. The watchdog monitors activity and
  deprovisions the instance after a configurable inactivity timeout (see
  [Watchdog Tuning](#watchdog-tuning)).
- **Hibernated** - process stopped, data directory preserved. The instance can be woken
  quickly without re-hydrating. The watchdog deprovisions (and deletes data) after a
  configurable inactivity timeout (see [Watchdog Tuning](#watchdog-tuning)).
- **Crashed** - process exited unexpectedly. The watchdog attempts an in-place restart
  automatically.

Transitioning states (`Provisioning`, `Hibernating`, `Waking`, `Deprovisioning`, `Recovering`,
`Obliterating`) are transient and held exclusively by one operation at a time. If hibernation
fails (the process cannot be stopped cleanly), the instance remains Provisioned.

**Hibernation vs deprovision:** hibernating stops the process but keeps the data directory
intact, allowing fast restart on the next Wake. Deprovisioning triggers a GC cycle, then
dehydrates the instance's state back to the configured backend, then deletes all local
instance data. Explicit deprovision requests are always honoured; the watchdog timeout path
always deprovisions rather than hibernates.

**Obliterate vs deprovision:** deprovisioning preserves data on the hydration backend so the
next provision of the same module starts warm. Obliterate permanently destroys both local
instance data and all backend hydration data for the module. This is irreversible. Obliterate
can be called on Provisioned, Hibernated, or Crashed instances. It also works on modules that
are not currently tracked by the hub (already deprovisioned) -- in that case it deletes only
the backend hydration data. Obliterating a module that was never provisioned is a no-op
success.

**Idempotent operations:** hibernating an already-hibernated instance, waking an
already-provisioned instance, deprovisioning a non-existent module, or obliterating a
never-provisioned module all return success without side effects.

## The Watchdog

The hub runs a background watchdog thread that manages instance lifecycle automatically.

**Cycle:** every `cycleintervalms` milliseconds (default 3 s) the watchdog:

1. Refreshes machine-level metrics (disk usage, system memory) used for provisioning limit
   checks.
2. Iterates over all active instance slots, spending at most `cycleprocessingbudgetms`
   (default 500 ms) per cycle with an `instancecheckthrottlems` (default 5 ms) pause between
   instance checks.

**Per-instance checks:**

For each **Provisioned** instance:
- Verifies the process is still running. If not, transitions to Crashed.
- Checks for inactivity. The check begins `inactivitycheckmarginseconds` (default 1 min)
  before the full timeout -- with a 10 min timeout, checking begins once 9 min of inactivity
  have elapsed. It makes a short HTTP request to the instance's
  `/stats/activity_counters` endpoint. Activity is any client storage request processed by
  the instance. If the activity counter increased since the previous check, the inactivity
  timer resets. If not, and the full `provisionedinactivitytimeoutseconds` has elapsed, the
  instance is automatically deprovisioned.

For each **Hibernated** instance:
- Checks elapsed time since last activity. No HTTP request is needed (process not running).
  After `hibernatedinactivitytimeoutseconds` (default 30 min), the instance is deprovisioned
  and its data deleted.

For each **Crashed** instance:
- Attempts an in-place restart (Recovering state) using the existing data directory, without
  re-hydrating. On success, the instance returns to Provisioned and an upstream notification
  is sent. On failure, the instance is deprovisioned.

**Summary with default settings:**

| Event | Time |
|---|---|
| Watchdog cycle | every 3 s |
| Activity check fires | 9 min after last activity |
| Provisioned auto-deprovision | 10 min after last activity |
| Hibernated auto-deprovision | 30 min after last activity |

## Running the Hub

```
zenserver hub [options]
```

For non-trivial deployments, pass a Lua configuration file rather than command-line flags:

```
zenserver hub --config /etc/zen/hub.lua
```

The hub's own config file is separate from the per-instance config (see
[Instance Management](#instance-management)).

## Configuration Reference

Options can be set via Lua config file or command-line flags. Lua values take precedence when
both are provided.

For general server options (listening port, data directory, logging, HTTP server selection)
that apply to zenserver in all modes, see the zenserver configuration documentation.

---

### Core Server Options

These generic options apply to zenserver in all modes, including hub.

| CLI flag        | Lua key                    | Description |
|-----------------|----------------------------|-------------|
| `--config`      | _(CLI only)_               | Path to the Lua configuration file for the hub. |
| `--data-dir`    | `server.datadir`           | Root directory for hub data, logs, and instance subdirectories. |
| `--port`        | `network.port`             | TCP port the hub listens on. |
| `--http`        | `network.httpserverclass`  | HTTP server implementation (`httpsys` or `asio`). |
| `--dedicated`   | `server.dedicated`         | Dedicated server mode: disables port probing and allocates more resources. |
| `--no-sentry`   | `server.sentry.disable`    | Disable Sentry crash reporting. |

---

### Instance Management

Controls the child storage server processes spawned by the hub.

Each instance stores its data in `<data-dir>/servers/<moduleid>/`. Instance directories
persist across hibernation and are removed only on deprovision.

| CLI flag                           | Lua key                        | Default                              | Description |
|------------------------------------|--------------------------------|--------------------------------------|-------------|
| `--hub-instance-base-port-number`  | `hub.instance.baseportnumber`  | `21000`                              | First port in the instance port pool. Instances are assigned ports sequentially from this base. Ports are not guaranteed to be stable across deprovision/re-provision cycles; a module may receive a different port after being deprovisioned and re-provisioned. |
| `--hub-instance-limit`             | `hub.instance.limits.count`   | `1000`                               | Maximum simultaneously provisioned instances. Provision requests are rejected once this limit is reached. |
| `--hub-instance-http`              | `hub.instance.http`            | `httpsys` (Windows), `asio` (Linux/macOS) | HTTP server implementation for child instances. On Windows, use `asio` if the hub runs without elevation and no URL reservation covers the instance port range. |
| `--hub-instance-http-threads`      | `hub.instance.httpthreads`     | `0`                                  | HTTP connection threads per child instance. `0` uses hardware concurrency. |
| `--hub-instance-corelimit`         | `hub.instance.corelimit`       | `0`                                  | Concurrency limit for child instances. `0` is automatic. |
| `--hub-instance-provision-threads` | `hub.instance.provisionthreads` | `clamp(cpu/8, 4, 12)`              | Per-module hydrate/dehydrate scheduling pool size. One thread per in-flight module hydrate or dehydrate; the per-file work fans out to `--hub-hydration-threads`. |
| `--hub-instance-spawn-threads`     | `hub.instance.spawnthreads`     | `clamp(cpu/8, 4, 16)`              | Per-module child-process spawn/despawn pool size. One thread per `CreateProcess`/health-poll or terminate cycle. |
| `--hub-instance-config`            | `hub.instance.config`          | _(none)_                             | Path to a Lua config file passed to every spawned child instance. Use this to configure storage paths, cache sizes, and other storage server settings. See the zenserver configuration documentation. |
| `--hub-instance-malloc`            | `hub.instance.malloc`          | _(none)_                             | Memory allocator for child instances (`ansi`, `stomp`, `rpmalloc`, `mimalloc`). When unset, instances use their compiled-in default. |
| `--hub-instance-trace`             | `hub.instance.trace`           | _(none)_                             | Trace channel specification for child instances (e.g. `default`, `cpu,log`, `memory`). When set, instances start with tracing enabled on the specified channels. |
| `--hub-instance-tracehost`         | `hub.instance.tracehost`       | _(none)_                             | Trace host for child instances. Instances stream trace data to this host. |
| `--hub-instance-tracefile`         | `hub.instance.tracefile`       | _(none)_                             | Trace file path for child instances. Supports `{moduleid}` and `{port}` placeholders, resolved per instance. Without placeholders all instances write to the same file. |

---

### Resource Limits

The hub checks machine resources before accepting a provision request. If any enabled limit
is exceeded, the request is rejected. Limits are refreshed at the start of each watchdog
cycle.

Setting a limit to `0` disables it. Byte and percent limits are independent -- either can
trigger a rejection.

| CLI flag                                | Lua key                                 | Default | Description |
|-----------------------------------------|-----------------------------------------|---------|-------------|
| `--hub-provision-disk-limit-bytes`      | `hub.instance.limits.disklimitbytes`    | `0`     | Reject provision if used bytes on the filesystem volume containing the data directory exceeds this value. |
| `--hub-provision-disk-limit-percent`    | `hub.instance.limits.disklimitpercent`  | `0`     | Reject provision if used space on the filesystem volume containing the data directory exceeds this percentage of its total capacity (0-100). |
| `--hub-provision-memory-limit-bytes`    | `hub.instance.limits.memorylimitbytes`  | `0`     | Reject provision if used system-wide physical memory exceeds this many bytes. |
| `--hub-provision-memory-limit-percent`  | `hub.instance.limits.memorylimitpercent` | `0`    | Reject provision if used system-wide physical memory exceeds this percentage of total RAM (0-100). |

---

### Watchdog Tuning

| CLI flag                                                   | Lua key                                             | Default           | Description |
|------------------------------------------------------------|-----------------------------------------------------|-------------------|-------------|
| `--hub-watchdog-cycle-interval-ms`                         | `hub.watchdog.cycleintervalms`                      | `3 s (3000 ms)`   | Milliseconds between watchdog cycles. |
| `--hub-watchdog-cycle-processing-budget-ms`                | `hub.watchdog.cycleprocessingbudgetms`              | `500 ms`          | Maximum milliseconds spent processing instances per cycle. |
| `--hub-watchdog-instance-check-throttle-ms`                | `hub.watchdog.instancecheckthrottlems`              | `5 ms`            | Milliseconds to wait between successive instance checks within a cycle. |
| `--hub-watchdog-provisioned-inactivity-timeout-seconds`    | `hub.watchdog.provisionedinactivitytimeoutseconds`  | `10 min (600 s)`  | Seconds of inactivity before a running instance is automatically deprovisioned. `0` disables automatic deprovisioning for running instances. |
| `--hub-watchdog-hibernated-inactivity-timeout-seconds`     | `hub.watchdog.hibernatedinactivitytimeoutseconds`   | `30 min (1800 s)` | Seconds of inactivity before a hibernated instance is deprovisioned and its data deleted. `0` disables automatic deprovisioning for hibernated instances. |
| `--hub-watchdog-inactivity-check-margin-seconds`           | `hub.watchdog.inactivitycheckmarginseconds`         | `1 min (60 s)`    | Activity check window opens this many seconds before the provisioned inactivity timeout. With defaults, checking begins at 9 min and the hard timeout is at 10 min. |
| `--hub-watchdog-activity-check-connect-timeout-ms`         | `hub.watchdog.activitycheckconnecttimeoutms`        | `100 ms`          | Connect timeout in milliseconds for activity check requests. |
| `--hub-watchdog-activity-check-request-timeout-ms`         | `hub.watchdog.activitycheckrequesttimeoutms`        | `200 ms`          | Request timeout in milliseconds for activity check requests. |

---

### Hydration

Hydration pre-populates a new instance's data directory from a shared storage backend before
the instance starts serving requests. On deprovision, the hub dehydrates the instance's
state back to the same backend so the next provision of that module starts warm.

The hydration system is incremental and content-addressed: files are hashed and stored in a
CAS (content-addressable store) on the backend. Only files that changed since the last
dehydration are uploaded or downloaded, and unchanged content is shared across modules. A
cached state object tracks the file manifest between hydrate/dehydrate cycles to avoid
redundant hashing and transfers.

Before dehydrating, the hub triggers a GC cycle on the instance to compact storage, reducing
the amount of data transferred to the backend.

If neither hydration option is set, the hub automatically creates a `hydration_storage`
directory under its own data directory and uses that as the file hydration source. This is
suitable for single-host deployments where instances share locally cached data.

`targetspec` and `targetconfig` are mutually exclusive.

| CLI flag                                    | Lua key                                     | Default                   | Description |
|---------------------------------------------|---------------------------------------------|---------------------------|-------------|
| `--hub-hydration-target-spec`               | `hub.hydration.targetspec`                  | _(local path, see above)_ | Shorthand URI for the hydration source. Must use the `file://` prefix for file targets: `file:///absolute/path`. |
| `--hub-hydration-target-config`             | `hub.hydration.targetconfig`                | _(none)_                  | Path to a JSON file specifying the hydration source. Supports `file` and `s3` backends. |
| `--hub-hydration-threads`                   | `hub.hydration.threads`                     | `clamp(cpu/8, 4, 12)`     | Per-file worker pool size inside a single hydrate/dehydrate. Drives parallel file hashing and pack assembly; backend I/O on the async S3 path runs on the `AsyncHttpClient` io thread instead of these workers. Set to `0` for synchronous operation. |
| `--hub-enable-hydration`                    | `hub.enablehydration`                       | `true`                    | Load instance state from the hydration target on provision. Disable to start every provision from an empty instance directory. |
| `--hub-enable-dehydration`                  | `hub.enabledehydration`                     | `true`                    | Save instance state to the hydration target on deprovision. Disable to run the hydrate-only path (useful for perf testing against a fixed backend snapshot). |
| `--hub-hydration-enable-pack`               | `hub.hydration.enablepack`                  | `true`                    | Concatenate small files into CAS pack blobs during dehydrate. See [Pack](#pack). |
| `--hub-hydration-pack-threshold-bytes`      | `hub.hydration.packthresholdbytes`          | `262144` (256 KiB)        | Files strictly smaller than this are pack candidates. Larger files are stored as standalone CAS entries. |
| `--hub-hydration-max-pack-bytes`            | `hub.hydration.maxpackbytes`                | `4194304` (4 MiB)         | Upper bound on a single pack's concatenation size. Candidates are bin-packed greedily; packs that would exceed this cap are closed and a new pack is started. A unique candidate larger than the cap falls back to standalone upload. |
| `--hub-hydration-async-enabled`             | `hub.hydration.async.enabled`               | `true`                    | Route S3 hydration through `AsyncHttpClient` (curl_multi + asio, single io thread). `false` falls back to the blocking `S3Client` path. |
| `--hub-hydration-async-max-concurrent-requests` | `hub.hydration.async.maxconcurrentrequests` | `128`                 | Cap on in-flight S3 requests submitted to the `AsyncHttpClient`; excess submissions queue inside the client until a slot frees. Only consulted when `--hub-hydration-async-enabled=true`. |

Multipart chunk size is S3-specific and set via the target config (see [Multipart chunking](#multipart-chunking)).

**File backend** (`hub.hydration.targetconfig` JSON):

```json
{
  "type": "file",
  "settings": {
    "path": "/data/hydration_storage"
  }
}
```

**S3 backend** (`hub.hydration.targetconfig` JSON):

```json
{
  "type": "s3",
  "settings": {
    "uri": "s3://bucket-name/optional/prefix",
    "region": "us-east-1",
    "endpoint": "https://custom-endpoint",
    "path-style": false,
    "chunksize": 67108864
  }
}
```

Both backends accept the optional top-level `excludes` key to override the built-in
defaults; see [Excludes](#excludes) for the schema and the default list.

S3 settings:

| Field | Required | Default | Description |
|---|---|---|---|
| `uri` | Yes | - | S3 URI. Include a prefix path to isolate hub data within a shared bucket. |
| `region` | No | `us-east-1` | AWS region. Also reads `AWS_DEFAULT_REGION` / `AWS_REGION`. |
| `endpoint` | No | - | Custom endpoint URL for S3-compatible services (MinIO, Ceph, etc.). |
| `path-style` | No | `false` | Use path-style S3 URLs instead of virtual-hosted. Required by some S3-compatible services. |
| `chunksize` | No | `67108864` (64 MiB) | Multipart chunk size for GET/PUT on files at or above 1.25x this value. See [Multipart chunking](#multipart-chunking). |

S3 credentials are read from environment variables (`AWS_ACCESS_KEY_ID`,
`AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`). If `AWS_ACCESS_KEY_ID` is not set, the hub
falls back to IMDS instance credentials.

#### Pack

When pack is enabled (default), dehydrate concatenates every unique file smaller than
`packthresholdbytes` into one or more CAS pack blobs, rather than uploading each file as
its own CAS entry. Pack contents are stored raw (no compression); each pack's CAS key is
the hash of its concatenated bytes. Files at or above the threshold continue to upload as
standalone CAS entries.

Pack composition is deterministic: candidates are ordered by content hash and bin-packed
greedily up to `maxpackbytes`. Identical module state produces identical pack hashes
across runs, so `ExistsLookup` deduplicates packs between redeploys just like regular
CAS entries. A pack that would contain fewer than two entries is discarded and its
candidates revert to standalone upload.

The state manifest records per-pack entries and each file's owning pack, so hydrate
resolves packed files by downloading the pack once and slicing it into the target files.

The primary win is request count: for modules dominated by tiny metadata files, pack can
collapse dozens of GETs into a single request, which dominates wall time on high-RTT
backends (real S3) more than it matters on localhost MinIO.

#### Excludes

The optional top-level `excludes` key on the target config is an array of wildcard
patterns matched against each file's relative path (forward-slash form). Matching
files are dropped at the dehydrate-side directory scan and never enter the manifest.
`*` is path-separator-agnostic, so `auth/*` matches `auth/authstate` and any deeper
path under `auth/`.

`excludes` uses **override semantics, not additive**: if the field is present in the
target config its contents fully replace the default list. An explicit empty array
(`"excludes": []`) is honoured as "apply no excludes". Field absent from the config
means the built-in default applies.

Built-in default:

| Pattern | Why |
|---|---|
| `.sentry-native/*` | Sentry-native crash uploader DB; locked while child runs |
| `state_marker`     | Root-level liveness marker re-created by the child |
| `.lock`            | Instance lock file (`FILE_FLAG_DELETE_ON_CLOSE`); locked while child runs |
| `*.bak`            | Transient backups produced by atomic file replace |
| `gc/reserve.gc`    | GC disk reserve under the per-store `gc/` subdirectory |
| `auth/*`           | Encrypted auth state (`auth/authstate`) |

A target config whose `excludes` array reproduces the built-in default (equivalent to
omitting the key from the config):

```json
{
  "type": "file",
  "settings": {
    "path": "/data/hydration_storage"
  },
  "excludes": [
    ".sentry-native/*",
    "state_marker",
    ".lock",
    "*.bak",
    "gc/reserve.gc",
    "auth/*"
  ]
}
```

#### Multipart chunking

Large files above 1.25x the chunk size are transferred in chunks using S3 range requests
(GET) or multipart uploads (PUT). Smaller files use a single request.

Chunk size is an S3-specific setting. It has no CLI or Lua surface - it applies only to
S3 backends and is set per-target:
- Target config JSON key: `chunksize` (under `settings`).
- State.cbo persists it per-module as `MultipartChunkSize` (under `StorageSettings`).
  This locks each module's chunking to what was used at its last dehydrate.

The chunk size a module uses is **persisted into state.cbo during dehydrate** so the
next hydrate uses the same partitioning. Changing the target-config `chunksize` affects
only modules with no prior state.cbo; existing modules continue to use the value
recorded at their last dehydrate. State.cbo files without the field fall back to
`DefaultMultipartChunkSize` (64 MiB).

The default of 64 MiB is tuned for intra-region Nitro EC2 instances (hub and S3 bucket in
the same AWS region). For smaller instance types where per-connection bandwidth is under
~40 MB/s, 32 MiB may give better thread-pool utilisation on multi-hundred-MiB files.

---

### Upstream Notifications

The hub can notify an external system when instance state changes. Notifications are disabled
when `endpoint` is empty.

| CLI flag                               | Lua key                               | Default  | Description |
|----------------------------------------|---------------------------------------|----------|-------------|
| `--upstream-notification-endpoint`     | `hub.upstreamnotification.endpoint`   | _(none)_ | URL that receives state-change notifications. |
| `--upstream-notification-instance-id`  | `hub.upstreamnotification.instanceid` | _(none)_ | Identifier sent in notification payloads to distinguish this hub from others. |

---

### Consul Integration

The hub can register with a Consul agent for service discovery and health reporting. Disabled
when `endpoint` is empty.

| CLI flag                             | Lua key                              | Default              | Description |
|--------------------------------------|--------------------------------------|----------------------|-------------|
| `--consul-endpoint`                  | `hub.consul.endpoint`                | _(none)_             | Consul agent URL. Example: `http://localhost:8500`. |
| `--consul-token-env`                 | `hub.consul.tokenenv`                | `CONSUL_HTTP_TOKEN`  | Name of the environment variable from which the Consul access token is read. |
| `--consul-health-interval-seconds`   | `hub.consul.healthintervalseconds`   | `10 s`               | Interval in seconds between Consul health checks. |
| `--consul-deregister-after-seconds`  | `hub.consul.deregisterafterseconds`  | `30 s`               | Seconds after which Consul deregisters the service if health checks stop passing. |
| `--consul-register-hub`             | `hub.consul.registerhub`             | `true`               | Register the hub parent service with Consul. Instance registration is unaffected. |

---

### Windows: Job Object

| CLI flag                 | Lua key              | Default | Description |
|--------------------------|----------------------|---------|-------------|
| `--hub-use-job-object`   | `hub.usejobobject`   | `true`  | Assigns child processes to a Windows Job Object configured to kill all children when the job handle closes. This ensures child processes are terminated if the hub exits unexpectedly. Disable only if the hub runs inside an existing Job Object that does not permit nested jobs. |

---

## Example Configurations

### Basic

Single-host deployment with S3 hydration, an instance count cap, and ASIO HTTP (avoids the
http.sys elevation requirement on Windows).

```lua
hub = {
  instance = {
    baseportnumber = 21000,
    limits = {
      count = 20,
      disklimitpercent = 90,
    },
    http = "asio",
    config = "/etc/zen/instance.lua",
  },

  hydration = {
    targetconfig = "/etc/zen/hydration.json",
  },
}
```

`/etc/zen/hydration.json`:

```json
{
  "type": "s3",
  "settings": {
    "uri": "s3://my-zen-cache/hub",
    "region": "us-east-1"
  }
}
```

### Full Production

Build-farm hub with S3 hydration, Consul registration, upstream notifications, resource
limits, and tuned watchdog.

```lua
hub = {
  -- Upstream notification: called on instance state changes
  upstreamnotification = {
    endpoint = "https://orchestrator.internal/zen/notifications",
    instanceid = "build-farm-hub-01",
  },

  -- Consul: service registration and health reporting
  consul = {
    endpoint = "http://localhost:8500",
    tokenenv = "CONSUL_HTTP_TOKEN",
    healthintervalseconds = 10,
    deregisterafterseconds = 30,
  },

  instance = {
    -- Port range starts at 21000 (hub assigns sequentially)
    baseportnumber = 21000,
    limits = {
      count = 100,

      -- Reject provisions when disk usage exceeds 90%
      disklimitpercent = 90,

      -- Reject provisions when system RAM usage exceeds 85%
      memorylimitpercent = 85,
    },

    -- Use asio to avoid http.sys elevation requirement for child instances
    http = "asio",
    httpthreads = 4,

    -- Per-module hydrate/dehydrate scheduling pool (0 = synchronous)
    provisionthreads = 8,

    -- Per-module child-process spawn/despawn pool (0 = synchronous)
    spawnthreads = 12,

    -- Config file applied to every child instance
    config = "/etc/zen/instance.lua",
  },

  -- Hydrate new instances from S3
  hydration = {
    targetconfig = "/etc/zen/hydration.json",
    threads = 8, -- per-file workers inside a single hydrate/dehydrate

    -- Async S3 path: pipeline requests on a single AsyncHttpClient io thread
    -- instead of blocking worker threads. Default true.
    async = {
      enabled = true,
      maxconcurrentrequests = 64,
    },
  },

  watchdog = {
    cycleintervalms = 3000,

    -- Deprovision running instances after 10 minutes of inactivity
    provisionedinactivitytimeoutseconds = 600,
    inactivitycheckmarginseconds = 60,

    -- Deprovision hibernated instances after 1 hour
    hibernatedinactivitytimeoutseconds = 3600,
  },
}
```

`/etc/zen/hydration.json`:

```json
{
  "type": "s3",
  "settings": {
    "uri": "s3://my-zen-cache/build-farm",
    "region": "us-east-1"
  }
}
```

---

## Deprecated Flags

These CLI flags still work but should not be used in new configurations.

| Deprecated flag | Current flag | Lua key |
|---|---|---|
| `--instance-id` | `--upstream-notification-instance-id` | `hub.upstreamnotification.instanceid` |
| `--hub-base-port-number` | `--hub-instance-base-port-number` | `hub.instance.baseportnumber` |