# Perf-seed workflow

Three-stage pipeline for running repeatable hub-hydration perf tests against a
local MinIO backend seeded with real module data pulled from production S3.

## Layout

All scripts default to a single perf-seed root - currently `E:/Dev/zen-perf-seed/`
in the script defaults, but every path is overridable via CLI flag (see the
per-stage options below). Pick a root with enough free space (snapshots and
preserved CAS dirs can be large) and either pass the matching `--*-dir` flag on
each invocation or change the script defaults to your chosen root.

Layout under the chosen root (`<perf-seed>/`):

```
<perf-seed>/
  hub-a/                Stage A hub data dir (transient)
    servers/<moduleid>/
  s3-snapshot/          Preserved production server-state trees (read-only after Stage A)
    <moduleid>/
  hubs/                 Stage B per-bucket hub data dirs (transient)
    hub-b-zen-seed-packed/
    hub-b-zen-seed-unpacked/
  minio-data/           Stage B MinIO data dir (transient, carries every seeded bucket)
  minio-seeded-baseline/  Preserved baseline MinIO CAS (read-only after Stage B + preserve)
    README.txt
  minio-seeded-packed/    Preserved packed MinIO CAS (filled by the pack worktree)
    README.txt
  hub-perf/             Stage C hub data dir (wiped each run)
  minio-run/            Stage C MinIO data dir (wiped + re-copied each run)
  perf-runs/            Per-run archive: hub.log, logs/, hub.utrace, summary.json
    20260423-141530_zen-seed-packed/
    20260423-143112_zen-seed-unpacked/
```

## Prerequisites

- Debug or release build of zenserver + minio: `xmake -y`
- `pip install boto3`
- AWS CLI v2 with an SSO profile configured (for Stage A only)
- Environment variables (or pass equivalents via CLI flags):
  - `ZEN_PERF_S3_URI` - source S3 bucket, e.g. `s3://your-bucket/optional-prefix/`
  - `ZEN_PERF_AWS_PROFILE` - AWS SSO profile name with read access to that bucket
  - `ZEN_PERF_AWS_REGION` - optional, defaults to `us-east-1`

## Stage A - snapshot real S3 data

One-time (or when you want a fresh baseline from production).

```
export ZEN_PERF_S3_URI=s3://your-bucket/
export ZEN_PERF_AWS_PROFILE=your-sso-profile
python scripts/test_scripts/hub/seed_s3_snapshot.py
```

Provisions N modules from `$ZEN_PERF_S3_URI`, hibernates them, then copies
`hub-a/servers/<mid>/` to `s3-snapshot/<mid>/`. Triggers `aws sso login`
automatically if the SSO token is missing or expired.

Module selection ranks all UUID-shaped folders by their
`incremental-state.cbo` `LastModified` (newest first, a proxy for
most-recently-accessed) and takes the top `--module-count`.

Options:
- `--module-count N` (default 1000)
- `--snapshot-dir PATH` (default `<perf-seed>/s3-snapshot`)
- `--hub-data-dir PATH` (default `<perf-seed>/hub-a`)

## Stage B - seed MinIO from the snapshot

One-time per pack-mode (or when `s3-snapshot` changes).

`seed_minio.py` seeds a **single** bucket per invocation. The pack flag is
hardcoded inside the script (`--hub-hydration-enable-pack=true` near the
top of `_start_hub`). To produce both packed and unpacked baselines for
comparison, invoke the script twice from two separate worktrees - one with
the flag flipped to `false` - and preserve the resulting MinIO data dir
each time.

```
# In the pack worktree (flag = true), seeds zen-seed-packed
python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-packed
python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-packed

# In the no-pack worktree (flag = false), seeds zen-seed-unpacked
python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-unpacked
python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-unpacked
```

The script provisions every module found under `s3-snapshot/`, hibernates
them, overlays the snapshot on top of the hub's servers dir, then
deprovisions all modules - which runs the dehydrate path and uploads the
content into the bucket.

`preserve_minio_state.py` copies the resulting `minio-data/` to a
variant-specific preservation dir and writes a README with provenance.

Options of interest:
- `--bucket NAME` - bucket name (default `zen-seed-packed`).
- `--wipe` removes the per-bucket hub data dir and the shared minio-data
  dir before starting.
- `--module-count N` caps the set (0 = every module in snapshot-dir).

## Stage C - run a perf iteration

Repeat as often as you want; each run starts from the preserved baseline.

```
# Pack-on bucket
python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-packed --trace

# Pack-off bucket (for comparison)
python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-unpacked --trace
```

Steps:
1. Copies `--minio-seeded` (default `minio-seeded-baseline/`) over `minio-run/` so MinIO starts from a known state.
2. Wipes `hub-perf/` (unless `--no-wipe-hub`).
3. Starts MinIO and hub.
4. Provisions all modules, waits for `provisioned`, deprovisions, waits gone.
5. Stops everything cleanly.

Default mode is `--hub-enable-dehydration=false` so MinIO isn't modified; every
iteration exercises the hydrate-only path against the same baseline CAS. The
`--bucket` flag selects which seeded bucket (and therefore which pack mode)
to exercise.

Pass `--enable-dehydration` to run a full provision -> deprovision cycle that
includes re-upload (dehydrate) at deprovision time. Use this to measure the
dehydrate phase end-to-end against the seeded baseline. Note the seeded
baseline diverges after a `--enable-dehydration` run - re-copy `--minio-seeded`
or re-run `preserve_minio_state.py` if you want to compare to the pristine state.

After each run the hub log, structured zenserver logs, any utrace file, and a
`summary.json` with the run's timings are copied into
`perf-runs/<timestamp>_<bucket>/` so Stage C runs can be compared
post-hoc. Override the destination with `--archive-dir PATH`.

## Resetting between runs

- **Keep**: `s3-snapshot/`, `minio-seeded-baseline/`, `minio-seeded-packed/`. These are expensive to rebuild.
- **Discard freely**: `hub-a/`, `hubs/`, `hub-perf/`, `minio-data/`, `minio-run/`.

To force a fresh MinIO seed for one variant: delete the matching
`minio-seeded-<variant>/` and re-run Stage B + preserve (with the matching
`--dest`) in that worktree. To force a fresh S3 snapshot: delete
`s3-snapshot/` and re-run Stage A.