# Perf-seed workflow Three-stage pipeline for running repeatable hub-hydration perf tests against a local MinIO backend seeded with real module data pulled from production S3. ## Layout All scripts default to a single perf-seed root - currently `E:/Dev/zen-perf-seed/` in the script defaults, but every path is overridable via CLI flag (see the per-stage options below). Pick a root with enough free space (snapshots and preserved CAS dirs can be large) and either pass the matching `--*-dir` flag on each invocation or change the script defaults to your chosen root. Layout under the chosen root (`/`): ``` / hub-a/ Stage A hub data dir (transient) servers// s3-snapshot/ Preserved production server-state trees (read-only after Stage A) / hubs/ Stage B per-bucket hub data dirs (transient) hub-b-zen-seed-packed/ hub-b-zen-seed-unpacked/ minio-data/ Stage B MinIO data dir (transient, carries every seeded bucket) minio-seeded-baseline/ Preserved baseline MinIO CAS (read-only after Stage B + preserve) README.txt minio-seeded-packed/ Preserved packed MinIO CAS (filled by the pack worktree) README.txt hub-perf/ Stage C hub data dir (wiped each run) minio-run/ Stage C MinIO data dir (wiped + re-copied each run) perf-runs/ Per-run archive: hub.log, logs/, hub.utrace, summary.json 20260423-141530_zen-seed-packed/ 20260423-143112_zen-seed-unpacked/ ``` ## Prerequisites - Debug or release build of zenserver + minio: `xmake -y` - `pip install boto3` - AWS CLI v2 with an SSO profile configured (for Stage A only) - Environment variables (or pass equivalents via CLI flags): - `ZEN_PERF_S3_URI` - source S3 bucket, e.g. `s3://your-bucket/optional-prefix/` - `ZEN_PERF_AWS_PROFILE` - AWS SSO profile name with read access to that bucket - `ZEN_PERF_AWS_REGION` - optional, defaults to `us-east-1` ## Stage A - snapshot real S3 data One-time (or when you want a fresh baseline from production). ``` export ZEN_PERF_S3_URI=s3://your-bucket/ export ZEN_PERF_AWS_PROFILE=your-sso-profile python scripts/test_scripts/hub/seed_s3_snapshot.py ``` Provisions N modules from `$ZEN_PERF_S3_URI`, hibernates them, then copies `hub-a/servers//` to `s3-snapshot//`. Triggers `aws sso login` automatically if the SSO token is missing or expired. Module selection ranks all UUID-shaped folders by their `incremental-state.cbo` `LastModified` (newest first, a proxy for most-recently-accessed) and takes the top `--module-count`. Options: - `--module-count N` (default 1000) - `--snapshot-dir PATH` (default `/s3-snapshot`) - `--hub-data-dir PATH` (default `/hub-a`) ## Stage B - seed MinIO from the snapshot One-time per pack-mode (or when `s3-snapshot` changes). `seed_minio.py` seeds a **single** bucket per invocation. The pack flag is hardcoded inside the script (`--hub-hydration-enable-pack=true` near the top of `_start_hub`). To produce both packed and unpacked baselines for comparison, invoke the script twice from two separate worktrees - one with the flag flipped to `false` - and preserve the resulting MinIO data dir each time. ``` # In the pack worktree (flag = true), seeds zen-seed-packed python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-packed python scripts/test_scripts/hub/preserve_minio_state.py --dest /minio-seeded-packed # In the no-pack worktree (flag = false), seeds zen-seed-unpacked python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-unpacked python scripts/test_scripts/hub/preserve_minio_state.py --dest /minio-seeded-unpacked ``` The script provisions every module found under `s3-snapshot/`, hibernates them, overlays the snapshot on top of the hub's servers dir, then deprovisions all modules - which runs the dehydrate path and uploads the content into the bucket. `preserve_minio_state.py` copies the resulting `minio-data/` to a variant-specific preservation dir and writes a README with provenance. Options of interest: - `--bucket NAME` - bucket name (default `zen-seed-packed`). - `--wipe` removes the per-bucket hub data dir and the shared minio-data dir before starting. - `--module-count N` caps the set (0 = every module in snapshot-dir). ## Stage C - run a perf iteration Repeat as often as you want; each run starts from the preserved baseline. ``` # Pack-on bucket python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-packed --trace # Pack-off bucket (for comparison) python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-unpacked --trace ``` Steps: 1. Copies `--minio-seeded` (default `minio-seeded-baseline/`) over `minio-run/` so MinIO starts from a known state. 2. Wipes `hub-perf/` (unless `--no-wipe-hub`). 3. Starts MinIO and hub. 4. Provisions all modules, waits for `provisioned`, deprovisions, waits gone. 5. Stops everything cleanly. Default mode is `--hub-enable-dehydration=false` so MinIO isn't modified; every iteration exercises the hydrate-only path against the same baseline CAS. The `--bucket` flag selects which seeded bucket (and therefore which pack mode) to exercise. Pass `--enable-dehydration` to run a full provision -> deprovision cycle that includes re-upload (dehydrate) at deprovision time. Use this to measure the dehydrate phase end-to-end against the seeded baseline. Note the seeded baseline diverges after a `--enable-dehydration` run - re-copy `--minio-seeded` or re-run `preserve_minio_state.py` if you want to compare to the pristine state. After each run the hub log, structured zenserver logs, any utrace file, and a `summary.json` with the run's timings are copied into `perf-runs/_/` so Stage C runs can be compared post-hoc. Override the destination with `--archive-dir PATH`. ## Resetting between runs - **Keep**: `s3-snapshot/`, `minio-seeded-baseline/`, `minio-seeded-packed/`. These are expensive to rebuild. - **Discard freely**: `hub-a/`, `hubs/`, `hub-perf/`, `minio-data/`, `minio-run/`. To force a fresh MinIO seed for one variant: delete the matching `minio-seeded-/` and re-run Stage B + preserve (with the matching `--dest`) in that worktree. To force a fresh S3 snapshot: delete `s3-snapshot/` and re-run Stage A.