# Perf-seed workflow Three-stage pipeline for running repeatable hub-hydration perf tests against a local MinIO backend seeded with real module data pulled from production S3. The pipeline is **pack-on only** - the seeded baseline always comes from a hub launched with `--hub-hydration-enable-pack=true`. The pack-off variant is no longer maintained. ## Layout All path arguments are required (no hardcoded defaults). Pick a perf-seed root with enough free space (snapshots and preserved CAS dirs can be large) and pass the matching `--*-dir` flag on each invocation. Stage A's hub data dir should live on the same volume as the snapshot dir so snapshotting is an O(1) rename per module instead of a cross-volume byte copy; Stage C's hub data dir should live on a different volume from the MinIO data dir so hub I/O does not skew the measured perf run. Example layout (directory names only; pick volumes/roots and pass via `--*-dir` flags): ``` / bulk data + Stage A/B flow (one volume = move-friendly) hub-a/ Stage A hub data dir (transient; snapshot-step rename source) servers// s3-snapshot/ Preserved production server-state trees (read-only after Stage A) / hubs/ Stage B per-bucket hub data dirs (transient) hub-b-zen-seed-packed/ minio-data/ Stage B MinIO data dir (transient) minio-seeded-packed/ Preserved packed MinIO CAS (read-only after Stage B + preserve) README.txt minio-run/ Stage C MinIO data dir (wiped + re-copied each run) perf-runs/ Per-run archive: hub.log, logs/, hub.utrace, summary.json 20260423-141530_zen-seed-packed/ / separate volume from for measurement isolation hub-perf/ Stage C hub data dir (wiped each run) ``` ## Prerequisites - Debug or release build of zenserver + minio: `xmake -y` - `pip install boto3` - AWS CLI v2 with an SSO profile configured (for Stage A only) - Environment variables (or pass equivalents via CLI flags): - `ZEN_PERF_S3_URI` - source S3 bucket, e.g. `s3://your-bucket/optional-prefix/` - `ZEN_PERF_AWS_PROFILE` - AWS SSO profile name with read access to that bucket - `ZEN_PERF_AWS_REGION` - optional, defaults to `us-east-1` ## Stage A - snapshot real S3 data One-time (or when you want a fresh snapshot from production). ``` export ZEN_PERF_S3_URI=s3://your-bucket/ export ZEN_PERF_AWS_PROFILE=your-sso-profile python scripts/test_scripts/hub/seed_s3_snapshot.py ``` Provisions N modules from `$ZEN_PERF_S3_URI`, hibernates them, then **moves** `hub-a/servers//` to `s3-snapshot//`. When `--hub-data-dir` and `--snapshot-dir` share a volume (the default) the move is an O(1) rename per module; cross-volume falls back to a byte copy with the old cost profile. The hub data dir is wiped on the next run regardless. Triggers `aws sso login` automatically if the SSO token is missing or expired. Module selection ranks all UUID-shaped folders by their `incremental-state.cbo` `LastModified` (newest first, a proxy for most-recently-accessed) and takes the top `--module-count`. Options: - `--module-count N` (default 1000) - `--snapshot-dir PATH` (required, e.g. `/s3-snapshot`) - `--hub-data-dir PATH` (required, e.g. `/hub-a`) ## Stage B - seed MinIO from the snapshot One-time, or when `s3-snapshot/` changes. `seed_minio.py` seeds the `zen-seed-packed` bucket with pack ON (`--hub-hydration-enable-pack=true` is hardcoded). The script provisions every module found under `s3-snapshot/`, hibernates them, overlays the snapshot on top of the hub's servers dir, then deprovisions all modules - which runs the dehydrate path and uploads the content into the bucket. ``` python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-packed python scripts/test_scripts/hub/preserve_minio_state.py --dest /minio-seeded-packed ``` `preserve_minio_state.py` MOVES (default; `--copy` to keep source) the resulting `minio-data/` to the preservation dir and writes a README with provenance. Options of interest: - `--bucket NAME` - bucket name (default `zen-seed-packed`). - `--wipe` removes the per-bucket hub data dir and the shared minio-data dir before starting. - `--module-count N` caps the set (0 = every module in snapshot-dir). ## Stage C - run a perf iteration Repeat as often as you want; each run starts from the preserved baseline. ``` python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-packed --trace ``` Steps: 1. Copies `--minio-seeded` over `--minio-run` so MinIO starts from a known state. 2. Wipes `--hub-data-dir` (unless `--no-wipe-hub`). 3. Starts MinIO and hub. 4. Provisions all modules, waits for `provisioned`, deprovisions, waits gone. 5. Stops everything cleanly. Default mode is `--hub-enable-dehydration=false` so MinIO isn't modified; every iteration exercises the hydrate-only path against the same baseline CAS. Pass `--enable-dehydration` to run a full provision -> deprovision cycle that includes re-upload (dehydrate) at deprovision time. Use this to measure the dehydrate phase end-to-end against the seeded baseline. Note the seeded baseline diverges after a `--enable-dehydration` run - re-copy `--minio-seeded` or re-run `preserve_minio_state.py` if you want to compare to the pristine state. After each run the hub log, structured zenserver logs, any utrace file, and a `summary.json` with the run's timings are copied into `perf-runs/_/` so Stage C runs can be compared post-hoc. Override the destination with `--archive-dir PATH`. ## Resetting between runs - **Keep**: `s3-snapshot/`, `minio-seeded-packed/`. These are expensive to rebuild. - **Discard freely**: `hub-a/`, `hubs/`, `hub-perf/`, `minio-data/`, `minio-run/`. To force a fresh MinIO seed: delete `minio-seeded-packed/` and re-run Stage B + preserve. To force a fresh S3 snapshot: delete `s3-snapshot/` and re-run Stage A.