diff options
Diffstat (limited to 'scripts/test_scripts/hub/PERF_SEED_README.md')
| -rw-r--r-- | scripts/test_scripts/hub/PERF_SEED_README.md | 148 |
1 files changed, 148 insertions, 0 deletions
diff --git a/scripts/test_scripts/hub/PERF_SEED_README.md b/scripts/test_scripts/hub/PERF_SEED_README.md new file mode 100644 index 000000000..fb471d4bb --- /dev/null +++ b/scripts/test_scripts/hub/PERF_SEED_README.md @@ -0,0 +1,148 @@ +# Perf-seed workflow + +Three-stage pipeline for running repeatable hub-hydration perf tests against a +local MinIO backend seeded with real module data pulled from production S3. + +## Layout + +All scripts default to a single perf-seed root - currently `E:/Dev/zen-perf-seed/` +in the script defaults, but every path is overridable via CLI flag (see the +per-stage options below). Pick a root with enough free space (snapshots and +preserved CAS dirs can be large) and either pass the matching `--*-dir` flag on +each invocation or change the script defaults to your chosen root. + +Layout under the chosen root (`<perf-seed>/`): + +``` +<perf-seed>/ + hub-a/ Stage A hub data dir (transient) + servers/<moduleid>/ + s3-snapshot/ Preserved production server-state trees (read-only after Stage A) + <moduleid>/ + hubs/ Stage B per-bucket hub data dirs (transient) + hub-b-zen-seed-packed/ + hub-b-zen-seed-unpacked/ + minio-data/ Stage B MinIO data dir (transient, carries every seeded bucket) + minio-seeded-baseline/ Preserved baseline MinIO CAS (read-only after Stage B + preserve) + README.txt + minio-seeded-packed/ Preserved packed MinIO CAS (filled by the pack worktree) + README.txt + hub-perf/ Stage C hub data dir (wiped each run) + minio-run/ Stage C MinIO data dir (wiped + re-copied each run) + perf-runs/ Per-run archive: hub.log, logs/, hub.utrace, summary.json + 20260423-141530_zen-seed-packed/ + 20260423-143112_zen-seed-unpacked/ +``` + +## Prerequisites + +- Debug or release build of zenserver + minio: `xmake -y` +- `pip install boto3` +- AWS CLI v2 with an SSO profile configured (for Stage A only) +- Environment variables (or pass equivalents via CLI flags): + - `ZEN_PERF_S3_URI` - source S3 bucket, e.g. `s3://your-bucket/optional-prefix/` + - `ZEN_PERF_AWS_PROFILE` - AWS SSO profile name with read access to that bucket + - `ZEN_PERF_AWS_REGION` - optional, defaults to `us-east-1` + +## Stage A - snapshot real S3 data + +One-time (or when you want a fresh baseline from production). + +``` +export ZEN_PERF_S3_URI=s3://your-bucket/ +export ZEN_PERF_AWS_PROFILE=your-sso-profile +python scripts/test_scripts/hub/seed_s3_snapshot.py +``` + +Provisions N modules from `$ZEN_PERF_S3_URI`, hibernates them, then copies +`hub-a/servers/<mid>/` to `s3-snapshot/<mid>/`. Triggers `aws sso login` +automatically if the SSO token is missing or expired. + +Module selection ranks all UUID-shaped folders by their +`incremental-state.cbo` `LastModified` (newest first, a proxy for +most-recently-accessed) and takes the top `--module-count`. + +Options: +- `--module-count N` (default 1000) +- `--snapshot-dir PATH` (default `<perf-seed>/s3-snapshot`) +- `--hub-data-dir PATH` (default `<perf-seed>/hub-a`) + +## Stage B - seed MinIO from the snapshot + +One-time per pack-mode (or when `s3-snapshot` changes). + +`seed_minio.py` seeds a **single** bucket per invocation. The pack flag is +hardcoded inside the script (`--hub-hydration-enable-pack=true` near the +top of `_start_hub`). To produce both packed and unpacked baselines for +comparison, invoke the script twice from two separate worktrees - one with +the flag flipped to `false` - and preserve the resulting MinIO data dir +each time. + +``` +# In the pack worktree (flag = true), seeds zen-seed-packed +python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-packed +python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-packed + +# In the no-pack worktree (flag = false), seeds zen-seed-unpacked +python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-unpacked +python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-unpacked +``` + +The script provisions every module found under `s3-snapshot/`, hibernates +them, overlays the snapshot on top of the hub's servers dir, then +deprovisions all modules - which runs the dehydrate path and uploads the +content into the bucket. + +`preserve_minio_state.py` copies the resulting `minio-data/` to a +variant-specific preservation dir and writes a README with provenance. + +Options of interest: +- `--bucket NAME` - bucket name (default `zen-seed-packed`). +- `--wipe` removes the per-bucket hub data dir and the shared minio-data + dir before starting. +- `--module-count N` caps the set (0 = every module in snapshot-dir). + +## Stage C - run a perf iteration + +Repeat as often as you want; each run starts from the preserved baseline. + +``` +# Pack-on bucket +python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-packed --trace + +# Pack-off bucket (for comparison) +python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-unpacked --trace +``` + +Steps: +1. Copies `--minio-seeded` (default `minio-seeded-baseline/`) over `minio-run/` so MinIO starts from a known state. +2. Wipes `hub-perf/` (unless `--no-wipe-hub`). +3. Starts MinIO and hub. +4. Provisions all modules, waits for `provisioned`, deprovisions, waits gone. +5. Stops everything cleanly. + +Default mode is `--hub-enable-dehydration=false` so MinIO isn't modified; every +iteration exercises the hydrate-only path against the same baseline CAS. The +`--bucket` flag selects which seeded bucket (and therefore which pack mode) +to exercise. + +Pass `--enable-dehydration` to run a full provision -> deprovision cycle that +includes re-upload (dehydrate) at deprovision time. Use this to measure the +dehydrate phase end-to-end against the seeded baseline. Note the seeded +baseline diverges after a `--enable-dehydration` run - re-copy `--minio-seeded` +or re-run `preserve_minio_state.py` if you want to compare to the pristine state. + +After each run the hub log, structured zenserver logs, any utrace file, and a +`summary.json` with the run's timings are copied into +`perf-runs/<timestamp>_<bucket>/` so Stage C runs can be compared +post-hoc. Override the destination with `--archive-dir PATH`. + +## Resetting between runs + +- **Keep**: `s3-snapshot/`, `minio-seeded-baseline/`, `minio-seeded-packed/`. These are expensive to rebuild. +- **Discard freely**: `hub-a/`, `hubs/`, `hub-perf/`, `minio-data/`, `minio-run/`. + +To force a fresh MinIO seed for one variant: delete the matching +`minio-seeded-<variant>/` and re-run Stage B + preserve (with the matching +`--dest`) in that worktree. To force a fresh S3 snapshot: delete +`s3-snapshot/` and re-run Stage A. |