aboutsummaryrefslogtreecommitdiff
path: root/scripts/test_scripts/hub/PERF_SEED_README.md
diff options
context:
space:
mode:
Diffstat (limited to 'scripts/test_scripts/hub/PERF_SEED_README.md')
-rw-r--r--scripts/test_scripts/hub/PERF_SEED_README.md148
1 files changed, 148 insertions, 0 deletions
diff --git a/scripts/test_scripts/hub/PERF_SEED_README.md b/scripts/test_scripts/hub/PERF_SEED_README.md
new file mode 100644
index 000000000..fb471d4bb
--- /dev/null
+++ b/scripts/test_scripts/hub/PERF_SEED_README.md
@@ -0,0 +1,148 @@
+# Perf-seed workflow
+
+Three-stage pipeline for running repeatable hub-hydration perf tests against a
+local MinIO backend seeded with real module data pulled from production S3.
+
+## Layout
+
+All scripts default to a single perf-seed root - currently `E:/Dev/zen-perf-seed/`
+in the script defaults, but every path is overridable via CLI flag (see the
+per-stage options below). Pick a root with enough free space (snapshots and
+preserved CAS dirs can be large) and either pass the matching `--*-dir` flag on
+each invocation or change the script defaults to your chosen root.
+
+Layout under the chosen root (`<perf-seed>/`):
+
+```
+<perf-seed>/
+ hub-a/ Stage A hub data dir (transient)
+ servers/<moduleid>/
+ s3-snapshot/ Preserved production server-state trees (read-only after Stage A)
+ <moduleid>/
+ hubs/ Stage B per-bucket hub data dirs (transient)
+ hub-b-zen-seed-packed/
+ hub-b-zen-seed-unpacked/
+ minio-data/ Stage B MinIO data dir (transient, carries every seeded bucket)
+ minio-seeded-baseline/ Preserved baseline MinIO CAS (read-only after Stage B + preserve)
+ README.txt
+ minio-seeded-packed/ Preserved packed MinIO CAS (filled by the pack worktree)
+ README.txt
+ hub-perf/ Stage C hub data dir (wiped each run)
+ minio-run/ Stage C MinIO data dir (wiped + re-copied each run)
+ perf-runs/ Per-run archive: hub.log, logs/, hub.utrace, summary.json
+ 20260423-141530_zen-seed-packed/
+ 20260423-143112_zen-seed-unpacked/
+```
+
+## Prerequisites
+
+- Debug or release build of zenserver + minio: `xmake -y`
+- `pip install boto3`
+- AWS CLI v2 with an SSO profile configured (for Stage A only)
+- Environment variables (or pass equivalents via CLI flags):
+ - `ZEN_PERF_S3_URI` - source S3 bucket, e.g. `s3://your-bucket/optional-prefix/`
+ - `ZEN_PERF_AWS_PROFILE` - AWS SSO profile name with read access to that bucket
+ - `ZEN_PERF_AWS_REGION` - optional, defaults to `us-east-1`
+
+## Stage A - snapshot real S3 data
+
+One-time (or when you want a fresh baseline from production).
+
+```
+export ZEN_PERF_S3_URI=s3://your-bucket/
+export ZEN_PERF_AWS_PROFILE=your-sso-profile
+python scripts/test_scripts/hub/seed_s3_snapshot.py
+```
+
+Provisions N modules from `$ZEN_PERF_S3_URI`, hibernates them, then copies
+`hub-a/servers/<mid>/` to `s3-snapshot/<mid>/`. Triggers `aws sso login`
+automatically if the SSO token is missing or expired.
+
+Module selection ranks all UUID-shaped folders by their
+`incremental-state.cbo` `LastModified` (newest first, a proxy for
+most-recently-accessed) and takes the top `--module-count`.
+
+Options:
+- `--module-count N` (default 1000)
+- `--snapshot-dir PATH` (default `<perf-seed>/s3-snapshot`)
+- `--hub-data-dir PATH` (default `<perf-seed>/hub-a`)
+
+## Stage B - seed MinIO from the snapshot
+
+One-time per pack-mode (or when `s3-snapshot` changes).
+
+`seed_minio.py` seeds a **single** bucket per invocation. The pack flag is
+hardcoded inside the script (`--hub-hydration-enable-pack=true` near the
+top of `_start_hub`). To produce both packed and unpacked baselines for
+comparison, invoke the script twice from two separate worktrees - one with
+the flag flipped to `false` - and preserve the resulting MinIO data dir
+each time.
+
+```
+# In the pack worktree (flag = true), seeds zen-seed-packed
+python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-packed
+python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-packed
+
+# In the no-pack worktree (flag = false), seeds zen-seed-unpacked
+python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-unpacked
+python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-unpacked
+```
+
+The script provisions every module found under `s3-snapshot/`, hibernates
+them, overlays the snapshot on top of the hub's servers dir, then
+deprovisions all modules - which runs the dehydrate path and uploads the
+content into the bucket.
+
+`preserve_minio_state.py` copies the resulting `minio-data/` to a
+variant-specific preservation dir and writes a README with provenance.
+
+Options of interest:
+- `--bucket NAME` - bucket name (default `zen-seed-packed`).
+- `--wipe` removes the per-bucket hub data dir and the shared minio-data
+ dir before starting.
+- `--module-count N` caps the set (0 = every module in snapshot-dir).
+
+## Stage C - run a perf iteration
+
+Repeat as often as you want; each run starts from the preserved baseline.
+
+```
+# Pack-on bucket
+python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-packed --trace
+
+# Pack-off bucket (for comparison)
+python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-unpacked --trace
+```
+
+Steps:
+1. Copies `--minio-seeded` (default `minio-seeded-baseline/`) over `minio-run/` so MinIO starts from a known state.
+2. Wipes `hub-perf/` (unless `--no-wipe-hub`).
+3. Starts MinIO and hub.
+4. Provisions all modules, waits for `provisioned`, deprovisions, waits gone.
+5. Stops everything cleanly.
+
+Default mode is `--hub-enable-dehydration=false` so MinIO isn't modified; every
+iteration exercises the hydrate-only path against the same baseline CAS. The
+`--bucket` flag selects which seeded bucket (and therefore which pack mode)
+to exercise.
+
+Pass `--enable-dehydration` to run a full provision -> deprovision cycle that
+includes re-upload (dehydrate) at deprovision time. Use this to measure the
+dehydrate phase end-to-end against the seeded baseline. Note the seeded
+baseline diverges after a `--enable-dehydration` run - re-copy `--minio-seeded`
+or re-run `preserve_minio_state.py` if you want to compare to the pristine state.
+
+After each run the hub log, structured zenserver logs, any utrace file, and a
+`summary.json` with the run's timings are copied into
+`perf-runs/<timestamp>_<bucket>/` so Stage C runs can be compared
+post-hoc. Override the destination with `--archive-dir PATH`.
+
+## Resetting between runs
+
+- **Keep**: `s3-snapshot/`, `minio-seeded-baseline/`, `minio-seeded-packed/`. These are expensive to rebuild.
+- **Discard freely**: `hub-a/`, `hubs/`, `hub-perf/`, `minio-data/`, `minio-run/`.
+
+To force a fresh MinIO seed for one variant: delete the matching
+`minio-seeded-<variant>/` and re-run Stage B + preserve (with the matching
+`--dest`) in that worktree. To force a fresh S3 snapshot: delete
+`s3-snapshot/` and re-run Stage A.