aboutsummaryrefslogtreecommitdiff
path: root/scripts/test_scripts/hub/PERF_SEED_README.md
blob: fb471d4bb88c65a53f7b7546c0255559a3387a17 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# Perf-seed workflow

Three-stage pipeline for running repeatable hub-hydration perf tests against a
local MinIO backend seeded with real module data pulled from production S3.

## Layout

All scripts default to a single perf-seed root - currently `E:/Dev/zen-perf-seed/`
in the script defaults, but every path is overridable via CLI flag (see the
per-stage options below). Pick a root with enough free space (snapshots and
preserved CAS dirs can be large) and either pass the matching `--*-dir` flag on
each invocation or change the script defaults to your chosen root.

Layout under the chosen root (`<perf-seed>/`):

```
<perf-seed>/
  hub-a/                Stage A hub data dir (transient)
    servers/<moduleid>/
  s3-snapshot/          Preserved production server-state trees (read-only after Stage A)
    <moduleid>/
  hubs/                 Stage B per-bucket hub data dirs (transient)
    hub-b-zen-seed-packed/
    hub-b-zen-seed-unpacked/
  minio-data/           Stage B MinIO data dir (transient, carries every seeded bucket)
  minio-seeded-baseline/  Preserved baseline MinIO CAS (read-only after Stage B + preserve)
    README.txt
  minio-seeded-packed/    Preserved packed MinIO CAS (filled by the pack worktree)
    README.txt
  hub-perf/             Stage C hub data dir (wiped each run)
  minio-run/            Stage C MinIO data dir (wiped + re-copied each run)
  perf-runs/            Per-run archive: hub.log, logs/, hub.utrace, summary.json
    20260423-141530_zen-seed-packed/
    20260423-143112_zen-seed-unpacked/
```

## Prerequisites

- Debug or release build of zenserver + minio: `xmake -y`
- `pip install boto3`
- AWS CLI v2 with an SSO profile configured (for Stage A only)
- Environment variables (or pass equivalents via CLI flags):
  - `ZEN_PERF_S3_URI` - source S3 bucket, e.g. `s3://your-bucket/optional-prefix/`
  - `ZEN_PERF_AWS_PROFILE` - AWS SSO profile name with read access to that bucket
  - `ZEN_PERF_AWS_REGION` - optional, defaults to `us-east-1`

## Stage A - snapshot real S3 data

One-time (or when you want a fresh baseline from production).

```
export ZEN_PERF_S3_URI=s3://your-bucket/
export ZEN_PERF_AWS_PROFILE=your-sso-profile
python scripts/test_scripts/hub/seed_s3_snapshot.py
```

Provisions N modules from `$ZEN_PERF_S3_URI`, hibernates them, then copies
`hub-a/servers/<mid>/` to `s3-snapshot/<mid>/`. Triggers `aws sso login`
automatically if the SSO token is missing or expired.

Module selection ranks all UUID-shaped folders by their
`incremental-state.cbo` `LastModified` (newest first, a proxy for
most-recently-accessed) and takes the top `--module-count`.

Options:
- `--module-count N` (default 1000)
- `--snapshot-dir PATH` (default `<perf-seed>/s3-snapshot`)
- `--hub-data-dir PATH` (default `<perf-seed>/hub-a`)

## Stage B - seed MinIO from the snapshot

One-time per pack-mode (or when `s3-snapshot` changes).

`seed_minio.py` seeds a **single** bucket per invocation. The pack flag is
hardcoded inside the script (`--hub-hydration-enable-pack=true` near the
top of `_start_hub`). To produce both packed and unpacked baselines for
comparison, invoke the script twice from two separate worktrees - one with
the flag flipped to `false` - and preserve the resulting MinIO data dir
each time.

```
# In the pack worktree (flag = true), seeds zen-seed-packed
python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-packed
python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-packed

# In the no-pack worktree (flag = false), seeds zen-seed-unpacked
python scripts/test_scripts/hub/seed_minio.py --wipe --bucket zen-seed-unpacked
python scripts/test_scripts/hub/preserve_minio_state.py --dest <perf-seed>/minio-seeded-unpacked
```

The script provisions every module found under `s3-snapshot/`, hibernates
them, overlays the snapshot on top of the hub's servers dir, then
deprovisions all modules - which runs the dehydrate path and uploads the
content into the bucket.

`preserve_minio_state.py` copies the resulting `minio-data/` to a
variant-specific preservation dir and writes a README with provenance.

Options of interest:
- `--bucket NAME` - bucket name (default `zen-seed-packed`).
- `--wipe` removes the per-bucket hub data dir and the shared minio-data
  dir before starting.
- `--module-count N` caps the set (0 = every module in snapshot-dir).

## Stage C - run a perf iteration

Repeat as often as you want; each run starts from the preserved baseline.

```
# Pack-on bucket
python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-packed --trace

# Pack-off bucket (for comparison)
python scripts/test_scripts/hub/run_minio_perf.py --bucket zen-seed-unpacked --trace
```

Steps:
1. Copies `--minio-seeded` (default `minio-seeded-baseline/`) over `minio-run/` so MinIO starts from a known state.
2. Wipes `hub-perf/` (unless `--no-wipe-hub`).
3. Starts MinIO and hub.
4. Provisions all modules, waits for `provisioned`, deprovisions, waits gone.
5. Stops everything cleanly.

Default mode is `--hub-enable-dehydration=false` so MinIO isn't modified; every
iteration exercises the hydrate-only path against the same baseline CAS. The
`--bucket` flag selects which seeded bucket (and therefore which pack mode)
to exercise.

Pass `--enable-dehydration` to run a full provision -> deprovision cycle that
includes re-upload (dehydrate) at deprovision time. Use this to measure the
dehydrate phase end-to-end against the seeded baseline. Note the seeded
baseline diverges after a `--enable-dehydration` run - re-copy `--minio-seeded`
or re-run `preserve_minio_state.py` if you want to compare to the pristine state.

After each run the hub log, structured zenserver logs, any utrace file, and a
`summary.json` with the run's timings are copied into
`perf-runs/<timestamp>_<bucket>/` so Stage C runs can be compared
post-hoc. Override the destination with `--archive-dir PATH`.

## Resetting between runs

- **Keep**: `s3-snapshot/`, `minio-seeded-baseline/`, `minio-seeded-packed/`. These are expensive to rebuild.
- **Discard freely**: `hub-a/`, `hubs/`, `hub-perf/`, `minio-data/`, `minio-run/`.

To force a fresh MinIO seed for one variant: delete the matching
`minio-seeded-<variant>/` and re-run Stage B + preserve (with the matching
`--dest`) in that worktree. To force a fresh S3 snapshot: delete
`s3-snapshot/` and re-run Stage A.