diff options
| author | Prasanna <[email protected]> | 2025-12-24 11:01:26 -0800 |
|---|---|---|
| committer | GitHub <[email protected]> | 2025-12-24 11:01:26 -0800 |
| commit | 7e79e227d3e755e4c6579e33c4e2d0018af64899 (patch) | |
| tree | 308214a10c3e07c16481471791b80e6a1ec806f3 /apps/docs/memorybench/cli.mdx | |
| parent | conditional (diff) | |
| download | supermemory-7e79e227d3e755e4c6579e33c4e2d0018af64899.tar.xz supermemory-7e79e227d3e755e4c6579e33c4e2d0018af64899.zip | |
docs: added MemoryBench documentation (#630)
Diffstat (limited to 'apps/docs/memorybench/cli.mdx')
| -rw-r--r-- | apps/docs/memorybench/cli.mdx | 117 |
1 files changed, 117 insertions, 0 deletions
diff --git a/apps/docs/memorybench/cli.mdx b/apps/docs/memorybench/cli.mdx new file mode 100644 index 00000000..3ab5c503 --- /dev/null +++ b/apps/docs/memorybench/cli.mdx @@ -0,0 +1,117 @@ +--- +title: "CLI Reference" +description: "Command-line interface for running MemoryBench evaluations" +sidebarTitle: "CLI" +--- + +## Commands + +### run + +Execute the full benchmark pipeline. + +```bash +bun run src/index.ts run -p <provider> -b <benchmark> -j <judge> -r <run-id> +``` + +| Option | Description | +|--------|-------------| +| `-p, --provider` | Memory provider (`supermemory`, `mem0`, `zep`) | +| `-b, --benchmark` | Benchmark (`locomo`, `longmemeval`, `convomem`) | +| `-j, --judge` | Judge model (default: `gpt-4o`) | +| `-r, --run-id` | Run identifier (auto-generated if omitted) | +| `-m, --answering-model` | Model for answer generation (default: `gpt-4o`) | +| `-l, --limit` | Limit number of questions | +| `-s, --sample` | Sample N questions per category | +| `--sample-type` | Sampling strategy: `consecutive` (default), `random` | +| `--force` | Clear checkpoint and restart | + +See [Supported Models](/memorybench/supported-models) for all available judge and answering models. + +--- + +### compare + +Run benchmark across multiple providers in parallel. + +```bash +bun run src/index.ts compare -p supermemory,mem0,zep -b locomo -j gpt-4o +``` + +--- + +### test + +Evaluate a single question for debugging. + +```bash +bun run src/index.ts test -r <run-id> -q <question-id> +``` + +--- + +### status + +Check progress of a run. + +```bash +bun run src/index.ts status -r <run-id> +``` + +--- + +### show-failures + +Debug failed questions with full context. + +```bash +bun run src/index.ts show-failures -r <run-id> +``` + +--- + +### list-questions + +Browse benchmark questions. + +```bash +bun run src/index.ts list-questions -b <benchmark> +``` + +--- + +### Random Sampling + +Sample N questions per category with optional randomization. + +```bash +bun run src/index.ts run -p supermemory -b longmemeval -s 3 --sample-type random +``` + +--- + +### serve + +Start the web UI. + +```bash +bun run src/index.ts serve +``` + +Opens at [http://localhost:3000](http://localhost:3000). + +--- + +### help + +Get help on providers, models, or benchmarks. + +```bash +bun run src/index.ts help providers +bun run src/index.ts help models +bun run src/index.ts help benchmarks +``` + +## Checkpointing + +Runs are saved to `data/runs/{runId}/` and automatically resume from the last successful phase. Use `--force` to restart. |