aboutsummaryrefslogtreecommitdiff
path: root/apps/docs/memorybench/cli.mdx
diff options
context:
space:
mode:
authorPrasanna <[email protected]>2025-12-24 11:01:26 -0800
committerGitHub <[email protected]>2025-12-24 11:01:26 -0800
commit7e79e227d3e755e4c6579e33c4e2d0018af64899 (patch)
tree308214a10c3e07c16481471791b80e6a1ec806f3 /apps/docs/memorybench/cli.mdx
parentconditional (diff)
downloadsupermemory-7e79e227d3e755e4c6579e33c4e2d0018af64899.tar.xz
supermemory-7e79e227d3e755e4c6579e33c4e2d0018af64899.zip
docs: added MemoryBench documentation (#630)
Diffstat (limited to 'apps/docs/memorybench/cli.mdx')
-rw-r--r--apps/docs/memorybench/cli.mdx117
1 files changed, 117 insertions, 0 deletions
diff --git a/apps/docs/memorybench/cli.mdx b/apps/docs/memorybench/cli.mdx
new file mode 100644
index 00000000..3ab5c503
--- /dev/null
+++ b/apps/docs/memorybench/cli.mdx
@@ -0,0 +1,117 @@
+---
+title: "CLI Reference"
+description: "Command-line interface for running MemoryBench evaluations"
+sidebarTitle: "CLI"
+---
+
+## Commands
+
+### run
+
+Execute the full benchmark pipeline.
+
+```bash
+bun run src/index.ts run -p <provider> -b <benchmark> -j <judge> -r <run-id>
+```
+
+| Option | Description |
+|--------|-------------|
+| `-p, --provider` | Memory provider (`supermemory`, `mem0`, `zep`) |
+| `-b, --benchmark` | Benchmark (`locomo`, `longmemeval`, `convomem`) |
+| `-j, --judge` | Judge model (default: `gpt-4o`) |
+| `-r, --run-id` | Run identifier (auto-generated if omitted) |
+| `-m, --answering-model` | Model for answer generation (default: `gpt-4o`) |
+| `-l, --limit` | Limit number of questions |
+| `-s, --sample` | Sample N questions per category |
+| `--sample-type` | Sampling strategy: `consecutive` (default), `random` |
+| `--force` | Clear checkpoint and restart |
+
+See [Supported Models](/memorybench/supported-models) for all available judge and answering models.
+
+---
+
+### compare
+
+Run benchmark across multiple providers in parallel.
+
+```bash
+bun run src/index.ts compare -p supermemory,mem0,zep -b locomo -j gpt-4o
+```
+
+---
+
+### test
+
+Evaluate a single question for debugging.
+
+```bash
+bun run src/index.ts test -r <run-id> -q <question-id>
+```
+
+---
+
+### status
+
+Check progress of a run.
+
+```bash
+bun run src/index.ts status -r <run-id>
+```
+
+---
+
+### show-failures
+
+Debug failed questions with full context.
+
+```bash
+bun run src/index.ts show-failures -r <run-id>
+```
+
+---
+
+### list-questions
+
+Browse benchmark questions.
+
+```bash
+bun run src/index.ts list-questions -b <benchmark>
+```
+
+---
+
+### Random Sampling
+
+Sample N questions per category with optional randomization.
+
+```bash
+bun run src/index.ts run -p supermemory -b longmemeval -s 3 --sample-type random
+```
+
+---
+
+### serve
+
+Start the web UI.
+
+```bash
+bun run src/index.ts serve
+```
+
+Opens at [http://localhost:3000](http://localhost:3000).
+
+---
+
+### help
+
+Get help on providers, models, or benchmarks.
+
+```bash
+bun run src/index.ts help providers
+bun run src/index.ts help models
+bun run src/index.ts help benchmarks
+```
+
+## Checkpointing
+
+Runs are saved to `data/runs/{runId}/` and automatically resume from the last successful phase. Use `--force` to restart.