apps/docs/memorybench/cli.mdx


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

---
title: "CLI Reference"
description: "Command-line interface for running MemoryBench evaluations"
sidebarTitle: "CLI"
---

## Commands

### run

Execute the full benchmark pipeline.

```bash
bun run src/index.ts run -p <provider> -b <benchmark> -j <judge> -r <run-id>
```

| Option | Description |
|--------|-------------|
| `-p, --provider` | Memory provider (`supermemory`, `mem0`, `zep`) |
| `-b, --benchmark` | Benchmark (`locomo`, `longmemeval`, `convomem`) |
| `-j, --judge` | Judge model (default: `gpt-4o`) |
| `-r, --run-id` | Run identifier (auto-generated if omitted) |
| `-m, --answering-model` | Model for answer generation (default: `gpt-4o`) |
| `-l, --limit` | Limit number of questions |
| `-s, --sample` | Sample N questions per category |
| `--sample-type` | Sampling strategy: `consecutive` (default), `random` |
| `--force` | Clear checkpoint and restart |

See [Supported Models](/memorybench/supported-models) for all available judge and answering models.

---

### compare

Run benchmark across multiple providers in parallel.

```bash
bun run src/index.ts compare -p supermemory,mem0,zep -b locomo -j gpt-4o
```

---

### test

Evaluate a single question for debugging.

```bash
bun run src/index.ts test -r <run-id> -q <question-id>
```

---

### status

Check progress of a run.

```bash
bun run src/index.ts status -r <run-id>
```

---

### show-failures

Debug failed questions with full context.

```bash
bun run src/index.ts show-failures -r <run-id>
```

---

### list-questions

Browse benchmark questions.

```bash
bun run src/index.ts list-questions -b <benchmark>
```

---

### Random Sampling

Sample N questions per category with optional randomization.

```bash
bun run src/index.ts run -p supermemory -b longmemeval -s 3 --sample-type random
```

---

### serve

Start the web UI.

```bash
bun run src/index.ts serve
```

Opens at [http://localhost:3000](http://localhost:3000).

---

### help

Get help on providers, models, or benchmarks.

```bash
bun run src/index.ts help providers
bun run src/index.ts help models
bun run src/index.ts help benchmarks
```

## Checkpointing

Runs are saved to `data/runs/{runId}/` and automatically resume from the last successful phase. Use `--force` to restart.