apps/docs/model-enhancement/context-extender.mdx


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233

---
title: "supermemory Infinite Chat"
description: "Build chat applications with unlimited context using supermemory's intelligent proxy"
tag: "BETA"
---

import GettingAPIKey from '/snippets/getting-api-key.mdx';

supermemory Infinite Chat is a powerful solution that gives your chat applications unlimited contextual memory. It works as a transparent proxy in front of your existing LLM provider, intelligently managing long conversations without requiring any changes to your application logic.

<img
  src="/images/infinite-context.png"
  alt="Infinite Context Diagram"
  className="rounded-lg shadow-lg"
/>

<Tabs>
  <Tab title="Key Features">
    <CardGroup cols={2}>
      <Card title="Unlimited Context" icon="infinity" color="#4F46E5">
        No more token limits - conversations can extend indefinitely
      </Card>
      <Card title="Zero Latency" icon="bolt" color="#10B981">
        Transparent proxying with negligible overhead
      </Card>
      <Card title="Cost Efficient" icon="coins" color="#F59E0B">
        Save up to 70% on token costs for long conversations
      </Card>
      <Card title="Provider Agnostic" icon="plug" color="#6366F1">
        Works with any OpenAI-compatible endpoint
      </Card>
    </CardGroup>
  </Tab>
</Tabs>

## Getting Started

To use the Infinite Chat endpoint, you need to:

### 1. Get a supermemory API key

<GettingAPIKey />

### 2. Add supermemory in front of any **OpenAI-Compatible** API URL

<CodeGroup>

```typescript Typescript
import OpenAI from "openai";

/**
 * Initialize the OpenAI client with supermemory proxy
 * @param {string} OPENAI_API_KEY - Your OpenAI API key
 * @param {string} SUPERMEMORY_API_KEY - Your supermemory API key
 * @returns {OpenAI} - Configured OpenAI client
 */
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.supermemory.ai/v3/https://api.openai.com/v1",
  headers: {
    "x-supermemory-api-key": process.env.SUPERMEMORY_API_KEY,
    "x-sm-user-id": "Your_users_id"
  },
});
```

```python Python
import openai
import os

# Configure the OpenAI client with supermemory proxy
openai.api_base = "https://api.supermemory.ai/v3/https://api.openai.com/v1"
openai.api_key = os.environ.get("OPENAI_API_KEY")  # Your regular OpenAI key
openai.default_headers = {
    "x-supermemory--api-key": os.environ.get("SUPERMEMORY_API_KEY"),  # Your supermemory key
}

# Create a chat completion with unlimited context
response = openai.ChatCompletion.create(
  model="gpt-5-nano",
  messages=[{"role": "user", "content": "Your message here"}]
)
```

</CodeGroup>

## How It Works

<Steps>
  <Step title="Transparent Proxying">
    All requests pass through supermemory to your chosen LLM provider with zero latency overhead.

    <img
      src="/images/transparent-proxy.png"
      alt="Transparent Proxy Diagram"
      className="my-4 rounded-md shadow"
    />
  </Step>
  <Step title="Intelligent Chunking">
    Long conversations are automatically broken down into optimized segments using our proprietary chunking algorithm that preserves semantic coherence.
  </Step>
  <Step title="Smart Retrieval">
    When conversations exceed token limits (20k+), supermemory intelligently retrieves the most relevant context from previous messages.
  </Step>
  <Step title="Automatic Token Management">
    The system intelligently balances token usage, ensuring optimal performance while minimizing costs.
  </Step>
</Steps>

## Performance Benefits

<Accordion title="Reduced Token Usage" defaultOpen icon="coins">
  Save up to 70% on token costs for long conversations through intelligent context management and caching.
</Accordion>

<Accordion title="Unlimited Context" icon="infinity">
  No more 8k/32k/128k token limits - conversations can extend indefinitely with supermemory's advanced retrieval system.
</Accordion>

<Accordion title="Improved Response Quality" icon="sparkles">
  Better context retrieval means more coherent responses even in very long threads, reducing hallucinations and inconsistencies.
</Accordion>

<Accordion title="Zero Performance Penalty" icon="bolt">
  The proxy adds negligible latency to your requests, ensuring fast response times for your users.
</Accordion>

## Pricing

<Tabs>
  <Tab title="Plans">
    <div className="mt-4">
      <div className="grid grid-cols-1 md:grid-cols-3 gap-4">
        <div className="p-4 border rounded-lg">
          <h3 className="text-lg font-bold">Free Tier</h3>
          <p className="text-sm text-gray-600 dark:text-gray-300">100k tokens stored at no cost</p>
        </div>
        <div className="p-4 border rounded-lg">
          <h3 className="text-lg font-bold">Standard Plan</h3>
          <p className="text-sm text-gray-600 dark:text-gray-300">$20/month fixed cost after exceeding free tier</p>
        </div>
        <div className="p-4 border rounded-lg">
          <h3 className="text-lg font-bold">Usage-Based</h3>
          <p className="text-sm text-gray-600 dark:text-gray-300">Each thread includes 20k free tokens, then $1 per million tokens thereafter</p>
        </div>
      </div>
    </div>
  </Tab>
  <Tab title="Comparison">
    <div className="mt-4">
      <table className="min-w-full divide-y divide-gray-200">
        <thead>
          <tr>
            <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
              Feature
            </th>
            <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
              Free
            </th>
            <th className="px-6 py-3 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
              Standard
            </th>
          </tr>
        </thead>
        <tbody className="divide-y divide-gray-200">
          <tr>
            <td className="px-6 py-4 whitespace-nowrap text-sm">
              Tokens Stored
            </td>
            <td className="px-6 py-4 whitespace-nowrap text-sm">
              100k
            </td>
            <td className="px-6 py-4 whitespace-nowrap text-sm">
              Unlimited
            </td>
          </tr>
          <tr>
            <td className="px-6 py-4 whitespace-nowrap text-sm">
              Conversations
            </td>
            <td className="px-6 py-4 whitespace-nowrap text-sm">
              10
            </td>
            <td className="px-6 py-4 whitespace-nowrap text-sm">
              Unlimited
            </td>
          </tr>
        </tbody>
      </table>
    </div>
  </Tab>
</Tabs>

## Error Handling

<Note>
  supermemory is designed with reliability as the top priority. If any issues occur within the supermemory processing pipeline, the system will automatically fall back to direct forwarding of your request to the LLM provider, ensuring zero downtime for your applications.
</Note>

Each response includes diagnostic headers that provide information about the processing:

| Header                           | Description                                                            |
| -------------------------------- | ---------------------------------------------------------------------- |
| `x-supermemory-conversation-id`  | Unique identifier for the conversation thread                          |
| `x-supermemory-context-modified` | Indicates whether supermemory modified the context ("true" or "false") |
| `x-supermemory-tokens-processed` | Number of tokens processed in this request                             |
| `x-supermemory-chunks-created`   | Number of new chunks created from this conversation                    |
| `x-supermemory-chunks-deleted`   | Number of chunks removed (if any)                                      |
| `x-supermemory-docs-deleted`     | Number of documents removed (if any)                                   |

If an error occurs, an additional header `x-supermemory-error` will be included with details about what went wrong. Your request will still be processed by the underlying LLM provider even if supermemory encounters an error.

## Rate Limiting

<Info>
  Currently, there are no rate limits specific to supermemory. Your requests are subject only to the rate limits of your underlying LLM provider.
</Info>

## Supported Models

supermemory works with any OpenAI-compatible API, including:

<CardGroup cols={3}>
  <Card title="OpenAI" icon="openai">
    GPT-3.5, GPT-4, GPT-4o
  </Card>
  <Card title="Anthropic" icon="user-astronaut">
    Claude 3 models
  </Card>
  <Card title="Other Providers" icon="plug">
    Any provider with an OpenAI-compatible endpoint
  </Card>
</CardGroup>