---
title: "supermemory Infinite Chat"
description: "Build chat applications with unlimited context using supermemory's intelligent proxy"
tag: "BETA"
---
import GettingAPIKey from '/snippets/getting-api-key.mdx';
supermemory Infinite Chat is a powerful solution that gives your chat applications unlimited contextual memory. It works as a transparent proxy in front of your existing LLM provider, intelligently managing long conversations without requiring any changes to your application logic.
No more token limits - conversations can extend indefinitely
Transparent proxying with negligible overhead
Save up to 70% on token costs for long conversations
Works with any OpenAI-compatible endpoint
## Getting Started
To use the Infinite Chat endpoint, you need to:
### 1. Get a supermemory API key
### 2. Add supermemory in front of any **OpenAI-Compatible** API URL
```typescript Typescript
import OpenAI from "openai";
/**
* Initialize the OpenAI client with supermemory proxy
* @param {string} OPENAI_API_KEY - Your OpenAI API key
* @param {string} SUPERMEMORY_API_KEY - Your supermemory API key
* @returns {OpenAI} - Configured OpenAI client
*/
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://api.supermemory.ai/v3/https://api.openai.com/v1",
headers: {
"x-supermemory-api-key": process.env.SUPERMEMORY_API_KEY,
"x-sm-user-id": "Your_users_id"
},
});
```
```python Python
import openai
import os
# Configure the OpenAI client with supermemory proxy
openai.api_base = "https://api.supermemory.ai/v3/https://api.openai.com/v1"
openai.api_key = os.environ.get("OPENAI_API_KEY") # Your regular OpenAI key
openai.default_headers = {
"x-supermemory--api-key": os.environ.get("SUPERMEMORY_API_KEY"), # Your supermemory key
}
# Create a chat completion with unlimited context
response = openai.ChatCompletion.create(
model="gpt-5-nano",
messages=[{"role": "user", "content": "Your message here"}]
)
```
## How It Works
All requests pass through supermemory to your chosen LLM provider with zero latency overhead.
Long conversations are automatically broken down into optimized segments using our proprietary chunking algorithm that preserves semantic coherence.
When conversations exceed token limits (20k+), supermemory intelligently retrieves the most relevant context from previous messages.
The system intelligently balances token usage, ensuring optimal performance while minimizing costs.
## Performance Benefits
Save up to 70% on token costs for long conversations through intelligent context management and caching.
No more 8k/32k/128k token limits - conversations can extend indefinitely with supermemory's advanced retrieval system.
Better context retrieval means more coherent responses even in very long threads, reducing hallucinations and inconsistencies.
The proxy adds negligible latency to your requests, ensuring fast response times for your users.
## Pricing
Free Tier
100k tokens stored at no cost
Standard Plan
$20/month fixed cost after exceeding free tier
Usage-Based
Each thread includes 20k free tokens, then $1 per million tokens thereafter
Feature
Free
Standard
Tokens Stored
100k
Unlimited
Conversations
10
Unlimited
## Error Handling
supermemory is designed with reliability as the top priority. If any issues occur within the supermemory processing pipeline, the system will automatically fall back to direct forwarding of your request to the LLM provider, ensuring zero downtime for your applications.
Each response includes diagnostic headers that provide information about the processing:
| Header | Description |
| -------------------------------- | ---------------------------------------------------------------------- |
| `x-supermemory-conversation-id` | Unique identifier for the conversation thread |
| `x-supermemory-context-modified` | Indicates whether supermemory modified the context ("true" or "false") |
| `x-supermemory-tokens-processed` | Number of tokens processed in this request |
| `x-supermemory-chunks-created` | Number of new chunks created from this conversation |
| `x-supermemory-chunks-deleted` | Number of chunks removed (if any) |
| `x-supermemory-docs-deleted` | Number of documents removed (if any) |
If an error occurs, an additional header `x-supermemory-error` will be included with details about what went wrong. Your request will still be processed by the underlying LLM provider even if supermemory encounters an error.
## Rate Limiting
Currently, there are no rate limits specific to supermemory. Your requests are subject only to the rate limits of your underlying LLM provider.
## Supported Models
supermemory works with any OpenAI-compatible API, including:
GPT-3.5, GPT-4, GPT-4o
Claude 3 models
Any provider with an OpenAI-compatible endpoint