aboutsummaryrefslogtreecommitdiff
path: root/apps/docs/memory-router/overview.mdx
blob: 9ed0ba992d56547925eff83137f028ecd2731755 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: "Overview"
description: "Transform any LLM into an intelligent agent with unlimited context and persistent memory"
sidebarTitle: "Overview"
---

The Memory Router is a transparent proxy that sits between your application and your LLM provider, automatically managing context and memories without requiring any code changes.

<Note>
**Live Demo**: Try the Memory Router at [supermemory.chat](https://supermemory.chat) to see it in action.
</Note>

<Tip>
**Using Vercel AI SDK?** Check out our [AI SDK integration](/integrations/ai-sdk) for the cleanest implementation with `@supermemory/tools/ai-sdk` - it's our recommended approach for new projects.
</Tip>

## What is the Memory Router?

The Memory Router gives your LLM applications:

- **Unlimited Context**: No more token limits - conversations can extend indefinitely
- **Automatic Memory Management**: Intelligently chunks, stores, and retrieves relevant context
- **Zero Code Changes**: Works with your existing OpenAI-compatible clients
- **Cost Optimization**: Save up to 70% on token costs through intelligent context management

## How It Works

<Steps>
  <Step title="Proxy Request">
    Your application sends requests to Supermemory instead of directly to your LLM provider
  </Step>

  <Step title="Context Management">
    Supermemory automatically:
    - Removes unnecessary context from long conversations
    - Searches relevant memories from previous interactions
    - Appends the most relevant context to your prompt
  </Step>

  <Step title="Forward to LLM">
    The optimized request is forwarded to your chosen LLM provider
  </Step>

  <Step title="Async Memory Creation">
    New memories are created asynchronously without blocking the response
  </Step>
</Steps>

## Key Benefits

### For Developers

- **Drop-in Integration**: Just change your base URL - no other code changes needed
- **Provider Agnostic**: Works with OpenAI, Anthropic, Google, Groq, and more
- **Shared Memory Pool**: Memories created via API are available to the Router and vice versa
- **Automatic Fallback**: If Supermemory has issues, requests pass through directly

### For Applications

- **Better Long Conversations**: Maintains context even after thousands of messages
- **Consistent Responses**: Memories ensure consistent information across sessions
- **Smart Retrieval**: Only relevant context is included, improving response quality
- **Cost Savings**: Automatic chunking reduces token usage significantly

## When to Use the Memory Router

The Memory Router is ideal for:

<Tabs>
  <Tab title="Perfect For">
    - **Chat Applications**: Customer support, AI assistants, chatbots
    - **Long Conversations**: Sessions that exceed model context windows
    - **Multi-Session Memory**: Users who return and continue conversations
    - **Quick Prototypes**: Get memory capabilities without building infrastructure
  </Tab>

  <Tab title="Consider API Instead">
    - **Custom Retrieval Logic**: Need specific control over what memories to fetch
    - **Non-Conversational Use**: Document processing, analysis tools
    - **Complex Filtering**: Need advanced metadata filtering
    - **Batch Operations**: Processing multiple documents at once
  </Tab>
</Tabs>

## Supported Providers

The Memory Router works with any OpenAI-compatible endpoint:

| Provider | Base URL | Status |
|----------|----------|---------|
| OpenAI | `api.openai.com/v1` | ✅ Fully Supported |
| Anthropic | `api.anthropic.com/v1` | ✅ Fully Supported |
| Google Gemini | `generativelanguage.googleapis.com/v1beta/openai` | ✅ Fully Supported |
| Groq | `api.groq.com/openai/v1` | ✅ Fully Supported |
| DeepInfra | `api.deepinfra.com/v1/openai` | ✅ Fully Supported |
| OpenRouter | `openrouter.ai/api/v1` | ✅ Fully Supported |
| Custom | Any OpenAI-compatible | ✅ Supported |

<Warning>
**Not Yet Supported**:
- OpenAI Assistants API (`/v1/assistants`)
</Warning>

## Authentication

The Memory Router requires two API keys:

1. **Supermemory API Key**: For memory management
2. **Provider API Key**: For your chosen LLM provider

You can provide these via:
- Headers (recommended for production)
- URL parameters (useful for testing)
- Request body (for compatibility)

## How Memories Work

When using the Memory Router:

1. **Automatic Extraction**: Important information from conversations is automatically extracted
2. **Intelligent Chunking**: Long messages are split into semantic chunks
3. **Relationship Building**: New memories connect to existing knowledge
4. **Smart Retrieval**: Only the most relevant memories are included in context

<Note>
Memories are shared between the Memory Router and Memory API when using the same `user_id`, allowing you to use both together.
</Note>

## Response Headers

The Memory Router adds diagnostic headers to help you understand what's happening:

| Header | Description |
|--------|-------------|
| `x-supermemory-conversation-id` | Unique conversation identifier |
| `x-supermemory-context-modified` | Whether context was modified (`true`/`false`) |
| `x-supermemory-tokens-processed` | Number of tokens processed |
| `x-supermemory-chunks-created` | New memory chunks created |
| `x-supermemory-chunks-retrieved` | Memory chunks added to context |

## Error Handling

The Memory Router is designed for reliability:

- **Automatic Fallback**: If Supermemory encounters an error, your request passes through unmodified
- **Error Headers**: `x-supermemory-error` header provides error details
- **Zero Downtime**: Your application continues working even if memory features are unavailable

## Rate Limits & Pricing

### Rate Limits
- No Supermemory-specific rate limits
- Subject only to your LLM provider's limits

### Pricing
- **Free Tier**: 100k tokens stored at no cost
- **Standard Plan**: $20/month after free tier
- **Usage-Based**: Each conversation includes 20k free tokens, then $1 per million tokens