aboutsummaryrefslogtreecommitdiff
path: root/apps/docs/memory-api/features
diff options
context:
space:
mode:
authorDhravya Shah <[email protected]>2025-09-28 16:42:06 -0700
committerDhravya Shah <[email protected]>2025-09-28 16:42:06 -0700
commit2093b316d9ecb9cfa9c550f436caee08e12f5d11 (patch)
tree07b87fbd48b0b38ef26b9d5f839ad8cd61d82331 /apps/docs/memory-api/features
parentMerge branch 'main' of https://github.com/supermemoryai/supermemory (diff)
downloadsupermemory-2093b316d9ecb9cfa9c550f436caee08e12f5d11.tar.xz
supermemory-2093b316d9ecb9cfa9c550f436caee08e12f5d11.zip
migrate docs to public
Diffstat (limited to 'apps/docs/memory-api/features')
-rw-r--r--apps/docs/memory-api/features/auto-multi-modal.mdx181
-rw-r--r--apps/docs/memory-api/features/content-cleaner.mdx86
-rw-r--r--apps/docs/memory-api/features/filtering.mdx297
-rw-r--r--apps/docs/memory-api/features/query-rewriting.mdx50
-rw-r--r--apps/docs/memory-api/features/reranking.mdx44
5 files changed, 658 insertions, 0 deletions
diff --git a/apps/docs/memory-api/features/auto-multi-modal.mdx b/apps/docs/memory-api/features/auto-multi-modal.mdx
new file mode 100644
index 00000000..df20c318
--- /dev/null
+++ b/apps/docs/memory-api/features/auto-multi-modal.mdx
@@ -0,0 +1,181 @@
+---
+title: "Auto Multi Modal"
+description: "supermemory automatically detects the content type of the document you are adding."
+icon: "sparkles"
+---
+
+supermemory is natively multi-modal, and can automatically detect the content type of the document you are adding.
+
+We use the best of breed tools to extract content from URLs, and process it for optimal memory storage.
+
+## Automatic Content Type Detection
+
+supermemory automatically detects the content type of the document you're adding. Simply pass your content to the API, and supermemory will handle the rest.
+
+<Tabs>
+ <Tab title="How It Works">
+ The content detection system analyzes:
+ - URL patterns and domains
+ - File extensions and MIME types
+ - Content structure and metadata
+ - Headers and response types
+ </Tab>
+ <Tab title="Best Practices">
+ <Accordion title="Content Type Best Practices" defaultOpen icon="sparkles">
+ 1. **Type Selection**
+ - Use `note` for simple text
+ - Use `webpage` for online content
+ - Use native types when possible
+
+ 2. **URL Content**
+ - Send clean URLs without tracking parameters
+ - Use article URLs, not homepage URLs
+ - Check URL accessibility before sending
+ </Accordion>
+
+ </Tab>
+</Tabs>
+
+### Quick Implementation
+
+All you need to do is pass the content to the `/documents` endpoint:
+
+<CodeGroup>
+
+```bash cURL
+curl https://api.supermemory.ai/v3/documents \
+ --request POST \
+ --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+ -d '{"content": "https://example.com/article"}'
+```
+
+```typescript
+await client.add.create({
+ content: "https://example.com/article",
+});
+```
+
+```python
+client.add.create(
+ content="https://example.com/article"
+)
+```
+
+</CodeGroup>
+
+<Note>
+ supermemory uses [Markdowner](https://md.dhr.wtf) to extract content from
+ URLs.
+</Note>
+
+## Supported Content Types
+
+supermemory supports a wide range of content formats to ensure versatility in memory creation:
+
+<Grid cols={2}>
+ <Card title="Text Content" icon="document-text">
+ - `note`: Plain text notes and documents
+ - Directly processes raw text content
+ - Automatically chunks content for optimal retrieval
+ - Preserves formatting and structure
+ </Card>
+
+ <Card title="Web Content" icon="globe">
+ - `webpage`: Web pages (just provide the URL)
+ - Intelligently extracts main content
+ - Preserves important metadata (title, description, images)
+ - Extracts OpenGraph metadata when available
+
+ - `tweet`: Twitter content
+ - Captures tweet text, media, and metadata
+ - Preserves thread structure if applicable
+
+ </Card>
+
+ <Card title="Document Types" icon="document">
+ - `pdf`: PDF files
+ - Extracts text content while maintaining structure
+ - Handles both searchable PDFs and scanned documents with OCR
+ - Preserves page breaks and formatting
+
+ - `google_doc`: Google Documents
+ - Seamlessly integrates with Google Docs API
+ - Maintains document formatting and structure
+ - Auto-updates when source document changes
+
+ - `notion_doc`: Notion pages
+ - Extracts content while preserving Notion's block structure
+ - Handles rich text formatting and embedded content
+
+ </Card>
+
+ <Card title="Media Types" icon="photo">
+ - `image`: Images with text content
+ - Advanced OCR for text extraction
+ - Visual content analysis and description
+
+ - `video`: Video content
+ - Transcription and content extraction
+ - Key frame analysis
+
+ </Card>
+</Grid>
+
+## Processing Pipeline
+
+<Steps>
+ <Step title="Content Detection">
+ supermemory automatically identifies the content type based on the input provided.
+ </Step>
+
+<Step title="Content Extraction">
+ Type-specific extractors process the content with: - Specialized parsing for
+ each format - Error handling with retries - Rate limit management
+</Step>
+
+ <Step title="AI Enhancement">
+ ```typescript
+ interface ProcessedContent {
+ content: string; // Extracted text
+ summary?: string; // AI-generated summary
+ tags?: string[]; // Extracted tags
+ categories?: string[]; // Content categories
+ }
+ ```
+ </Step>
+
+ <Step title="Chunking & Indexing">
+ - Sentence-level splitting
+ - 2-sentence overlap
+ - Context preservation
+ - Semantic coherence
+ </Step>
+</Steps>
+
+## Technical Specifications
+
+### Size Limits
+
+| Content Type | Max Size |
+| ------------ | -------- |
+| Text/Note | 1MB |
+| PDF | 10MB |
+| Image | 5MB |
+| Video | 100MB |
+| Web Page | N/A |
+| Google Doc | N/A |
+| Notion Page | N/A |
+| Tweet | N/A |
+
+### Processing Time
+
+| Content Type | Processing Time |
+| ------------ | --------------- |
+| Text/Note | Almost instant |
+| PDF | 1-5 seconds |
+| Image | 2-10 seconds |
+| Video | 10+ seconds |
+| Web Page | 1-3 seconds |
+| Google Doc | N/A |
+| Notion Page | N/A |
+| Tweet | N/A |
diff --git a/apps/docs/memory-api/features/content-cleaner.mdx b/apps/docs/memory-api/features/content-cleaner.mdx
new file mode 100644
index 00000000..e586c3dc
--- /dev/null
+++ b/apps/docs/memory-api/features/content-cleaner.mdx
@@ -0,0 +1,86 @@
+---
+title: "Cleaning and Categorizing"
+description: "Document Cleaning Summaries in supermemory"
+icon: "washing-machine"
+---
+
+supermemory provides advanced configuration options to customize your content processing pipeline. At its core is an AI-powered system that can automatically analyze, categorize, and filter your content based on your specific needs.
+
+## Configuration Schema
+
+```json
+{
+ "shouldLLMFilter": true,
+ "categories": ["feature-request", "bug-report", "positive", "negative"],
+ "filterPrompt": "Analyze feedback sentiment and identify feature requests",
+ "includeItems": ["critical", "high-priority"],
+ "excludeItems": ["spam", "irrelevant"]
+}
+```
+
+## Core Settings
+
+### shouldLLMFilter
+- **Type**: `boolean`
+- **Required**: No (defaults to `false`)
+- **Description**: Master switch for AI-powered content analysis. Must be enabled to use any of the advanced filtering features.
+
+### categories
+- **Type**: `string[]`
+- **Limits**: Each category must be 1-50 characters
+- **Required**: No
+- **Description**: Define custom categories for content classification. When specified, the AI will only use these categories. If not specified, it will generate 3-5 relevant categories automatically.
+
+### filterPrompt
+- **Type**: `string`
+- **Limits**: 1-750 characters
+- **Required**: No
+- **Description**: Custom instructions for the AI on how to analyze and categorize content. Use this to guide the categorization process based on your specific needs.
+
+### includeItems & excludeItems
+- **Type**: `string[]`
+- **Limits**: Each item must be 1-20 characters
+- **Required**: No
+- **Description**: Fine-tune content filtering by specifying items to explicitly include or exclude during processing.
+
+## Content Processing Pipeline
+
+When content is ingested with LLM filtering enabled:
+
+1. **Initial Processing**
+ - Content is extracted and normalized
+ - Basic metadata (title, description) is captured
+
+2. **AI Analysis**
+ - Content is analyzed based on your `filterPrompt`
+ - Categories are assigned (either from your predefined list or auto-generated)
+ - Tags are evaluated and scored
+
+3. **Chunking & Indexing**
+ - Content is split into semantic chunks
+ - Each chunk is embedded for efficient search
+ - Metadata and classifications are stored
+
+## Example Use Cases
+
+### 1. Customer Feedback System
+```json
+{
+ "shouldLLMFilter": true,
+ "categories": ["positive", "negative", "neutral"],
+ "filterPrompt": "Analyze customer sentiment and identify key themes",
+}
+```
+
+### 2. Content Moderation
+```json
+{
+ "shouldLLMFilter": true,
+ "categories": ["safe", "needs-review", "flagged"],
+ "filterPrompt": "Identify potentially inappropriate or sensitive content",
+ "excludeItems": ["spam", "offensive"],
+ "includeItems": ["user-generated"]
+}
+```
+
+> **Important**: All filtering features (`categories`, `filterPrompt`, `includeItems`, `excludeItems`) require `shouldLLMFilter` to be enabled. Attempting to use these features without enabling `shouldLLMFilter` will result in a 400 error.
diff --git a/apps/docs/memory-api/features/filtering.mdx b/apps/docs/memory-api/features/filtering.mdx
new file mode 100644
index 00000000..3873e606
--- /dev/null
+++ b/apps/docs/memory-api/features/filtering.mdx
@@ -0,0 +1,297 @@
+---
+title: "Filtering"
+description: "Learn how to filter content while searching from supermemory"
+icon: "list-filter-plus"
+---
+
+## Container Tag
+
+Container tag is an identifier for your end users, to group memories together..
+
+This can be:
+- A user using your product
+- An organization using a SaaS
+
+A project ID, or even a dynamic one like `user_project_etc`
+
+We recommend using single containerTag in all API requests.
+
+The graph is built on top of the Container Tags. For example, each user / tag in your supermemory account will have one single graph built for them.
+
+<CodeGroup>
+
+```bash cURL
+curl https://api.supermemory.ai/v3/search \
+ --request POST \
+ --header 'Content-Type: application/json' \
+ --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+ --data '{
+ "q": "machine learning",
+ "containerTags": ["user_123"]
+ }'
+```
+
+```typescript Typescript
+await client.search.execute({
+ q: "machine learning",
+ containerTags: ["user_123"],
+});
+```
+
+```python Python
+client.search.execute(
+ q="machine learning",
+ containerTags=["user_123"]
+)
+```
+
+</CodeGroup>
+
+## Metadata
+
+Sometimes, you might want to add metadata and do advanced filtering based on it.
+
+Using metadata filtering, you can search based on:
+
+- AND and OR conditions
+- String matching
+- Numeric matching
+- Date matching
+- Time range queries
+
+### Validation Rules & Limits
+
+To ensure optimal performance and security, the filtering system has the following limits:
+
+- **Metadata keys**: Must contain only alphanumeric characters, underscores, and hyphens (`/^[a-zA-Z0-9_-]+$/`)
+- **Metadata key length**: Maximum of 64 characters
+- **Maximum conditions**: Up to 200 conditions per query
+- **Maximum nesting depth**: Up to 8 levels of nested AND/OR expressions
+- **Valid operators**: `=`, `!=`, `<`, `<=`, `>`, `>=` for numeric filtering
+
+<Warning>
+These limits help prevent overly complex queries that could impact performance. If you need to filter on more conditions, consider breaking your query into multiple requests or using broader search terms with post-processing.
+</Warning>
+
+<CodeGroup>
+
+```bash cURL
+curl https://api.supermemory.ai/v3/search \
+ --request POST \
+ --header 'Content-Type: application/json' \
+ --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+ --data '{
+ "q": "machine learning",
+ "filters": {
+ "AND": [
+ {
+ "key": "category",
+ "value": "technology",
+ "negate": false
+ },
+ {
+ "filterType": "numeric",
+ "key": "readingTime",
+ "value": "5",
+ "negate": false,
+ "numericOperator": "<="
+ }
+ ]
+ }
+}'
+```
+
+```typescript Typescript
+await client.search.execute({
+ q: "machine learning",
+ filters: {
+ AND: [
+ {
+ key: "category",
+ value: "technology",
+ negate: false,
+ },
+ {
+ filterType: "numeric",
+ key: "readingTime",
+ value: "5",
+ negate: false,
+ numericOperator: "<=",
+ },
+ ],
+ },
+});
+```
+
+```python Python
+client.search.execute(
+ q="machine learning",
+ filters={
+ "AND": [
+ {
+ "key": "category",
+ "value": "technology",
+ "negate": false
+ },
+ {
+ "filterType": "numeric",
+ "key": "readingTime",
+ "value": "5",
+ "negate": false,
+ "numericOperator": "<="
+ }
+ ]
+ }
+)
+```
+
+</CodeGroup>
+
+## Array Contains Filtering
+
+You can filter memories by array values using the `array_contains` filter type. This is particularly useful for filtering by participants or other array-based metadata.
+
+First, create a memory with participants in the metadata:
+
+<CodeGroup>
+
+```bash cURL
+curl --location 'https://api.supermemory.ai/v3/documents' \
+--header 'Content-Type: application/json' \
+--header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+--data '{
+ "content": "quarterly planning meeting discussion",
+ "metadata": {
+ "participants": ["john.doe", "sarah.smith", "mike.wilson"]
+ }
+ }'
+```
+
+```typescript Typescript
+await client.memories.create({
+ content: "quarterly planning meeting discussion",
+ metadata: {
+ participants: ["john.doe", "sarah.smith", "mike.wilson"]
+ }
+});
+```
+
+```python Python
+client.memories.create(
+ content="quarterly planning meeting discussion",
+ metadata={
+ "participants": ["john.doe", "sarah.smith", "mike.wilson"]
+ }
+)
+```
+
+</CodeGroup>
+
+Then search using the `array_contains` filter:
+
+<CodeGroup>
+
+```bash cURL
+curl --location 'https://api.supermemory.ai/v3/search' \
+--header 'Content-Type: application/json' \
+--header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+--data '{
+ "q": "meeting",
+ "filters": {
+ "AND": [
+ {
+ "key": "participants",
+ "value": "john.doe",
+ "filterType": "array_contains"
+ }
+ ]
+ },
+ "limit": 5
+ }'
+```
+
+```typescript Typescript
+await client.search.execute({
+ q: "meeting",
+ filters: {
+ AND: [
+ {
+ key: "participants",
+ value: "john.doe",
+ filterType: "array_contains"
+ }
+ ]
+ },
+ limit: 5
+});
+```
+
+```python Python
+client.search.execute(
+ q="meeting",
+ filters={
+ "AND": [
+ {
+ "key": "participants",
+ "value": "john.doe",
+ "filterType": "array_contains"
+ }
+ ]
+ },
+ limit=5
+)
+```
+
+</CodeGroup>
+
+## Migration Notes
+
+<Note>
+**Breaking Changes**: Recent updates to the filtering system have introduced stricter validation rules. If you're experiencing filter validation errors, please check the following:
+
+1. **Metadata Key Format**: Ensure all metadata keys only contain alphanumeric characters, underscores, and hyphens. Keys with spaces, dots, or other special characters will now fail validation.
+
+2. **Key Length**: Metadata keys must be 64 characters or fewer.
+
+3. **Filter Complexity**: Queries with more than 200 conditions or more than 8 levels of nesting will be rejected.
+
+**Example of invalid keys that need updating**:
+- `"user.email"` → `"user_email"`
+- `"reading time"` → `"reading_time"`
+- `"category-with-very-long-name-that-exceeds-the-limit"` → `"category_name"`
+</Note>
+
+## Document
+
+You can also find chunks within a specific, large document.
+
+This can be particularly useful for extremely large documents like Books, Podcasts, etc.
+
+<CodeGroup>
+
+```bash cURL
+curl https://api.supermemory.ai/v3/search \
+ --request POST \
+ --header 'Content-Type: application/json' \
+ --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+ --data '{
+ "q": "machine learning",
+ "docId": "doc_123"
+ }'
+```
+
+```typescript Typescript
+await client.search.execute({
+ q: "machine learning",
+ docId: "doc_123",
+});
+```
+
+```python Python
+client.search.execute(
+ q="machine learning",
+ docId="doc_123"
+)
+```
+
+</CodeGroup>
diff --git a/apps/docs/memory-api/features/query-rewriting.mdx b/apps/docs/memory-api/features/query-rewriting.mdx
new file mode 100644
index 00000000..9508297a
--- /dev/null
+++ b/apps/docs/memory-api/features/query-rewriting.mdx
@@ -0,0 +1,50 @@
+---
+title: "Query Rewriting"
+description: "Query Rewriting in supermemory"
+icon: "blend"
+---
+
+Query Rewriting is a feature that allows you to rewrite queries to make them more accurate.
+
+![Query Rewriting](/images/query-rewriting.png)
+
+### Usage
+
+In supermemory, you can enable query rewriting by setting the `rewriteQuery` parameter to `true` in the search API.
+
+<CodeGroup>
+
+```bash cURL
+curl https://api.supermemory.ai/v3/search \
+ --request POST \
+ --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
+ --header 'Content-Type: application/json' \
+ -d '{
+ "q": "What is the capital of France?",
+ "rewriteQuery": true
+ }'
+```
+
+```typescript
+await client.search.create({
+ q: "What is the capital of France?",
+ rewriteQuery: true,
+});
+```
+
+```python
+client.search.create(
+ q="What is the capital of France?",
+ rewriteQuery=True
+)
+```
+
+</CodeGroup>
+
+### Notes and limitations
+
+- supermemory generates multiple rewrites, and runs the search through all of them.
+- The results are then merged and returned to you.
+- There is no additional costs associated with query rewriting.
+- While query rewriting makes the quality much better, it also **incurs additional latency**.
+- All other features like filtering, hybrid search, recency bias, etc. work with rewritten results as well.
diff --git a/apps/docs/memory-api/features/reranking.mdx b/apps/docs/memory-api/features/reranking.mdx
new file mode 100644
index 00000000..1df8a9c5
--- /dev/null
+++ b/apps/docs/memory-api/features/reranking.mdx
@@ -0,0 +1,44 @@
+---
+title: "Reranking"
+description: "Reranked search results in supermemory"
+icon: "chart-bar-increasing"
+---
+
+Reranking is a feature that allows you to rerank search results based on the query.
+
+![Reranking](/images/rerank.png)
+
+### Usage
+
+In supermemory, you can enable answer rewriting by setting the `rerank` parameter to `true` in the search API.
+
+<CodeGroup>
+
+```bash cURL
+curl https://api.supermemory.ai/v3/search?q=What+is+the+capital+of+France?&rerank=true \
+ --request GET \
+ --header 'Authorization: Bearer SUPERMEMORY_API_KEY'
+```
+
+```typescript
+await client.search.create({
+ q: "What is the capital of France?",
+ rerank: true,
+});
+```
+
+```python
+client.search.create(
+ q="What is the capital of France?",
+ rerank=True
+)
+```
+
+</CodeGroup>
+
+### Notes and limitations
+
+- We currently use `bge-reranker-base` model for reranking.
+- There is no additional costs associated with reranking.
+- While reranking makes the quality much better, it also **incurs additional latency**.
+- All other features like filtering, hybrid search, recency bias, etc. work with reranked results as well.