diff options
| author | Dhravya Shah <[email protected]> | 2025-09-10 19:13:33 -0700 |
|---|---|---|
| committer | Dhravya Shah <[email protected]> | 2025-09-10 19:13:33 -0700 |
| commit | daa5d039f3e6a61c6487788aee5c5c3c158fe81f (patch) | |
| tree | c35fb478582338d7830041fe7071ba9385613aea /apps/docs/memory-api/features | |
| parent | feat: pro subscriber email config (#417) (diff) | |
| download | supermemory-daa5d039f3e6a61c6487788aee5c5c3c158fe81f.tar.xz supermemory-daa5d039f3e6a61c6487788aee5c5c3c158fe81f.zip | |
make docs public
Diffstat (limited to 'apps/docs/memory-api/features')
| -rw-r--r-- | apps/docs/memory-api/features/auto-multi-modal.mdx | 181 | ||||
| -rw-r--r-- | apps/docs/memory-api/features/content-cleaner.mdx | 86 | ||||
| -rw-r--r-- | apps/docs/memory-api/features/filtering.mdx | 266 | ||||
| -rw-r--r-- | apps/docs/memory-api/features/query-rewriting.mdx | 50 | ||||
| -rw-r--r-- | apps/docs/memory-api/features/reranking.mdx | 44 |
5 files changed, 627 insertions, 0 deletions
diff --git a/apps/docs/memory-api/features/auto-multi-modal.mdx b/apps/docs/memory-api/features/auto-multi-modal.mdx new file mode 100644 index 00000000..18a91135 --- /dev/null +++ b/apps/docs/memory-api/features/auto-multi-modal.mdx @@ -0,0 +1,181 @@ +--- +title: "Auto Multi Modal" +description: "supermemory automatically detects the content type of the document you are adding." +icon: "sparkles" +--- + +supermemory is natively multi-modal, and can automatically detect the content type of the document you are adding. + +We use the best of breed tools to extract content from URLs, and process it for optimal memory storage. + +## Automatic Content Type Detection + +supermemory automatically detects the content type of the document you're adding. Simply pass your content to the API, and supermemory will handle the rest. + +<Tabs> + <Tab title="How It Works"> + The content detection system analyzes: + - URL patterns and domains + - File extensions and MIME types + - Content structure and metadata + - Headers and response types + </Tab> + <Tab title="Best Practices"> + <Accordion title="Content Type Best Practices" defaultOpen icon="sparkles"> + 1. **Type Selection** + - Use `note` for simple text + - Use `webpage` for online content + - Use native types when possible + + 2. **URL Content** + - Send clean URLs without tracking parameters + - Use article URLs, not homepage URLs + - Check URL accessibility before sending + </Accordion> + + </Tab> +</Tabs> + +### Quick Implementation + +All you need to do is pass the content to the `/memories` endpoint: + +<CodeGroup> + +```bash cURL +curl https://api.supermemory.ai/v3/memories \ + --request POST \ + --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ + -d '{"content": "https://example.com/article"}' +``` + +```typescript +await client.add.create({ + content: "https://example.com/article", +}); +``` + +```python +client.add.create( + content="https://example.com/article" +) +``` + +</CodeGroup> + +<Note> + supermemory uses [Markdowner](https://md.dhr.wtf) to extract content from + URLs. +</Note> + +## Supported Content Types + +supermemory supports a wide range of content formats to ensure versatility in memory creation: + +<Grid cols={2}> + <Card title="Text Content" icon="document-text"> + - `note`: Plain text notes and documents + - Directly processes raw text content + - Automatically chunks content for optimal retrieval + - Preserves formatting and structure + </Card> + + <Card title="Web Content" icon="globe"> + - `webpage`: Web pages (just provide the URL) + - Intelligently extracts main content + - Preserves important metadata (title, description, images) + - Extracts OpenGraph metadata when available + + - `tweet`: Twitter content + - Captures tweet text, media, and metadata + - Preserves thread structure if applicable + + </Card> + + <Card title="Document Types" icon="document"> + - `pdf`: PDF files + - Extracts text content while maintaining structure + - Handles both searchable PDFs and scanned documents with OCR + - Preserves page breaks and formatting + + - `google_doc`: Google Documents + - Seamlessly integrates with Google Docs API + - Maintains document formatting and structure + - Auto-updates when source document changes + + - `notion_doc`: Notion pages + - Extracts content while preserving Notion's block structure + - Handles rich text formatting and embedded content + + </Card> + + <Card title="Media Types" icon="photo"> + - `image`: Images with text content + - Advanced OCR for text extraction + - Visual content analysis and description + + - `video`: Video content + - Transcription and content extraction + - Key frame analysis + + </Card> +</Grid> + +## Processing Pipeline + +<Steps> + <Step title="Content Detection"> + supermemory automatically identifies the content type based on the input provided. + </Step> + +<Step title="Content Extraction"> + Type-specific extractors process the content with: - Specialized parsing for + each format - Error handling with retries - Rate limit management +</Step> + + <Step title="AI Enhancement"> + ```typescript + interface ProcessedContent { + content: string; // Extracted text + summary?: string; // AI-generated summary + tags?: string[]; // Extracted tags + categories?: string[]; // Content categories + } + ``` + </Step> + + <Step title="Chunking & Indexing"> + - Sentence-level splitting + - 2-sentence overlap + - Context preservation + - Semantic coherence + </Step> +</Steps> + +## Technical Specifications + +### Size Limits + +| Content Type | Max Size | +| ------------ | -------- | +| Text/Note | 1MB | +| PDF | 10MB | +| Image | 5MB | +| Video | 100MB | +| Web Page | N/A | +| Google Doc | N/A | +| Notion Page | N/A | +| Tweet | N/A | + +### Processing Time + +| Content Type | Processing Time | +| ------------ | --------------- | +| Text/Note | Almost instant | +| PDF | 1-5 seconds | +| Image | 2-10 seconds | +| Video | 10+ seconds | +| Web Page | 1-3 seconds | +| Google Doc | N/A | +| Notion Page | N/A | +| Tweet | N/A | diff --git a/apps/docs/memory-api/features/content-cleaner.mdx b/apps/docs/memory-api/features/content-cleaner.mdx new file mode 100644 index 00000000..e586c3dc --- /dev/null +++ b/apps/docs/memory-api/features/content-cleaner.mdx @@ -0,0 +1,86 @@ +--- +title: "Cleaning and Categorizing" +description: "Document Cleaning Summaries in supermemory" +icon: "washing-machine" +--- + +supermemory provides advanced configuration options to customize your content processing pipeline. At its core is an AI-powered system that can automatically analyze, categorize, and filter your content based on your specific needs. + +## Configuration Schema + +```json +{ + "shouldLLMFilter": true, + "categories": ["feature-request", "bug-report", "positive", "negative"], + "filterPrompt": "Analyze feedback sentiment and identify feature requests", + "includeItems": ["critical", "high-priority"], + "excludeItems": ["spam", "irrelevant"] +} +``` + +## Core Settings + +### shouldLLMFilter +- **Type**: `boolean` +- **Required**: No (defaults to `false`) +- **Description**: Master switch for AI-powered content analysis. Must be enabled to use any of the advanced filtering features. + +### categories +- **Type**: `string[]` +- **Limits**: Each category must be 1-50 characters +- **Required**: No +- **Description**: Define custom categories for content classification. When specified, the AI will only use these categories. If not specified, it will generate 3-5 relevant categories automatically. + +### filterPrompt +- **Type**: `string` +- **Limits**: 1-750 characters +- **Required**: No +- **Description**: Custom instructions for the AI on how to analyze and categorize content. Use this to guide the categorization process based on your specific needs. + +### includeItems & excludeItems +- **Type**: `string[]` +- **Limits**: Each item must be 1-20 characters +- **Required**: No +- **Description**: Fine-tune content filtering by specifying items to explicitly include or exclude during processing. + +## Content Processing Pipeline + +When content is ingested with LLM filtering enabled: + +1. **Initial Processing** + - Content is extracted and normalized + - Basic metadata (title, description) is captured + +2. **AI Analysis** + - Content is analyzed based on your `filterPrompt` + - Categories are assigned (either from your predefined list or auto-generated) + - Tags are evaluated and scored + +3. **Chunking & Indexing** + - Content is split into semantic chunks + - Each chunk is embedded for efficient search + - Metadata and classifications are stored + +## Example Use Cases + +### 1. Customer Feedback System +```json +{ + "shouldLLMFilter": true, + "categories": ["positive", "negative", "neutral"], + "filterPrompt": "Analyze customer sentiment and identify key themes", +} +``` + +### 2. Content Moderation +```json +{ + "shouldLLMFilter": true, + "categories": ["safe", "needs-review", "flagged"], + "filterPrompt": "Identify potentially inappropriate or sensitive content", + "excludeItems": ["spam", "offensive"], + "includeItems": ["user-generated"] +} +``` + +> **Important**: All filtering features (`categories`, `filterPrompt`, `includeItems`, `excludeItems`) require `shouldLLMFilter` to be enabled. Attempting to use these features without enabling `shouldLLMFilter` will result in a 400 error. diff --git a/apps/docs/memory-api/features/filtering.mdx b/apps/docs/memory-api/features/filtering.mdx new file mode 100644 index 00000000..cde6ee4a --- /dev/null +++ b/apps/docs/memory-api/features/filtering.mdx @@ -0,0 +1,266 @@ +--- +title: "Filtering" +description: "Learn how to filter content while searching from supermemory" +icon: "list-filter-plus" +--- + +## Container Tag + +Container tag is an identifier for your end users, to group memories together.. + +This can be: +- A user using your product +- An organization using a SaaS + +A project ID, or even a dynamic one like `user_project_etc` + +We recommend using single containerTag in all API requests. + +The graph is built on top of the Container Tags. For example, each user / tag in your supermemory account will have one single graph built for them. + +<CodeGroup> + +```bash cURL +curl https://api.supermemory.ai/v3/search \ + --request POST \ + --header 'Content-Type: application/json' \ + --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ + --data '{ + "q": "machine learning", + "containerTags": ["user_123"] + }' +``` + +```typescript Typescript +await client.search.execute({ + q: "machine learning", + containerTags: ["user_123"], +}); +``` + +```python Python +client.search.execute( + q="machine learning", + containerTags=["user_123"] +) +``` + +</CodeGroup> + +## Metadata + +Sometimes, you might want to add metadata and do advanced filtering based on it. + +Using metadata filtering, you can search based on: + +- AND and OR conditions +- String matching +- Numeric matching +- Date matching +- Time range queries + +<CodeGroup> + +```bash cURL +curl https://api.supermemory.ai/v3/search \ + --request POST \ + --header 'Content-Type: application/json' \ + --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ + --data '{ + "q": "machine learning", + "filters": { + "AND": [ + { + "key": "category", + "value": "technology", + "negate": false + }, + { + "filterType": "numeric", + "key": "readingTime", + "value": "5", + "negate": false, + "numericOperator": "<=" + } + ] + } +}' +``` + +```typescript Typescript +await client.search.execute({ + q: "machine learning", + filters: { + AND: [ + { + key: "category", + value: "technology", + negate: false, + }, + { + filterType: "numeric", + key: "readingTime", + value: "5", + negate: false, + numericOperator: "<=", + }, + ], + }, +}); +``` + +```python Python +client.search.execute( + q="machine learning", + filters={ + "AND": [ + { + "key": "category", + "value": "technology", + "negate": false + }, + { + "filterType": "numeric", + "key": "readingTime", + "value": "5", + "negate": false, + "numericOperator": "<=" + } + ] + } +) +``` + +</CodeGroup> + +## Array Contains Filtering + +You can filter memories by array values using the `array_contains` filter type. This is particularly useful for filtering by participants or other array-based metadata. + +First, create a memory with participants in the metadata: + +<CodeGroup> + +```bash cURL +curl --location 'https://api.supermemory.ai/v3/memories' \ +--header 'Content-Type: application/json' \ +--header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ +--data '{ + "content": "quarterly planning meeting discussion", + "metadata": { + "participants": ["john.doe", "sarah.smith", "mike.wilson"] + } + }' +``` + +```typescript Typescript +await client.memories.create({ + content: "quarterly planning meeting discussion", + metadata: { + participants: ["john.doe", "sarah.smith", "mike.wilson"] + } +}); +``` + +```python Python +client.memories.create( + content="quarterly planning meeting discussion", + metadata={ + "participants": ["john.doe", "sarah.smith", "mike.wilson"] + } +) +``` + +</CodeGroup> + +Then search using the `array_contains` filter: + +<CodeGroup> + +```bash cURL +curl --location 'https://api.supermemory.ai/v3/search' \ +--header 'Content-Type: application/json' \ +--header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ +--data '{ + "q": "meeting", + "filters": { + "AND": [ + { + "key": "participants", + "value": "john.doe", + "filterType": "array_contains" + } + ] + }, + "limit": 5 + }' +``` + +```typescript Typescript +await client.search.execute({ + q: "meeting", + filters: { + AND: [ + { + key: "participants", + value: "john.doe", + filterType: "array_contains" + } + ] + }, + limit: 5 +}); +``` + +```python Python +client.search.execute( + q="meeting", + filters={ + "AND": [ + { + "key": "participants", + "value": "john.doe", + "filterType": "array_contains" + } + ] + }, + limit=5 +) +``` + +</CodeGroup> + +## Document + +You can also find chunks within a specific, large document. + +This can be particularly useful for extremely large documents like Books, Podcasts, etc. + +<CodeGroup> + +```bash cURL +curl https://api.supermemory.ai/v3/search \ + --request POST \ + --header 'Content-Type: application/json' \ + --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ + --data '{ + "q": "machine learning", + "docId": "doc_123" + }' +``` + +```typescript Typescript +await client.search.execute({ + q: "machine learning", + docId: "doc_123", +}); +``` + +```python Python +client.search.execute( + q="machine learning", + docId="doc_123" +) +``` + +</CodeGroup> diff --git a/apps/docs/memory-api/features/query-rewriting.mdx b/apps/docs/memory-api/features/query-rewriting.mdx new file mode 100644 index 00000000..9508297a --- /dev/null +++ b/apps/docs/memory-api/features/query-rewriting.mdx @@ -0,0 +1,50 @@ +--- +title: "Query Rewriting" +description: "Query Rewriting in supermemory" +icon: "blend" +--- + +Query Rewriting is a feature that allows you to rewrite queries to make them more accurate. + + + +### Usage + +In supermemory, you can enable query rewriting by setting the `rewriteQuery` parameter to `true` in the search API. + +<CodeGroup> + +```bash cURL +curl https://api.supermemory.ai/v3/search \ + --request POST \ + --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ + --header 'Content-Type: application/json' \ + -d '{ + "q": "What is the capital of France?", + "rewriteQuery": true + }' +``` + +```typescript +await client.search.create({ + q: "What is the capital of France?", + rewriteQuery: true, +}); +``` + +```python +client.search.create( + q="What is the capital of France?", + rewriteQuery=True +) +``` + +</CodeGroup> + +### Notes and limitations + +- supermemory generates multiple rewrites, and runs the search through all of them. +- The results are then merged and returned to you. +- There is no additional costs associated with query rewriting. +- While query rewriting makes the quality much better, it also **incurs additional latency**. +- All other features like filtering, hybrid search, recency bias, etc. work with rewritten results as well. diff --git a/apps/docs/memory-api/features/reranking.mdx b/apps/docs/memory-api/features/reranking.mdx new file mode 100644 index 00000000..1df8a9c5 --- /dev/null +++ b/apps/docs/memory-api/features/reranking.mdx @@ -0,0 +1,44 @@ +--- +title: "Reranking" +description: "Reranked search results in supermemory" +icon: "chart-bar-increasing" +--- + +Reranking is a feature that allows you to rerank search results based on the query. + + + +### Usage + +In supermemory, you can enable answer rewriting by setting the `rerank` parameter to `true` in the search API. + +<CodeGroup> + +```bash cURL +curl https://api.supermemory.ai/v3/search?q=What+is+the+capital+of+France?&rerank=true \ + --request GET \ + --header 'Authorization: Bearer SUPERMEMORY_API_KEY' +``` + +```typescript +await client.search.create({ + q: "What is the capital of France?", + rerank: true, +}); +``` + +```python +client.search.create( + q="What is the capital of France?", + rerank=True +) +``` + +</CodeGroup> + +### Notes and limitations + +- We currently use `bge-reranker-base` model for reranking. +- There is no additional costs associated with reranking. +- While reranking makes the quality much better, it also **incurs additional latency**. +- All other features like filtering, hybrid search, recency bias, etc. work with reranked results as well. |