From 90fd19f2156e28845d9288ea8ffc2d7d9573b77a Mon Sep 17 00:00:00 2001 From: Dhravya Shah Date: Sat, 13 Sep 2025 22:09:40 -0700 Subject: update: Readme --- apps/docs/memory-api/features/auto-multi-modal.mdx | 181 -------------- apps/docs/memory-api/features/content-cleaner.mdx | 86 ------- apps/docs/memory-api/features/filtering.mdx | 266 --------------------- apps/docs/memory-api/features/query-rewriting.mdx | 50 ---- apps/docs/memory-api/features/reranking.mdx | 44 ---- 5 files changed, 627 deletions(-) delete mode 100644 apps/docs/memory-api/features/auto-multi-modal.mdx delete mode 100644 apps/docs/memory-api/features/content-cleaner.mdx delete mode 100644 apps/docs/memory-api/features/filtering.mdx delete mode 100644 apps/docs/memory-api/features/query-rewriting.mdx delete mode 100644 apps/docs/memory-api/features/reranking.mdx (limited to 'apps/docs/memory-api/features') diff --git a/apps/docs/memory-api/features/auto-multi-modal.mdx b/apps/docs/memory-api/features/auto-multi-modal.mdx deleted file mode 100644 index 18a91135..00000000 --- a/apps/docs/memory-api/features/auto-multi-modal.mdx +++ /dev/null @@ -1,181 +0,0 @@ ---- -title: "Auto Multi Modal" -description: "supermemory automatically detects the content type of the document you are adding." -icon: "sparkles" ---- - -supermemory is natively multi-modal, and can automatically detect the content type of the document you are adding. - -We use the best of breed tools to extract content from URLs, and process it for optimal memory storage. - -## Automatic Content Type Detection - -supermemory automatically detects the content type of the document you're adding. Simply pass your content to the API, and supermemory will handle the rest. - - - - The content detection system analyzes: - - URL patterns and domains - - File extensions and MIME types - - Content structure and metadata - - Headers and response types - - - - 1. **Type Selection** - - Use `note` for simple text - - Use `webpage` for online content - - Use native types when possible - - 2. **URL Content** - - Send clean URLs without tracking parameters - - Use article URLs, not homepage URLs - - Check URL accessibility before sending - - - - - -### Quick Implementation - -All you need to do is pass the content to the `/memories` endpoint: - - - -```bash cURL -curl https://api.supermemory.ai/v3/memories \ - --request POST \ - --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ - -d '{"content": "https://example.com/article"}' -``` - -```typescript -await client.add.create({ - content: "https://example.com/article", -}); -``` - -```python -client.add.create( - content="https://example.com/article" -) -``` - - - - - supermemory uses [Markdowner](https://md.dhr.wtf) to extract content from - URLs. - - -## Supported Content Types - -supermemory supports a wide range of content formats to ensure versatility in memory creation: - - - - - `note`: Plain text notes and documents - - Directly processes raw text content - - Automatically chunks content for optimal retrieval - - Preserves formatting and structure - - - - - `webpage`: Web pages (just provide the URL) - - Intelligently extracts main content - - Preserves important metadata (title, description, images) - - Extracts OpenGraph metadata when available - - - `tweet`: Twitter content - - Captures tweet text, media, and metadata - - Preserves thread structure if applicable - - - - - - `pdf`: PDF files - - Extracts text content while maintaining structure - - Handles both searchable PDFs and scanned documents with OCR - - Preserves page breaks and formatting - - - `google_doc`: Google Documents - - Seamlessly integrates with Google Docs API - - Maintains document formatting and structure - - Auto-updates when source document changes - - - `notion_doc`: Notion pages - - Extracts content while preserving Notion's block structure - - Handles rich text formatting and embedded content - - - - - - `image`: Images with text content - - Advanced OCR for text extraction - - Visual content analysis and description - - - `video`: Video content - - Transcription and content extraction - - Key frame analysis - - - - -## Processing Pipeline - - - - supermemory automatically identifies the content type based on the input provided. - - - - Type-specific extractors process the content with: - Specialized parsing for - each format - Error handling with retries - Rate limit management - - - - ```typescript - interface ProcessedContent { - content: string; // Extracted text - summary?: string; // AI-generated summary - tags?: string[]; // Extracted tags - categories?: string[]; // Content categories - } - ``` - - - - - Sentence-level splitting - - 2-sentence overlap - - Context preservation - - Semantic coherence - - - -## Technical Specifications - -### Size Limits - -| Content Type | Max Size | -| ------------ | -------- | -| Text/Note | 1MB | -| PDF | 10MB | -| Image | 5MB | -| Video | 100MB | -| Web Page | N/A | -| Google Doc | N/A | -| Notion Page | N/A | -| Tweet | N/A | - -### Processing Time - -| Content Type | Processing Time | -| ------------ | --------------- | -| Text/Note | Almost instant | -| PDF | 1-5 seconds | -| Image | 2-10 seconds | -| Video | 10+ seconds | -| Web Page | 1-3 seconds | -| Google Doc | N/A | -| Notion Page | N/A | -| Tweet | N/A | diff --git a/apps/docs/memory-api/features/content-cleaner.mdx b/apps/docs/memory-api/features/content-cleaner.mdx deleted file mode 100644 index e586c3dc..00000000 --- a/apps/docs/memory-api/features/content-cleaner.mdx +++ /dev/null @@ -1,86 +0,0 @@ ---- -title: "Cleaning and Categorizing" -description: "Document Cleaning Summaries in supermemory" -icon: "washing-machine" ---- - -supermemory provides advanced configuration options to customize your content processing pipeline. At its core is an AI-powered system that can automatically analyze, categorize, and filter your content based on your specific needs. - -## Configuration Schema - -```json -{ - "shouldLLMFilter": true, - "categories": ["feature-request", "bug-report", "positive", "negative"], - "filterPrompt": "Analyze feedback sentiment and identify feature requests", - "includeItems": ["critical", "high-priority"], - "excludeItems": ["spam", "irrelevant"] -} -``` - -## Core Settings - -### shouldLLMFilter -- **Type**: `boolean` -- **Required**: No (defaults to `false`) -- **Description**: Master switch for AI-powered content analysis. Must be enabled to use any of the advanced filtering features. - -### categories -- **Type**: `string[]` -- **Limits**: Each category must be 1-50 characters -- **Required**: No -- **Description**: Define custom categories for content classification. When specified, the AI will only use these categories. If not specified, it will generate 3-5 relevant categories automatically. - -### filterPrompt -- **Type**: `string` -- **Limits**: 1-750 characters -- **Required**: No -- **Description**: Custom instructions for the AI on how to analyze and categorize content. Use this to guide the categorization process based on your specific needs. - -### includeItems & excludeItems -- **Type**: `string[]` -- **Limits**: Each item must be 1-20 characters -- **Required**: No -- **Description**: Fine-tune content filtering by specifying items to explicitly include or exclude during processing. - -## Content Processing Pipeline - -When content is ingested with LLM filtering enabled: - -1. **Initial Processing** - - Content is extracted and normalized - - Basic metadata (title, description) is captured - -2. **AI Analysis** - - Content is analyzed based on your `filterPrompt` - - Categories are assigned (either from your predefined list or auto-generated) - - Tags are evaluated and scored - -3. **Chunking & Indexing** - - Content is split into semantic chunks - - Each chunk is embedded for efficient search - - Metadata and classifications are stored - -## Example Use Cases - -### 1. Customer Feedback System -```json -{ - "shouldLLMFilter": true, - "categories": ["positive", "negative", "neutral"], - "filterPrompt": "Analyze customer sentiment and identify key themes", -} -``` - -### 2. Content Moderation -```json -{ - "shouldLLMFilter": true, - "categories": ["safe", "needs-review", "flagged"], - "filterPrompt": "Identify potentially inappropriate or sensitive content", - "excludeItems": ["spam", "offensive"], - "includeItems": ["user-generated"] -} -``` - -> **Important**: All filtering features (`categories`, `filterPrompt`, `includeItems`, `excludeItems`) require `shouldLLMFilter` to be enabled. Attempting to use these features without enabling `shouldLLMFilter` will result in a 400 error. diff --git a/apps/docs/memory-api/features/filtering.mdx b/apps/docs/memory-api/features/filtering.mdx deleted file mode 100644 index cde6ee4a..00000000 --- a/apps/docs/memory-api/features/filtering.mdx +++ /dev/null @@ -1,266 +0,0 @@ ---- -title: "Filtering" -description: "Learn how to filter content while searching from supermemory" -icon: "list-filter-plus" ---- - -## Container Tag - -Container tag is an identifier for your end users, to group memories together.. - -This can be: -- A user using your product -- An organization using a SaaS - -A project ID, or even a dynamic one like `user_project_etc` - -We recommend using single containerTag in all API requests. - -The graph is built on top of the Container Tags. For example, each user / tag in your supermemory account will have one single graph built for them. - - - -```bash cURL -curl https://api.supermemory.ai/v3/search \ - --request POST \ - --header 'Content-Type: application/json' \ - --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ - --data '{ - "q": "machine learning", - "containerTags": ["user_123"] - }' -``` - -```typescript Typescript -await client.search.execute({ - q: "machine learning", - containerTags: ["user_123"], -}); -``` - -```python Python -client.search.execute( - q="machine learning", - containerTags=["user_123"] -) -``` - - - -## Metadata - -Sometimes, you might want to add metadata and do advanced filtering based on it. - -Using metadata filtering, you can search based on: - -- AND and OR conditions -- String matching -- Numeric matching -- Date matching -- Time range queries - - - -```bash cURL -curl https://api.supermemory.ai/v3/search \ - --request POST \ - --header 'Content-Type: application/json' \ - --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ - --data '{ - "q": "machine learning", - "filters": { - "AND": [ - { - "key": "category", - "value": "technology", - "negate": false - }, - { - "filterType": "numeric", - "key": "readingTime", - "value": "5", - "negate": false, - "numericOperator": "<=" - } - ] - } -}' -``` - -```typescript Typescript -await client.search.execute({ - q: "machine learning", - filters: { - AND: [ - { - key: "category", - value: "technology", - negate: false, - }, - { - filterType: "numeric", - key: "readingTime", - value: "5", - negate: false, - numericOperator: "<=", - }, - ], - }, -}); -``` - -```python Python -client.search.execute( - q="machine learning", - filters={ - "AND": [ - { - "key": "category", - "value": "technology", - "negate": false - }, - { - "filterType": "numeric", - "key": "readingTime", - "value": "5", - "negate": false, - "numericOperator": "<=" - } - ] - } -) -``` - - - -## Array Contains Filtering - -You can filter memories by array values using the `array_contains` filter type. This is particularly useful for filtering by participants or other array-based metadata. - -First, create a memory with participants in the metadata: - - - -```bash cURL -curl --location 'https://api.supermemory.ai/v3/memories' \ ---header 'Content-Type: application/json' \ ---header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ ---data '{ - "content": "quarterly planning meeting discussion", - "metadata": { - "participants": ["john.doe", "sarah.smith", "mike.wilson"] - } - }' -``` - -```typescript Typescript -await client.memories.create({ - content: "quarterly planning meeting discussion", - metadata: { - participants: ["john.doe", "sarah.smith", "mike.wilson"] - } -}); -``` - -```python Python -client.memories.create( - content="quarterly planning meeting discussion", - metadata={ - "participants": ["john.doe", "sarah.smith", "mike.wilson"] - } -) -``` - - - -Then search using the `array_contains` filter: - - - -```bash cURL -curl --location 'https://api.supermemory.ai/v3/search' \ ---header 'Content-Type: application/json' \ ---header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ ---data '{ - "q": "meeting", - "filters": { - "AND": [ - { - "key": "participants", - "value": "john.doe", - "filterType": "array_contains" - } - ] - }, - "limit": 5 - }' -``` - -```typescript Typescript -await client.search.execute({ - q: "meeting", - filters: { - AND: [ - { - key: "participants", - value: "john.doe", - filterType: "array_contains" - } - ] - }, - limit: 5 -}); -``` - -```python Python -client.search.execute( - q="meeting", - filters={ - "AND": [ - { - "key": "participants", - "value": "john.doe", - "filterType": "array_contains" - } - ] - }, - limit=5 -) -``` - - - -## Document - -You can also find chunks within a specific, large document. - -This can be particularly useful for extremely large documents like Books, Podcasts, etc. - - - -```bash cURL -curl https://api.supermemory.ai/v3/search \ - --request POST \ - --header 'Content-Type: application/json' \ - --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ - --data '{ - "q": "machine learning", - "docId": "doc_123" - }' -``` - -```typescript Typescript -await client.search.execute({ - q: "machine learning", - docId: "doc_123", -}); -``` - -```python Python -client.search.execute( - q="machine learning", - docId="doc_123" -) -``` - - diff --git a/apps/docs/memory-api/features/query-rewriting.mdx b/apps/docs/memory-api/features/query-rewriting.mdx deleted file mode 100644 index 9508297a..00000000 --- a/apps/docs/memory-api/features/query-rewriting.mdx +++ /dev/null @@ -1,50 +0,0 @@ ---- -title: "Query Rewriting" -description: "Query Rewriting in supermemory" -icon: "blend" ---- - -Query Rewriting is a feature that allows you to rewrite queries to make them more accurate. - -![Query Rewriting](/images/query-rewriting.png) - -### Usage - -In supermemory, you can enable query rewriting by setting the `rewriteQuery` parameter to `true` in the search API. - - - -```bash cURL -curl https://api.supermemory.ai/v3/search \ - --request POST \ - --header 'Authorization: Bearer SUPERMEMORY_API_KEY' \ - --header 'Content-Type: application/json' \ - -d '{ - "q": "What is the capital of France?", - "rewriteQuery": true - }' -``` - -```typescript -await client.search.create({ - q: "What is the capital of France?", - rewriteQuery: true, -}); -``` - -```python -client.search.create( - q="What is the capital of France?", - rewriteQuery=True -) -``` - - - -### Notes and limitations - -- supermemory generates multiple rewrites, and runs the search through all of them. -- The results are then merged and returned to you. -- There is no additional costs associated with query rewriting. -- While query rewriting makes the quality much better, it also **incurs additional latency**. -- All other features like filtering, hybrid search, recency bias, etc. work with rewritten results as well. diff --git a/apps/docs/memory-api/features/reranking.mdx b/apps/docs/memory-api/features/reranking.mdx deleted file mode 100644 index 1df8a9c5..00000000 --- a/apps/docs/memory-api/features/reranking.mdx +++ /dev/null @@ -1,44 +0,0 @@ ---- -title: "Reranking" -description: "Reranked search results in supermemory" -icon: "chart-bar-increasing" ---- - -Reranking is a feature that allows you to rerank search results based on the query. - -![Reranking](/images/rerank.png) - -### Usage - -In supermemory, you can enable answer rewriting by setting the `rerank` parameter to `true` in the search API. - - - -```bash cURL -curl https://api.supermemory.ai/v3/search?q=What+is+the+capital+of+France?&rerank=true \ - --request GET \ - --header 'Authorization: Bearer SUPERMEMORY_API_KEY' -``` - -```typescript -await client.search.create({ - q: "What is the capital of France?", - rerank: true, -}); -``` - -```python -client.search.create( - q="What is the capital of France?", - rerank=True -) -``` - - - -### Notes and limitations - -- We currently use `bge-reranker-base` model for reranking. -- There is no additional costs associated with reranking. -- While reranking makes the quality much better, it also **incurs additional latency**. -- All other features like filtering, hybrid search, recency bias, etc. work with reranked results as well. -- cgit v1.2.3