blob: df20c31828728d59d397432a7eb5461b79825e73 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
|
---
title: "Auto Multi Modal"
description: "supermemory automatically detects the content type of the document you are adding."
icon: "sparkles"
---
supermemory is natively multi-modal, and can automatically detect the content type of the document you are adding.
We use the best of breed tools to extract content from URLs, and process it for optimal memory storage.
## Automatic Content Type Detection
supermemory automatically detects the content type of the document you're adding. Simply pass your content to the API, and supermemory will handle the rest.
<Tabs>
<Tab title="How It Works">
The content detection system analyzes:
- URL patterns and domains
- File extensions and MIME types
- Content structure and metadata
- Headers and response types
</Tab>
<Tab title="Best Practices">
<Accordion title="Content Type Best Practices" defaultOpen icon="sparkles">
1. **Type Selection**
- Use `note` for simple text
- Use `webpage` for online content
- Use native types when possible
2. **URL Content**
- Send clean URLs without tracking parameters
- Use article URLs, not homepage URLs
- Check URL accessibility before sending
</Accordion>
</Tab>
</Tabs>
### Quick Implementation
All you need to do is pass the content to the `/documents` endpoint:
<CodeGroup>
```bash cURL
curl https://api.supermemory.ai/v3/documents \
--request POST \
--header 'Authorization: Bearer SUPERMEMORY_API_KEY' \
-d '{"content": "https://example.com/article"}'
```
```typescript
await client.add.create({
content: "https://example.com/article",
});
```
```python
client.add.create(
content="https://example.com/article"
)
```
</CodeGroup>
<Note>
supermemory uses [Markdowner](https://md.dhr.wtf) to extract content from
URLs.
</Note>
## Supported Content Types
supermemory supports a wide range of content formats to ensure versatility in memory creation:
<Grid cols={2}>
<Card title="Text Content" icon="document-text">
- `note`: Plain text notes and documents
- Directly processes raw text content
- Automatically chunks content for optimal retrieval
- Preserves formatting and structure
</Card>
<Card title="Web Content" icon="globe">
- `webpage`: Web pages (just provide the URL)
- Intelligently extracts main content
- Preserves important metadata (title, description, images)
- Extracts OpenGraph metadata when available
- `tweet`: Twitter content
- Captures tweet text, media, and metadata
- Preserves thread structure if applicable
</Card>
<Card title="Document Types" icon="document">
- `pdf`: PDF files
- Extracts text content while maintaining structure
- Handles both searchable PDFs and scanned documents with OCR
- Preserves page breaks and formatting
- `google_doc`: Google Documents
- Seamlessly integrates with Google Docs API
- Maintains document formatting and structure
- Auto-updates when source document changes
- `notion_doc`: Notion pages
- Extracts content while preserving Notion's block structure
- Handles rich text formatting and embedded content
</Card>
<Card title="Media Types" icon="photo">
- `image`: Images with text content
- Advanced OCR for text extraction
- Visual content analysis and description
- `video`: Video content
- Transcription and content extraction
- Key frame analysis
</Card>
</Grid>
## Processing Pipeline
<Steps>
<Step title="Content Detection">
supermemory automatically identifies the content type based on the input provided.
</Step>
<Step title="Content Extraction">
Type-specific extractors process the content with: - Specialized parsing for
each format - Error handling with retries - Rate limit management
</Step>
<Step title="AI Enhancement">
```typescript
interface ProcessedContent {
content: string; // Extracted text
summary?: string; // AI-generated summary
tags?: string[]; // Extracted tags
categories?: string[]; // Content categories
}
```
</Step>
<Step title="Chunking & Indexing">
- Sentence-level splitting
- 2-sentence overlap
- Context preservation
- Semantic coherence
</Step>
</Steps>
## Technical Specifications
### Size Limits
| Content Type | Max Size |
| ------------ | -------- |
| Text/Note | 1MB |
| PDF | 10MB |
| Image | 5MB |
| Video | 100MB |
| Web Page | N/A |
| Google Doc | N/A |
| Notion Page | N/A |
| Tweet | N/A |
### Processing Time
| Content Type | Processing Time |
| ------------ | --------------- |
| Text/Note | Almost instant |
| PDF | 1-5 seconds |
| Image | 2-10 seconds |
| Video | 10+ seconds |
| Web Page | 1-3 seconds |
| Google Doc | N/A |
| Notion Page | N/A |
| Tweet | N/A |
|