--- title: "Cleaning and Categorizing" description: "Document Cleaning Summaries in supermemory" icon: "washing-machine" --- supermemory provides advanced configuration options to customize your content processing pipeline. At its core is an AI-powered system that can automatically analyze, categorize, and filter your content based on your specific needs. ## Configuration Schema ```json { "shouldLLMFilter": true, "categories": ["feature-request", "bug-report", "positive", "negative"], "filterPrompt": "Analyze feedback sentiment and identify feature requests", "includeItems": ["critical", "high-priority"], "excludeItems": ["spam", "irrelevant"] } ``` ## Core Settings ### shouldLLMFilter - **Type**: `boolean` - **Required**: No (defaults to `false`) - **Description**: Master switch for AI-powered content analysis. Must be enabled to use any of the advanced filtering features. ### categories - **Type**: `string[]` - **Limits**: Each category must be 1-50 characters - **Required**: No - **Description**: Define custom categories for content classification. When specified, the AI will only use these categories. If not specified, it will generate 3-5 relevant categories automatically. ### filterPrompt - **Type**: `string` - **Limits**: 1-750 characters - **Required**: No - **Description**: Custom instructions for the AI on how to analyze and categorize content. Use this to guide the categorization process based on your specific needs. ### includeItems & excludeItems - **Type**: `string[]` - **Limits**: Each item must be 1-20 characters - **Required**: No - **Description**: Fine-tune content filtering by specifying items to explicitly include or exclude during processing. ## Content Processing Pipeline When content is ingested with LLM filtering enabled: 1. **Initial Processing** - Content is extracted and normalized - Basic metadata (title, description) is captured 2. **AI Analysis** - Content is analyzed based on your `filterPrompt` - Categories are assigned (either from your predefined list or auto-generated) - Tags are evaluated and scored 3. **Chunking & Indexing** - Content is split into semantic chunks - Each chunk is embedded for efficient search - Metadata and classifications are stored ## Example Use Cases ### 1. Customer Feedback System ```json { "shouldLLMFilter": true, "categories": ["positive", "negative", "neutral"], "filterPrompt": "Analyze customer sentiment and identify key themes", } ``` ### 2. Content Moderation ```json { "shouldLLMFilter": true, "categories": ["safe", "needs-review", "flagged"], "filterPrompt": "Identify potentially inappropriate or sensitive content", "excludeItems": ["spam", "offensive"], "includeItems": ["user-generated"] } ``` > **Important**: All filtering features (`categories`, `filterPrompt`, `includeItems`, `excludeItems`) require `shouldLLMFilter` to be enabled. Attempting to use these features without enabling `shouldLLMFilter` will result in a 400 error.