diff options
| author | Fuwn <[email protected]> | 2025-10-01 17:01:18 -0700 |
|---|---|---|
| committer | Fuwn <[email protected]> | 2025-10-01 17:01:18 -0700 |
| commit | 23ce506b135e0941d028bf6c725cbedb4857fbff (patch) | |
| tree | a41946af25ccb839cb1e616ee209b6bcd8b73b28 /packages | |
| parent | fix(listeners:moderationAgent): Update request parameters and response handling (diff) | |
| download | umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.tar.xz umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.zip | |
fix(listeners:moderationAgent): Update prompt and handling
Diffstat (limited to 'packages')
| -rw-r--r-- | packages/gateway/src/listeners/moderationAgent/utilities.ts | 77 |
1 files changed, 58 insertions, 19 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts index 397afca..61899e2 100644 --- a/packages/gateway/src/listeners/moderationAgent/utilities.ts +++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts @@ -139,19 +139,35 @@ ${context || "No recent message history available"} ${SERVER_RULES} `; - const prompt = `You are an AI moderator for a Discord server. Your job is to analyze messages for rule violations with extreme precision and accuracy. - -CRITICAL INSTRUCTIONS: -1. You MUST ONLY enforce the exact rules provided in the SERVER_RULES section above -2. You MUST NOT make up, interpret, or assume any rules that are not explicitly stated -3. You MUST NOT flag content based on general Discord guidelines, community standards, or your own moral judgments -4. You MUST be conservative - only flag clear, unambiguous violations of the stated rules -5. You MUST consider the full context including channel type, NSFW status, and message history -6. You MUST respond with valid JSON in the exact format specified below -7. You MUST NOT flag adult sexual content unless it explicitly violates a stated rule -8. You MUST NOT interpret rules beyond their literal meaning -9. You MUST NOT stretch or reinterpret rules to fit content that doesn't clearly violate them -10. If content doesn't clearly violate a specific rule, set violation to FALSE + const prompt = `You are a Discord moderator. Analyze messages for rule violations. + +CRITICAL: Respond with ONLY valid JSON. No other text. + +JSON RULES: +- Use double quotes for all strings +- Escape quotes: "hello" becomes \\"hello\\" +- Escape apostrophes: I'm becomes I\\'m +- NEVER use single quotes in JSON + +EXAMPLES: +CORRECT: "The message \\"hello\\" does not violate rules" +CORRECT: "The message \\"I\\'m horny\\" does not violate rules" +WRONG: "The message 'hello' does not violate rules" +WRONG: "The message 'I'm horny' does not violate rules" + +RULES: +- Rule 3: Flag media spoilers not tagged with ||spoiler|| +- Rule 6: Flag sexualization of underage characters only +- Rule 8: Flag graphic sexual content in SFW channels only +- Rule 13: Flag exact string "test_string_1337" + +BE CONSERVATIVE: Only flag clear violations. If unsure, set violation to false. + +IMPORTANT: When describing the message content, do NOT quote it directly. Instead, describe it without quotes. For example: +- WRONG: "The message \\"hello\\" does not violate rules" +- RIGHT: "The message contains hello and does not violate rules" +- WRONG: "The message \\"<:emoji:123>\\" is an emoji" +- RIGHT: "The message contains an emoji and does not violate rules" RULE-SPECIFIC GUIDELINES: - Rule 3 (Spoilers): ONLY flag if media spoilers (movies, TV, anime, manga, etc.) are not properly tagged with ||spoiler|| format. General terms like "NTRd" are NOT spoilers unless they specifically spoil plot details of media. @@ -273,17 +289,40 @@ CONFIDENCE GUIDELINES: RESPONSE FORMAT: You must respond with ONLY valid JSON in this exact format. Do not include any text before or after the JSON: + +REQUIRED JSON FORMAT: { - "violation": boolean, - "rule": "Rule number and brief description if violation found, empty string if none", - "severity": "low|medium|high|critical", - "explanation": "Detailed explanation of the violation or why it's acceptable", - "brief": "Short one-sentence explanation for why it was flagged (if violation) or why it's acceptable (if no violation)", - "confidence": number + "violation": false, + "rule": "", + "severity": "", + "explanation": "Brief explanation here", + "brief": "One sentence summary", + "confidence": 100 +} + +EXAMPLES: +{ + "violation": false, + "rule": "", + "severity": "", + "explanation": "The message contains hello and does not violate any rules", + "brief": "No violation found", + "confidence": 100 +} + +{ + "violation": false, + "rule": "", + "severity": "", + "explanation": "The message contains an emoji and does not violate any rules", + "brief": "No violation found", + "confidence": 100 } CRITICAL: Your response must be ONLY the JSON object above. No explanations, no markdown, no additional text. Start with { and end with }. All string values must use double quotes, not single quotes. Escape any quotes in string values with backslashes. +REMEMBER: NEVER use single quotes in JSON strings. Always use double quotes and escape them with backslashes. + If no violation is found, set "violation" to false and provide a brief explanation of why the message is acceptable. Remember: Only enforce the exact rules provided. Do not make assumptions or interpretations beyond what is explicitly stated in the SERVER_RULES section. Adult sexual content between adults is NOT a violation unless it explicitly breaks a stated rule.`; |