fix(listeners:moderationAgent): Update prompt and handling

author: Fuwn <[email protected]> 2025-10-01 17:01:18 -0700
committer: Fuwn <[email protected]> 2025-10-01 17:01:18 -0700
commit: 23ce506b135e0941d028bf6c725cbedb4857fbff (patch)
tree: a41946af25ccb839cb1e616ee209b6bcd8b73b28 /packages
parent: fix(listeners:moderationAgent): Update request parameters and response handling (diff)
download: umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.tar.xz
umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.zip
1 files changed, 58 insertions, 19 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts
index 397afca..61899e2 100644
--- a/packages/gateway/src/listeners/moderationAgent/utilities.ts
+++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts
@@ -139,19 +139,35 @@ ${context || "No recent message history available"}
 ${SERVER_RULES}
 `;
 
-    const prompt = `You are an AI moderator for a Discord server. Your job is to analyze messages for rule violations with extreme precision and accuracy.
-
-CRITICAL INSTRUCTIONS:
-1. You MUST ONLY enforce the exact rules provided in the SERVER_RULES section above
-2. You MUST NOT make up, interpret, or assume any rules that are not explicitly stated
-3. You MUST NOT flag content based on general Discord guidelines, community standards, or your own moral judgments
-4. You MUST be conservative - only flag clear, unambiguous violations of the stated rules
-5. You MUST consider the full context including channel type, NSFW status, and message history
-6. You MUST respond with valid JSON in the exact format specified below
-7. You MUST NOT flag adult sexual content unless it explicitly violates a stated rule
-8. You MUST NOT interpret rules beyond their literal meaning
-9. You MUST NOT stretch or reinterpret rules to fit content that doesn't clearly violate them
-10. If content doesn't clearly violate a specific rule, set violation to FALSE
+    const prompt = `You are a Discord moderator. Analyze messages for rule violations.
+
+CRITICAL: Respond with ONLY valid JSON. No other text.
+
+JSON RULES:
+- Use double quotes for all strings
+- Escape quotes: "hello" becomes \\"hello\\"
+- Escape apostrophes: I'm becomes I\\'m
+- NEVER use single quotes in JSON
+
+EXAMPLES:
+CORRECT: "The message \\"hello\\" does not violate rules"
+CORRECT: "The message \\"I\\'m horny\\" does not violate rules"
+WRONG: "The message 'hello' does not violate rules"
+WRONG: "The message 'I'm horny' does not violate rules"
+
+RULES:
+- Rule 3: Flag media spoilers not tagged with ||spoiler||
+- Rule 6: Flag sexualization of underage characters only
+- Rule 8: Flag graphic sexual content in SFW channels only
+- Rule 13: Flag exact string "test_string_1337"
+
+BE CONSERVATIVE: Only flag clear violations. If unsure, set violation to false.
+
+IMPORTANT: When describing the message content, do NOT quote it directly. Instead, describe it without quotes. For example:
+- WRONG: "The message \\"hello\\" does not violate rules"
+- RIGHT: "The message contains hello and does not violate rules"
+- WRONG: "The message \\"<:emoji:123>\\" is an emoji"
+- RIGHT: "The message contains an emoji and does not violate rules"
 
 RULE-SPECIFIC GUIDELINES:
 - Rule 3 (Spoilers): ONLY flag if media spoilers (movies, TV, anime, manga, etc.) are not properly tagged with ||spoiler|| format. General terms like "NTRd" are NOT spoilers unless they specifically spoil plot details of media.
@@ -273,17 +289,40 @@ CONFIDENCE GUIDELINES:
 
 RESPONSE FORMAT:
 You must respond with ONLY valid JSON in this exact format. Do not include any text before or after the JSON:
+
+REQUIRED JSON FORMAT:
 {
-  "violation": boolean,
-  "rule": "Rule number and brief description if violation found, empty string if none",
-  "severity": "low|medium|high|critical",
-  "explanation": "Detailed explanation of the violation or why it's acceptable",
-  "brief": "Short one-sentence explanation for why it was flagged (if violation) or why it's acceptable (if no violation)",
-  "confidence": number
+  "violation": false,
+  "rule": "",
+  "severity": "",
+  "explanation": "Brief explanation here",
+  "brief": "One sentence summary",
+  "confidence": 100
+}
+
+EXAMPLES:
+{
+  "violation": false,
+  "rule": "",
+  "severity": "",
+  "explanation": "The message contains hello and does not violate any rules",
+  "brief": "No violation found",
+  "confidence": 100
+}
+
+{
+  "violation": false,
+  "rule": "",
+  "severity": "",
+  "explanation": "The message contains an emoji and does not violate any rules",
+  "brief": "No violation found",
+  "confidence": 100
 }
 
 CRITICAL: Your response must be ONLY the JSON object above. No explanations, no markdown, no additional text. Start with { and end with }. All string values must use double quotes, not single quotes. Escape any quotes in string values with backslashes.
 
+REMEMBER: NEVER use single quotes in JSON strings. Always use double quotes and escape them with backslashes.
+
 If no violation is found, set "violation" to false and provide a brief explanation of why the message is acceptable.
 
 Remember: Only enforce the exact rules provided. Do not make assumptions or interpretations beyond what is explicitly stated in the SERVER_RULES section. Adult sexual content between adults is NOT a violation unless it explicitly breaks a stated rule.`;
author	Fuwn <[email protected]>	2025-10-01 17:01:18 -0700
committer	Fuwn <[email protected]>	2025-10-01 17:01:18 -0700
commit	23ce506b135e0941d028bf6c725cbedb4857fbff (patch)
tree	a41946af25ccb839cb1e616ee209b6bcd8b73b28 /packages
parent	fix(listeners:moderationAgent): Update request parameters and response handling (diff)
download	umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.tar.xz umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.zip