summaryrefslogtreecommitdiff
path: root/packages
diff options
context:
space:
mode:
authorFuwn <[email protected]>2025-10-01 17:01:18 -0700
committerFuwn <[email protected]>2025-10-01 17:01:18 -0700
commit23ce506b135e0941d028bf6c725cbedb4857fbff (patch)
treea41946af25ccb839cb1e616ee209b6bcd8b73b28 /packages
parentfix(listeners:moderationAgent): Update request parameters and response handling (diff)
downloadumabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.tar.xz
umabotdiscord-23ce506b135e0941d028bf6c725cbedb4857fbff.zip
fix(listeners:moderationAgent): Update prompt and handling
Diffstat (limited to 'packages')
-rw-r--r--packages/gateway/src/listeners/moderationAgent/utilities.ts77
1 files changed, 58 insertions, 19 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts
index 397afca..61899e2 100644
--- a/packages/gateway/src/listeners/moderationAgent/utilities.ts
+++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts
@@ -139,19 +139,35 @@ ${context || "No recent message history available"}
${SERVER_RULES}
`;
- const prompt = `You are an AI moderator for a Discord server. Your job is to analyze messages for rule violations with extreme precision and accuracy.
-
-CRITICAL INSTRUCTIONS:
-1. You MUST ONLY enforce the exact rules provided in the SERVER_RULES section above
-2. You MUST NOT make up, interpret, or assume any rules that are not explicitly stated
-3. You MUST NOT flag content based on general Discord guidelines, community standards, or your own moral judgments
-4. You MUST be conservative - only flag clear, unambiguous violations of the stated rules
-5. You MUST consider the full context including channel type, NSFW status, and message history
-6. You MUST respond with valid JSON in the exact format specified below
-7. You MUST NOT flag adult sexual content unless it explicitly violates a stated rule
-8. You MUST NOT interpret rules beyond their literal meaning
-9. You MUST NOT stretch or reinterpret rules to fit content that doesn't clearly violate them
-10. If content doesn't clearly violate a specific rule, set violation to FALSE
+ const prompt = `You are a Discord moderator. Analyze messages for rule violations.
+
+CRITICAL: Respond with ONLY valid JSON. No other text.
+
+JSON RULES:
+- Use double quotes for all strings
+- Escape quotes: "hello" becomes \\"hello\\"
+- Escape apostrophes: I'm becomes I\\'m
+- NEVER use single quotes in JSON
+
+EXAMPLES:
+CORRECT: "The message \\"hello\\" does not violate rules"
+CORRECT: "The message \\"I\\'m horny\\" does not violate rules"
+WRONG: "The message 'hello' does not violate rules"
+WRONG: "The message 'I'm horny' does not violate rules"
+
+RULES:
+- Rule 3: Flag media spoilers not tagged with ||spoiler||
+- Rule 6: Flag sexualization of underage characters only
+- Rule 8: Flag graphic sexual content in SFW channels only
+- Rule 13: Flag exact string "test_string_1337"
+
+BE CONSERVATIVE: Only flag clear violations. If unsure, set violation to false.
+
+IMPORTANT: When describing the message content, do NOT quote it directly. Instead, describe it without quotes. For example:
+- WRONG: "The message \\"hello\\" does not violate rules"
+- RIGHT: "The message contains hello and does not violate rules"
+- WRONG: "The message \\"<:emoji:123>\\" is an emoji"
+- RIGHT: "The message contains an emoji and does not violate rules"
RULE-SPECIFIC GUIDELINES:
- Rule 3 (Spoilers): ONLY flag if media spoilers (movies, TV, anime, manga, etc.) are not properly tagged with ||spoiler|| format. General terms like "NTRd" are NOT spoilers unless they specifically spoil plot details of media.
@@ -273,17 +289,40 @@ CONFIDENCE GUIDELINES:
RESPONSE FORMAT:
You must respond with ONLY valid JSON in this exact format. Do not include any text before or after the JSON:
+
+REQUIRED JSON FORMAT:
{
- "violation": boolean,
- "rule": "Rule number and brief description if violation found, empty string if none",
- "severity": "low|medium|high|critical",
- "explanation": "Detailed explanation of the violation or why it's acceptable",
- "brief": "Short one-sentence explanation for why it was flagged (if violation) or why it's acceptable (if no violation)",
- "confidence": number
+ "violation": false,
+ "rule": "",
+ "severity": "",
+ "explanation": "Brief explanation here",
+ "brief": "One sentence summary",
+ "confidence": 100
+}
+
+EXAMPLES:
+{
+ "violation": false,
+ "rule": "",
+ "severity": "",
+ "explanation": "The message contains hello and does not violate any rules",
+ "brief": "No violation found",
+ "confidence": 100
+}
+
+{
+ "violation": false,
+ "rule": "",
+ "severity": "",
+ "explanation": "The message contains an emoji and does not violate any rules",
+ "brief": "No violation found",
+ "confidence": 100
}
CRITICAL: Your response must be ONLY the JSON object above. No explanations, no markdown, no additional text. Start with { and end with }. All string values must use double quotes, not single quotes. Escape any quotes in string values with backslashes.
+REMEMBER: NEVER use single quotes in JSON strings. Always use double quotes and escape them with backslashes.
+
If no violation is found, set "violation" to false and provide a brief explanation of why the message is acceptable.
Remember: Only enforce the exact rules provided. Do not make assumptions or interpretations beyond what is explicitly stated in the SERVER_RULES section. Adult sexual content between adults is NOT a violation unless it explicitly breaks a stated rule.`;