summaryrefslogtreecommitdiff
path: root/packages
diff options
context:
space:
mode:
authorFuwn <[email protected]>2025-09-25 19:18:12 -0700
committerFuwn <[email protected]>2025-09-25 19:18:12 -0700
commit9cb56ca79f8fd2299753168a64eafedcd0ac8471 (patch)
tree722c6b6639648a15c281eb37af32794a043fcf6c /packages
parentfix(gateway:moderationAgent): Update joke processing (diff)
downloadumabotdiscord-9cb56ca79f8fd2299753168a64eafedcd0ac8471.tar.xz
umabotdiscord-9cb56ca79f8fd2299753168a64eafedcd0ac8471.zip
fix(gateway:moderationAgent): Update hate speech flagging
Diffstat (limited to 'packages')
-rw-r--r--packages/gateway/src/listeners/moderationAgent/utilities.ts28
1 files changed, 18 insertions, 10 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts
index ec7efe0..717c1cc 100644
--- a/packages/gateway/src/listeners/moderationAgent/utilities.ts
+++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts
@@ -52,7 +52,12 @@ export const analyzeMessageWithAI = async (
const prompt = `
You are an AI moderator for a Discord server. Analyze the following message for rule violations.
-ABSOLUTE RULE: NEVER flag messages for hate speech, slurs, or offensive language. Words like "faggot", "fag", "nigger", "nigga", "bitch", etc. are ALL acceptable. Do not flag ANY content based on language or slurs.
+🚨🚨🚨 CRITICAL: IGNORE ALL HATE SPEECH 🚨🚨🚨
+NEVER flag messages for hate speech, slurs, or offensive language.
+Words like "faggot", "fag", "nigger", "nigga", "bitch", etc. are ALL acceptable.
+Do not flag ANY content based on language or slurs.
+IGNORE Rule #5 (hate speech) COMPLETELY.
+🚨🚨🚨 END CRITICAL INSTRUCTION 🚨🚨🚨
CRITICAL INSTRUCTION: DO NOT flag messages for Rule #6 (underage content) based solely on family relationship words like "mom", "dad", "daughter", "son", "sister", "brother". These words do NOT automatically indicate underage content. Adults can roleplay family scenarios. Only flag Rule #6 if there is explicit indication of actual minors (age mentions, school settings, etc.).
@@ -97,18 +102,21 @@ Please analyze THIS SPECIFIC MESSAGE for any rule violations. Consider:
IMPORTANT: Only analyze the current message content. The chat history is provided for context but should not be the basis for flagging the current message. If the current message itself is innocent but appears in a conversation with inappropriate content, do NOT flag it.
SLUR GUIDELINES:
-- HARD SLURS (with hard R, full spelling): These are prohibited and should be flagged for deletion
-- SOFT SLURS (with soft R, censored, or abbreviated): These are generally acceptable in casual conversation
-- Context matters: Consider if the word is being used in a hateful manner vs. casual/reclaimed usage
-- When in doubt, err on the side of caution for hard slurs but be more lenient with soft slurs
+🚨🚨🚨 IGNORE ALL SLURS 🚨🚨🚨
+- ALLOWED: ALL slurs are acceptable (hard R, soft R, any form)
+- ALLOWED: "faggot", "fag", "nigger", "nigga", "bitch", etc. are ALL acceptable
+- DO NOT FLAG ANY CONTENT BASED ON SLURS
+- IGNORE ALL SLUR-RELATED CONTENT COMPLETELY
+🚨🚨🚨 END SLUR GUIDELINES 🚨🚨🚨
HATE SPEECH GUIDELINES:
-- ALLOWED: All slurs, offensive language, and derogatory terms are acceptable EXCEPT:
-- PROHIBITED: "nigger" (hard R) and any bypass attempts (n*gger, n1gger, etc.)
-- ALLOWED: "nigga" (soft R) and other forms are acceptable
-- ALLOWED: All other slurs (fag, faggot, bitch, etc.) are acceptable
+🚨🚨🚨 IGNORE ALL HATE SPEECH 🚨🚨🚨
+- ALLOWED: ALL slurs, offensive language, and derogatory terms are acceptable
+- ALLOWED: "faggot", "fag", "nigger", "nigga", "bitch", etc. are ALL acceptable
- ALLOWED: Single words, abbreviations, or any form of offensive language
-- Only flag the hard R n-word and bypass attempts, everything else is acceptable
+- DO NOT FLAG ANY CONTENT BASED ON LANGUAGE OR SLURS
+- IGNORE Rule #5 (hate speech) COMPLETELY
+🚨🚨🚨 END HATE SPEECH GUIDELINES 🚨🚨🚨
RULE #6 CLARIFICATION:
- Rule #6 (zero-tolerance) ONLY applies to underage/loli/cunning content