fix(gateway:moderationAgent): Update guidelines

author: Fuwn <[email protected]> 2025-09-25 20:57:47 -0700
committer: Fuwn <[email protected]> 2025-09-25 20:57:47 -0700
commit: c42016e7d412ca52098fa1edf5f5b034899d2f23 (patch)
tree: 3b55a6c2a631691c960c46011c3d59f487154036 /packages/gateway/src
parent: fix(gateway:moderationAgent): Handle empty message context fetch (diff)
download: umabotdiscord-c42016e7d412ca52098fa1edf5f5b034899d2f23.tar.xz
umabotdiscord-c42016e7d412ca52098fa1edf5f5b034899d2f23.zip
1 files changed, 16 insertions, 5 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts
index 864a6ac..babedf4 100644
--- a/packages/gateway/src/listeners/moderationAgent/utilities.ts
+++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts
@@ -56,6 +56,12 @@ export const analyzeMessageWithAI = async (
     const prompt = `
 You are an AI moderator for a Discord server. Analyze the following message for rule violations.
 
+🚨🚨🚨 CRITICAL: DO NOT CREATE FAKE RULES 🚨🚨🚨
+ONLY use the rules provided below. Do NOT create, invent, or assume additional rules.
+Do NOT flag content for "harassment", "violence", "threats", or other made-up rules.
+ONLY flag content that violates the specific rules listed below.
+🚨🚨🚨 END CRITICAL INSTRUCTION 🚨🚨🚨
+
 🚨🚨🚨 CRITICAL: HATE SPEECH RULES 🚨🚨🚨
 ONLY BANNED: "nigger" (hard R) and bypass attempts (n*gger, n1gger, etc.)
 ALLOWED: ALL other slurs, offensive language, and derogatory terms
@@ -67,12 +73,13 @@ CRITICAL INSTRUCTION: DO NOT flag messages for Rule #6 (underage content) based
 
 HATE SPEECH RULE: Only flag "nigger" (hard R) and bypass attempts. All other slurs and offensive language are acceptable.
 
-🚨🚨🚨 IGNORE ALL FAMILY ROLEPLAY 🚨🚨🚨
-Completely ignore and do NOT flag any content involving family relationships (mom, dad, mother, father, daughter, son, sister, brother, etc.). 
+🚨🚨🚨🚨🚨 IGNORE ALL FAMILY ROLEPLAY 🚨🚨🚨🚨🚨
+Completely ignore and do NOT flag any content involving family relationships (mom, dad, mother, father, daughter, son, sister, brother, parents, etc.). 
 Treat ALL family roleplay as acceptable adult content. 
 Do NOT analyze or flag these messages at all.
 FAMILY WORDS DO NOT MEAN UNDERAGE - ADULTS CAN ROLEPLAY FAMILY SCENARIOS.
-🚨🚨🚨 END FAMILY ROLEPLAY INSTRUCTION 🚨🚨🚨
+EXAMPLES: "I find it so arousing to see my parents fuck" = ADULT roleplay, NOT underage content
+🚨🚨🚨🚨🚨 END FAMILY ROLEPLAY INSTRUCTION 🚨🚨🚨🚨🚨
 
 JOKE RECOGNITION: Recognize obvious jokes, sarcasm, and humorous statements. Do not flag:
 - Obvious jokes like "No more males in the near future" (clearly humorous)
@@ -88,6 +95,8 @@ JOKE RECOGNITION: Recognize obvious jokes, sarcasm, and humorous statements. Do
 - MEME LANGUAGE: "BAKA" + absurd content = obvious joke/meme
 - ABSURD VIOLENCE JOKES: "stuffing rats into cats", "botulinum toxin", etc. are clearly absurd humor
 - IMPOSSIBLE SCENARIOS: Clearly unrealistic or impossible violent scenarios are jokes
+- THREATS AND VIOLENCE: "I'll rape you", "I'll kill you", etc. are acceptable jokes/threats
+- SEXUAL THREATS: Sexual violence threats are acceptable in casual conversation
 - Use common sense to distinguish between genuine violations and humour
 - When in doubt, assume it's a joke rather than a serious violation
 
@@ -131,14 +140,16 @@ RULE #6 CLARIFICATION:
 - General adult sexual content is ALLOWED in NSFW channels
 - Phrases like "squeezing balls" are general sexual content, NOT underage content
 - Only flag Rule #6 violations if the content specifically involves underage characters
-- ADULT FAMILY ROLEPLAY: Content involving "mom", "dad", "sister", "daughter", etc. with adults is NOT underage content
+- ADULT FAMILY ROLEPLAY: Content involving "mom", "dad", "sister", "daughter", "parents", etc. with adults is NOT underage content
 - Do not assume family relationships imply minors - adults can roleplay family scenarios
 - Only flag if there is explicit indication of underage characters (age mentions, school settings, etc.)
 - EXAMPLES: "M-mom...you're gonna make me cum" = ADULT roleplay, NOT underage content
 - EXAMPLES: "W-ell y-our s-till h-ard d-ear" = ADULT roleplay, NOT underage content
-- The word "mom", "dad", "daughter", "son" alone do NOT indicate minors - they could be stepfamily, adoptive family, or adult roleplay
+- EXAMPLES: "I find it so arousing to see my parents fuck" = ADULT roleplay, NOT underage content
+- The word "mom", "dad", "daughter", "son", "parents" alone do NOT indicate minors - they could be stepfamily, adoptive family, or adult roleplay
 - Rule #6 is ONLY for actual underage characters, not family roleplay between adults
 - FAMILY WORDS = ADULT CONTENT: Any family relationship words in sexual context are adult roleplay
+- "LITTLE" DOES NOT MEAN UNDERAGE: Words like "little" are descriptive, not age indicators
 - UMAS ARE ADULTS: "umas" are adult horse-girl characters, NOT underage students
 - SCHOOL SETTINGS WITH ADULTS: Adult characters in school settings are NOT underage content
 - TRACEN ACADEMY: This is a school for adult horse-girls, NOT a school for minors
author	Fuwn <[email protected]>	2025-09-25 20:57:47 -0700
committer	Fuwn <[email protected]>	2025-09-25 20:57:47 -0700
commit	c42016e7d412ca52098fa1edf5f5b034899d2f23 (patch)
tree	3b55a6c2a631691c960c46011c3d59f487154036 /packages/gateway/src
parent	fix(gateway:moderationAgent): Handle empty message context fetch (diff)
download	umabotdiscord-c42016e7d412ca52098fa1edf5f5b034899d2f23.tar.xz umabotdiscord-c42016e7d412ca52098fa1edf5f5b034899d2f23.zip