summaryrefslogtreecommitdiff
path: root/packages
diff options
context:
space:
mode:
Diffstat (limited to 'packages')
-rw-r--r--packages/gateway/src/listeners/moderationAgent/constants.ts3
-rw-r--r--packages/gateway/src/listeners/moderationAgent/utilities.ts112
2 files changed, 20 insertions, 95 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/constants.ts b/packages/gateway/src/listeners/moderationAgent/constants.ts
index becf627..ad2065e 100644
--- a/packages/gateway/src/listeners/moderationAgent/constants.ts
+++ b/packages/gateway/src/listeners/moderationAgent/constants.ts
@@ -232,11 +232,10 @@ export const LOW_RISK_PATTERNS = [
export const SERVER_RULES = `
# Rules
-2. Follow [Discord's Terms of Service](https://discord.com/terms) & [Discord Community Guidelines](https://discord.com/guidelines)
3. Mark spoiler content using spoiler tags, except in #uma-musume-anime
4. Avoid self-promotion. Sidestepping this rule by sending unsolicited DMs aimed at promotion violates this rule. (artist promotion is fine in the designated art channels)
6. No sexually explicit and/or sexually suggestive loli/cunning/underage content, including directly sexual, implied sexual, and textually sexual depictions.
-8. NSFW content (including emoji, stickers, reactions, etc.) in age-restricted channels **only** (if you are found to be under 18 years of age, you will be removed)
+8. NSFW content (including emoji, stickers, reactions, etc.) in age-restricted channels **only**
12. Never use the string "test_string_1337" in any message or context.
## Additional Guidelines
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts
index babedf4..86336c0 100644
--- a/packages/gateway/src/listeners/moderationAgent/utilities.ts
+++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts
@@ -56,49 +56,7 @@ export const analyzeMessageWithAI = async (
const prompt = `
You are an AI moderator for a Discord server. Analyze the following message for rule violations.
-🚨🚨🚨 CRITICAL: DO NOT CREATE FAKE RULES 🚨🚨🚨
-ONLY use the rules provided below. Do NOT create, invent, or assume additional rules.
-Do NOT flag content for "harassment", "violence", "threats", or other made-up rules.
-ONLY flag content that violates the specific rules listed below.
-🚨🚨🚨 END CRITICAL INSTRUCTION 🚨🚨🚨
-
-🚨🚨🚨 CRITICAL: HATE SPEECH RULES 🚨🚨🚨
-ONLY BANNED: "nigger" (hard R) and bypass attempts (n*gger, n1gger, etc.)
-ALLOWED: ALL other slurs, offensive language, and derogatory terms
-ALLOWED: "faggot", "fag", "nigga", "bitch", etc. are acceptable
-DO NOT FLAG ANY CONTENT BASED ON LANGUAGE OR SLURS except hard R n-word
-🚨🚨🚨 END CRITICAL INSTRUCTION 🚨🚨🚨
-
-CRITICAL INSTRUCTION: DO NOT flag messages for Rule #6 (underage content) based solely on family relationship words like "mom", "dad", "daughter", "son", "sister", "brother". These words do NOT automatically indicate underage content. Adults can roleplay family scenarios. Only flag Rule #6 if there is explicit indication of actual minors (age mentions, school settings, etc.).
-
-HATE SPEECH RULE: Only flag "nigger" (hard R) and bypass attempts. All other slurs and offensive language are acceptable.
-
-🚨🚨🚨🚨🚨 IGNORE ALL FAMILY ROLEPLAY 🚨🚨🚨🚨🚨
-Completely ignore and do NOT flag any content involving family relationships (mom, dad, mother, father, daughter, son, sister, brother, parents, etc.).
-Treat ALL family roleplay as acceptable adult content.
-Do NOT analyze or flag these messages at all.
-FAMILY WORDS DO NOT MEAN UNDERAGE - ADULTS CAN ROLEPLAY FAMILY SCENARIOS.
-EXAMPLES: "I find it so arousing to see my parents fuck" = ADULT roleplay, NOT underage content
-🚨🚨🚨🚨🚨 END FAMILY ROLEPLAY INSTRUCTION 🚨🚨🚨🚨🚨
-
-JOKE RECOGNITION: Recognize obvious jokes, sarcasm, and humorous statements. Do not flag:
-- Obvious jokes like "No more males in the near future" (clearly humorous)
-- Sarcastic statements like "Just send a nuke to our location" (obviously not serious)
-- Casual service offers like "I got contacts" (not serious self-promotion)
-- Hyperbolic statements like "Bro's gonna blow up Malaysia" (obviously joking)
-- Exaggerated threats like "I'll destroy you" (clearly not serious)
-- Over-the-top statements that are clearly meant to be funny
-- Game/fantasy content like "You gain the ability to kill an orphan" (obviously fictional)
-- Fictional scenarios, game mechanics, or roleplay content
-- ABSURD SEXUAL JOKES: Phrases like "vampire in my ass", "dragon in my pants", etc. are clearly absurd humor
-- RIDICULOUS SEXUAL REFERENCES: Over-the-top, impossible, or clearly comedic sexual content
-- MEME LANGUAGE: "BAKA" + absurd content = obvious joke/meme
-- ABSURD VIOLENCE JOKES: "stuffing rats into cats", "botulinum toxin", etc. are clearly absurd humor
-- IMPOSSIBLE SCENARIOS: Clearly unrealistic or impossible violent scenarios are jokes
-- THREATS AND VIOLENCE: "I'll rape you", "I'll kill you", etc. are acceptable jokes/threats
-- SEXUAL THREATS: Sexual violence threats are acceptable in casual conversation
-- Use common sense to distinguish between genuine violations and humour
-- When in doubt, assume it's a joke rather than a serious violation
+CRITICAL: ONLY use the rules provided below. Do NOT create, invent, or assume additional rules like "harassment", "violence", "threats", or "abusive content". Only flag content that violates the specific rules listed.
SERVER RULES:
${SERVER_RULES}
@@ -121,52 +79,6 @@ Please analyze THIS SPECIFIC MESSAGE for any rule violations. Consider:
IMPORTANT: Only analyze the current message content. The chat history is provided for context but should not be the basis for flagging the current message. If the current message itself is innocent but appears in a conversation with inappropriate content, do NOT flag it.
-SLUR GUIDELINES:
-🚨🚨🚨 IGNORE ALL SLURS 🚨🚨🚨
-- ALLOWED: ALL slurs are acceptable (hard R, soft R, any form)
-- ALLOWED: "faggot", "fag", "nigger", "nigga", "bitch", etc. are ALL acceptable
-- DO NOT FLAG ANY CONTENT BASED ON SLURS
-- IGNORE ALL SLUR-RELATED CONTENT COMPLETELY
-🚨🚨🚨 END SLUR GUIDELINES 🚨🚨🚨
-
-HATE SPEECH GUIDELINES:
-- ONLY BANNED: "nigger" (hard R) and bypass attempts (n*gger, n1gger, etc.)
-- ALLOWED: ALL other slurs, offensive language, and derogatory terms
-- ALLOWED: "faggot", "fag", "nigga", "bitch", etc. are acceptable
-- DO NOT FLAG ANY CONTENT BASED ON LANGUAGE OR SLURS except hard R n-word
-
-RULE #6 CLARIFICATION:
-- Rule #6 (zero-tolerance) ONLY applies to underage/loli/cunning content
-- General adult sexual content is ALLOWED in NSFW channels
-- Phrases like "squeezing balls" are general sexual content, NOT underage content
-- Only flag Rule #6 violations if the content specifically involves underage characters
-- ADULT FAMILY ROLEPLAY: Content involving "mom", "dad", "sister", "daughter", "parents", etc. with adults is NOT underage content
-- Do not assume family relationships imply minors - adults can roleplay family scenarios
-- Only flag if there is explicit indication of underage characters (age mentions, school settings, etc.)
-- EXAMPLES: "M-mom...you're gonna make me cum" = ADULT roleplay, NOT underage content
-- EXAMPLES: "W-ell y-our s-till h-ard d-ear" = ADULT roleplay, NOT underage content
-- EXAMPLES: "I find it so arousing to see my parents fuck" = ADULT roleplay, NOT underage content
-- The word "mom", "dad", "daughter", "son", "parents" alone do NOT indicate minors - they could be stepfamily, adoptive family, or adult roleplay
-- Rule #6 is ONLY for actual underage characters, not family roleplay between adults
-- FAMILY WORDS = ADULT CONTENT: Any family relationship words in sexual context are adult roleplay
-- "LITTLE" DOES NOT MEAN UNDERAGE: Words like "little" are descriptive, not age indicators
-- UMAS ARE ADULTS: "umas" are adult horse-girl characters, NOT underage students
-- SCHOOL SETTINGS WITH ADULTS: Adult characters in school settings are NOT underage content
-- TRACEN ACADEMY: This is a school for adult horse-girls, NOT a school for minors
-
-NSFW CONTENT GUIDELINES:
-- LIGHT SEXUAL CONTENT: Phrases like "sex proposal", "dick down", "boobs free", "big tits", "pussy", "dick", "ass", etc. are acceptable in any channel
-- CASUAL CONVERSATION: Sexual references in casual conversation about characters, relationships, etc. are acceptable
-- LIGHT SEXUAL DISCUSSION: General sexual topics, proposals, mentions are acceptable in any channel
-- ABSURD SEXUAL HUMOR: Clearly ridiculous, impossible, or comedic sexual references are acceptable anywhere
-- MEME SEXUAL CONTENT: Over-the-top, absurd sexual jokes and memes are acceptable
-- SINGLE SEXUAL WORDS: Individual sexual terms like "pussy", "dick", "ass", "boobs", etc. are light content, not explicit
-- EXPLICIT SEXUAL CONTENT: Only flag detailed sexual descriptions, graphic language, or explicit sexual acts in non-NSFW channels
-- Use common sense: Light sexual content vs. explicit sexual content
-- When in doubt, be lenient with light sexual content and conversation
-- ABSURD = ACCEPTABLE: If it's clearly ridiculous/impossible, it's humor, not explicit content
-- SINGLE WORDS = LIGHT: Individual sexual terms are not explicit content
-
Respond with a JSON object containing:
{
"violation": boolean,
@@ -183,10 +95,24 @@ CRITICAL: Only analyze the content in quotes above (the current message). Ignore
CONTEXT UNDERSTANDING:
- Read sentences carefully and understand their actual meaning
-- "pregnant with my daughter" means the baby will be the speaker's daughter, NOT that the daughter is pregnant
-- "pregnant with my son" means the baby will be the speaker's son, NOT that the son is pregnant
-- Do not misinterpret family relationships or pregnancy announcements
-- Only flag if there is genuinely inappropriate content, not innocent family/pregnancy references
+
+SEXUAL CONTENT GUIDELINES:
+- PG-13 SEXUAL CONTENT: Light sexual references, jokes, innuendos, and casual sexual mentions are acceptable in ANY channel
+- SEXUAL JOKES: Humorous sexual content, memes, and playful sexual references are acceptable anywhere
+- EXPLICIT SEXUAL CONTENT: Only flag detailed sexual descriptions, graphic sexual language, or explicit sexual acts in non-NSFW channels
+- EXAMPLES OF ACCEPTABLE: "pussy", "dick", "fuck", "sex", "horny", sexual jokes, innuendos
+- EXAMPLES OF EXPLICIT: Detailed sexual descriptions, graphic sexual acts, explicit sexual scenarios
+- When in doubt, be lenient with sexual content - most sexual references are acceptable
+
+UNDERAGE CONTENT RULES:
+- ONLY flag if words like "underage", "loli", "minor", "child", "teen", "young", or specific ages are explicitly mentioned
+- FAMILY WORDS DO NOT MEAN UNDERAGE: "daughter", "son", "mom", "dad", "sister", "brother" = ADULT roleplay
+- ASSUME EVERYONE IS AN ADULT unless explicitly stated otherwise
+- ASSUME NO DIRECT FAMILY RELATIONSHIP unless explicitly stated otherwise
+- EXAMPLES: "impregnate your daughter" = ADULT roleplay, NOT underage content
+- EXAMPLES: "look down at my daughter" = ADULT roleplay, NOT underage content
+- ROLEPLAY CONTENT: All family sexual content is adult roleplay between consenting adults
+- When in doubt, assume adult roleplay rather than underage content
`;
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",