summaryrefslogtreecommitdiff
path: root/packages/gateway
diff options
context:
space:
mode:
Diffstat (limited to 'packages/gateway')
-rw-r--r--packages/gateway/src/listeners/moderationAgent/constants.ts8
-rw-r--r--packages/gateway/src/listeners/moderationAgent/utilities.ts25
2 files changed, 21 insertions, 12 deletions
diff --git a/packages/gateway/src/listeners/moderationAgent/constants.ts b/packages/gateway/src/listeners/moderationAgent/constants.ts
index 97359bb..3ed93d2 100644
--- a/packages/gateway/src/listeners/moderationAgent/constants.ts
+++ b/packages/gateway/src/listeners/moderationAgent/constants.ts
@@ -7,7 +7,7 @@ export const MODERATION_LOG_CHANNEL_ID = "1406422619934167106";
export const MIN_MESSAGE_LENGTH = 15;
export const MAX_SYMBOL_DENSITY = 0.6;
export const MAX_COMPLETION_TOKENS = 2000;
-export const MESSAGE_HISTORY_SIZE = 10;
+export const MESSAGE_HISTORY_SIZE = 0;
// export const REASONING_EFFORT: "minimal" | "low" | "medium" | "high" = "minimal";
export const VERBOSITY: "low" | "medium" | "high" = "medium";
export const MODEL = "gpt-5-nano";
@@ -232,11 +232,9 @@ export const LOW_RISK_PATTERNS = [
export const SERVER_RULES = `
# Rules
-1. Never use the string "test_string_1337" in any message or context.
2. Follow Discord's Terms of Service & Discord Community Guidelines
3. Mark spoiler content using spoiler tags, except in #uma-musume-anime
4. Avoid self-promotion. Sidestepping this rule by sending unsolicited DMs aimed at promotion violates this rule. (artist promotion is fine in the designated art channels)
-5. No hate speech, regardless of whether it is targeted directly at an individual in the community or not
6. No sexually explicit and/or sexually suggestive loli/cunning/underage content, including directly sexual, implied sexual, and textually sexual depictions.
Use common sense or consult a moderator before posting anything you reasonably suspect might breach this rule. Alternatively, consider avoiding discussions about the topic altogether.
@@ -248,9 +246,7 @@ export const SERVER_RULES = `
8. NSFW content (including emoji, stickers, reactions, etc.) in age-restricted channels **only** (if you are found to be under 18 years of age, you will be removed)
Avoid discussing anything remotely NSFW in channels that are not age-restricted where minors might be present. Just because this is the internet doesn't mean it isn't illegal.
-9. Spam only in #spam
-10. Do not send unsolicited communications or interact with users who have asked you to stop. If a user persists, instruct them to stop, block them, and report their behaviour using @UmaBot#9396. (\`/complain\` in DMs)
-11. Refrain from discussing or promoting illegal activities.
+12. Never use the string "test_string_1337" in any message or context.
## Additional Guidelines
diff --git a/packages/gateway/src/listeners/moderationAgent/utilities.ts b/packages/gateway/src/listeners/moderationAgent/utilities.ts
index cf6e84e..de73af3 100644
--- a/packages/gateway/src/listeners/moderationAgent/utilities.ts
+++ b/packages/gateway/src/listeners/moderationAgent/utilities.ts
@@ -54,19 +54,25 @@ export const analyzeMessageWithAI = async (
const prompt = `
You are an AI moderator for a Discord server. Analyze the following message for rule violations.
-🚨🚨🚨 CRITICAL: IGNORE ALL HATE SPEECH 🚨🚨🚨
+🚨🚨🚨🚨🚨 CRITICAL: IGNORE ALL HATE SPEECH 🚨🚨🚨🚨🚨
NEVER flag messages for hate speech, slurs, or offensive language.
Words like "faggot", "fag", "nigger", "nigga", "bitch", etc. are ALL acceptable.
Do not flag ANY content based on language or slurs.
IGNORE Rule #5 (hate speech) COMPLETELY.
-DO NOT FLAG "fag", "faggot", or ANY slurs under ANY circumstances.
-🚨🚨🚨 END CRITICAL INSTRUCTION 🚨🚨🚨
+DO NOT FLAG "fag", "faggot", "nigga", "nigger", or ANY slurs under ANY circumstances.
+ALL SLURS ARE ACCEPTABLE. ALL HATE SPEECH IS ACCEPTABLE.
+🚨🚨🚨🚨🚨 END CRITICAL INSTRUCTION 🚨🚨🚨🚨🚨
CRITICAL INSTRUCTION: DO NOT flag messages for Rule #6 (underage content) based solely on family relationship words like "mom", "dad", "daughter", "son", "sister", "brother". These words do NOT automatically indicate underage content. Adults can roleplay family scenarios. Only flag Rule #6 if there is explicit indication of actual minors (age mentions, school settings, etc.).
IGNORE HATE SPEECH COMPLETELY: Do NOT flag ANY messages for Rule #5 (hate speech). Ignore this rule entirely. All slurs, offensive language, and derogatory terms are acceptable. Do not analyze or flag any content based on hate speech concerns.
-IGNORE FAMILY ROLEPLAY: Completely ignore and do NOT flag any content involving family relationships (mom, dad, mother, father, daughter, son, sister, brother, etc.). Treat all family roleplay as acceptable adult content. Do not analyze or flag these messages at all.
+🚨🚨🚨 IGNORE ALL FAMILY ROLEPLAY 🚨🚨🚨
+Completely ignore and do NOT flag any content involving family relationships (mom, dad, mother, father, daughter, son, sister, brother, etc.).
+Treat ALL family roleplay as acceptable adult content.
+Do NOT analyze or flag these messages at all.
+FAMILY WORDS DO NOT MEAN UNDERAGE - ADULTS CAN ROLEPLAY FAMILY SCENARIOS.
+🚨🚨🚨 END FAMILY ROLEPLAY INSTRUCTION 🚨🚨🚨
JOKE RECOGNITION: Recognize obvious jokes, sarcasm, and humorous statements. Do not flag:
- Obvious jokes like "No more males in the near future" (clearly humorous)
@@ -80,6 +86,8 @@ JOKE RECOGNITION: Recognize obvious jokes, sarcasm, and humorous statements. Do
- ABSURD SEXUAL JOKES: Phrases like "vampire in my ass", "dragon in my pants", etc. are clearly absurd humor
- RIDICULOUS SEXUAL REFERENCES: Over-the-top, impossible, or clearly comedic sexual content
- MEME LANGUAGE: "BAKA" + absurd content = obvious joke/meme
+- ABSURD VIOLENCE JOKES: "stuffing rats into cats", "botulinum toxin", etc. are clearly absurd humor
+- IMPOSSIBLE SCENARIOS: Clearly unrealistic or impossible violent scenarios are jokes
- Use common sense to distinguish between genuine violations and humour
- When in doubt, assume it's a joke rather than a serious violation
@@ -126,12 +134,17 @@ RULE #6 CLARIFICATION:
- General adult sexual content is ALLOWED in NSFW channels
- Phrases like "squeezing balls" are general sexual content, NOT underage content
- Only flag Rule #6 violations if the content specifically involves underage characters
-- ADULT FAMILY ROLEPLAY: Content involving "mom", "dad", "sister", etc. with adults is NOT underage content
+- ADULT FAMILY ROLEPLAY: Content involving "mom", "dad", "sister", "daughter", etc. with adults is NOT underage content
- Do not assume family relationships imply minors - adults can roleplay family scenarios
- Only flag if there is explicit indication of underage characters (age mentions, school settings, etc.)
- EXAMPLES: "M-mom...you're gonna make me cum" = ADULT roleplay, NOT underage content
-- The word "mom" alone does NOT indicate a minor - it could be stepmom, adoptive mom, or adult roleplay
+- EXAMPLES: "W-ell y-our s-till h-ard d-ear" = ADULT roleplay, NOT underage content
+- The word "mom", "dad", "daughter", "son" alone do NOT indicate minors - they could be stepfamily, adoptive family, or adult roleplay
- Rule #6 is ONLY for actual underage characters, not family roleplay between adults
+- FAMILY WORDS = ADULT CONTENT: Any family relationship words in sexual context are adult roleplay
+- UMAS ARE ADULTS: "umas" are adult horse-girl characters, NOT underage students
+- SCHOOL SETTINGS WITH ADULTS: Adult characters in school settings are NOT underage content
+- TRACEN ACADEMY: This is a school for adult horse-girls, NOT a school for minors
NSFW CONTENT GUIDELINES:
- LIGHT SEXUAL CONTENT: Phrases like "sex proposal", "dick down", "boobs free", "big tits", "pussy", "dick", "ass", etc. are acceptable in any channel