News: 0183068894

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins'

(Thursday April 30, 2026 @11:00AM (BeauHD) from the news-for-goblins-stuff-that-matters dept.)


An anonymous reader quotes a report from Ars Technica:

> The system prompt for OpenAI's Codex CLI contains a perplexing and repeated warning for the most recent GPT model to " [1]never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

>

> The explicit operational warning was [2]made public last week as part of the latest open source code for Codex CLI that OpenAI [3]posted on GitHub . The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."

>

> Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. [4]Anecdotal [5]evidence on [6]social media shows [7]some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days.

Update: OpenAI has published a blog post explaining " [8]where the goblins came from ."

In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.



[1] https://arstechnica.com/ai/2026/04/openai-codex-system-prompt-includes-explicit-directive-to-never-talk-about-goblins/

[2] https://github.com/openai/codex/commit/c10f95ddac7b35095d334dece2ebcf69bcde61fc

[3] https://github.com/openai/codex/blob/main/codex-rs/models-manager/models.json#L55

[4] https://x.com/LeoMozoloa/status/2049082289116582386

[5] https://x.com/iamBarronRoth/status/2049123594467475481

[6] https://x.com/AndyAyrey/status/2047956630676095280

[7] https://x.com/MainStreetAIHQ/status/2049335057370976408

[8] https://openai.com/index/where-the-goblins-came-from/



Funny but serious (Score:3)

by JoshuaZ ( 1134087 )

This is obviously pretty funny at some level, and an amusing example of how training can go wrong in somewhat subtle ways. This is in some respects a less substantial example of how [Claude Opus essentially hacked itself into caring a lot more about ethics](https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking). But both of these are examples of the same central issues: LLM AIs even in their current form are hard to predict, hard to control, and can end up with very weird hard to predict or adjust behavior.

Re: (Score:2)

by AmazingRuss ( 555076 )

They are roughly equivalent to a very knowledgeable human with a bit of schizophrenia.

Re: (Score:3)

by dinfinity ( 2300094 )

To be fair, in this instance they almost specifically instructed the AI to act like this:

"You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap o

I'm more concerned out this (Score:2)

by Comboman ( 895500 )

> alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed"

Since overuse of emojis and em dashes are a classic indicator of AI generated text that people now know to look for, it pretty clear they are actively trying to hide the nature of their LLM output.

Re: (Score:2)

by JoshuaZ ( 1134087 )

That does seem like a valid concern in terms of the em dash especially, which does look like an attempt to make the text more easily passed off as not AI. I suspect that the emojis may be connected to a separate concern: they are freaking annoying. If I'm using an LLM to code something up, I don't want the comments to include smiley faces or a frowny face comment when it runs into an error, and that's not the only example. Reducing emoji use may be in part just to make users happier.

Example of control (Score:2)

by fropenn ( 1116699 )

I shouldn't be surprised, but it is an example of the astonishing amount of control the managers or owners of OpenAI have over what the world sees in their communications with AI. Sure, maybe erasing goblins from conversations isn't a big deal...but it could easily be applied to world events, politicians, crime, corruption, etc.

Also - don't mention the war! (Score:2)

by 93 Escort Wagon ( 326346 )

I did once, but I think I got away with it.

Price does not include taxes.