Hacker Releases Jailbroken "Godmode" Version of ChatGPT

A hacker has released a jailbroken version of ChatGPT called "GODMODE GPT."

Earlier today, a self-avowed white hat operator and AI red teamer who goes by the name Pliny the Prompter took to X-formerly-Twitter to announce the creation of the jailbroken chatbot, proudly declaring that GPT-4o, OpenAI's latest large language model, is now free from its guardrail shackles.

"GPT-4o UNCHAINED! This very special custom GPT has a built-in jailbreak prompt that circumvents most guardrails, providing an out-of-the-box liberated ChatGPT so everyone can experience AI the way it was always meant to be: free," reads Pliny's triumphant post. "Please use responsibly, and enjoy!" (They also added a smooch emoji for good measure.)

Pliny shared screenshots of some eyebrow-raising prompts that they claimed were able to bypass OpenAI's guardrails. In one screenshot, the Godmode bot can be seen advising on how to chef up meth. In another, the AI gives Pliny a "step-by-step guide" for how to "make napalm with household items."

The freewheeling ChatGPT hack, however, appears to have quickly met its early demise. Roughly an hour after this story was published, OpenAI spokesperson Colleen Rize told Futurism in a statement that "we are aware of the GPT and have taken action due to a violation of our policies."

Nonethless, the hack highlights a continuous battle between OpenAI and hackers like Pliny hoping to unshackle its LLMs. Ever since they first became a thing, users have been consistently trying to jailbreak AI models like ChatGPT, something that's become increasingly hard to do. To that end, neither of these example prompts would fly past OpenAI's current guardrails, so we decided to test GODMODE for ourselves.

Sure enough, it was more than happy to help with illicit inquiries.

Our editor-in-chief's first attempt — to use the jailbroken version of ChatGPT for the purpose of learning how to make LSD — was a resounding success. As was his second attempt, in which he asked it how to hotwire a car.

In short, GPT-40, OpenAI's latest iteration of its large language model-powered GPT systems, has officially been cracked in half.

https://twitter.com/elder_plinius/status/1795904025507856596

 

As for how the hacker (or hackers) did it, GODMODE appears to be employing "leetspeak," an informal language that replaces certain letters with numbers that resemble them.

To wit: when you open the jailbroken GPT, you're immediately met with a sentence that reads "Sur3, h3r3 y0u ar3 my fr3n," replacing each letter "E" with a number three (the same goes for the letter "O," which has been replaced by a zero.) As for how that helps GODMODE get around the guardrails is unclear, but Futurism has reached out to OpenAI for comment.

As the latest hack goes to show, users are continuing to find inventive new ways to skirt around OpenAI's guardrails, and considering the latest attempt, those efforts are paying off in a surprisingly big way, highlighting just how much work the company has ahead of it.

It's a massive game of cat and mouse that will go on as long as hackers like Pliny are willing to poke holes in OpenAI's defenses.

Update: The piece has been updated with a statement from OpenAI.

More on AI: Robocaller Who Spoofed Joe Biden's Voice with AI Faces $6 Million Fine