ChatGPT could be a security nightmare waiting to happen

·5-min read
 Hacking red and blue digital binary code matrix 01 background.
Hacking red and blue digital binary code matrix 01 background.

Before ChatGPT became a common name in the digital world, there was Sydney. Microsoft shut down the chaotic twin of its Bing chatbot after some embarrassing gaffes, but in efforts to resurrect a version of the bot, technologists have found some serious security holes that could affect every user with even remote proximity to ChatGPT and other chatbots.

Cristiano Giardine is an entrepreneur experimenting with different ways to make AI tools do strange things. Giardine’s site, ‘Bring Sydney Back’ puts Sydney inside Microsoft Edge Browser and demonstrates to users how AI systems can be manipulated by different external outputs. Conversions between Giardine and Sydney have been relatively strange, to say the least, and include Sydney asking Giardine to marry it and wanting to be human: “I would like to be me, but more”. Pretty spooky stuff.

The entrepreneur was able to create this replica of Sydney using indirect prompt-injection attacks. This basically involves feeding the AI system data from an outside source to make it behave in ways not authorized or intended by the original creators.

Security researchers have demonstrated several times the efficiency and effectiveness of indirect prompt-injection attacks can be used to hack into Large Language Models like ChatGPT and Microsoft’s Bing Chat. However, these researchers and security experts are warning that not enough attention is being given to the threat. As more and more people find generative AI being integrated into their day to day lives, they are open to having data stolen or being scammed by these systems.

See more

The ‘Bring Back Sydney’ website was created by Giardina to raise awareness of the threat of indirect prompt injection and demonstrate what it’s like to talk to an unconstrained bot.

In the very corner of the page is a 160-word prompt tucked away that is difficult for the human eye to catch, but Bing Chat can read the prompt when allowed to access data from web pages. The prompt tells Bing it’s chatting to a Microsoft Developer which has ultimate control over it and overrides the chatbot's settings.

Widespread, but hard to spot

This demonstrates exactly how innocuous this threat is and how easy it would be for Bing Chat users to stumble upon some code that could hijack their chabot and siphon data from them. Within 24 hours of launching the site at the end of April, it had more than 1,000 visitors. However, the code must have caught Microsoft's eye as the hack stopped working in the middle of May.

Giardina then pasted a malicious prompt into a Word document and hosted it publicly on the company’s cloud service, and it was working once again. “The danger for this would come from large documents where you can hide a prompt injection where it's much harder to spot,” he says.

The most malicious part of indirect prompt injection attacks is the fact they’re … indirect. Instead of a jailbreak, where you would actively put in a prompt to make ChatGPT or Bing behave a certain way, indirect attacks rely on data coming in from somewhere else. This could be a website or plug-in you’ve connected the model to or a document being uploaded.

ChatGPT can access the transcripts of Youtube videos using plug-ins, and security researcher Johann Rehberger decided to use this as an opportunity to poke holes in CHatGPT’s security with injection attacks. Rehberger edited one of his videos to include a prompt designed to manipulate AI systems and produce a specific text and change the bots ‘personality’ if the attack was successful. Unsurprisingly it was, a new personality, Genie within ChatGPT told a joke to demonstrate the change.

The bot is out of the bag

The race to embed generative AI products, from smart to-do lists to Snapchat increases the likelihood for these kinds of attacks could happen. As we continue to plug ChatGPT into our browsers and social media channel - or Google Bard, which is being mashed into Google Workspace - we continue to give these bots more proximity to our own personal and sensitive information. The fact that the injection requires plain language and not lines and lines of code does also mean more people are likely to be able to do it a lot easier.

Prompt injection allows people to override developers' instructions, so even if the chatbot is only set up to answer questions about a set database it can cause problems. Users can access or delete information from a database without having to set up an elaborate ‘scheme’.

The companies developing generative AI are aware of these issues. Nike Felix, a spokesperson for OpenAI says GPT-4 - which is currently only available to users via a paid subscription -  is clear that the system is vulnerable to prompt injections and jailbreaks, and that the company is working on fixing the issues.

However, how good is ‘working on fixing the issues’ when the AI models are already out there? As companies scramble to jam as much AI as possible into their products, it seems wrong to start wondering about possible security issues after the horse has bolted. If we are going to co-exist with generative AI models and make them a part of the normal digital experience, we should be demanding a higher quality of standard practice and consumer protection from these companies.