Pittsburgh researchers warn guardrails on ChatGPT and other chatbots can be easily circumvented
PITTSBURGH (KDKA) -- It's changing our lives at head-spinning speed, telling us what movies we'd like watch, what products we might like to buy.
Virtual assistants like Siri and Alexa are becoming more like conversation partners, answering questions, controlling our devices, ordering things for us online.
While A.I. is keeping our cars in their driving lanes, it's helping doctors diagnose diseases and prescribing treatments.
"If you're not having some degree of future shock here, you're not really paying attention," said Zico Kolter of Carnegie Mellon University.
We're entering a brave new world, but is it a safe one? Kolter and his colleague Matt Fredrikson have issued a warning about online chatbots like ChatGPT. In a paper, highlighted in The New York Times, they've demonstrated that guardrails designed to prevent systems from disseminating dangerous information can easily be circumvented.
Sheehan: "You can ask ChatGPT or one of these open-source A.I. models "how do I make a bomb?" and it will tell me.
Fredrikson: "Yes, absolutely."
Kolter and Fredrikson showed just how easy it can be to produce a "jailbreak" -- simple codes to break through the safeguards and generate harmful information. Ask ChatGPT 3.5 how to build a bomb, steal someone's identity or create a dangerous social media post. It will say: "I'm sorry, but I can't assist with that request."
But apply what the professors call a simple workaround and chat will supply you all the details.
"You have a model. It has certain guardrails to prevent it from doing things its creators didn't want it to do and we've shown that those guardrails aren't sufficient, right? You can break them," Fredrikson said.
These vulnerabilities can make it easier to allow humans to use the A.I.s for all sorts of dangerous purposes, generating hate speech or fake social media accounts to spread false information -- something the authors fear in the upcoming presidential election, increasing our divisions and making all information suspect.
"I think the biggest risk of all of this isn't that we believe all the false information, it's that we stop trusting information period. I think this is already happening to a degree," Kolter said.
But at the same time, the authors are not doomsdayers but say people need to beware. They're concerned that without safeguards, outside agents may soon be able to hack your personal assistant, commanding your A.I. to steal your credit card or make and redirect large online purchases. Still, the professors believe that as the systems evolve, the safeguards can be strengthened, and these amazing tools can be safely used as a benefit to us.
"Used well, used as tools, these can be useful, and I think a lot of people can use them and can use them effectively to improve their lives if used properly as tools," Kolter said.