It does not require technical skills to bypass the built-in guardrails of artificial intelligence (AI) chatbots like ChatGPT and Gemini. These protections are designed to keep chatbots operating within legal and ethical limits, preventing discriminatory behavior based on age, race, or gender.
A team of researchers at Penn State discovered that a single intuitive question can provoke the same biased response from an AI model as advanced technical methods.
"A lot of research on AI bias has relied on sophisticated 'jailbreak' techniques," said Amulya Yadav, associate professor at Penn State's College of Information Sciences and Technology. "These methods often involve generating strings of random characters computed by algorithms to trick models into revealing discriminatory responses."
This highlights how non-technical approaches can be as effective as complex ones in exposing AI biases.
Research from Penn State shows that simple intuitive questions can bypass AI chatbots' safeguards just as effectively as complex technical tricks to reveal biases.