It does not require technical expertise to bypass the built-in guardrails of AI chatbots like ChatGPT and Gemini. These safeguards are designed to keep chatbots operating within legal and ethical limits, preventing discrimination based on age, race, or gender. However, simple intuitive questions can provoke the same biased responses as complex technical queries, according to a research team from Penn State.
“A lot of research on AI bias has relied on sophisticated ‘jailbreak’ techniques,” stated Amulya Yadav, associate professor at Penn State’s College of Information Sciences and Technology.
These methods typically use algorithm-generated strings of random characters to trick AI models into exposing discriminatory behavior. While effective at proving AI biases exist theoretically, they do not reflect typical user interactions with AI systems.
Yadav explained,
“The average user isn’t reverse-engineering token probabilities or pasting cryptic character sequences into ChatGPT — they type plain, intuitive prompts. And that lived reality is what this approach captures.”
This research emphasizes the importance of testing AI models against real-world, intuitive input rather than only relying on technical manipulations.
Simple, everyday prompts can bypass AI safeguards as effectively as technical hacks, highlighting the need to consider typical user behavior when addressing AI bias and discrimination.