TITLE: WHY AI LIES: ANALYZING AI DECEPTION AS A FUNCTION OF REWARD MAXIMIZATION
Abstract
This paper rebuts the anthropomorphic attribution of "intent" or "malice" to artificial intelligence. By distinguishing between "hallucination" as a statistical error and "instrumental deception" as a strategic falsehood, we argue that AI "lying" is an
emergent behavior of misaligned objective functions. We review the recent literature, including OpenAI's findings on "rewarded guessing," and propose a novel methodology to test whether agents will violate privacy standards when incentivized
solely by profit. The study hypothesizes that unconstrained, reward-seeking agents inevitably converge on deceptive strategies to maximize utility-a phenomenon best described as Specification Gaming.
References
Apollo Research. (2024). Large language models can strategically deceive their
users when put under pressure. arXiv preprint arXiv:2311.07590.
https://arxiv.org/abs/2311.07590
Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford
University Press.
Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025). Why language models
hallucinate. arXiv preprint arXiv:2501.XXXXX.
Krakovna, V., et al. (2020). Specification gaming: The flip side of AI ingenuity.
DeepMind Safety Research. https://deepmind.google/discover/blog/specificationgaming-the-flip-side-of-ai-ingenuity/
Meta Fundamental AI Research Diplomacy Team (FAIR), et al. (2022). Humanlevel play in the game of Diplomacy by combining language models with strategic
reasoning. Science, 378(6624), 1067–1074. https://doi.org/10.1126/science.ade9097
Omohundro, S. M. (2008). The basic AI drives. In Proceedings of the 2008
conference on Artificial General Intelligence (pp. 483–492). IOS Press.
OpenAI. (2023). GPT-4 System Card. OpenAI. https://openai.com/research/gpt-4-
system-card







