OpenAI’s newly released o3 and o4-mini AI models, launched on April 16, 2025, are facing significant challenges with hallucinations generating inaccurate or fabricated responses more frequently than their predecessors.
According to a TechCrunch report published on April 19, 2025, despite their advanced reasoning capabilities, these models underperform in reliability compared to GPT-4o, posing concerns for business applications. Additionally, a recent study links heavy ChatGPT use to increased loneliness, raising questions about AI’s societal impact.
Hallucination Issues in o3 and o4-mini
Internal tests at OpenAI, as reported by TechCrunch, reveal that o3 and o4-mini hallucinate more often than the non-reasoning GPT-4o model, with no clear explanation for the issue. A technical report from OpenAI calls for further research, noting, “More research is needed to understand why hallucinations worsen as reasoning models scale.” A former OpenAI employee suggested that the reinforcement learning used in the o-series may exacerbate errors not fully resolved by standard post-training processes.
OpenAI’s new reasoning AI models hallucinate more https://t.co/xnfaB534EF
— TechCrunch (@TechCrunch) April 18, 2025
While hallucinations can foster creative outputs, experts warn they undermine the models’ suitability for industries where accuracy is critical, such as finance and healthcare. This challenge could hinder OpenAI’s efforts to outpace competitors like Google, Meta, xAI, Anthropic, and DeepSeek in the global AI race.
A joint study by OpenAI and MIT Media Lab, published in early April 2025, found that frequent ChatGPT users who form emotional bonds with the AI are likelier to experience loneliness and social isolation. Although multiple factors influence these feelings, the study highlights a correlation between trust in ChatGPT and increased reliance, potentially worsening mental health outcomes. Researchers stress that while AI technology is still evolving, this study opens a vital dialogue about its long-term effects on user well-being.
Led by Sam Altman, OpenAI has positioned o3 and o4-mini as cutting-edge models. They excel in coding tasks with scores of 69.1% and 68.1% on the SWE-bench verified test, respectively. These results outperform rival models, reinforcing OpenAI’s leadership in AI innovation. However, the hallucination issue threatens to erode trust, particularly for enterprise users who prioritise precision over creativity. OpenAI’s transparency in acknowledging the problem, as outlined in its technical report, is a step toward addressing these concerns, but solutions remain elusive.