Google Gemini 3 Deep Think AI Scores Passing Score on Humanity’s Final Test, Crushes Toughest Standards
Google is introducing a major upgrade to its powerhouse AI reasoning model, Gemini 3 Deep Think. The enhanced version is now available to premium customers and selected researchers, engineers, and enterprises through Google’s API.

Google has released a new update for Gemini 3 DeepThink. Given how everyone from CEO Sundar Pichai to DeepMind boss Demis Hassabis has been pitching it, this update sounds huge. For those unaware, Deep Think is a special mode or feature of Gemini that is designed to think and reason on multiple tasks. What’s making headlines is that the update propels Gemini 3 Deep Think to crush some of the toughest benchmarks available to mankind at the time of writing, including Humanity’s Last Exam. Google has announced that an upgraded version of Gemini 3 DeepThink is available to paying customers.
“Gemini 3 Deep Think is getting a significant upgrade. We’ve refined Deep Think in close partnership with scientists and researchers to tackle tough, real-world challenges,” Pichai wrote on X, formerly known as Twitter, to announce the new Gemini 3 Deep Think update. “It is surpassing the most challenging benchmarks,” he said.
Hassabis also took to
Benchmarks are the official way to identify how powerful an AI model can be. The idea is to present it with a wide range of challenging scenarios or problems and then see how it performs on some of these tasks. To test how far AI can go, comparisons are made with competing models such as OpenAI’s GPT and Anthropic’s Opus, as well as with humans. To this end, the new Gemini 3 Deep Think model achieved 84.6 percent on ARC-AGI-2, 48.4 percent on the Humanities Last Exam without tools, and a 3455 Elo rating on Codeforces.
The ultimate test of humanity has special significance here as it is widely considered to be one of the toughest standards to overcome. The creators of the benchmark actually call it “the last closed-ended academic benchmark,” where most AIs averaged around or below 30 percent. Humans’ score is around 90 percent. So technically, Gemini – with a claimed score of 48.4 percent – is not only outperforming other AIs, but also starting to close the gap with humans.
Advanced models are designed to tackle messy, real-world problems, push the boundaries of mathematical research, and act as an intelligent assistant to scientists and engineers. Google says that this model has been created in partnership with leading scientists and researchers. Beyond just understanding numbers and writing code, Google says Gemini 3 Deep Think is now showing strong results in difficult scientific areas, including physics, chemistry, and advanced theoretical research.
The company claims the model has reached gold-medal-level performance on comparable written benchmarks for the International Physics Olympiad 2025 and International Chemistry Olympiad 2025, while also posting solid scores on academic tests like CMT-Benchmark, indicating a push to establish Deep Think not just as a programming assistant, but as a comprehensive research tool for tackling complex scientific problems.
For now, access to Gemini 3 DeepThink is limited. Google says the advanced reasoning mode is running in the Gemini app for Google AI Ultra customers, while researchers, engineers, and enterprises can apply for early access through the Gemini API. The company has also opened an expression-of-interest programme.