Google Gemini 3.1 Pro Surpasses Anthropic Claude Opus 4.6 and OpenAI GPT-5.2 on Key AI Benchmarks

Google's Gemini 3.1 Pro leads the AI model race by topping 13 of 16 industry-standard benchmarks, outperforming Anthropic's Claude Opus 4.6 and OpenAI's GPT-5.2 in core reasoning, scientific knowledge, and coding tasks.

Google has launched Gemini 3.1 Pro, its latest AI model, which outperforms leading competitors Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.2 on the majority of industry benchmarks.

Gemini 3.1 Pro achieved top scores on 13 of 16 evaluated benchmarks, including 94.3% on GPQA Diamond, a test of expert-level scientific knowledge, surpassing Claude Opus 4.6’s 91.3% and GPT-5.2’s 92.4%.

On the challenging ARC-AGI-2 abstract reasoning benchmark, Gemini 3.1 Pro scored 77.1%, more than double its predecessor’s result and well ahead of competitors. The model also led in agentic and coding performance, scoring first on SWE-Bench Verified (80.6%) and Terminal-Bench 2.0 (68.5%).

While Gemini 3.1 Pro dominated most benchmarks, Claude Sonnet 4.6 matched its long-context performance on MRCR v2 and outperformed it on specialized expert tasks and exam-like challenges, indicating areas for continued competition in AI capabilities.