AI & LLM

Reasoning Model Benchmarks 2025

o3, Claude 3.7 Sonnet (extended thinking), and Gemini 2.0 Flash Thinking — how the new class of reasoning models compare on the hardest tests.

By Benchmark
Score Matrix
Made with sHTMLs