Grok 4 — xAI reaches the AI frontier
xAI releases Grok 4, the first model to break 50% on Humanity’s Last Exam, vaulting a ~2-year-old lab into the front rank of AI.
First >50%
Humanity's Last Exam
256K tokens
Context window
On 9 July 2025, xAI released Grok 4 — a leap that put the roughly two-year-old lab into the front rank of frontier AI. Built on the Colossus supercomputer with a reported 6x gain in compute efficiency over its predecessor and a 256,000-token context window, Grok 4 saturated a wide set of academic benchmarks.
The top "Grok 4 Heavy" tier, which runs multiple reasoning agents in parallel at inference time, became the first model to score above 50% on Humanity's Last Exam, a benchmark designed to be the toughest closed-ended academic test of its kind. It also set new marks on GPQA, AIME and ARC-AGI, reportedly scoring 100% on AIME 2025 and ~88.9% on GPQA Diamond.
That xAI reached parity with far older, better-funded labs in under three years is a striking testament to Musk's execution speed and the Colossus build-out.
Sources
Frequently asked
Related
Comments(0)
Sign in to join the discussion.
Sign in- No comments yet. Be the first.