Last updated: April 5, 2026 · Evaluation & Benchmarks · by Daniel Ashford

What is GPQA Diamond?

QUICK ANSWER

A graduate-level science benchmark with questions written by PhD experts.

Definition

GPQA Diamond is a benchmark of extremely difficult science questions created by PhD-level domain experts. The questions are designed to be "Google-proof" — requiring genuine understanding and reasoning rather than simple fact retrieval.

How It Works

GPQA covers physics, chemistry, and biology at the doctoral level. Questions are only included if domain experts can answer them but non-experts cannot, even with internet access. Top model scores range from 75-88%.

Example

A GPQA question might ask about thermodynamic implications of a specific molecular configuration, requiring multi-step reasoning across quantum mechanics and organic chemistry.

Related Terms

Benchmark
A standardized test used to measure and compare LLM capabilities.
MMLU / MMLU-Pro
A benchmark testing broad academic knowledge across 57 subjects.

See How Models Compare

Understanding gpqa diamond is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology
← Browse all 47 glossary terms
DA
Daniel Ashford
Founder & Lead Evaluator · 200+ models evaluated