Last updated: April 5, 2026 | Reviewed by Daniel Ashford

The LLM Judge Index

Independent, multi-dimensional AI model evaluation by Daniel Ashford. 574+ models ranked. Methodology

Full Leaderboard - 574 Models

Data by Artificial Analysis | Updated hourly

#ModelIntelGPQACodeInput $/MLicense

Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)Anthropic

59.992.6%76.5$10.0Prop.2

GPT-5.6 Sol (max)OpenAI

58.994.1%77.4$5.0Prop.3

GPT-5.6 Sol (xhigh)OpenAI

57.793.1%78.3$5.0Prop.4

GPT-5.6 Sol (high)OpenAI

55.992.8%77.2$5.0Prop.5

Claude Opus 4.8 (Adaptive Reasoning, Max Effort)Anthropic

55.792%74.3$5.0Prop.6

GPT-5.6 Terra (max)OpenAI

55.092.5%76.7$2.5Prop.7

GPT-5.5 (xhigh)OpenAI

54.893.5%74.9$5.0Prop.8

Grok 4.5 (high)SpaceXAI

53.893.1%72.4$2.0Prop.9

GPT-5.6 Sol (medium)OpenAI

53.692.6%76.3$5.0Prop.10

Claude Opus 4.7 (Adaptive Reasoning, Max Effort)Anthropic

53.591.4%73.6$5.0Prop.11

Claude Sonnet 5 (Adaptive Reasoning, Max Effort)Anthropic

53.491.1%71.5$2.0Prop.12

GPT-5.5 (high)OpenAI

53.193.2%71.6$5.0Prop.13

GPT-5.6 Terra (xhigh)OpenAI

51.690.8%70.6$2.5Prop.14

GPT-5.4 (xhigh)OpenAI

51.492%71.1$2.5Prop.15

GPT-5.6 Luna (max)OpenAI

51.291.1%71.4$1.0Prop.16

GLM-5.2 (max)Z AI

51.189.5%68.8$1.4Prop.17

Muse Spark 1.1 (xhigh)Meta

50.689.8%71.3$1.3Prop.18

GPT-5.5 (medium)OpenAI

50.492.6%71.5$5.0Prop.19

Gemini 3.5 Flash (high)Google

50.292.2%70.1$1.5Prop.20

GPT-5.6 Sol (low)OpenAI

49.489.8%69.7$5.0Prop.21

GPT-5.6 Luna (xhigh)OpenAI

49.189.5%68.6$1.0Prop.22

GPT-5.6 Terra (high)OpenAI

49.089.6%67.1$2.5Prop.23

Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)Anthropic

47.287.5%63.0$3.0Prop.24

Gemini 3.1 Pro PreviewGoogle

46.594.1%68.8$2.0Prop.25

GPT-5.6 Luna (high)OpenAI

46.189.2%63.3$1.0Prop.26

Qwen3.7 MaxAlibaba

46.092.3%66.0$2.5Prop.27

GPT-5.6 Terra (medium)OpenAI

45.687.2%64.7$2.5Prop.28

Gemini 3.5 Flash (medium)Google

45.492.1%-$1.5Prop.29

MiniMax-M3MiniMax

44.492.9%58.6$0.30Prop.30

GPT-5.3 Codex (xhigh)OpenAI

44.391.5%-$1.8Prop.31

DeepSeek V4 Pro (Reasoning, Max Effort)DeepSeek

44.388.8%59.4$0.43Prop.32

Kimi K2.6Kimi

44.291.1%61.8$0.95Prop.33

Claude Opus 4.6 (Adaptive Reasoning, Max Effort)Anthropic

43.789.6%-$5.0Prop.34

GPT-5.5 (low)OpenAI

43.591%60.9$5.0Prop.35

Muse SparkMeta

43.188.4%58.6-Prop.36

Claude Opus 4.7 (Non-reasoning, High Effort)Anthropic

42.788.5%-$5.0Prop.37

MiMo-V2.5-ProXiaomi

42.286.6%60.2$0.43Prop.38

GPT-5.2 (xhigh)OpenAI

42.290.3%-$1.8Prop.39

Kimi K2.7 CodeKimi

41.989.6%60.8$0.95Prop.40

Claude Sonnet 5 (Non-reasoning, High Effort)Anthropic

41.780%66.4$2.0Prop.41

GPT-5.6 Sol (Non-reasoning)OpenAI

41.279%65.1$5.0Prop.42

Hy3Tencent

41.289.7%58.8-Prop.43

Nex-N2-ProNex AGI

41.089.2%59.1$0.50Prop.44

DeepSeek V4 Pro (Reasoning, High Effort)DeepSeek

40.890.5%-$0.43Prop.45

Claude Opus 4.5 (Reasoning)Anthropic

40.886.6%-$5.0Prop.46

GPT-5.6 Terra (low)OpenAI

40.584.3%58.1$2.5Prop.47

DeepSeek V4 Flash (Reasoning, Max Effort)DeepSeek

40.389.4%56.2$0.14Prop.48

MiMo-V2-ProXiaomi

40.387%--Prop.49

GLM-5.1 (Reasoning)Z AI

40.286.8%55.8$1.4Prop.50

GPT-5.2 Codex (xhigh)OpenAI

40.189.9%-$1.8Prop.

Showing top 50 of 574 models. Full data powered by Artificial Analysis.

Best LLM By Use Case

💻

Best for Code Generation

Weighted ranking

💬

Best for Customer Chatbot

Weighted ranking

✍️

Best for Content Writing

Weighted ranking

📊

Best for Data Analysis

Weighted ranking

🔬

Best for Research & RAG

Weighted ranking

🛡️

Best for Safety-Critical

Weighted ranking

Best LLM By Industry

🎓

Education

Schools, tutoring and edtech

🏥

Healthcare

Hospitals, clinics and health tech

🏦

Financial Services

Banking, investment and fintech

⚖️

Legal

Law firms, contracts and legal tech

💬

Customer Support

Help desks, chatbots and CX

Popular Comparisons

Claude Opus 4 vs GPT-5.3 Codex

Full comparison

Claude Opus 4 vs Gemini 2.5 Ultra

Full comparison

Claude Opus 4 vs Claude Sonnet 4

Full comparison

Claude Opus 4 vs GPT-4o

Full comparison

Claude Opus 4 vs Llama 4 405B

Full comparison

Claude Opus 4 vs Mistral Large 3

Full comparison

Claude Opus 4 vs Qwen 3.5 Plus

Full comparison

Claude Opus 4 vs DeepSeek V3

Full comparison

LLM Glossary

47 AI and language model terms explained. Browse all

Large Language Model (LLM)

An AI system trained on massive text data to understand and generate human language.

Tokens

The basic units of text that LLMs process — roughly 3/4 of a word.

Context Window

The maximum amount of text an LLM can process in a single request.

Hallucination

When an LLM generates plausible-sounding but factually incorrect information.

Inference

The process of an LLM generating a response to your input.

Prompt

The text input you send to an LLM to get a response.

Fine-Tuning

Customizing a pre-trained LLM on your specific data to improve performance for your use case.

RAG (Retrieval-Augmented Generation)

A technique that gives LLMs access to external documents to improve accuracy and reduce hallucination.

By Provider

Anthropic (3)OpenAI (3)Google (2)Meta (1)Mistral (1)Alibaba (1)DeepSeek (1)