Last updated: April 5, 2026 · Model Architecture · by Daniel Ashford

What is Transformer?

QUICK ANSWER

The neural network architecture behind all modern LLMs.

Definition

The Transformer is the neural network architecture that powers virtually all modern language models. Introduced in 2017, it replaced previous sequence models with self-attention that can process all tokens simultaneously rather than one at a time.

How It Works

The key innovation is the attention mechanism, which allows each token to "attend to" every other token in the input, learning which parts of the text are most relevant. This parallel processing enables massive speedups during training and captures long-range dependencies. Modern LLMs use decoder-only Transformer variants with modifications like rotary position embeddings and grouped query attention.

Example

When processing "The cat sat on the mat because it was tired," the Transformer attention mechanism helps the model understand that "it" refers to "the cat" — not "the mat."

Related Terms

Attention Mechanism

The core technique that allows LLMs to understand relationships between words in text.

Parameters

The numerical weights inside an LLM that encode its learned knowledge.

Large Language Model (LLM)

An AI system trained on massive text data to understand and generate human language.

See How Models Compare

Understanding transformer is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology

← Browse all 47 glossary terms

Daniel Ashford

Founder & Lead Evaluator · 200+ models evaluated