Last updated: April 5, 2026 · Model Architecture · by Daniel Ashford

What is Attention Mechanism?

QUICK ANSWER

The core technique that allows LLMs to understand relationships between words in text.

Definition

The attention mechanism is the fundamental operation inside Transformer models that allows each token to dynamically focus on other relevant tokens in the input sequence. It computes weighted relationships between all positions in the text.

How It Works

Self-attention operates through queries (Q), keys (K), and values (V). For each token, a query is compared against all keys to produce attention weights, which then weight the values. Multi-head attention runs this process multiple times in parallel with different learned projections, allowing the model to attend to different types of relationships simultaneously.

Example

In "The bank by the river had a steep bank," multi-head attention helps disambiguate the two meanings of "bank" by attending to surrounding context.

Related Terms

Transformer
The neural network architecture behind all modern LLMs.
Context Window
The maximum amount of text an LLM can process in a single request.
Parameters
The numerical weights inside an LLM that encode its learned knowledge.

See How Models Compare

Understanding attention mechanism is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology
← Browse all 47 glossary terms
DA
Daniel Ashford
Founder & Lead Evaluator · 200+ models evaluated