Last updated: April 5, 2026 · 47 terms · by Daniel Ashford

📖 LLM Glossary

47 essential AI and language model terms explained clearly. Whether you are a developer evaluating models or a business leader exploring AI, this glossary covers every concept you need to understand.

Core Concepts

Large Language Model (LLM)

An AI system trained on massive text data to understand and generate human language.

The basic units of text that LLMs process — roughly 3/4 of a word.

The maximum amount of text an LLM can process in a single request.

When an LLM generates plausible-sounding but factually incorrect information.

The process of an LLM generating a response to your input.

An LLM that can autonomously plan, use tools, and take actions to complete complex tasks.

LLMs that can process not just text, but also images, audio, and video.

Model Architecture

Customizing a pre-trained LLM on your specific data to improve performance for your use case.

RAG (Retrieval-Augmented Generation)

A technique that gives LLMs access to external documents to improve accuracy and reduce hallucination.

The neural network architecture behind all modern LLMs.

The numerical weights inside an LLM that encode its learned knowledge.

Attention Mechanism

The core technique that allows LLMs to understand relationships between words in text.

Numerical representations of text that capture semantic meaning — used in search and RAG systems.

Compressing an LLM to use less memory by reducing numerical precision.

Reasoning Tokens

Hidden thinking tokens that reasoning models generate internally before their visible response.

Vector Database

A specialized database for storing and searching embeddings — the backbone of RAG systems.

Evaluation & Benchmarks

A standardized test used to measure and compare LLM capabilities.

Arena Elo Rating

A crowdsourced model ranking based on human preference votes in blind comparisons.

LLM Judge Index™

Our proprietary composite score ranking LLMs across 6 evaluation dimensions on a 0-100 scale.

MMLU / MMLU-Pro

A benchmark testing broad academic knowledge across 57 subjects.

A graduate-level science benchmark with questions written by PhD experts.

Pricing & Deployment

API (Application Programming Interface)

The technical interface that lets your software send prompts to an LLM and receive responses.

LLM API Pricing

The cost of using language models, typically measured in dollars per million tokens.

The tokens in your prompt that the model reads — cheaper than output tokens.

The tokens the model generates in its response — the most expensive part of API usage.

Running an LLM on your own hardware instead of using a cloud API.

How long it takes to receive the first token of a response.

GPU (Graphics Processing Unit)

The specialized hardware that LLMs run on.

The GPU memory that determines which models can run on which hardware.

Open Source / Open Weights

LLMs whose model weights are publicly available for download and self-hosting.

Receiving the model response word-by-word in real-time instead of waiting for the full answer.

Batch Processing

Submitting large volumes of LLM requests at once for 50% cost savings.

A feature that reduces costs up to 90% by reusing previously processed system prompts.

Prompting & Usage

The text input you send to an LLM to get a response.

A setting that controls how creative or deterministic responses are.

Persistent instructions that define how the model should behave.

Chain-of-Thought (CoT)

A prompting technique that asks the model to show its reasoning step by step.

Few-Shot Prompting

Providing examples of desired input-output pairs in the prompt.

Prompt Engineering

The skill of crafting effective prompts to get the best results from an LLM.

Function Calling / Tool Use

The ability of LLMs to invoke external tools, APIs, and databases.

An API parameter that limits how long the model response can be.

Safety & Alignment

RLHF (Reinforcement Learning from Human Feedback)

The training technique that makes LLMs helpful and safe by learning from human preferences.

The challenge of making AI systems behave in accordance with human values.

AI Safety Score

A measure of how well a model avoids harmful outputs and maintains appropriate guardrails.

Safety mechanisms that prevent LLMs from producing harmful or off-topic outputs.

Constitutional AI

Anthropic approach to safety that trains models using written principles rather than solely human ratings.

Deliberately trying to make an LLM produce harmful outputs to find and fix vulnerabilities.

Founder & Lead Evaluator · 200+ models evaluated