Last updated: April 5, 2026 · 47 terms · by Daniel Ashford
📖 LLM Glossary
47 essential AI and language model terms explained clearly. Whether you are a developer evaluating models or a business leader exploring AI, this glossary covers every concept you need to understand.
Core Concepts
Large Language Model (LLM)
An AI system trained on massive text data to understand and generate human language.
Tokens
The basic units of text that LLMs process — roughly 3/4 of a word.
Context Window
The maximum amount of text an LLM can process in a single request.
Hallucination
When an LLM generates plausible-sounding but factually incorrect information.
Inference
The process of an LLM generating a response to your input.
AI Agent
An LLM that can autonomously plan, use tools, and take actions to complete complex tasks.
Multimodal
LLMs that can process not just text, but also images, audio, and video.
Model Architecture
Fine-Tuning
Customizing a pre-trained LLM on your specific data to improve performance for your use case.
RAG (Retrieval-Augmented Generation)
A technique that gives LLMs access to external documents to improve accuracy and reduce hallucination.
Transformer
The neural network architecture behind all modern LLMs.
Parameters
The numerical weights inside an LLM that encode its learned knowledge.
Attention Mechanism
The core technique that allows LLMs to understand relationships between words in text.
Embeddings
Numerical representations of text that capture semantic meaning — used in search and RAG systems.
Quantization
Compressing an LLM to use less memory by reducing numerical precision.
Reasoning Tokens
Hidden thinking tokens that reasoning models generate internally before their visible response.
Vector Database
A specialized database for storing and searching embeddings — the backbone of RAG systems.
Evaluation & Benchmarks
Benchmark
A standardized test used to measure and compare LLM capabilities.
Arena Elo Rating
A crowdsourced model ranking based on human preference votes in blind comparisons.
LLM Judge Index™
Our proprietary composite score ranking LLMs across 6 evaluation dimensions on a 0-100 scale.
MMLU / MMLU-Pro
A benchmark testing broad academic knowledge across 57 subjects.
GPQA Diamond
A graduate-level science benchmark with questions written by PhD experts.
Pricing & Deployment
API (Application Programming Interface)
The technical interface that lets your software send prompts to an LLM and receive responses.
LLM API Pricing
The cost of using language models, typically measured in dollars per million tokens.
Input Tokens
The tokens in your prompt that the model reads — cheaper than output tokens.
Output Tokens
The tokens the model generates in its response — the most expensive part of API usage.
Self-Hosting
Running an LLM on your own hardware instead of using a cloud API.
Latency
How long it takes to receive the first token of a response.
GPU (Graphics Processing Unit)
The specialized hardware that LLMs run on.
VRAM
The GPU memory that determines which models can run on which hardware.
Open Source / Open Weights
LLMs whose model weights are publicly available for download and self-hosting.
Streaming
Receiving the model response word-by-word in real-time instead of waiting for the full answer.
Batch Processing
Submitting large volumes of LLM requests at once for 50% cost savings.
Prompt Caching
A feature that reduces costs up to 90% by reusing previously processed system prompts.
Prompting & Usage
Prompt
The text input you send to an LLM to get a response.
Temperature
A setting that controls how creative or deterministic responses are.
System Prompt
Persistent instructions that define how the model should behave.
Chain-of-Thought (CoT)
A prompting technique that asks the model to show its reasoning step by step.
Few-Shot Prompting
Providing examples of desired input-output pairs in the prompt.
Prompt Engineering
The skill of crafting effective prompts to get the best results from an LLM.
Function Calling / Tool Use
The ability of LLMs to invoke external tools, APIs, and databases.
Max Tokens
An API parameter that limits how long the model response can be.
Safety & Alignment
RLHF (Reinforcement Learning from Human Feedback)
The training technique that makes LLMs helpful and safe by learning from human preferences.
Alignment
The challenge of making AI systems behave in accordance with human values.
AI Safety Score
A measure of how well a model avoids harmful outputs and maintains appropriate guardrails.
Guardrails
Safety mechanisms that prevent LLMs from producing harmful or off-topic outputs.
Constitutional AI
Anthropic approach to safety that trains models using written principles rather than solely human ratings.
Red Teaming
Deliberately trying to make an LLM produce harmful outputs to find and fix vulnerabilities.
DA
Daniel Ashford
Founder & Lead Evaluator · 200+ models evaluated