Last updated: April 5, 2026 · Pricing & Deployment · by Daniel Ashford

What is GPU (Graphics Processing Unit)?

QUICK ANSWER

The specialized hardware that LLMs run on.

Definition

A GPU is a specialized processor originally for graphics but now essential for AI computation. GPUs perform thousands of operations simultaneously, making them necessary for both training and running language models.

How It Works

The dominant manufacturer is NVIDIA. Key models: A100 (80GB VRAM, $10-15K), H100 (80GB, $25-30K), B200 (192GB). VRAM is the primary constraint. A 7B model fits on an RTX 4090 (24GB). A 70B model requires 2-4 A100s. Cloud GPU rentals are $1-4 per GPU-hour.

Example

Training GPT-4 reportedly required approximately 25,000 A100 GPUs for months, costing an estimated $100M+. Running inference uses a much smaller cluster.

Related Terms

VRAM
The GPU memory that determines which models can run on which hardware.
Self-Hosting
Running an LLM on your own hardware instead of using a cloud API.
Inference
The process of an LLM generating a response to your input.
Quantization
Compressing an LLM to use less memory by reducing numerical precision.

See How Models Compare

Understanding gpu (graphics processing unit) is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology
← Browse all 47 glossary terms
DA
Daniel Ashford
Founder & Lead Evaluator · 200+ models evaluated