Last updated: April 5, 2026 · Pricing & Deployment · by Daniel Ashford

What is GPU (Graphics Processing Unit)?

QUICK ANSWER

The specialized hardware that LLMs run on.

Definition

A GPU is a specialized processor originally for graphics but now essential for AI computation. GPUs perform thousands of operations simultaneously, making them necessary for both training and running language models.

How It Works

The dominant manufacturer is NVIDIA. Key models: A100 (80GB VRAM, $10-15K), H100 (80GB, $25-30K), B200 (192GB). VRAM is the primary constraint. A 7B model fits on an RTX 4090 (24GB). A 70B model requires 2-4 A100s. Cloud GPU rentals are $1-4 per GPU-hour.

Example

Training GPT-4 reportedly required approximately 25,000 A100 GPUs for months, costing an estimated $100M+. Running inference uses a much smaller cluster.

Related Terms

VRAM

The GPU memory that determines which models can run on which hardware.

Self-Hosting

Running an LLM on your own hardware instead of using a cloud API.

Inference

The process of an LLM generating a response to your input.

Quantization

Compressing an LLM to use less memory by reducing numerical precision.

See How Models Compare

Understanding gpu (graphics processing unit) is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology

← Browse all 47 glossary terms

Daniel Ashford

Founder & Lead Evaluator · 200+ models evaluated