Hardware Guide · 2026

The Right Machine for
Private Local AI

A dedicated desktop with a GPU is the most capable and cost-effective way to run a private AI node. Here's what to know before you buy — and how it compares to cloud-based alternatives like Cowork and Perplexity Personal Computer.

Why Hardware Matters

Local AI inference is not like running a spreadsheet. The model weights — billions of numbers that define how the AI thinks — must be loaded into memory and read repeatedly on every token generated. The single biggest factor in how fast and capable your private AI feels is how much memory your machine has, and how fast that memory is.

There are two types of memory that matter: GPU VRAM (video memory on a discrete graphics card) and system RAM (your computer's main memory). GPU VRAM is dramatically faster for inference. When a model fits entirely in VRAM, token generation speeds of 30–80 tokens per second are achievable. When it spills into system RAM, speed drops significantly. The practical rule:

Buy the most VRAM you can afford. Everything else is secondary.
GPU VRAM
Most Important
Where the model lives during inference. More VRAM = larger models, faster tokens. 12GB handles 13B models. 20–24GB handles 32B models comfortably.
System RAM
Important
Overflow storage when the model is too large for VRAM. DDR5 is significantly faster than DDR4 for this. 32GB is the minimum. 64–128GB opens up larger models.
CPU & Storage
Supporting Role
A mid-range CPU is sufficient — inference is GPU and memory bound, not CPU bound. A fast NVMe SSD reduces model load times from minutes to seconds.

Memory bandwidth — how fast data moves between the chip and memory — is what translates into tokens per second. Here's how the main memory types compare:

GPU GDDR6X (RTX 4090)
1,008 GB/s
GPU GDDR6 (RTX 3060)
360 GB/s
Apple M4 Pro Unified
273 GB/s
DDR5 System RAM
96 GB/s
DDR4 System RAM
65 GB/s

This is why a GPU desktop running a model entirely in VRAM is faster than a Mac mini running the same model in unified memory — and why DDR5 matters significantly more than DDR4 when the model spills out of VRAM into system RAM.

Three Tiers for Private AI

All three builds below run Ollama for local model serving, connect via Cloudflare tunnel for encrypted remote access, and are reachable from any smartphone browser. Prices reflect current 2026 component costs, which are elevated due to AI-driven DDR4/DDR5 and M.2 storage demand increases.

Tier 1
Solo Practitioner
$900 – $1,100 estimated build cost · 2026

Components

GPU RTX 3060 12GB GDDR6
CPU Ryzen 5 5600 / Core i5-13400
RAM 32GB DDR4 or DDR5
Storage 1TB NVMe M.2 SSD
PSU 650W 80+ Gold
Power draw ~200W under load
Best for 1–2 concurrent users

Performance (tokens/sec)

Llama 3.1 8B · Q4
35–50 t/s
Qwen2.5 14B · Q4
18–28 t/s
Mistral 7B · Q4
40–55 t/s
DeepSeek 32B · Q4 (RAM offload)
4–8 t/s
Tier 2
Small Office
$1,600 – $1,900 estimated build cost · 2026

Components

GPU RX 7900 XT 20GB — or — RTX 4070 Ti Super 16GB
CPU Ryzen 7 7700X / Core i7-13700K
RAM 64GB DDR5
Storage 2TB NVMe M.2 SSD
PSU 850W 80+ Gold
Power draw ~350W under load
Best for 3–8 concurrent users

Performance (tokens/sec)

Llama 3.1 8B · Q4
55–80 t/s
Qwen2.5 32B · Q4
25–38 t/s
DeepSeek 32B · Q4
22–35 t/s
Llama 3 70B · Q4 (partial offload)
8–14 t/s
Tier 3
Professional & Research
$2,400 – $2,900 estimated build cost · 2026

Components

GPU RTX 4090 24GB GDDR6X
CPU Ryzen 7 7800X3D / Core i9-13900K
RAM 64–128GB DDR5
Storage 2–4TB NVMe M.2 SSD
PSU 1000W 80+ Platinum
Power draw ~500–600W under load
Best for 8+ users · research workloads

Performance (tokens/sec)

Llama 3.1 8B · Q4
70–100 t/s
Qwen2.5 32B · Q4
38–55 t/s
DeepSeek 70B · Q4
18–28 t/s
Llama 3 70B · Q4
15–25 t/s

How It Compares to Mac mini M4 Pro

Perplexity uses the Mac mini M4 Pro as the reference hardware for their Personal Computer product. It's a legitimate machine — Apple Silicon's unified memory architecture is efficient and capable. But the GPU desktop tells a different story on cost, capability ceiling, and upgradeability.

Factor Mac mini M4 Pro (64GB) GPU Desktop Tier 2 GPU Desktop Tier 3
Price (2026) $2,399 (Apple MSRP) $1,600–1,900 $2,400–2,900
Memory bandwidth 273 GB/s unified 336–480 GB/s VRAM 1,008 GB/s VRAM
Usable memory for LLMs ~58GB (unified) 16–20GB VRAM + 64GB RAM 24GB VRAM + 64–128GB RAM
32B model speed 11–15 t/s 22–38 t/s 38–55 t/s
Power under load ~40W ~350W ~500–600W
Upgradeable No — soldered RAM Yes — swap GPU anytime Yes — swap GPU anytime
OS flexibility macOS only Linux / Windows Linux / Windows
Best for Silent, power-efficient always-on node Small office, 3–8 users Larger team, research, best model quality

The Mac mini wins on power efficiency and silence — it draws 40 watts versus 350–600W for a GPU desktop. For a home office where noise and electricity matter, that's real. For a dedicated office environment where performance and cost per capability matter more, the GPU desktop wins at every tier. And critically, the GPU desktop is upgradeable — when better GPUs arrive, you replace the card, not the whole machine.

Buying Tips for 2026

RAM prices are elevated

DDR4 32GB kits that cost $60–90 in late 2025 now run $150–180. DDR5 32GB kits under $360 are scarce. Budget accordingly — this is the biggest cost surprise in current builds compared to guides written a year ago.

Choose DDR5 over DDR4

For LLM inference, DDR5 delivers nearly twice the memory bandwidth of DDR4 when the model spills out of VRAM. The price premium is worth it. If you're on a tight budget, DDR4 works — but DDR5 is the right choice for a machine you'll run for several years.

Used RTX 3090 is strong value

The RTX 3090's 24GB GDDR6X VRAM matches the RTX 4090 in capacity, at roughly half the price on the used market. Speed is lower but the model headroom is identical. For Tier 2 budgets wanting Tier 3 VRAM, a used 3090 is worth considering.

NVMe matters for model loading

A 70B model takes 30 seconds to load from a fast NVMe SSD versus 3–5 minutes from a spinning hard drive. You only load once per session, but for a node that may restart or switch models, fast storage is noticeable.

GPU is the upgrade path

The GPU desktop's biggest advantage over the Mac mini is that when better GPUs arrive — RTX 5090, future RDNA generations — you swap one card. The rest of the build stays. The Mac mini requires replacing the entire machine.

Plan for 24/7 operation

A private AI node works best running continuously — always reachable from your phone. Factor in power costs: a Tier 2 build at 350W under load costs roughly $180–250/year in electricity at average US rates, assuming moderate daily use.

Prebuilt vs. self-build

You don't have to build from parts. Retailers like Walmart, Amazon, and Best Buy regularly stock prebuilt gaming desktops with current-generation GPUs and DDR5 RAM already installed — often competitive once you factor in the included OS license, cooler, and case. Search for iBUYPOWER, CyberPowerPC, or ASUS ROG prebuilts as starting points. Check local stores too — local pickup avoids shipping damage risk on a full tower.

See It Working

The PrivateAI demo node is running on a 32GB desktop — similar hardware to Tier 1. When the node is online, you can connect from any browser, on any device, from anywhere. No app, no account, no cloud relay. Your prompts go to a machine in a private office, not to Anthropic or Perplexity's servers.

→ Try PrivateAI Chat ← Back to Overview