Complete breakdown from hobbyist ($15) to enterprise ($40k+):
| Tier | Hardware | RAM | Storage | Speed | Cost | Models |
|---|---|---|---|---|---|---|
| Minimal | Raspberry Pi Zero 2W | 512MB | 64GB SD | ~0.5 tok/s | $15 | tinyllama (1B) |
| Minimal | Raspberry Pi 4 (2GB) | 2GB | 32GB SD | ~2 tok/s | $35 | phi3:mini (2B) |
| Entry | Raspberry Pi 5 (4GB) | 4GB | 128GB SSD | ~3 tok/s | $65 | qwen2.5:3b-q4 |
| Entry | Orange Pi 5 Plus (16GB) | 16GB | 256GB NVMe | ~5 tok/s | $120 | qwen2.5:3b-q4, mistral:7b-q4 |
| Entry | Jetson Orin Nano (8GB) | 8GB | 128GB NVMe | ~8 tok/s | $199 | qwen2.5:3b, phi3 |
| Mid | Intel NUC (i5-1240P, 32GB) | 32GB | 512GB SSD | ~15 tok/s | $600 | qwen2.5:7b-q4, mistral:7b |
| Mid | Desktop (Ryzen 5 5600X, 32GB) | 32GB | 1TB SSD | ~20 tok/s | $800 | qwen2.5:7b-q4, llama3.1:8b |
| Mid | Jetson Orin AGX (64GB) | 64GB | 512GB NVMe | ~25 tok/s | $999 | qwen2.5:7b, llama3.1:8b |
| High | Desktop (RTX 3090, 64GB) | 64GB | 2TB SSD | ~40 tok/s | $2,500 | qwen2.5:7b, qwen3.5:27b-q4 |
| High | Desktop (RTX 4070 Super, 32GB) | 32GB | 1TB SSD | ~35 tok/s | $2,000 | qwen2.5:14b-q4, mistral:12b |
| High | Desktop (RTX 4080, 48GB) | 48GB | 2TB SSD | ~50 tok/s | $3,200 | qwen3.5:27b-q4, llama3.1:70b-q4 |
| High | Desktop (RTX 4090, 128GB) | 128GB | 4TB SSD | ~80 tok/s | $5,000 | qwen3.5:27b (fp16), llama3.1:70b |
| High | Desktop (RTX 5090, 256GB) | 256GB | 8TB SSD | ~120+ tok/s | $8,000 | qwen3.5:32b, llama3.1:405b-q4 |
| High | Desktop (AMD R9 7950X, 192GB) | 192GB | 4TB SSD | ~60 tok/s | $4,500 | qwen3.5:27b, llama3.1:70b-q4 |
| Enterprise | Mac Studio (M2 Ultra, 128GB) | 128GB | 2TB SSD | ~45 tok/s | $4,000 | qwen3.5:27b, llama3.1:8b-13b |
| Enterprise | Mac Studio (M2 Max, 96GB) | 96GB | 2TB SSD | ~35 tok/s | $3,500 | qwen2.5:7b, llama3.1:8b |
| Enterprise | Server (Dual Xeon, RTX 5090, 768GB) | 768GB | 8TB SSD/NVMe | ~200+ tok/s | $15,000+ | qwen3.5:32b, llama3.1:405b-q4 |
| Enterprise | Server (Dual Xeon, RTX 6000 Ada, 512GB) | 512GB | 8TB SSD/NVMe | ~150 tok/s | $12,000 | qwen3.5:27b, llama3.1:70b |
| Enterprise | H100 GPU (40GB) + Server | 512GB | 8TB SSD | ~300+ tok/s | $40,000+ | Any model (full precision) |
| Enterprise | Cloud (AWS g4dn.12xlarge) | 192GB | 4x 550GB | ~100 tok/s | $5/hour | Any model (on-demand) |
# All platforms (Linux, macOS, Windows)
curl https://ollama.ai/install.sh | sh
ollama serve
# In another terminal:
ollama pull qwen2.5:7b
ollama run qwen2.5:7b "What is CKB?"
# Linux: systemd user service
cat > ~/.config/systemd/user/ollama.service << 'EOF'
[Unit]
Description=Ollama Service
After=network.target
[Service]
ExecStart=/usr/bin/ollama serve
Restart=on-failure
RestartSec=10
Environment="OLLAMA_MODELS=/home/$USER/.ollama/models"
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable ollama
systemctl --user start ollama
# Check status
systemctl --user status ollama
Best practice: Start with Q4 variants. Use Q8 if quality is critical. Use Q2 only for classification/simple tasks.