← Back to AI Hub

📖 Retrieval-Augmented Generation (RAG)

What is RAG?

RAG combines knowledge retrieval + LLM generation:

  1. User asks a question
  2. System retrieves relevant documents (vector similarity search)
  3. LLM reads the documents + answers based on context
  4. Result: accurate, sourced, up-to-date answers

Why RAG?

Wyltek RAG System Architecture

Documents (research, blog posts, internal docs)
    ↓
Chunking (300-token chunks with overlap)
    ↓
Embedding (sentence-transformers: all-MiniLM-L6-v2, 384-dim)
    ↓
Vector Store (FAISS: fast similarity search + SQLite metadata)
    ↓
Retrieval (top-K similar chunks + metadata filtering)
    ↓
LLM Generation (context-aware answering with temp/top-p control)
    ↓
Cited Response (with source attribution)

Deploy Wyltek RAG

# Clone from GitHub
git clone https://github.com/toastmanAu/rag-system.git ~/rag-system

# Setup
cd ~/rag-system
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run on port 9990
python3 app.py

# Test health
curl http://localhost:9990/health
# → {"status": "ok", "service": "rag-system"}

Ingest Documents

curl -X POST http://localhost:9990/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/doc",
    "html": "...",
    "tags": ["research", "fiber", "payments"]
  }'

Query with RAG

# Retrieve context (no LLM)
curl -X POST http://localhost:9990/rag/retrieve \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I set up Fiber payments?",
    "agent_id": "kernel",
    "k": 5
  }'

# Ask with LLM (retrieve + generate)
curl -X POST http://localhost:9990/rag/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I set up Fiber payments?",
    "agent_id": "kernel",
    "temperature": 0.2
  }'

Multi-Agent Profiles

Configure different agents to interpret the same knowledge differently:

agents.yaml:

agents:
  kernel:
    agent_name: "Kernel"
    temperature: 0.2
    retrieval_config:
      k_retrieve: 20
      k_rerank: 5
    source_weights:
      docs: 1.0
      research-findings: 0.8

  shannon:
    agent_name: "Shannon"
    temperature: 0.7
    retrieval_config:
      k_retrieve: 20
      k_rerank: 5
    source_weights:
      blog: 1.0
      research-findings: 0.6

Hardware Tiers for RAG Deployment

Choose based on knowledge base size, query volume, and latency requirements:

Tier Hardware RAM Storage Throughput Cost (USD) Use Case
Minimal Raspberry Pi Zero 2 W 512MB 64GB SD ~0.5 req/s $15 Hobby, embedded
Minimal Raspberry Pi 4 (2GB) 2GB 32GB SD ~2 req/s $35 Personal RAG, edge device
Entry Raspberry Pi 5 (4GB) 4GB 128GB SSD ~3 req/s $65 Small team RAG
Entry Orange Pi 5 Plus (16GB) 16GB 256GB NVMe ~5 req/s $120 Personal RAG server
Entry Jetson Orin Nano (8GB) 8GB 128GB NVMe ~8 req/s $199 Edge AI RAG with GPU
Mid Intel NUC 12 (i5-1240P, 32GB) 32GB 512GB SSD ~15 req/s $600 Home/office RAG server
Mid Desktop (Ryzen 5 5600X, 32GB) 32GB 1TB SSD ~20 req/s $800 Single workstation
Mid Jetson Orin AGX (64GB) 64GB 512GB NVMe ~25 req/s $999 AI research, embedded RAG
High Desktop (RTX 3090, 64GB) 64GB 2TB SSD ~40 req/s $2,500 Team RAG with GPU accel
High Desktop (RTX 4070 Super, 32GB) 32GB 1TB SSD ~35 req/s $2,000 Balanced GPU RAG
High Desktop (RTX 4080, 48GB) 48GB 2TB SSD ~50 req/s $3,200 Heavy-duty RAG + LLM
High Desktop (RTX 4090, 128GB) 128GB 4TB SSD ~80 req/s $5,000 Production RAG cluster node
High Desktop (AMD R9 7950X, 192GB) 192GB 4TB SSD ~60 req/s $4,500 Enterprise RAG (CPU-heavy)
Enterprise Mac Studio (M2 Ultra, 128GB) 128GB 2TB SSD ~45 req/s $4,000 Apple ecosystem RAG
Enterprise Mac Studio (M2 Max, 96GB) 96GB 2TB SSD ~35 req/s $3,500 Mac team RAG
Enterprise Nvidia Jetson AGX Orin (64GB) 64GB 512GB NVMe ~30 req/s $999 Edge AI RAG deployment
Enterprise Server (Dual Xeon, RTX 5090, 768GB) 768GB 8TB SSD/NVMe ~200+ req/s $15,000+ Multi-tenant RAG service
Enterprise Server (Dual Xeon, RTX 6000 Ada, 512GB) 512GB 8TB SSD/NVMe ~150 req/s $12,000 Professional RAG (multi-GPU)
Enterprise H100 GPU (40GB) + Server 512GB 8TB SSD ~300+ req/s $40,000+ High-volume RAG with fine-tuning
Enterprise Cloud (AWS g4dn.12xlarge) 192GB 4x 550GB ~100 req/s $5/hour Scalable cloud RAG (on-demand)

Key Metrics

Cost vs Performance

← Back to AI Hub