Knowledge Graphs for AI Agents

← Back to AI Hub

Real-World Numbers

These figures come from the CKB ecosystem knowledge graph built from 36 repos using graphify:

~60k

Nodes in the CKB graph

~120k

Edges (relationships)

~1,000

Communities detected

115×

Average token reduction vs naive RAG

5,375×

Best case (cross-layer questions)

38×

Floor (simple lookups)

What does 115× mean in practice? A naive RAG answer to "how does the light client sync protocol interact with Nervos' CKB-VM execution model?" might require 80,000 tokens of context across 36 repos. The graph returns the relevant subgraph in ~700 tokens. Same answer, 1% of the token budget.

Why Knowledge Graphs Beat Naive RAG on Large Codebases

Naive vector RAG has a structural problem: it retrieves chunks based on semantic similarity to the query, but codebases aren't structured by semantic similarity — they're structured by call graphs, dependency trees, and architectural layers. A question about how Component A affects Component B requires traversing a relationship graph, not finding similar text.

Dimension	Naive RAG	Knowledge Graph
Retrieval unit	Text chunk (arbitrary boundary)	Entity + relationships (semantic boundary)
Cross-file questions	Poor — chunks don't know about each other	Strong — edges encode dependencies explicitly
Token cost	High — must retrieve many chunks to cover a topic	Low — BFS traversal returns minimal relevant subgraph
Community awareness	None	Built-in — Louvain community detection groups related entities
Incremental updates	Re-embed all changed chunks	`--update` flag, only changed nodes re-extracted
Explainability	Black box similarity score	Explicit path: A → calls → B → imports → C

graphify Overview

graphify is a Python tool that takes a codebase (or any collection of text/code) and produces a queryable knowledge graph. It uses entity extraction, relationship mapping, and community detection to build a graph that AI agents can navigate via BFS traversal.

Installation

pip install graphifyy   # note: two y's

Two y's in the package name The PyPI package is graphifyy (double-y) due to namespace availability. The CLI command after installation is graphify.

The four-stage pipeline

Entity Extraction

Files are parsed and named entities are extracted: functions, classes, modules, types, constants, API endpoints, CKB scripts, CLI commands — whatever is meaningful in your corpus. Language-aware parsers handle Rust, TypeScript, C, Python, Go, and plain markdown.

Relationship Mapping

Entities are connected: calls, imports, implements, extends, depends_on, defined_in, referenced_by. Cross-file and cross-repo relationships are resolved.

Community Detection

The Louvain algorithm runs on the graph to detect clusters of tightly related entities — these become the "communities" that appear in the report and power community-level queries.

Output Generation

Three outputs: interactive HTML viewer, GraphRAG-ready JSON, and GRAPH_REPORT.md summarizing communities and key entities. Optional Neo4j export for production graph databases.

Basic Workflow

Build a graph from a directory

# Build graph from current directory
graphify .

# Build from a specific path
graphify ~/projects/ckb-light-client

# Deep mode — more thorough extraction, slower
graphify ~/projects/ckb-light-client --mode deep

# Multi-repo corpus
graphify ~/projects/ckb-light-client ~/projects/ckb-scripts ~/projects/ckb-node

Outputs

graph.json

GraphRAG-ready JSON: nodes, edges, communities, entity metadata. This is what AI agents query programmatically. Feed it to the MCP server or query it directly.

graph.html

Interactive HTML viewer — force-directed graph, searchable, filterable by community or entity type. Serve it locally or open directly in a browser. No server required.

GRAPH_REPORT.md

Markdown summary: top communities by size, most-connected entities, key relationships. Useful as a CLAUDE.md appendix or for humans to orient quickly to an unfamiliar codebase.

Key Flags

Flag	Purpose
--mode deep	More thorough extraction — follows imports, resolves type aliases, extracts doc comments. Slower but better graph quality. Use for the initial build.
--update	Incremental rebuild — only re-processes files changed since last run. Use after code changes to keep the graph current without full rebuild.
--watch	File watcher mode — automatically runs --update whenever files change. Run in a terminal alongside your editor for a live graph.
--mcp	Start an MCP (Model Context Protocol) server that Claude Code can connect to directly. Claude queries the graph natively without any shell commands.
--output <dir>	Write outputs to a specific directory instead of the current dir.
--exclude <patterns>	Glob patterns to exclude: `--exclude ".lock,node_modules/,target/*"`
--neo4j <uri>	Export to a running Neo4j instance for production graph database usage.

Querying the Graph

From the Claude Code skill

The graphify skill handles querying via natural language through Claude Code:

# In Claude Code:
/graphify query "how does the sync protocol interact with the merkle proof verifier?"

# Build and immediately enter query mode:
/graphify ~/projects/ckb-light-client --mode deep

Direct CLI query

graphify query "how does X work" --graph ./graph.json

# Limit BFS depth (default 3)
graphify query "what calls the signing function" --depth 4

# Return raw JSON subgraph instead of formatted text
graphify query "transaction builder dependencies" --json

What the query returns

Queries run BFS traversal from entities that match the query semantics. The returned subgraph contains:

The seed entities (directly matching your query)
Their neighbors up to --depth hops away
The edges connecting them (with relationship type labels)
The communities those entities belong to (for context)

BFS depth trade-off Depth 2 is fast and precise for focused questions. Depth 4+ can return thousands of nodes on highly connected graphs — use it for architecture overviews, not specific lookups. The default of 3 covers 95% of use cases well.

The graph-routing Skill

graph-routing is a Claude Code skill that sits in front of every cross-repo or architecture question. When triggered, it:

Detects the question shape (cross-repo lookup, dependency trace, architecture question, debug-with-symptoms)
Routes to the appropriate graph in ~/.claude/graphs.json
Runs a targeted BFS query against that graph
Returns the relevant subgraph as context before gathering any other evidence
Proceeds to answer using the graph result instead of grep/glob

This means Claude Code navigates a 60k-node multi-repo codebase with a single graph query instead of dozens of grep calls — dramatically faster and with better cross-file awareness.

When graph-routing does NOT trigger Single-file edits, typo fixes, simple renames, and questions already answered in the current conversation do not trigger graph-routing. The skill is tuned to trigger on task shape (cross-repo, architecture, dependency-tracing) not on topic keywords.

The Graphs Registry

Multiple graphs for different domains are registered in ~/.claude/graphs.json:

{
  "graphs": [
    {
      "id": "my-platform",
      "description": "Core platform — 12 repos, API, SDK, node, CLI tooling",
      "path": "~/.claude/graphs/my-platform/graph.json",
      "viewer": "http://localhost:8765",
      "triggers": ["platform", "api", "sdk", "light client", "core"]
    },
    {
      "id": "my-rust-lib",
      "description": "Rust library — crypto primitives and deployment task graph",
      "path": "~/.claude/graphs/my-rust-lib/graph.json",
      "triggers": ["signing", "crypto", "verifier", "lock script"]
    },
    {
      "id": "my-webapp",
      "description": "Web application — frontend, backend, image gen, TTS, media",
      "path": "~/.claude/graphs/my-webapp/graph.json",
      "triggers": ["webapp", "frontend", "comfyui", "tts", "media"]
    }
  ]
}

Domain isolation Separate graphs for separate domains keeps queries fast and prevents cross-contamination — a question about CKB scripts shouldn't pull in Wyltek Studio nodes. The registry lets the routing skill pick the right graph without you specifying it.

Real Example: Building a Graph from a Rust Blockchain Project

# Step 1: Install
pip install graphifyy

# Step 2: Initial deep build (takes 2-10 minutes for large repos)
cd ~/projects/ckb-light-client
graphify . --mode deep --output ~/.claude/graphs/ckb-light-client/

# Step 3: Verify outputs
ls ~/.claude/graphs/ckb-light-client/
# graph.json  graph.html  GRAPH_REPORT.md

# Step 4: Open the viewer
xdg-open ~/.claude/graphs/ckb-light-client/graph.html

# Step 5: Test a query
graphify query "how does the header sync protocol work" \
  --graph ~/.claude/graphs/ckb-light-client/graph.json

# Step 6: Register in graphs.json
# Add an entry to ~/.claude/graphs.json pointing to graph.json

# Step 7: Start the watcher (background process, keeps graph current)
graphify ~/.claude/graphs/ckb-light-client/ --watch &

What a graph query looks like for a Rust project

Query: "what calls the verify_merkle_proof function?"

Result (condensed subgraph):
  verify_merkle_proof [src/verifier.rs:142] (community: merkle-verification)
    ← calls: verify_transaction [src/tx_verifier.rs:89]
    ← calls: verify_block_header [src/header_verifier.rs:234]
    → imports: MerkleProof [ckb-types::packed]
    → imports: H256 [ckb-types::H256]
    → uses: ProofNode [src/verifier.rs:28]

Community summary (merkle-verification):
  23 entities, core of block validation pipeline
  Key boundary: interfaces with ckb-types for external type compatibility

Building a Corpus: What Goes In, What to Exclude

Include	Exclude
Source files (.rs, .ts, .js, .py, .go, .c, .cpp, .h)	Binary files (.so, .a, .wasm compiled output)
Configuration files that define architecture (.toml, .yaml)	Lock files (Cargo.lock, package-lock.json, yarn.lock)
Markdown docs that describe protocols and APIs	Generated code (proto-generated, build artifacts)
Test files — they reveal usage patterns and edge cases	node_modules, target/, dist/, .git/
Script files that encode workflows	Large binary assets (images, models)

# Recommended exclude pattern for Rust projects:
graphify . --mode deep \
  --exclude "target/**,*.lock,**/*.pb.rs,**/generated/**"

# For TypeScript/Node projects:
graphify . --mode deep \
  --exclude "node_modules/**,dist/**,*.lock,**/*.d.ts"

The Interactive Viewer

The graph.html output is a fully self-contained, zero-dependency HTML file. Open it in any browser:

Force-directed layout — nodes repel, edges attract, communities cluster naturally
Search — type any entity name to highlight and center it in the graph
Filter by community — click a community label to isolate that cluster
Filter by entity type — show only functions, or only types, or only modules
Click a node — shows entity details: file path, line number, docstring, relationship count
Serve for team access — python3 -m http.server 8765 --directory ~/.claude/graphs/

The viewer is a great onboarding tool Drop a new developer into the graph.html viewer for a codebase and they can understand the architecture in 10 minutes. Community clusters immediately reveal the major subsystems. This works for humans too, not just AI agents.

When NOT to Use Knowledge Graphs

Don't reach for the graph for these tasks

Single-file edits — just open the file
Typo fixes or simple renames
Questions already answered in the current conversation
Simple "what does this function do?" lookups — just read the function
Projects with fewer than ~10 files — overhead outweighs benefit

The graph shines on cross-file, cross-repo, architecture, and "how does X relate to Y" questions. Use it for those.

MCP Server Mode

For the tightest integration with Claude Code, run graphify as an MCP server:

# Start the MCP server (exposes the graph as Claude Code tools)
graphify --mcp --graph ~/.claude/graphs/ckb-ecosystem/graph.json --port 3100

# Add to your MCP config:
# ~/.claude/mcp_servers.json
{
  "graphify": {
    "url": "http://localhost:3100",
    "description": "CKB ecosystem knowledge graph"
  }
}

In MCP mode, Claude Code gets native tools: graph_query, graph_neighbors, graph_community, graph_path. No shell commands required — Claude navigates the graph as naturally as reading a file.

← Back to AI Hub

Knowledge Graphs for AI Agents — Build, Query, Route

Real-World Numbers

Why Knowledge Graphs Beat Naive RAG on Large Codebases

graphify Overview

Installation

The four-stage pipeline

Basic Workflow

Build a graph from a directory

Outputs

Key Flags

Querying the Graph

From the Claude Code skill

Direct CLI query

What the query returns

The graph-routing Skill

The Graphs Registry

Real Example: Building a Graph from a Rust Blockchain Project

What a graph query looks like for a Rust project

Building a Corpus: What Goes In, What to Exclude

The Interactive Viewer

When NOT to Use Knowledge Graphs

MCP Server Mode