Naive RAG throws chunks of text at a language model and hopes context sticks. Knowledge graphs do the opposite: they model relationships between entities, detect communities, and return only the subgraph that's relevant to your question — typically 100× fewer tokens with better accuracy on cross-layer questions.
These figures come from the CKB ecosystem knowledge graph built from 36 repos using graphify:
Naive vector RAG has a structural problem: it retrieves chunks based on semantic similarity to the query, but codebases aren't structured by semantic similarity — they're structured by call graphs, dependency trees, and architectural layers. A question about how Component A affects Component B requires traversing a relationship graph, not finding similar text.
| Dimension | Naive RAG | Knowledge Graph |
|---|---|---|
| Retrieval unit | Text chunk (arbitrary boundary) | Entity + relationships (semantic boundary) |
| Cross-file questions | Poor — chunks don't know about each other | Strong — edges encode dependencies explicitly |
| Token cost | High — must retrieve many chunks to cover a topic | Low — BFS traversal returns minimal relevant subgraph |
| Community awareness | None | Built-in — Louvain community detection groups related entities |
| Incremental updates | Re-embed all changed chunks | --update flag, only changed nodes re-extracted |
| Explainability | Black box similarity score | Explicit path: A → calls → B → imports → C |
graphify is a Python tool that takes a codebase (or any collection of text/code) and produces a queryable knowledge graph. It uses entity extraction, relationship mapping, and community detection to build a graph that AI agents can navigate via BFS traversal.
pip install graphifyy # note: two y's
graphifyy (double-y) due to namespace availability. The CLI command after installation is graphify.
Files are parsed and named entities are extracted: functions, classes, modules, types, constants, API endpoints, CKB scripts, CLI commands — whatever is meaningful in your corpus. Language-aware parsers handle Rust, TypeScript, C, Python, Go, and plain markdown.
Entities are connected: calls, imports, implements, extends, depends_on, defined_in, referenced_by. Cross-file and cross-repo relationships are resolved.
The Louvain algorithm runs on the graph to detect clusters of tightly related entities — these become the "communities" that appear in the report and power community-level queries.
Three outputs: interactive HTML viewer, GraphRAG-ready JSON, and GRAPH_REPORT.md summarizing communities and key entities. Optional Neo4j export for production graph databases.
# Build graph from current directory
graphify .
# Build from a specific path
graphify ~/projects/ckb-light-client
# Deep mode — more thorough extraction, slower
graphify ~/projects/ckb-light-client --mode deep
# Multi-repo corpus
graphify ~/projects/ckb-light-client ~/projects/ckb-scripts ~/projects/ckb-node
GraphRAG-ready JSON: nodes, edges, communities, entity metadata. This is what AI agents query programmatically. Feed it to the MCP server or query it directly.
Interactive HTML viewer — force-directed graph, searchable, filterable by community or entity type. Serve it locally or open directly in a browser. No server required.
Markdown summary: top communities by size, most-connected entities, key relationships. Useful as a CLAUDE.md appendix or for humans to orient quickly to an unfamiliar codebase.
| Flag | Purpose |
|---|---|
| --mode deep | More thorough extraction — follows imports, resolves type aliases, extracts doc comments. Slower but better graph quality. Use for the initial build. |
| --update | Incremental rebuild — only re-processes files changed since last run. Use after code changes to keep the graph current without full rebuild. |
| --watch | File watcher mode — automatically runs --update whenever files change. Run in a terminal alongside your editor for a live graph. |
| --mcp | Start an MCP (Model Context Protocol) server that Claude Code can connect to directly. Claude queries the graph natively without any shell commands. |
| --output <dir> | Write outputs to a specific directory instead of the current dir. |
| --exclude <patterns> | Glob patterns to exclude: --exclude "*.lock,node_modules/**,target/**" |
| --neo4j <uri> | Export to a running Neo4j instance for production graph database usage. |
The graphify skill handles querying via natural language through Claude Code:
# In Claude Code:
/graphify query "how does the sync protocol interact with the merkle proof verifier?"
# Build and immediately enter query mode:
/graphify ~/projects/ckb-light-client --mode deep
graphify query "how does X work" --graph ./graph.json
# Limit BFS depth (default 3)
graphify query "what calls the signing function" --depth 4
# Return raw JSON subgraph instead of formatted text
graphify query "transaction builder dependencies" --json
Queries run BFS traversal from entities that match the query semantics. The returned subgraph contains:
--depth hops awaygraph-routing is a Claude Code skill that sits in front of every cross-repo or architecture question. When triggered, it:
~/.claude/graphs.jsonThis means Claude Code navigates a 60k-node multi-repo codebase with a single graph query instead of dozens of grep calls — dramatically faster and with better cross-file awareness.
Multiple graphs for different domains are registered in ~/.claude/graphs.json:
{
"graphs": [
{
"id": "my-platform",
"description": "Core platform — 12 repos, API, SDK, node, CLI tooling",
"path": "~/.claude/graphs/my-platform/graph.json",
"viewer": "http://localhost:8765",
"triggers": ["platform", "api", "sdk", "light client", "core"]
},
{
"id": "my-rust-lib",
"description": "Rust library — crypto primitives and deployment task graph",
"path": "~/.claude/graphs/my-rust-lib/graph.json",
"triggers": ["signing", "crypto", "verifier", "lock script"]
},
{
"id": "my-webapp",
"description": "Web application — frontend, backend, image gen, TTS, media",
"path": "~/.claude/graphs/my-webapp/graph.json",
"triggers": ["webapp", "frontend", "comfyui", "tts", "media"]
}
]
}
# Step 1: Install
pip install graphifyy
# Step 2: Initial deep build (takes 2-10 minutes for large repos)
cd ~/projects/ckb-light-client
graphify . --mode deep --output ~/.claude/graphs/ckb-light-client/
# Step 3: Verify outputs
ls ~/.claude/graphs/ckb-light-client/
# graph.json graph.html GRAPH_REPORT.md
# Step 4: Open the viewer
xdg-open ~/.claude/graphs/ckb-light-client/graph.html
# Step 5: Test a query
graphify query "how does the header sync protocol work" \
--graph ~/.claude/graphs/ckb-light-client/graph.json
# Step 6: Register in graphs.json
# Add an entry to ~/.claude/graphs.json pointing to graph.json
# Step 7: Start the watcher (background process, keeps graph current)
graphify ~/.claude/graphs/ckb-light-client/ --watch &
Query: "what calls the verify_merkle_proof function?"
Result (condensed subgraph):
verify_merkle_proof [src/verifier.rs:142] (community: merkle-verification)
← calls: verify_transaction [src/tx_verifier.rs:89]
← calls: verify_block_header [src/header_verifier.rs:234]
→ imports: MerkleProof [ckb-types::packed]
→ imports: H256 [ckb-types::H256]
→ uses: ProofNode [src/verifier.rs:28]
Community summary (merkle-verification):
23 entities, core of block validation pipeline
Key boundary: interfaces with ckb-types for external type compatibility
| Include | Exclude |
|---|---|
| Source files (.rs, .ts, .js, .py, .go, .c, .cpp, .h) | Binary files (.so, .a, .wasm compiled output) |
| Configuration files that define architecture (.toml, .yaml) | Lock files (Cargo.lock, package-lock.json, yarn.lock) |
| Markdown docs that describe protocols and APIs | Generated code (proto-generated, build artifacts) |
| Test files — they reveal usage patterns and edge cases | node_modules, target/, dist/, .git/ |
| Script files that encode workflows | Large binary assets (images, models) |
# Recommended exclude pattern for Rust projects:
graphify . --mode deep \
--exclude "target/**,*.lock,**/*.pb.rs,**/generated/**"
# For TypeScript/Node projects:
graphify . --mode deep \
--exclude "node_modules/**,dist/**,*.lock,**/*.d.ts"
The graph.html output is a fully self-contained, zero-dependency HTML file. Open it in any browser:
python3 -m http.server 8765 --directory ~/.claude/graphs/For the tightest integration with Claude Code, run graphify as an MCP server:
# Start the MCP server (exposes the graph as Claude Code tools)
graphify --mcp --graph ~/.claude/graphs/ckb-ecosystem/graph.json --port 3100
# Add to your MCP config:
# ~/.claude/mcp_servers.json
{
"graphify": {
"url": "http://localhost:3100",
"description": "CKB ecosystem knowledge graph"
}
}
In MCP mode, Claude Code gets native tools: graph_query, graph_neighbors, graph_community, graph_path. No shell commands required — Claude navigates the graph as naturally as reading a file.
See also: CLAUDE.md Guide · Writing Skills · Hooks & Safety