← Back to AI Hub

Real-World Numbers

These figures come from the CKB ecosystem knowledge graph built from 36 repos using graphify:

~60k
Nodes in the CKB graph
~120k
Edges (relationships)
~1,000
Communities detected
115×
Average token reduction vs naive RAG
5,375×
Best case (cross-layer questions)
38×
Floor (simple lookups)
What does 115× mean in practice? A naive RAG answer to "how does the light client sync protocol interact with Nervos' CKB-VM execution model?" might require 80,000 tokens of context across 36 repos. The graph returns the relevant subgraph in ~700 tokens. Same answer, 1% of the token budget.

Why Knowledge Graphs Beat Naive RAG on Large Codebases

Naive vector RAG has a structural problem: it retrieves chunks based on semantic similarity to the query, but codebases aren't structured by semantic similarity — they're structured by call graphs, dependency trees, and architectural layers. A question about how Component A affects Component B requires traversing a relationship graph, not finding similar text.

Dimension Naive RAG Knowledge Graph
Retrieval unit Text chunk (arbitrary boundary) Entity + relationships (semantic boundary)
Cross-file questions Poor — chunks don't know about each other Strong — edges encode dependencies explicitly
Token cost High — must retrieve many chunks to cover a topic Low — BFS traversal returns minimal relevant subgraph
Community awareness None Built-in — Louvain community detection groups related entities
Incremental updates Re-embed all changed chunks --update flag, only changed nodes re-extracted
Explainability Black box similarity score Explicit path: A → calls → B → imports → C

graphify Overview

graphify is a Python tool that takes a codebase (or any collection of text/code) and produces a queryable knowledge graph. It uses entity extraction, relationship mapping, and community detection to build a graph that AI agents can navigate via BFS traversal.

Installation

pip install graphifyy   # note: two y's
Two y's in the package name The PyPI package is graphifyy (double-y) due to namespace availability. The CLI command after installation is graphify.

The four-stage pipeline

1
Entity Extraction

Files are parsed and named entities are extracted: functions, classes, modules, types, constants, API endpoints, CKB scripts, CLI commands — whatever is meaningful in your corpus. Language-aware parsers handle Rust, TypeScript, C, Python, Go, and plain markdown.

2
Relationship Mapping

Entities are connected: calls, imports, implements, extends, depends_on, defined_in, referenced_by. Cross-file and cross-repo relationships are resolved.

3
Community Detection

The Louvain algorithm runs on the graph to detect clusters of tightly related entities — these become the "communities" that appear in the report and power community-level queries.

4
Output Generation

Three outputs: interactive HTML viewer, GraphRAG-ready JSON, and GRAPH_REPORT.md summarizing communities and key entities. Optional Neo4j export for production graph databases.


Basic Workflow

Build a graph from a directory

# Build graph from current directory
graphify .

# Build from a specific path
graphify ~/projects/ckb-light-client

# Deep mode — more thorough extraction, slower
graphify ~/projects/ckb-light-client --mode deep

# Multi-repo corpus
graphify ~/projects/ckb-light-client ~/projects/ckb-scripts ~/projects/ckb-node

Outputs

graph.json

GraphRAG-ready JSON: nodes, edges, communities, entity metadata. This is what AI agents query programmatically. Feed it to the MCP server or query it directly.

graph.html

Interactive HTML viewer — force-directed graph, searchable, filterable by community or entity type. Serve it locally or open directly in a browser. No server required.

GRAPH_REPORT.md

Markdown summary: top communities by size, most-connected entities, key relationships. Useful as a CLAUDE.md appendix or for humans to orient quickly to an unfamiliar codebase.

Key Flags

Flag Purpose
--mode deep More thorough extraction — follows imports, resolves type aliases, extracts doc comments. Slower but better graph quality. Use for the initial build.
--update Incremental rebuild — only re-processes files changed since last run. Use after code changes to keep the graph current without full rebuild.
--watch File watcher mode — automatically runs --update whenever files change. Run in a terminal alongside your editor for a live graph.
--mcp Start an MCP (Model Context Protocol) server that Claude Code can connect to directly. Claude queries the graph natively without any shell commands.
--output <dir> Write outputs to a specific directory instead of the current dir.
--exclude <patterns> Glob patterns to exclude: --exclude "*.lock,node_modules/**,target/**"
--neo4j <uri> Export to a running Neo4j instance for production graph database usage.

Querying the Graph

From the Claude Code skill

The graphify skill handles querying via natural language through Claude Code:

# In Claude Code:
/graphify query "how does the sync protocol interact with the merkle proof verifier?"

# Build and immediately enter query mode:
/graphify ~/projects/ckb-light-client --mode deep

Direct CLI query

graphify query "how does X work" --graph ./graph.json

# Limit BFS depth (default 3)
graphify query "what calls the signing function" --depth 4

# Return raw JSON subgraph instead of formatted text
graphify query "transaction builder dependencies" --json

What the query returns

Queries run BFS traversal from entities that match the query semantics. The returned subgraph contains:

BFS depth trade-off Depth 2 is fast and precise for focused questions. Depth 4+ can return thousands of nodes on highly connected graphs — use it for architecture overviews, not specific lookups. The default of 3 covers 95% of use cases well.

The graph-routing Skill

graph-routing is a Claude Code skill that sits in front of every cross-repo or architecture question. When triggered, it:

  1. Detects the question shape (cross-repo lookup, dependency trace, architecture question, debug-with-symptoms)
  2. Routes to the appropriate graph in ~/.claude/graphs.json
  3. Runs a targeted BFS query against that graph
  4. Returns the relevant subgraph as context before gathering any other evidence
  5. Proceeds to answer using the graph result instead of grep/glob

This means Claude Code navigates a 60k-node multi-repo codebase with a single graph query instead of dozens of grep calls — dramatically faster and with better cross-file awareness.

When graph-routing does NOT trigger Single-file edits, typo fixes, simple renames, and questions already answered in the current conversation do not trigger graph-routing. The skill is tuned to trigger on task shape (cross-repo, architecture, dependency-tracing) not on topic keywords.

The Graphs Registry

Multiple graphs for different domains are registered in ~/.claude/graphs.json:

{
  "graphs": [
    {
      "id": "my-platform",
      "description": "Core platform — 12 repos, API, SDK, node, CLI tooling",
      "path": "~/.claude/graphs/my-platform/graph.json",
      "viewer": "http://localhost:8765",
      "triggers": ["platform", "api", "sdk", "light client", "core"]
    },
    {
      "id": "my-rust-lib",
      "description": "Rust library — crypto primitives and deployment task graph",
      "path": "~/.claude/graphs/my-rust-lib/graph.json",
      "triggers": ["signing", "crypto", "verifier", "lock script"]
    },
    {
      "id": "my-webapp",
      "description": "Web application — frontend, backend, image gen, TTS, media",
      "path": "~/.claude/graphs/my-webapp/graph.json",
      "triggers": ["webapp", "frontend", "comfyui", "tts", "media"]
    }
  ]
}
Domain isolation Separate graphs for separate domains keeps queries fast and prevents cross-contamination — a question about CKB scripts shouldn't pull in Wyltek Studio nodes. The registry lets the routing skill pick the right graph without you specifying it.

Real Example: Building a Graph from a Rust Blockchain Project

# Step 1: Install
pip install graphifyy

# Step 2: Initial deep build (takes 2-10 minutes for large repos)
cd ~/projects/ckb-light-client
graphify . --mode deep --output ~/.claude/graphs/ckb-light-client/

# Step 3: Verify outputs
ls ~/.claude/graphs/ckb-light-client/
# graph.json  graph.html  GRAPH_REPORT.md

# Step 4: Open the viewer
xdg-open ~/.claude/graphs/ckb-light-client/graph.html

# Step 5: Test a query
graphify query "how does the header sync protocol work" \
  --graph ~/.claude/graphs/ckb-light-client/graph.json

# Step 6: Register in graphs.json
# Add an entry to ~/.claude/graphs.json pointing to graph.json

# Step 7: Start the watcher (background process, keeps graph current)
graphify ~/.claude/graphs/ckb-light-client/ --watch &

What a graph query looks like for a Rust project

Query: "what calls the verify_merkle_proof function?"

Result (condensed subgraph):
  verify_merkle_proof [src/verifier.rs:142] (community: merkle-verification)
    ← calls: verify_transaction [src/tx_verifier.rs:89]
    ← calls: verify_block_header [src/header_verifier.rs:234]
    → imports: MerkleProof [ckb-types::packed]
    → imports: H256 [ckb-types::H256]
    → uses: ProofNode [src/verifier.rs:28]

Community summary (merkle-verification):
  23 entities, core of block validation pipeline
  Key boundary: interfaces with ckb-types for external type compatibility

Building a Corpus: What Goes In, What to Exclude

Include Exclude
Source files (.rs, .ts, .js, .py, .go, .c, .cpp, .h) Binary files (.so, .a, .wasm compiled output)
Configuration files that define architecture (.toml, .yaml) Lock files (Cargo.lock, package-lock.json, yarn.lock)
Markdown docs that describe protocols and APIs Generated code (proto-generated, build artifacts)
Test files — they reveal usage patterns and edge cases node_modules, target/, dist/, .git/
Script files that encode workflows Large binary assets (images, models)
# Recommended exclude pattern for Rust projects:
graphify . --mode deep \
  --exclude "target/**,*.lock,**/*.pb.rs,**/generated/**"

# For TypeScript/Node projects:
graphify . --mode deep \
  --exclude "node_modules/**,dist/**,*.lock,**/*.d.ts"

The Interactive Viewer

The graph.html output is a fully self-contained, zero-dependency HTML file. Open it in any browser:

The viewer is a great onboarding tool Drop a new developer into the graph.html viewer for a codebase and they can understand the architecture in 10 minutes. Community clusters immediately reveal the major subsystems. This works for humans too, not just AI agents.

When NOT to Use Knowledge Graphs

Don't reach for the graph for these tasks The graph shines on cross-file, cross-repo, architecture, and "how does X relate to Y" questions. Use it for those.

MCP Server Mode

For the tightest integration with Claude Code, run graphify as an MCP server:

# Start the MCP server (exposes the graph as Claude Code tools)
graphify --mcp --graph ~/.claude/graphs/ckb-ecosystem/graph.json --port 3100

# Add to your MCP config:
# ~/.claude/mcp_servers.json
{
  "graphify": {
    "url": "http://localhost:3100",
    "description": "CKB ecosystem knowledge graph"
  }
}

In MCP mode, Claude Code gets native tools: graph_query, graph_neighbors, graph_community, graph_path. No shell commands required — Claude navigates the graph as naturally as reading a file.

See also: CLAUDE.md Guide · Writing Skills · Hooks & Safety

← Back to AI Hub