Skip to content

GraWiki

GraWiki

GraWiki is an early-stage open source Python project with two main concerns:

  1. GraphRAG-style knowledge extraction and retrieval.
  2. LLM Wiki style memory for agents.

It uses an LLM to extract structured knowledge from documents, persists that knowledge in a graph, and uses the same graph as part of an LLM system's long-lived memory.

The project is still experimental. It is a working repository for graph-backed memory and retrieval workflows rather than a finished framework.

What the project covers

GraWiki focuses on a small end-to-end surface:

  • ingest source documents into a typed property graph,
  • retrieve graph-backed context from documents, chunks, memories, and entities,
  • persist agent memory as first-class graph nodes,
  • inspect and merge duplicate entities through similarity-based workflows.

Current capabilities

  • Read source documents and split them into chunks.
  • Use Markdown-aware chunking for markdown files and in-memory markdown text when configured.
  • Extract entities and relationships from chunk text.
  • Persist documents, chunks, entities, and edges in a graph database.
  • Retrieve graph-backed context with full-text and vector search.
  • Store and recall agent memories as dedicated graph nodes.
  • Inspect semantic-key collisions and broader duplicate candidates.
  • Merge duplicate entities into canonical masters.

Why this project exists

Many RAG systems treat retrieved context as a temporary document slice. GraWiki instead stores document structure and agent memory in a persistent graph that can be searched later and expanded through connected context.

That design is closer to a lightweight graph-backed "wiki" for an LLM system than to a document-search pipeline alone 12.

How to use the docs

  • Start with Flows for the main ingestion and retrieval paths.
  • Use How to for task-oriented guides derived from the maintained notebooks.
  • Use API Overview when you need the facade and lower-level references.

Repository structure

The repository is organized around a few major areas:

  • src/grawiki/ contains the reusable project code.
  • tests/ contains pytest coverage for the facade, retrieval, graph models, extraction, and the FalkorDB adapter.
  • docs/ contains the public MkDocs site, including generated API pages under docs/api/.
  • notebooks/ contains focused tutorial notebooks and sample text inputs.

At the package level:

  • grawiki.core holds shared source-data types and the embedding protocol.
  • grawiki.doc_processing handles document loading and chunking.
  • grawiki.graph defines the graph schema and extraction logic.
  • grawiki.db defines the database abstraction layer and FalkorDB backend.
  • grawiki.retrieval owns query-time retrieval strategies.
  • grawiki.similarity covers duplicate inspection, similarity matchers, and deduplication helpers.
  • grawiki.rag exposes the GraphRAG facade.

For a fuller map of the repository, see Project Structure.

Maintained notebooks

The maintained notebook flow lives in three numbered notebooks under notebooks/:

  • 01_ingest_and_deduplicate.ipynb
  • 02_agent_memory_and_recall.ipynb
  • 03_visualize_graph.ipynb

Run notebook 1 first to build the local FalkorDB graph. Notebook 2 reuses that graph for agent memory examples, and notebook 3 visualizes the resulting graph.

To install the tutorial dependencies, choose one:

  • For file-based (FalkorDBLite):

    pip install 'grawiki[falkordblite,notebooks,viz]'
    

  • For Docker-based (full FalkorDB):

    pip install 'grawiki[falkordb,notebooks,viz]'
    

The sample texts used there are Medium articles by Filip Wojcik. They are available from his public Medium profile and are accessible without a subscription.


  1. Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Sarah Truitt, and Jonathan Larson. From local to global: a graphrag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024. 

  2. Andrej Karpathy. Llm wiki. GitHub Gist, April 2026. Created 2026-04-04. URL: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f