GraWiki¶

GraWiki is an early-stage open source Python project with two main concerns:
- GraphRAG-style knowledge extraction and retrieval.
- LLM Wiki style memory for agents.
It uses an LLM to extract structured knowledge from documents, persists that knowledge in a graph, and uses the same graph as part of an LLM system's long-lived memory.
The project is still experimental. It is a working repository for graph-backed memory and retrieval workflows rather than a finished framework.
What the project covers¶
GraWiki focuses on a small end-to-end surface:
- ingest source documents into a typed property graph,
- retrieve graph-backed context from documents, chunks, memories, and entities,
- persist agent memory as first-class graph nodes,
- inspect and merge duplicate entities through similarity-based workflows.
Current capabilities¶
- Read source documents and split them into chunks.
- Use Markdown-aware chunking for markdown files and in-memory markdown text when configured.
- Extract entities and relationships from chunk text.
- Persist documents, chunks, entities, and edges in a graph database.
- Retrieve graph-backed context with full-text and vector search.
- Store and recall agent memories as dedicated graph nodes.
- Inspect semantic-key collisions and broader duplicate candidates.
- Merge duplicate entities into canonical masters.
Why this project exists¶
Many RAG systems treat retrieved context as a temporary document slice. GraWiki instead stores document structure and agent memory in a persistent graph that can be searched later and expanded through connected context.
That design is closer to a lightweight graph-backed "wiki" for an LLM system than to a document-search pipeline alone 12.
How to use the docs¶
- Start with Flows for the main ingestion and retrieval paths.
- Use How to for task-oriented guides derived from the maintained notebooks.
- Use API Overview when you need the facade and lower-level references.
Repository structure¶
The repository is organized around a few major areas:
src/grawiki/contains the reusable project code.tests/contains pytest coverage for the facade, retrieval, graph models, extraction, and the FalkorDB adapter.docs/contains the public MkDocs site, including generated API pages underdocs/api/.notebooks/contains focused tutorial notebooks and sample text inputs.
At the package level:
grawiki.coreholds shared source-data types and the embedding protocol.grawiki.doc_processinghandles document loading and chunking.grawiki.graphdefines the graph schema and extraction logic.grawiki.dbdefines the database abstraction layer and FalkorDB backend.grawiki.retrievalowns query-time retrieval strategies.grawiki.similaritycovers duplicate inspection, similarity matchers, and deduplication helpers.grawiki.ragexposes theGraphRAGfacade.
For a fuller map of the repository, see Project Structure.
Maintained notebooks¶
The maintained notebook flow lives in three numbered notebooks under notebooks/:
01_ingest_and_deduplicate.ipynb02_agent_memory_and_recall.ipynb03_visualize_graph.ipynb
Run notebook 1 first to build the local FalkorDB graph. Notebook 2 reuses that graph for agent memory examples, and notebook 3 visualizes the resulting graph.
To install the tutorial dependencies, choose one:
-
For file-based (FalkorDBLite):
pip install 'grawiki[falkordblite,notebooks,viz]' -
For Docker-based (full FalkorDB):
pip install 'grawiki[falkordb,notebooks,viz]'
The sample texts used there are Medium articles by Filip Wojcik. They are available from his public Medium profile and are accessible without a subscription.
-
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Sarah Truitt, and Jonathan Larson. From local to global: a graphrag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024. ↩
-
Andrej Karpathy. Llm wiki. GitHub Gist, April 2026. Created 2026-04-04. URL: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f. ↩