Project Structure¶

This page summarizes the public repository layout and the main package surfaces exposed by GraWiki.
Top-level folders¶
src/grawiki/¶
Main application package. This directory contains the reusable project code.
tests/¶
Pytest coverage for GraphRAG, the modular ingestion-step API, the retrieval layer, graph models and extraction, Cypher query generation, and the FalkorDB adapter.
docs/¶
Official MkDocs documentation source. It includes narrative pages and generated API reference pages under docs/api/, with GraphRAG as the main entry point.
notebooks/¶
Maintained tutorial notebooks plus sample data. The main walkthrough starts in 01_ingest_and_deduplicate.ipynb, then continues into agent memory in 02_agent_memory_and_recall.ipynb and visualization in 03_visualize_graph.ipynb.
.github/workflows/¶
Public CI workflows used to validate package artifacts and release tags.
Important top-level files¶
README.md: repository overview and setup commands.mkdocs.yml: public documentation site configuration..readthedocs.yaml: Read the Docs build configuration.pyproject.toml: package metadata, dependencies, and tool configuration.
Package map¶
grawiki¶
Top-level package that re-exports GraphRAG, the main public facade.
grawiki.core¶
Shared source-data models and embedding abstractions.
core/commons.py: lightweight pre-persistenceDocumentandChunkmodels.core/embedding.py: the sharedEmbeddingprotocol used across ingestion, extraction, retrieval, and similarity workflows.
grawiki.doc_processing¶
Document loading and chunking utilities.
doc_processing/document_processing.py: source document reading and chunking entry helpers.doc_processing/chunkers.py: strategy wrapper aroundchonkieforfast,recursive,semantic,sentence, andtokenchunking, plus an optional pipeline-backed markdown adapter that can merge prose, code, and table regions into one ordered stream when callers explicitly opt in.
grawiki.graph¶
Knowledge-graph-specific schema, prompts, and extraction logic.
graph/models.py: canonical persisted graph schema includingNode,Relationship,KnowledgeGraph,DocumentNode,ChunkNode, andMemoryNode.graph/prompts.py: the extraction prompt template.graph/extraction.py:KnowledgeGraphExtractorand the extractor-facing transient graph shapes.
grawiki.db¶
Database abstraction layer and FalkorDB implementation.
db/base.py: theGraphDBcontract plus shared hit/result models.db/cypher.py: Cypher query builders.db/falkordb.py: theFalkorGraphDBbackend used by the project today.
grawiki.retrieval¶
Query-time retrieval strategy layer.
retrieval/base.py: theRetrieverprotocol.retrieval/text.py: full-text and vector retrieval over stored nodes.retrieval/keywords.py: keyword extraction plus graph-context expansion.
grawiki.similarity¶
Entity similarity inspection and deduplication support.
similarity/base.py: theEntitySimilarityMatcherprotocol.similarity/vector.py: embedding-based entity matching.similarity/fuzzy.py: RapidFuzz-based name matching.similarity/similarity_finder.py: duplicate-candidate orchestration.similarity/deduplication.py: merge helper logic andMergeReport.
grawiki.rag¶
High-level RAG facade.
rag/graph_rag.py: the main end-to-end ingestion, search, memory, and deduplication entrypoint.
How the pieces fit together¶
grawiki.rag.GraphRAGorchestrates the user-facing flows.grawiki.doc_processingreads and chunks source material, including format normalization, PDF-to-markdown conversion, generic text chunking, and optional markdown-aware text/code/table chunking.grawiki.graph.extractionturns chunk text into graph structure.grawiki.dbpersists nodes, relationships, memories, and search indexes.grawiki.retrievalhandles query-time embedding and graph-context retrieval.grawiki.similaritypowers duplicate inspection, ingest-time entity resolution, and deduplication.
For task-oriented examples built from the maintained notebooks, see How to.