API Overview¶

GraWiki exposes a layered API. Most users should start with GraphRAG, which provides ingestion, search, memory, and entity-deduplication workflows through one facade.

Use this section in the following order:

GraphRAG for the high-level application surface.
Retrieval pages for query-time search behavior.
Graph model pages for persisted node and relationship shapes.
Database abstractions when implementing or debugging a backend.
Similarity and deduplication pages when inspecting duplicate entities or running merges.

At a high level, the API is split into facade-level entry points and lower-level implementation layers:

GraphRAG is the normal application surface.
Retrieval, graph, database, and similarity pages document the subsystems that GraphRAG composes.
The extraction and the FalkorDB / Memgraph adapter pages are advanced reference material.
Helper modules with leading underscores are internal and intentionally undocumented here.

For task-oriented examples, use the How to section alongside this reference. The generated API sections are backed by docstrings from src/, so the reference stays aligned with the code that ships.

grawiki ¶

Public GraWiki package surface.

The top-level package intentionally re-exports :class:grawiki.GraphRAG as the main entry point for users who want document ingestion, retrieval, memory, and entity-deduplication workflows through one facade.

GraphRAG ¶

Orchestrate document ingestion and retrieval-augmented search.

GraphRAG is the main facade for the grawiki pipeline. It wires together chunking, embedding, knowledge-graph extraction, entity resolution, and vector/keyword retrieval into a coherent workflow:

Ingest — read a document from disk or a string.
Extract — chunk the document and extract a knowledge graph per chunk.
Resolve — optionally deduplicate extracted entities against the persisted graph.
Persist — store document nodes, chunk nodes, entities, and relationships in the configured graph database.
Query — search the graph with :meth:search or :meth:recall.

The stepwise helpers useful for notebooks and debugging are :meth:read_document, :meth:chunk_document, :meth:process_chunks, :meth:embed_chunks, :meth:build_document_node, :meth:build_chunk_nodes, :meth:persist_document_and_chunks, :meth:extract_kg_per_chunk, and :meth:persist_entities_and_relationships.

See :meth:__init__ for the full list of constructor parameters.

read_document ¶

read_document(path)

Load one source document from disk.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Filesystem path to the source document.	required

Returns:

Type	Description
`Document`	Loaded source document.

chunk_document ¶

chunk_document(document, format=None)

Split a document into chunks.

Parameters:

Name	Type	Description	Default
`document`	`Document`	Source document to segment.	required
`format`	`('text', 'markdown')`	Explicit content-format override. When omitted, the method uses `document.metadata["content_format"]` and falls back to `"text"`.	`"text"`

Returns:

Type	Description
`list[Chunk]`	Chunk sequence produced by the configured chunker. Markdown content uses the markdown pipeline only when one was configured explicitly.

process_chunks `async` ¶

process_chunks(chunks)

Apply configured chunk processors stage-by-stage.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	Chunks to process.	required

Returns:

Type	Description
`list[Chunk]`	Processed chunks in the same order as the input sequence. Processor stages run in configured order, while chunks within one stage run concurrently up to `num_workers`.

embed_chunks `async` ¶

embed_chunks(chunks)

Embed chunk contents in one batch.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	Chunks whose content should be embedded.	required

Returns:

Type	Description
`list[list[float]]`	Embedding vectors aligned with the input chunk order.

build_document_node ¶

build_document_node(document)

Build a document node ready for persistence.

Parameters:

Name	Type	Description	Default
`document`	`Document`	Source document to convert into a persisted node model.	required

Returns:

Type	Description
`DocumentNode`	Prepared document node ready for persistence. Document-level embeddings are not stored; the embedding field is always empty.

build_chunk_nodes ¶

build_chunk_nodes(chunks, embeddings)

Build chunk nodes with embeddings attached.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	Source chunks to convert into persisted node models.	required
`embeddings`	`list[list[float]]`	Embedding vectors aligned with `chunks`.	required

Returns:

Type	Description
`list[ChunkNode]`	Prepared chunk nodes ready for persistence.

Raises:

Type	Description
`ValueError`	Raised when the number of chunks and embeddings does not match.

persist_document_and_chunks `async` ¶

persist_document_and_chunks(document_node, chunk_nodes)

Persist one document node and its chunk nodes with indexes.

Parameters:

Name	Type	Description	Default
`document_node`	`DocumentNode`	Prepared document node.	required
`chunk_nodes`	`list[ChunkNode]`	Prepared chunk nodes associated with the document.	required

extract_kg_per_chunk `async` ¶

extract_kg_per_chunk(chunks, *, show_progress=False)

Extract knowledge graphs for chunks with bounded concurrency.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	Chunks to analyze.	required
`show_progress`	`bool`	When `True`, emit info-level log messages as chunk extraction starts, each chunk finishes, and the overall extraction completes. Defaults to `False`.	`False`

Returns:

Type	Description
`dict[str, KnowledgeGraph]`	Extracted graphs keyed by chunk identifier.

aclose `async` ¶

aclose()

Best-effort close of facade-owned resources.

Notes

This forwards cleanup to initialized chunk processors, the configured knowledge-graph extractor, and the graph database adapter when those objects expose aclose() or close().

persist_entities_and_relationships `async` ¶

persist_entities_and_relationships(owner_ids, owner_graphs)

Persist extracted entities and relationships.

Parameters:

Name	Type	Description	Default
`owner_ids`	`Sequence[str]`	Node identifiers that own the extracted graphs.	required
`owner_graphs`	`dict[str, KnowledgeGraph]`	Extracted graphs keyed by owner identifier.	required

find_entity_duplicate_candidates `async` ¶

find_entity_duplicate_candidates(*, limit=10, threshold=None, skip_semantic_key_collisions_in_similarity_scan=True)

Run the two-step duplicate-finding heuristic across entities.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of candidate hits returned per source entity.	`10`
`threshold`	`float \| None`	Optional matcher-specific minimum score.	`None`
`skip_semantic_key_collisions_in_similarity_scan`	`bool`	Whether the broader similarity scan should exclude entities already involved in exact semantic-key collisions.	`True`

Returns:

Type	Description
`EntityDuplicateCandidates`	Combined duplicate-candidate report produced by the injected entity similarity finder.

dedupe_entities `async` ¶

dedupe_entities(*, limit=10, threshold=None, min_merge_score=0.95, dry_run=False)

Find duplicate entities and merge them into canonical masters.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum candidate hits returned per source entity during duplicate inspection.	`10`
`threshold`	`float \| None`	Optional similarity threshold forwarded to the duplicate finder.	`None`
`min_merge_score`	`float`	Minimum candidate score required for inclusion in a merge group.	`0.95`
`dry_run`	`bool`	When `True`, reports are produced without applying destructive DB changes.	`False`

Returns:

Type	Description
`list[MergeReport]`	Reports describing the merge decisions that were made.

ingest `async` ¶

ingest(path, *, show_progress=False)

Run the full ingestion flow for one file.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Source file to ingest.	required
`show_progress`	`bool`	When `True`, emit info-level progress logs for chunk-level knowledge-graph extraction during ingestion. Defaults to `False`.	`False`

Returns:

Type	Description
`None`	This method persists the resulting graph side effects to the configured database.

ingest_text `async` ¶

ingest_text(text, title, *, format='text', metadata=None, show_progress=False)

Ingest a document supplied as a string.

Parameters:

Name	Type	Description	Default
`text`	`str`	Document content to ingest.	required
`title`	`str`	Human-readable document title used as the document name.	required
`format`	`('text', 'markdown')`	Explicit content format for the in-memory document. Defaults to `"text"` and is not auto-detected.	`"text"`
`metadata`	`dict[str, str] \| None`	Additional metadata attached to the transient source document before persistence.	`None`
`show_progress`	`bool`	When `True`, emit info-level progress logs for chunk-level knowledge-graph extraction during ingestion. Defaults to `False`.	`False`

Returns:

Type	Description
`None`	This method persists the resulting graph side effects to the configured database.

remember `async` ¶

remember(memory, *, memory_id=None, name=None, semantic_key=None, metadata=None, related_node_ids=())

Persist one memory, replacing an existing memory when requested.

Parameters:

Name	Type	Description	Default
`memory`	`MemoryNode \| str`	Memory payload to persist. Raw strings are normalized into a new :class:`~grawiki.graph.models.MemoryNode`.	required
`memory_id`	`str \| None`	Existing memory identifier to replace. When omitted, `memory.id` is used as-is.	`None`
`name`	`str \| None`	Optional memory name override. Primarily useful when `memory` is a raw string.	`None`
`semantic_key`	`str \| None`	Optional semantic key override. Defaults to the final memory id.	`None`
`metadata`	`dict[str, str] \| None`	Optional metadata merged into the memory metadata.	`None`
`related_node_ids`	`Sequence[str]`	Existing node ids that should be explicitly linked from the memory.	`()`

Returns:

Type	Description
`MemoryNode`	Persisted memory payload including its final id.

search `async` ¶

search(query, *, limit=10)

Aggregate results from the configured retrievers.

Parameters:

Name	Type	Description	Default
`query`	`str`	Raw user query text.	required
`limit`	`int`	Maximum number of final hits returned after combining retriever outputs.	`10`

Returns:

Type	Description
`list[NodeHit]`	Flat, deduplicated search hits across the configured retrievers. With the default retriever set this typically includes chunk, memory, and keyword-expanded entity results.

Raises:

Type	Description
`RuntimeError`	Raised when every configured retriever fails for the query.

recall `async` ¶

recall(query, *, user_id=None, limit=5, hops=1, limit_per_hop=5)

Search memories and attach connected graph context.

Parameters:

Name	Type	Description	Default
`query`	`str`	Raw user query text.	required
`user_id`	`str \| None`	Optional memory-owner filter applied after memory retrieval.	`None`
`limit`	`int`	Maximum number of memories returned.	`5`
`hops`	`int`	Number of graph-expansion hops to include.	`1`
`limit_per_hop`	`int`	Maximum recall paths expanded per memory seed.	`5`

API Overview¶

grawiki ¶

GraphRAG ¶

read_document ¶

chunk_document ¶

process_chunks async ¶

embed_chunks async ¶

build_document_node ¶

build_chunk_nodes ¶

persist_document_and_chunks async ¶

extract_kg_per_chunk async ¶

aclose async ¶

persist_entities_and_relationships async ¶

find_entity_duplicate_candidates async ¶

dedupe_entities async ¶

ingest async ¶

ingest_text async ¶

remember async ¶

search async ¶

recall async ¶

process_chunks `async` ¶

embed_chunks `async` ¶

persist_document_and_chunks `async` ¶

extract_kg_per_chunk `async` ¶

aclose `async` ¶

persist_entities_and_relationships `async` ¶

find_entity_duplicate_candidates `async` ¶

dedupe_entities `async` ¶

ingest `async` ¶

ingest_text `async` ¶

remember `async` ¶

search `async` ¶

recall `async` ¶