API Overview¶
GraWiki exposes a layered API. Most users should start with GraphRAG, which provides ingestion, search, memory, and entity-deduplication workflows through one facade.
Use this section in the following order:
GraphRAGfor the high-level application surface.- Retrieval pages for query-time search behavior.
- Graph model pages for persisted node and relationship shapes.
- Database abstractions when implementing or debugging a backend.
- Similarity and deduplication pages when inspecting duplicate entities or running merges.
At a high level, the API is split into facade-level entry points and lower-level implementation layers:
GraphRAGis the normal application surface.- Retrieval, graph, database, and similarity pages document the subsystems that
GraphRAGcomposes. - The extraction and FalkorDB adapter pages are advanced reference material.
- Helper modules with leading underscores are internal and intentionally undocumented here.
For task-oriented examples, use the How to section alongside this reference. The generated API sections are backed by docstrings from src/, so the reference stays aligned with the code that ships.
grawiki
¶
Public GraWiki package surface.
The top-level package intentionally re-exports :class:grawiki.GraphRAG as the
main entry point for users who want document ingestion, retrieval, memory, and
entity-deduplication workflows through one facade.
GraphRAG
¶
Orchestrate document ingestion and retrieval-augmented search.
The stepwise ingestion helpers exposed for notebooks and debugging are
read_document(...), chunk_document(...), process_chunks(...),
embed_document(...), embed_chunks(...), build_document_node(...),
build_chunk_nodes(...), persist_document_and_chunks(...),
extract_kg_per_chunk(...), and persist_entities_and_relationships(...).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Chat model used by the knowledge graph extractor. |
required |
embedding_model
|
str
|
Embedding model used for documents, chunks, entities, and queries. |
required |
db
|
GraphDB
|
Graph database adapter used for persistence and search. |
required |
chunking_strategy
|
str
|
Chunking strategy passed to :class: |
'sentence'
|
chunk_processors
|
list[ChunkProcessor] | None
|
Optional chunk-level processing steps applied after chunking and before embedding and graph extraction. Useful for enrichment or normalization tasks such as question generation, entity anonymization, or metadata injection. |
None
|
markdown_pipeline
|
Pipeline | None
|
Optional markdown-aware pipeline used for markdown content. When omitted, markdown falls back to the generic text chunker. |
None
|
max_workers
|
int
|
Maximum number of concurrent chunk-level extraction coroutines. |
4
|
embedding
|
Embedding | None
|
Embedding override for tests or debugging. |
None
|
kg_extractor
|
KnowledgeGraphExtractorProtocol | None
|
Knowledge graph extractor override for tests or debugging. |
None
|
kg_output_language
|
str
|
Language used by the default knowledge graph extractor for entity
names, relationship labels, and textual properties. Defaults to
|
'English'
|
kg_extractor_kwargs
|
dict[str, Any] | None
|
Extra keyword arguments forwarded to the default extractor's
|
None
|
similarity_finder
|
EntitySimilarityFinder | None
|
Entity similarity finder used for collision inspection and candidate lookup. Defaults to a finder backed by the vector similarity matcher. |
None
|
resolve_entities_on_ingest
|
bool
|
When |
False
|
entity_resolution_threshold
|
float
|
Minimum cosine-similarity score for two entities to be considered the
same during ingest-time resolution. Only used when
|
0.92
|
read_document
¶
read_document(path)
Load one source document from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Filesystem path to the source document. |
required |
Returns:
| Type | Description |
|---|---|
Document
|
Loaded source document. |
chunk_document
¶
chunk_document(document, format=None)
Split a document into chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document
|
Document
|
Source document to segment. |
required |
format
|
('text', 'markdown')
|
Explicit content-format override. When omitted, the method uses
|
"text"
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
Chunk sequence produced by the configured chunker. Markdown content uses the markdown pipeline only when one was configured explicitly. |
process_chunks
async
¶
process_chunks(chunks)
Apply configured chunk processors in sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
Chunks to process. |
required |
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
Processed chunks in the same order as the input sequence. |
embed_document
async
¶
embed_document(document)
Return no document-level embedding for ingestion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document
|
Document
|
Source document. Kept in the signature for step-method API compatibility. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
Empty list. Document content is persisted without a vector; chunk, entity, memory, and query embeddings remain the retrieval path. |
embed_chunks
async
¶
embed_chunks(chunks)
Embed chunk contents in one batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
Chunks whose content should be embedded. |
required |
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
Embedding vectors aligned with the input chunk order. |
build_document_node
¶
build_document_node(document, embedding)
Build a document node with an optional embedding attached.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document
|
Document
|
Source document to convert into a persisted node model. |
required |
embedding
|
list[float]
|
Optional embedding vector for the document. The ingestion path now passes an empty list, but the parameter is retained for API compatibility. |
required |
Returns:
| Type | Description |
|---|---|
DocumentNode
|
Prepared document node ready for persistence. |
build_chunk_nodes
¶
build_chunk_nodes(chunks, embeddings)
Build chunk nodes with embeddings attached.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
Source chunks to convert into persisted node models. |
required |
embeddings
|
list[list[float]]
|
Embedding vectors aligned with |
required |
Returns:
| Type | Description |
|---|---|
list[ChunkNode]
|
Prepared chunk nodes ready for persistence. |
Raises:
| Type | Description |
|---|---|
ValueError
|
Raised when the number of chunks and embeddings does not match. |
persist_document_and_chunks
async
¶
persist_document_and_chunks(document_node, chunk_nodes)
Persist one document node and its chunk nodes with indexes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
document_node
|
DocumentNode
|
Prepared document node. |
required |
chunk_nodes
|
list[ChunkNode]
|
Prepared chunk nodes associated with the document. |
required |
extract_kg_per_chunk
async
¶
extract_kg_per_chunk(chunks, *, show_progress=False)
Extract knowledge graphs for chunks with bounded concurrency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
Chunks to analyze. |
required |
show_progress
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, KnowledgeGraph]
|
Extracted graphs keyed by chunk identifier. |
persist_entities_and_relationships
async
¶
persist_entities_and_relationships(owner_ids, owner_graphs)
Persist extracted entities and relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
owner_ids
|
Sequence[str]
|
Node identifiers that own the extracted graphs. |
required |
owner_graphs
|
dict[str, KnowledgeGraph]
|
Extracted graphs keyed by owner identifier. |
required |
find_similar_entities
async
¶
find_similar_entities(entity, *, limit=10, threshold=None, candidates=None)
Return candidate entities similar to entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity
|
Node
|
Source entity used as the similarity query. |
required |
limit
|
int
|
Maximum number of candidate hits to return. |
10
|
threshold
|
float | None
|
Optional strategy-specific minimum score. |
None
|
candidates
|
list[Node] | None
|
Optional candidate pool. When omitted, persisted entities are loaded from the graph database. |
None
|
Returns:
| Type | Description |
|---|---|
list[NodeHit]
|
Ranked similarity candidates. |
Notes
The configured :class:~grawiki.similarity.similarity_finder.EntitySimilarityFinder
decides which concrete matcher implementation is used.
find_entity_collision_candidates
async
¶
find_entity_collision_candidates(*, limit=10, threshold=None)
Return semantic-key collision groups annotated with merge candidates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of candidate hits returned per source entity. |
10
|
threshold
|
float | None
|
Optional strategy-specific minimum score. |
None
|
Returns:
| Type | Description |
|---|---|
list[SemanticKeyCollisionCandidates]
|
Collision groups with per-entity candidate matches. |
Notes
Candidate generation uses the similarity matcher configured on the injected entity similarity finder.
find_entity_duplicate_candidates
async
¶
find_entity_duplicate_candidates(*, limit=10, threshold=None, skip_semantic_key_collisions_in_similarity_scan=True)
Run the two-step duplicate-finding heuristic across entities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of candidate hits returned per source entity. |
10
|
threshold
|
float | None
|
Optional matcher-specific minimum score. |
None
|
skip_semantic_key_collisions_in_similarity_scan
|
bool
|
Whether the broader similarity scan should exclude entities already involved in exact semantic-key collisions. |
True
|
Returns:
| Type | Description |
|---|---|
EntityDuplicateCandidates
|
Combined duplicate-candidate report produced by the injected entity similarity finder. |
dedupe_entities
async
¶
dedupe_entities(*, limit=10, threshold=None, min_merge_score=0.95, dry_run=False)
Find duplicate entities and merge them into canonical masters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum candidate hits returned per source entity during duplicate inspection. |
10
|
threshold
|
float | None
|
Optional similarity threshold forwarded to the duplicate finder. |
None
|
min_merge_score
|
float
|
Minimum candidate score required for inclusion in a merge group. |
0.95
|
dry_run
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
list[MergeReport]
|
Reports describing the merge decisions that were made. |
ingest
async
¶
ingest(path, *, show_progress=False)
Run the full ingestion flow for one file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Source file to ingest. |
required |
show_progress
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
None
|
This method persists the resulting graph side effects to the configured database. |
ingest_text
async
¶
ingest_text(text, title, *, format='text', metadata=None, show_progress=False)
Ingest a document supplied as a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Document content to ingest. |
required |
title
|
str
|
Human-readable document title used as the document name. |
required |
format
|
('text', 'markdown')
|
Explicit content format for the in-memory document. Defaults to
|
"text"
|
metadata
|
dict[str, str] | None
|
Additional metadata attached to the transient source document before persistence. |
None
|
show_progress
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
None
|
This method persists the resulting graph side effects to the configured database. |
remember
async
¶
remember(memory, *, memory_id=None, name=None, semantic_key=None, metadata=None, related_node_ids=())
Persist one memory, replacing an existing memory when requested.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
memory
|
MemoryNode | str
|
Memory payload to persist. Raw strings are normalized into a new
:class: |
required |
memory_id
|
str | None
|
Existing memory identifier to replace. When omitted, |
None
|
name
|
str | None
|
Optional memory name override. Primarily useful when |
None
|
semantic_key
|
str | None
|
Optional semantic key override. Defaults to the final memory id. |
None
|
metadata
|
dict[str, str] | None
|
Optional metadata merged into the memory metadata. |
None
|
related_node_ids
|
Sequence[str]
|
Existing node ids that should be explicitly linked from the memory. |
()
|
Returns:
| Type | Description |
|---|---|
MemoryNode
|
Persisted memory payload including its final id. |
search
async
¶
search(query, *, limit=10)
Aggregate results from the configured retrievers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Raw user query text. |
required |
limit
|
int
|
Maximum number of final hits returned after combining retriever outputs. |
10
|
Returns:
| Type | Description |
|---|---|
list[NodeHit]
|
Flat, deduplicated search hits across the configured retrievers. With the default retriever set this typically includes chunk, memory, and keyword-expanded entity results. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
Raised when every configured retriever fails for the query. |
recall
async
¶
recall(query, *, user_id=None, limit=5, hops=1, limit_per_hop=5)
Search memories and attach connected graph context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Raw user query text. |
required |
user_id
|
str | None
|
Optional memory-owner filter applied after memory retrieval. |
None
|
limit
|
int
|
Maximum number of memories returned. |
5
|
hops
|
int
|
Number of graph-expansion hops to include. |
1
|
limit_per_hop
|
int
|
Maximum recall paths expanded per memory seed. |
5
|