Skip to content

Database Abstractions

The database layer is centered on GraphDB, which defines the backend-agnostic persistence, indexing, search, neighbor-expansion, and merge primitives used elsewhere in the project.

Use this page when implementing a new backend or when clarifying which responsibilities belong to the database adapter rather than the retrieval layer. For the current concrete implementation, see FalkorDB adapter.

grawiki.db.base

Backend-agnostic graph database interfaces.

NodeHit dataclass

Search result pairing a node with scoring metadata.

Parameters:

Name Type Description Default
node Node

Matched node. May be a concrete subclass such as :class:~grawiki.graph.models.DocumentNode, :class:~grawiki.graph.models.ChunkNode, or :class:~grawiki.graph.models.MemoryNode depending on the node's label.

required
score float

Relevance or similarity score. Adapters and higher-level services may normalize backend-specific distance values into higher-is-better scores. Defaults to 0.0 when no score is reported.

0.0
matched_on str

Short descriptor of how the hit was matched (for example "fulltext:content", "vector", or "rapidfuzz"). Empty when not reported.

''

NeighborRelationship dataclass

One-hop relationship context around a seed node.

Parameters:

Name Type Description Default
source_id str

Identifier of the seed node that was expanded.

required
source_name str

Human-readable name of the seed node.

required
relationship_label str

Label of the relationship connecting the seed to the target.

required
target Node

Neighbor node connected to the seed.

required

GraphDB

Bases: ABC

Abstract interface for graph database adapters.

Notes

The contract has two layers. Storage-engine primitives (:meth:upsert_nodes, :meth:upsert_relationships, :meth:fulltext_search, :meth:vector_search, :meth:neighbor_relationships, :meth:list_entities, :meth:ensure_indexes) are the foundational operations every backend must implement. Higher-level convenience methods (:meth:save_documents_and_chunks, :meth:save_docs_and_chunks_to_db, :meth:save_entities_and_rels, :meth:search) are thin wrappers over the primitives that preserve the legacy API during the migration.

setup abstractmethod async

setup(embedding_dimensions=None)

Prepare backend indexes and other database structures.

Parameters:

Name Type Description Default
embedding_dimensions dict[str, int] | None

Mapping from node label to embedding dimensionality for vector indexes that require the dimension to be known ahead of time.

None

ensure_indexes abstractmethod async

ensure_indexes(*, labels, vector_dims=None)

Ensure full-text and vector indexes exist for labels.

Parameters:

Name Type Description Default
labels Iterable[str]

Node labels whose indexes should be created.

required
vector_dims Mapping[str, int] | None

Per-label embedding dimensionality. Labels omitted from the mapping do not get a vector index.

None
fulltext_search(*, labels, query_text, limit=10)

Run a full-text search across one or more node labels.

Parameters:

Name Type Description Default
labels Sequence[str]

Labels whose full-text indexes should be queried.

required
query_text str

Raw full-text query string.

required
limit int

Maximum number of hits to return per label.

10

Returns:

Type Description
list[NodeHit]

Flat list of hits across the requested labels. Callers group by the node family / label set when a grouped view is needed.

vector_search(*, labels, query_embedding, limit=10)

Run a vector similarity search across one or more node labels.

Parameters:

Name Type Description Default
labels Sequence[str]

Labels whose vector indexes should be queried.

required
query_embedding list[float]

Pre-computed query embedding. The DB does not embed queries; that concern lives in the retrieval layer.

required
limit int

Maximum number of hits to return per label.

10

Returns:

Type Description
list[NodeHit]

Flat list of hits across the requested labels.

neighbor_relationships abstractmethod async

neighbor_relationships(*, node_ids, limit_per_node=5)

Fetch one-hop relationship context for the given seed nodes.

Parameters:

Name Type Description Default
node_ids Sequence[str]

Seed node identifiers.

required
limit_per_node int

Maximum number of one-hop relationships returned for each seed.

5

Returns:

Type Description
dict[str, list[NeighborRelationship]]

Relationship context keyed by seed node identifier. Seed ids with no matching context should still be present with an empty list.

recall_subgraph abstractmethod async

recall_subgraph(*, memory_ids, hops=1, limit_per_memory=20)

Fetch a flattened k-hop recall subgraph for memory seeds.

Parameters:

Name Type Description Default
memory_ids Sequence[str]

Memory node identifiers used as traversal seeds.

required
hops int

Maximum traversal depth in hops. Must be at least 1.

1
limit_per_memory int

Maximum number of distinct paths expanded per memory seed before flattening them into relationship rows.

20

Returns:

Type Description
dict[str, list[NeighborRelationship]]

Flattened relationship rows keyed by memory id. Traversal is undirected for discovery, but each row preserves the stored relationship direction.

list_entities abstractmethod async

list_entities(*, include_embeddings=False)

Return persisted entity nodes.

Parameters:

Name Type Description Default
include_embeddings bool

Whether entity embeddings should be loaded when available. Callers that only need identifiers and names should keep this disabled to avoid transferring large vectors unnecessarily.

False

Returns:

Type Description
list[Node]

Persisted entity nodes ordered by backend-defined stable ordering.

entity_relationship_counts abstractmethod async

entity_relationship_counts(node_ids)

Return incident relationship counts for entity nodes.

Parameters:

Name Type Description Default
node_ids Sequence[str]

Entity identifiers whose incident edge counts should be returned.

required

Returns:

Type Description
dict[str, int]

Mapping from entity id to total incoming-plus-outgoing relationship count. Missing ids should still appear with 0.

upsert_nodes abstractmethod async

upsert_nodes(nodes)

Upsert nodes. Dispatches on labels for persistence semantics.

Parameters:

Name Type Description Default
nodes Sequence[Node]

Nodes to create or update. Each node's label set determines which concrete storage path is used.

required

upsert_relationships abstractmethod async

upsert_relationships(rels)

Upsert relationships between existing nodes (matched by id).

Parameters:

Name Type Description Default
rels Sequence[Relationship]

Relationships to create or update. Both endpoints must already exist in the graph and are matched by their id field.

required

merge_entity_nodes abstractmethod async

merge_entity_nodes(*, master, duplicate_ids)

Merge duplicate entity nodes into master.

Parameters:

Name Type Description Default
master Node

Final persisted state for the surviving master node. The master is matched by master.id and updated before duplicate nodes are deleted.

required
duplicate_ids Sequence[str]

Entity identifiers to merge into master and then delete.

required

Raises:

Type Description
ValueError

Raised when duplicate_ids contains master.id.

delete_memory abstractmethod async

delete_memory(memory_id)

Delete one memory and prune now-orphaned directly mentioned entities.

Parameters:

Name Type Description Default
memory_id str

Identifier of the memory node to remove.

required

save_documents_and_chunks async

save_documents_and_chunks(documents, chunks)

Persist source documents and their chunks.

Parameters:

Name Type Description Default
documents list[Document]

Source documents to persist.

required
chunks list[Chunk]

Source chunks to persist and connect to their parent documents.

required

save_docs_and_chunks_to_db async

save_docs_and_chunks_to_db(doc_nodes, chunk_nodes)

Persist prepared document and chunk nodes.

Parameters:

Name Type Description Default
doc_nodes list[DocumentNode]

Prepared document nodes ready for persistence.

required
chunk_nodes list[ChunkNode]

Prepared chunk nodes ready for persistence.

required

save_entities_and_rels async

save_entities_and_rels(owner_ids, owner_graphs)

Persist extracted owner-linked entities and relationships.

Parameters:

Name Type Description Default
owner_ids Sequence[str]

Node identifiers that own the extracted graphs, such as chunk or memory ids.

required
owner_graphs dict[str, KnowledgeGraph]

Extracted graphs keyed by owner identifier.

required

Raises:

Type Description
ValueError

Raised when a graph references a chunk identifier that is not present in owner_ids.

search async

search(query, method, *, limit=10, query_embedding=None)

Search documents, chunks, and entities.

Parameters:

Name Type Description Default
query str

Raw user query text.

required
method ('fulltext', 'vector')

Search strategy to execute.

"fulltext"
limit int

Maximum number of results to return per node family.

10
query_embedding list[float] | None

Embedded query vector required for vector search.

None

Returns:

Type Description
SearchResults

Search hits grouped by node family.