Skip to content

Memgraph adapter

MemgraphGraphDB is a GraphDB backend that connects to a Memgraph server over the Bolt protocol. It is a thin Memgraph dialect over the shared Cypher engine, implementing only the Memgraph-specific hooks (Bolt connection via the neo4j driver, plain list embedding literals, CREATE TEXT INDEX / CREATE VECTOR INDEX DDL, SHOW INDEX INFO / SHOW VECTOR INDEX INFO introspection, and the text_search / vector_search query procedures); all generic orchestration lives in GraphDB. Most applications should start with GraphRAG, which uses any adapter through the backend-agnostic database interface.

Entity merging is delegated to Memgraph's native refactor.merge_nodes procedure (provided by the MAGE library), so the server must run a MAGE-enabled image such as memgraph/memgraph-mage.

Installation

pip install 'grawiki[memgraph]'

From a repository checkout use uv sync --extra memgraph instead.

Running a local server

A docker-compose.yml at the repository root starts a MAGE-enabled Memgraph (and an optional Memgraph Lab UI at http://localhost:3000):

docker compose up -d

This exposes Bolt on localhost:7687.

Minimal example

from grawiki.db import MemgraphGraphDB

database = MemgraphGraphDB(
    "my_graph",
    host="localhost",
    port=7687,
)

Or via the backend factory:

from grawiki.db import create_graph_db

database = create_graph_db("memgraph", "my_graph", host="localhost", port=7687)

Operational notes

  • close() should be called explicitly in tests and short-lived scripts to release the Bolt connection pool.
  • GraphRAG.ingest(...), GraphRAG.ingest_text(...), and GraphRAG.remember(...) usually call setup() for you. Direct adapter usage may require an explicit await database.setup(...) before indexing or search operations.
  • Entity merging requires the refactor module, available in the memgraph/memgraph-mage image. Native merge semantics differ from the generic implementation: refactor.merge_nodes does not drop self-relationships.
  • Deletions clear node embeddings before removing nodes. Memgraph does not evict a deleted node's vector-index entry on its own, and a stale entry breaks subsequent vector searches on that label, so the adapter nulls embeddings first.

Advanced adapter helpers

  • query(...) executes write-capable Cypher.
  • ro_query(...) executes read-only Cypher and is the simplest option for ad hoc inspection.

Vector index creation can be tuned through the constructor arguments vector_similarity_function, vector_index_capacity, and vector_index_resize_coefficient.

grawiki.db.memgraph.MemgraphGraphDB

Bases: GraphDB

Graph adapter for a Memgraph server reached over the Bolt protocol.

Parameters:

Name Type Description Default
graph_name str

Logical graph name. Accepted for interface symmetry with :class:~grawiki.db.falkordb.FalkorGraphDB; Memgraph has no FalkorDB-style per-graph namespacing within a single database, so this value is stored but not used for routing.

required
host str

Hostname of the Memgraph server. Defaults to "localhost".

'localhost'
port int

Bolt port of the Memgraph server. Defaults to 7687.

7687
username str

Bolt auth username. Defaults to "" (no authentication).

''
password str

Bolt auth password. Defaults to "" (no authentication).

''
database str | None

Database name passed to each Bolt session. Defaults to "memgraph". Pass None to use the driver's default session without selecting a database (useful for servers that do not support multi-tenancy).

'memgraph'
vector_similarity_function Literal['cosine', 'euclidean']

Similarity function used for vector indexes. Mapped to Memgraph's "cos"/"l2sq" metrics.

'cosine'
vector_index_capacity int

Initial capacity hint for vector indexes.

1000
vector_index_resize_coefficient int

Growth multiplier applied when a vector index exceeds its capacity.

2

close

close()

Close the Bolt driver and release its connection pool.

query

query(query, params=None)

Execute a write-capable query.

Parameters:

Name Type Description Default
query str

Cypher query to execute.

required
params dict[str, Any] | None

Query parameters.

None

Returns:

Type Description
Any

Result exposing a positional result_set.

ro_query

ro_query(query, params=None)

Execute a read-only query.

Parameters:

Name Type Description Default
query str

Cypher query to execute.

required
params dict[str, Any] | None

Query parameters.

None

Returns:

Type Description
Any

Result exposing a positional result_set.

delete_memory async

delete_memory(memory_id)

Delete one memory and prune directly-mentioned orphan entities.

Mirrors :meth:~grawiki.db.base.GraphDB.delete_memory but clears node embeddings before each deletion so vector indexes stay consistent (see :meth:_clear_embeddings).

Parameters:

Name Type Description Default
memory_id str

Identifier of the memory node to remove.

required

merge_entity_nodes async

merge_entity_nodes(*, master, duplicate_ids)

Merge duplicate entity nodes into master using refactor.merge_nodes.

Memgraph's MAGE refactor.merge_nodes procedure redirects every relationship from the duplicates onto the master (the first node in the list) and deletes the duplicates. Master's canonical state is rewritten afterwards via :meth:~grawiki.db.base.GraphDB._update_entity_node so the result is independent of the procedure's property-merge strategy.

Parameters:

Name Type Description Default
master Node

Final persisted state for the surviving master node.

required
duplicate_ids Sequence[str]

Entity identifiers to merge into master and then delete.

required

Raises:

Type Description
ValueError

Raised when duplicate_ids contains master.id.

Notes

Native merge semantics differ from the generic implementation in :meth:~grawiki.db.base.GraphDB.merge_entity_nodes: refactor.merge_nodes does not drop self-relationships, so a duplicate that pointed at the master can yield a self-loop on the merged node.

setup async

setup(embedding_dimensions=None)

Prepare backend indexes and other database structures.

Parameters:

Name Type Description Default
embedding_dimensions dict[str, int] | None

Mapping from node label to embedding dimensionality for vector indexes that require the dimension to be known ahead of time.

None

ensure_indexes async

ensure_indexes(*, labels, vector_dims=None)

Ensure full-text and vector indexes exist for labels.

Parameters:

Name Type Description Default
labels Iterable[str]

Node labels whose indexes should be created.

required
vector_dims Mapping[str, int] | None

Per-label embedding dimensionality. Labels omitted from the mapping do not get a vector index.

None

upsert_nodes async

upsert_nodes(nodes)

Upsert nodes, creating indexes on first use per label.

Parameters:

Name Type Description Default
nodes Sequence[Node]

Nodes to create or update. Dispatches on concrete type and label.

required

upsert_relationships async

upsert_relationships(rels)

Upsert relationships, dispatching on label for match semantics.

Parameters:

Name Type Description Default
rels Sequence[Relationship]

Relationships to create or update.

required
Notes

__has_chunk__ guards both endpoints by label (__document__ -> __chunk__). Other system relationships such as __mentions__ have heterogeneous sources (a __chunk__ or __memory__) and match by id alone. All non-system labels guard both endpoints as __entity__. Every relationship persists the same id, label, and serialized properties fields.

fulltext_search(*, labels, query_text, limit=10)

Run a full-text search across one or more node labels.

Parameters:

Name Type Description Default
labels Sequence[str]

Labels whose full-text indexes should be queried.

required
query_text str

Raw full-text query string.

required
limit int

Maximum number of hits to return per label.

10

Returns:

Type Description
list[NodeHit]

Flat list of hits across the requested labels.

vector_search(*, labels, query_embedding, limit=10)

Run a vector similarity search across one or more node labels.

Parameters:

Name Type Description Default
labels Sequence[str]

Labels whose vector indexes should be queried.

required
query_embedding list[float]

Pre-computed query embedding.

required
limit int

Maximum number of hits to return per label.

10

Returns:

Type Description
list[NodeHit]

Flat list of hits across the requested labels.

neighbor_relationships async

neighbor_relationships(*, node_ids, limit_per_node=5)

Fetch one-hop relationship context for each seed node.

Parameters:

Name Type Description Default
node_ids Sequence[str]

Seed node identifiers.

required
limit_per_node int

Maximum number of relationship rows returned for each seed.

5

Returns:

Type Description
dict[str, list[NeighborRelationship]]

Relationship context keyed by seed id.

recall_subgraph async

recall_subgraph(*, memory_ids, hops=1, limit_per_memory=20)

Fetch a flattened k-hop recall subgraph for memory seeds.

Parameters:

Name Type Description Default
memory_ids Sequence[str]

Memory node identifiers used as traversal seeds.

required
hops int

Maximum traversal depth in hops. Must be at least 1.

1
limit_per_memory int

Maximum number of distinct paths expanded per memory seed before flattening them into relationship rows.

20

Returns:

Type Description
dict[str, list[NeighborRelationship]]

Flattened relationship rows keyed by memory id.

list_entities async

list_entities(*, include_embeddings=False)

Return persisted entity nodes ordered by semantic key then id.

Parameters:

Name Type Description Default
include_embeddings bool

Whether to include entity embeddings in the result.

False

Returns:

Type Description
list[Node]

Persisted entity nodes.

entity_relationship_counts async

entity_relationship_counts(node_ids)

Return total incident relationship counts for entity ids.

Parameters:

Name Type Description Default
node_ids Sequence[str]

Entity identifiers whose incident edge counts should be returned.

required

Returns:

Type Description
dict[str, int]

Mapping from entity id to total relationship count. Missing ids still appear with 0.

save_documents_and_chunks async

save_documents_and_chunks(documents, chunks)

Persist source documents and their chunks.

Parameters:

Name Type Description Default
documents list[Document]

Source documents to persist.

required
chunks list[Chunk]

Source chunks to persist and connect to their parent documents.

required

save_docs_and_chunks_to_db async

save_docs_and_chunks_to_db(doc_nodes, chunk_nodes)

Persist prepared document and chunk nodes.

Parameters:

Name Type Description Default
doc_nodes list[DocumentNode]

Prepared document nodes ready for persistence.

required
chunk_nodes list[ChunkNode]

Prepared chunk nodes ready for persistence.

required

save_entities_and_rels async

save_entities_and_rels(owner_ids, owner_graphs)

Persist extracted owner-linked entities and relationships.

Parameters:

Name Type Description Default
owner_ids Sequence[str]

Node identifiers that own the extracted graphs, such as chunk or memory ids.

required
owner_graphs dict[str, KnowledgeGraph]

Extracted graphs keyed by owner identifier.

required

Raises:

Type Description
ValueError

Raised when a graph references a chunk identifier that is not present in owner_ids.

search async

search(query, method, *, limit=10, query_embedding=None)

Search documents, chunks, and entities.

Parameters:

Name Type Description Default
query str

Raw user query text.

required
method ('fulltext', 'vector')

Search strategy to execute.

"fulltext"
limit int

Maximum number of results to return per node family.

10
query_embedding list[float] | None

Embedded query vector required for vector search.

None

Returns:

Type Description
SearchResults

Search hits grouped by node family.