Memgraph adapter¶
MemgraphGraphDB is a GraphDB backend that connects to a Memgraph server over the Bolt protocol. It is a thin Memgraph dialect over the shared Cypher engine, implementing only the Memgraph-specific hooks (Bolt connection via the neo4j driver, plain list embedding literals, CREATE TEXT INDEX / CREATE VECTOR INDEX DDL, SHOW INDEX INFO / SHOW VECTOR INDEX INFO introspection, and the text_search / vector_search query procedures); all generic orchestration lives in GraphDB. Most applications should start with GraphRAG, which uses any adapter through the backend-agnostic database interface.
Entity merging is delegated to Memgraph's native refactor.merge_nodes procedure (provided by the MAGE library), so the server must run a MAGE-enabled image such as memgraph/memgraph-mage.
Installation¶
pip install 'grawiki[memgraph]'
From a repository checkout use uv sync --extra memgraph instead.
Running a local server¶
A docker-compose.yml at the repository root starts a MAGE-enabled Memgraph (and an optional Memgraph Lab UI at http://localhost:3000):
docker compose up -d
This exposes Bolt on localhost:7687.
Minimal example¶
from grawiki.db import MemgraphGraphDB
database = MemgraphGraphDB(
"my_graph",
host="localhost",
port=7687,
)
Or via the backend factory:
from grawiki.db import create_graph_db
database = create_graph_db("memgraph", "my_graph", host="localhost", port=7687)
Operational notes¶
close()should be called explicitly in tests and short-lived scripts to release the Bolt connection pool.GraphRAG.ingest(...),GraphRAG.ingest_text(...), andGraphRAG.remember(...)usually callsetup()for you. Direct adapter usage may require an explicitawait database.setup(...)before indexing or search operations.- Entity merging requires the
refactormodule, available in thememgraph/memgraph-mageimage. Native merge semantics differ from the generic implementation:refactor.merge_nodesdoes not drop self-relationships. - Deletions clear node embeddings before removing nodes. Memgraph does not evict a deleted node's vector-index entry on its own, and a stale entry breaks subsequent vector searches on that label, so the adapter nulls embeddings first.
Advanced adapter helpers¶
query(...)executes write-capable Cypher.ro_query(...)executes read-only Cypher and is the simplest option for ad hoc inspection.
Vector index creation can be tuned through the constructor arguments vector_similarity_function, vector_index_capacity, and vector_index_resize_coefficient.
grawiki.db.memgraph.MemgraphGraphDB
¶
Bases: GraphDB
Graph adapter for a Memgraph server reached over the Bolt protocol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_name
|
str
|
Logical graph name. Accepted for interface symmetry with
:class: |
required |
host
|
str
|
Hostname of the Memgraph server. Defaults to |
'localhost'
|
port
|
int
|
Bolt port of the Memgraph server. Defaults to |
7687
|
username
|
str
|
Bolt auth username. Defaults to |
''
|
password
|
str
|
Bolt auth password. Defaults to |
''
|
database
|
str | None
|
Database name passed to each Bolt session. Defaults to |
'memgraph'
|
vector_similarity_function
|
Literal['cosine', 'euclidean']
|
Similarity function used for vector indexes. Mapped to Memgraph's
|
'cosine'
|
vector_index_capacity
|
int
|
Initial capacity hint for vector indexes. |
1000
|
vector_index_resize_coefficient
|
int
|
Growth multiplier applied when a vector index exceeds its capacity. |
2
|
query
¶
query(query, params=None)
Execute a write-capable query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Cypher query to execute. |
required |
params
|
dict[str, Any] | None
|
Query parameters. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
Result exposing a positional |
ro_query
¶
ro_query(query, params=None)
Execute a read-only query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Cypher query to execute. |
required |
params
|
dict[str, Any] | None
|
Query parameters. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
Result exposing a positional |
delete_memory
async
¶
delete_memory(memory_id)
Delete one memory and prune directly-mentioned orphan entities.
Mirrors :meth:~grawiki.db.base.GraphDB.delete_memory but clears node
embeddings before each deletion so vector indexes stay consistent (see
:meth:_clear_embeddings).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
memory_id
|
str
|
Identifier of the memory node to remove. |
required |
merge_entity_nodes
async
¶
merge_entity_nodes(*, master, duplicate_ids)
Merge duplicate entity nodes into master using refactor.merge_nodes.
Memgraph's MAGE refactor.merge_nodes procedure redirects every
relationship from the duplicates onto the master (the first node in the
list) and deletes the duplicates. Master's canonical state is rewritten
afterwards via :meth:~grawiki.db.base.GraphDB._update_entity_node so the
result is independent of the procedure's property-merge strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
master
|
Node
|
Final persisted state for the surviving master node. |
required |
duplicate_ids
|
Sequence[str]
|
Entity identifiers to merge into |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
Raised when |
Notes
Native merge semantics differ from the generic implementation in
:meth:~grawiki.db.base.GraphDB.merge_entity_nodes: refactor.merge_nodes
does not drop self-relationships, so a duplicate that pointed at the
master can yield a self-loop on the merged node.
setup
async
¶
setup(embedding_dimensions=None)
Prepare backend indexes and other database structures.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding_dimensions
|
dict[str, int] | None
|
Mapping from node label to embedding dimensionality for vector indexes that require the dimension to be known ahead of time. |
None
|
ensure_indexes
async
¶
ensure_indexes(*, labels, vector_dims=None)
Ensure full-text and vector indexes exist for labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
Iterable[str]
|
Node labels whose indexes should be created. |
required |
vector_dims
|
Mapping[str, int] | None
|
Per-label embedding dimensionality. Labels omitted from the mapping do not get a vector index. |
None
|
upsert_nodes
async
¶
upsert_nodes(nodes)
Upsert nodes, creating indexes on first use per label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
Sequence[Node]
|
Nodes to create or update. Dispatches on concrete type and label. |
required |
upsert_relationships
async
¶
upsert_relationships(rels)
Upsert relationships, dispatching on label for match semantics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rels
|
Sequence[Relationship]
|
Relationships to create or update. |
required |
Notes
__has_chunk__ guards both endpoints by label (__document__ ->
__chunk__). Other system relationships such as __mentions__ have
heterogeneous sources (a __chunk__ or __memory__) and match by
id alone. All non-system labels guard both endpoints as
__entity__. Every relationship persists the same id, label,
and serialized properties fields.
fulltext_search
async
¶
fulltext_search(*, labels, query_text, limit=10)
Run a full-text search across one or more node labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
Sequence[str]
|
Labels whose full-text indexes should be queried. |
required |
query_text
|
str
|
Raw full-text query string. |
required |
limit
|
int
|
Maximum number of hits to return per label. |
10
|
Returns:
| Type | Description |
|---|---|
list[NodeHit]
|
Flat list of hits across the requested labels. |
vector_search
async
¶
vector_search(*, labels, query_embedding, limit=10)
Run a vector similarity search across one or more node labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
Sequence[str]
|
Labels whose vector indexes should be queried. |
required |
query_embedding
|
list[float]
|
Pre-computed query embedding. |
required |
limit
|
int
|
Maximum number of hits to return per label. |
10
|
Returns:
| Type | Description |
|---|---|
list[NodeHit]
|
Flat list of hits across the requested labels. |
neighbor_relationships
async
¶
neighbor_relationships(*, node_ids, limit_per_node=5)
Fetch one-hop relationship context for each seed node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_ids
|
Sequence[str]
|
Seed node identifiers. |
required |
limit_per_node
|
int
|
Maximum number of relationship rows returned for each seed. |
5
|
Returns:
| Type | Description |
|---|---|
dict[str, list[NeighborRelationship]]
|
Relationship context keyed by seed id. |
recall_subgraph
async
¶
recall_subgraph(*, memory_ids, hops=1, limit_per_memory=20)
Fetch a flattened k-hop recall subgraph for memory seeds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
memory_ids
|
Sequence[str]
|
Memory node identifiers used as traversal seeds. |
required |
hops
|
int
|
Maximum traversal depth in hops. Must be at least |
1
|
limit_per_memory
|
int
|
Maximum number of distinct paths expanded per memory seed before flattening them into relationship rows. |
20
|
Returns:
| Type | Description |
|---|---|
dict[str, list[NeighborRelationship]]
|
Flattened relationship rows keyed by memory id. |
list_entities
async
¶
list_entities(*, include_embeddings=False)
Return persisted entity nodes ordered by semantic key then id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_embeddings
|
bool
|
Whether to include entity embeddings in the result. |
False
|
Returns:
| Type | Description |
|---|---|
list[Node]
|
Persisted entity nodes. |
entity_relationship_counts
async
¶
entity_relationship_counts(node_ids)
Return total incident relationship counts for entity ids.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_ids
|
Sequence[str]
|
Entity identifiers whose incident edge counts should be returned. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
Mapping from entity id to total relationship count. Missing ids
still appear with |
save_documents_and_chunks
async
¶
save_documents_and_chunks(documents, chunks)
Persist source documents and their chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
documents
|
list[Document]
|
Source documents to persist. |
required |
chunks
|
list[Chunk]
|
Source chunks to persist and connect to their parent documents. |
required |
save_docs_and_chunks_to_db
async
¶
save_docs_and_chunks_to_db(doc_nodes, chunk_nodes)
Persist prepared document and chunk nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc_nodes
|
list[DocumentNode]
|
Prepared document nodes ready for persistence. |
required |
chunk_nodes
|
list[ChunkNode]
|
Prepared chunk nodes ready for persistence. |
required |
save_entities_and_rels
async
¶
save_entities_and_rels(owner_ids, owner_graphs)
Persist extracted owner-linked entities and relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
owner_ids
|
Sequence[str]
|
Node identifiers that own the extracted graphs, such as chunk or memory ids. |
required |
owner_graphs
|
dict[str, KnowledgeGraph]
|
Extracted graphs keyed by owner identifier. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
Raised when a graph references a chunk identifier that is not
present in |
search
async
¶
search(query, method, *, limit=10, query_embedding=None)
Search documents, chunks, and entities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Raw user query text. |
required |
method
|
('fulltext', 'vector')
|
Search strategy to execute. |
"fulltext"
|
limit
|
int
|
Maximum number of results to return per node family. |
10
|
query_embedding
|
list[float] | None
|
Embedded query vector required for vector search. |
None
|
Returns:
| Type | Description |
|---|---|
SearchResults
|
Search hits grouped by node family. |