Skip to content

Deduplication Helpers

These helpers are used by the facade-level deduplication flow to choose a surviving master node, merge labels and properties, and report the result. They are useful when building custom merge workflows on top of the lower-level APIs.

grawiki.similarity.deduplication

Helpers for post-persistence entity deduplication.

MergeReport dataclass

Summary of one merge decision.

Parameters:

Name Type Description Default
master_id str

Identifier of the surviving entity node.

required
duplicate_ids tuple[str, ...]

Identifiers of duplicate nodes merged into the master.

required
source str

Candidate source category, e.g. "collision" or "similarity".

required
merged_labels tuple[str, ...]

Alphabetically sorted label set that will remain on the master.

required
property_conflicts tuple[str, ...]

Property keys for which multiple duplicate values were observed and the master's value was kept.

required

pick_master

pick_master(nodes, relation_counts)

Return the preferred master node from a duplicate group.

merge_node_properties

merge_node_properties(master, duplicates)

Merge node properties while preserving the master's values on conflicts.

build_merged_master

build_merged_master(master, duplicates)

Return the final master node state for a merge group.