Extraction¶
This page covers the advanced extraction layer that turns raw text into a graph-shaped intermediate representation before persistence. Most users should use extraction through GraphRAG rather than constructing these pieces directly.
For the public ingestion flow and stepwise examples, start with Flows and How to ingest a document.
Structured output with Instructor¶
KnowledgeGraphExtractor relies on Instructor for structured LLM output. When extract(...) is called, the chunk text is sent to the configured chat model together with a system prompt that defines the desired node and relationship schema. Instructor requests the model to return JSON matching the ExtractedKnowledgeGraph Pydantic model, validates the response, and surfaces any schema violations as early as possible. This removes the need for manual JSON parsing or ad-hoc regex extraction.
from grawiki.graph.extraction import KnowledgeGraphExtractor
extractor = KnowledgeGraphExtractor(
model="openai:gpt-4.1-mini",
embedding=embedder,
output_language="Polish",
)
graph = await extractor.extract("Alan Turing was a pioneering computer scientist.")
The resulting graph is a KnowledgeGraph whose nodes already carry embeddings and durable UUIDs, ready for persistence.
When output_language is omitted, KnowledgeGraphExtractor defaults to English for extracted entity names, relationship labels, and textual properties.
grawiki.graph.extraction
¶
Knowledge graph extraction helpers.
This module also defines the LLM-facing transient types
(:class:ExtractedNode, :class:ExtractedRelationship,
:class:ExtractedKnowledgeGraph). They live here rather than in
:mod:grawiki.graph.models because they are an implementation detail
of extraction — the persisted domain model (Node / Relationship /
KnowledgeGraph) does not reference them.
KnowledgeGraphExtractorProtocol
¶
Bases: Protocol
Protocol for chunk-level knowledge graph extractors.
ExtractedNode
¶
Bases: GraphModel
Extractor-facing node without a machine-generated identifier.
This transient shape is produced by the LLM extractor before the
application assigns durable UUIDs and converts the result into persisted
:class:~grawiki.graph.models.Node objects.
ExtractedRelationship
¶
Bases: GraphModel
Extractor-facing relationship using node names as endpoints.
Relationship endpoints reference extracted node names within one extraction result and are later rewritten to durable node identifiers during persistence.
ExtractedKnowledgeGraph
¶
Bases: GraphModel
Extractor-facing graph before machine identifiers are assigned.
Node names act as temporary reference keys within one extraction result and
are later promoted into a persisted
:class:~grawiki.graph.models.KnowledgeGraph.
KnowledgeGraphExtractor
¶
Extract chunk-level knowledge graphs and attach entity embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Chat model used for structured knowledge extraction. Passed to
:func: |
required |
embedding
|
Embedding
|
Embedding client used for entity node vectors. Injected so callers share one embedding model across the pipeline instead of each component constructing its own. |
required |
prompt
|
str
|
Extraction prompt template. |
KG_EXTRACTION_PROMPT
|
max_triplets
|
int
|
Maximum number of triplets requested from the model. |
5
|
output_language
|
str
|
Language used for extracted node names, relationship labels, and
textual properties. Defaults to |
'English'
|
allowed_entity_types
|
list[str] | None
|
Optional entity label allow-list. |
None
|
allowed_relation_types
|
list[str] | None
|
Optional relationship label allow-list. |
None
|
fix_missing_nodes
|
bool
|
Whether to inject placeholder nodes for relationships that reference missing node names. |
True
|
extract_kwargs
|
dict[str, Any] | None
|
Extra keyword arguments forwarded to the instructor |
None
|
extract
async
¶
extract(text)
Extract a knowledge graph for one text input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Source text to analyze. |
required |
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
Extracted graph with embedded entity nodes. |