Skip to content

Transforming and Enriching

Transformers are the bread and butter of an ingestion pipeline. They can transform the chunk, extract, modify and add metadata, adding vectors, and probably a whole lot more that we haven’t thought of.

There’s two ways to apply a transformer. Per node or in batch.

The `Transformer` trait

The Transformer trait is very straightforward:

pub trait Transformer: Send + Sync {
    async fn transform_node(&self, node: IngestionNode) -> Result<IngestionNode>;

    fn concurrency(&self) -> Option<usize> {
        None
    }
}

Or in human language: “You get a node, do your thing, then return a result with the node”. That’s it.

In batches, the BatchableTransformer trait is similar, except that it needs to return a stream. See docs.rs for more details.

Built in transformers

Name	Description	Feature Flag
Embed	Generic embedding transformer, requires an LLM
MetadataKeywords	Uses an LLM to extract keywords and add as metadata
MetadataQACode	Uses an LLM to generate questions and answers for Code
MetadataQAText	Uses an LLM to generate questions and answers for Text
MetadataSummary	Uses an LLM to generate a summary
MetadataTitle	Uses an LLM to generate a title
HtmlToMarkdownTransformer	Converts html in a node to markdown	scraping