Skip to content

Storing the results

After processing nodes in the pipeline you probably want to store the results. Pipelines support multiple storage steps, but need at least one. A storage implements the Persist trait.

The `Persist` trait

Which is defined as follows:

pub trait Persist: Debug + Send + Sync {
    async fn setup(&self) -> Result<()>;
    async fn store(&self, node: Node) -> Result<Node>;
    async fn batch_store(&self, nodes: Vec<Node>) -> IndexingStream;
    fn batch_size(&self) -> Option<usize> {
        None
    }
}

Setup functions are run right away, asynchronously when the pipeline starts. This could include setting up collections, tables, connections etcetera. Because more might happen after storing, both store and batch_store are expected to return the nodes they processed.

If batch_size is implemented for the storage, the stream will always prefer batch_store.

Built in storage

Name	Description	Feature Flag
Redis	Persists nodes by default as json	redis
Qdrant	Persists nodes in qdrant	qdrant
MemoryStorage	Persists nodes in memory; great for debugging
LanceDB	Persist and retrieve in lancedb	lancedb
PGVector	Persist and retrieve in pgvector	pgvector