Step-by-step Introduction
Swiftide provides a pipeline model. Troughout a pipeline, IngestionNodes
are transformed and ultimately persisted. Every step with a pipeline returns the same, owned pipeline.
A pipeline step-by-step
-
The pipeline starts with a loader:
let pipeline = IngestionPipeline::from_loader(FileLoader::new("./"));A loader implements the
Loader
trait which yieldsIngestionNodes
to a stream. -
Nodes can then be transformed with an existing transformer:
pipeline.then(MetadataQACode::new(openai_client.clone()));Any transformer has to implement the
Transformer
trait, which takes an ownedIngestionNode
and outputs aResult<IngestionNode>
. Closures also implement this trait! -
… so you can also do this:
pipeline.then(|node| {node.chunk = format!("{}\n{}", &node.chunk, "awesome!");Ok(node)}); -
Batch transformations are also supported:
pipeline.then_in_batch(10, Embed::new(FastEmbed::try_default()?));Batchable transformers implement the
BatchableTransformer
trait, which takes a vector ofIngestionNodes
and outputs anIngestionStream
. -
Nodes can be filtered using a NodeCache at any stage, based on a cache key the node cache defines. Redis uses a prefix and the hash of an
IngestionNode
, based on the path and text, by default.pipeline.filter_cached(Redis::try_from_url(redis_url,"swiftide-examples",)?);Node caches implement the
NodeCache
trait, which defines aget
andset
method, taking anIngestionNode
as input. -
At any point in the pipeline, nodes can be chunked into smaller parts:
pipeline.then_chunk(ChunkCode::try_for_language_and_chunk_size("rust",10..2048,)?);Chunkers implement the ChunkerTransformer trait, which take an
IngestionNode
and return anIngestionStream
. By default metadata is copied over to each node. -
Also, nodes can be persisted (multiple times!) to storage:
pipeline.then_store_with(Qdrant::try_from_url(qdrant_url)?.batch_size(50).vector_size(1536).collection_name("swiftide-examples".to_string()).build()?,)Storages implement the
Storage
trait, which definesetup
,store
,batch_store
andbatch_size
methods. They also provide ways to convert anIngestionNode
to something that can be stored. -
Finally, the pipeline can be run as follows:
pipeline.run()?;