Skip to content

Step-by-step Introduction

Swiftide provides a pipeline model. Troughout a pipeline, IngestionNodes are transformed and ultimately persisted. Every step with a pipeline returns the same, owned pipeline.

A pipeline step-by-step

  1. The pipeline starts with a loader:

    let pipeline = IngestionPipeline::from_loader(FileLoader::new("./"));

    A loader implements the Loader trait which yields IngestionNodes to a stream.

  2. Nodes can then be transformed with an existing transformer:

    pipeline.then(MetadataQACode::new(openai_client.clone()));

    Any transformer has to implement the Transformer trait, which takes an owned IngestionNode and outputs a Result<IngestionNode>. Closures also implement this trait!

  3. … so you can also do this:

    pipeline.then(|node| {
    node.chunk = format!("{}\n{}", &node.chunk, "awesome!");
    Ok(node)
    });
  4. Batch transformations are also supported:

    pipeline.then_in_batch(10, Embed::new(FastEmbed::try_default()?));

    Batchable transformers implement the BatchableTransformer trait, which takes a vector of IngestionNodes and outputs an IngestionStream.

  5. Nodes can be filtered using a NodeCache at any stage, based on a cache key the node cache defines. Redis uses a prefix and the hash of an IngestionNode, based on the path and text, by default.

    pipeline.filter_cached(Redis::try_from_url(
    redis_url,
    "swiftide-examples",
    )?);

    Node caches implement the NodeCache trait, which defines a get and set method, taking an IngestionNode as input.

  6. At any point in the pipeline, nodes can be chunked into smaller parts:

    pipeline.then_chunk(ChunkCode::try_for_language_and_chunk_size(
    "rust",
    10..2048,
    )?);

    Chunkers implement the ChunkerTransformer trait, which take an IngestionNode and return an IngestionStream. By default metadata is copied over to each node.

  7. Also, nodes can be persisted (multiple times!) to storage:

    pipeline.then_store_with(
    Qdrant::try_from_url(qdrant_url)?
    .batch_size(50)
    .vector_size(1536)
    .collection_name("swiftide-examples".to_string())
    .build()?,
    )

    Storages implement the Storage trait, which define setup, store, batch_store and batch_size methods. They also provide ways to convert an IngestionNode to something that can be stored.

  8. Finally, the pipeline can be run as follows:

    pipeline.run()?;

Read more

Reference documentation on docs.rs