Skip to content

Loading Data

A pipeline starts with data and is only as good as the data it ingests. A loader implements the Loader trait.

The Loader trait

Which is defined as follows:

pub trait Loader {
fn into_stream(self) -> IndexingStream;
}

Or in human language: “I can be turned into a stream”. The assumption under the hood is that Loaders will yield the data they load as a stream of Nodes. These can be files, messages, webpages and so on.

Built in loaders

NameDescriptionFeature Flag
FileLoaderLoads files with an optional extension filter, respecting gitignore
ScrapingLoaderScrapes a website using the spider crate and html to markdown transformationsscraping
fluvioLoad data directly from fluvio streamsfluvio