Features
Main features
- Fast streaming indexing pipeline with async, parallel processing
- Experimental query pipeline
- Integrations with OpenAI, Groq, Redis, Qdrant, FastEmbed, Treesitter, and more
- A variety of loaders, transformers, embedders, and other common, generic tools
- Bring your own transformers by extending straightforward traits
- Jinja-like templating for prompts
- Evaluate pipelines with RAGAS
- Splitting and merging pipelines
- Store into multiple backends
Other cool things
tracing
support for logging and tracing, see /examples and the tracing crate for more information- Indexing Pipeline supports closures as well
- Embed either all fields on a node, combine it into a single field or do both
- Store results in memory for debugging and experimentation
LLMs & Embeddings
Name | Prompting | Embedding | Feature flag | Notes |
---|---|---|---|---|
openai | openai | |||
AWS Bedrock | aws-bedrock | Mistral and Titan models supported | ||
groq | groq | All major models supported, uses async_openai with Groq’s openai schema under the hood | ||
FastEmbed | fastembed | Uses fastembed.rs under the hood, dense and sparse embedding models supported | ||
Ollama | ollama | Ollama support |
Additional integrations
Name | Feature flag | Notes |
---|---|---|
Qdrant | qdrant | Named vectors also supported. |
Redis | redis | Supports caching and storage |
Spider & htmd | scraping | Scrape websites fast and convert the html to markdown |
Treesitter | tree-sitter | Code splitting and various transformers to effectively index code |
Fluvio | fluvio | Loading data from fluvio streams |
Lancedb | lancedb | Storing and retrieval from lancedb |