AI Recipes15 min read
Build a RAG Pipeline with AI Embeddings
End-to-end recipe for retrieval-augmented generation.
The RAG flow
1. Chunk your documents into 200–500 token pieces. 2. Generate embeddings for each chunk. 3. Store embeddings in a vector database. 4. At query time, embed the query, find top-K similar chunks, send them as context to your LLM.
Step 1: Embed your corpus
Use our AI Embeddings API with sentence-transformers/all-MiniLM-L6-v2 (384 dimensions, fast) for most cases. Use all-mpnet-base-v2 (768 dim) when quality matters more than speed.
Step 2: Store in a vector DB
We don't host a vector DB. Pinecone, Weaviate, Qdrant, and pgvector all work great. Pick based on your stack.
Step 3: Query-time retrieval
Embed the user's question with the same model used in step 1. Query your vector DB for top-5 nearest chunks. Pass them to Multi-LLM Router as context.