AI Recipes15 min read

Build a RAG Pipeline with AI Embeddings

End-to-end recipe for retrieval-augmented generation.

The RAG flow

1. Chunk your documents into 200–500 token pieces. 2. Generate embeddings for each chunk. 3. Store embeddings in a vector database. 4. At query time, embed the query, find top-K similar chunks, send them as context to your LLM.

Step 1: Embed your corpus

Use our AI Embeddings API with sentence-transformers/all-MiniLM-L6-v2 (384 dimensions, fast) for most cases. Use all-mpnet-base-v2 (768 dim) when quality matters more than speed.

Step 2: Store in a vector DB

We don't host a vector DB. Pinecone, Weaviate, Qdrant, and pgvector all work great. Pick based on your stack.

Step 3: Query-time retrieval

Embed the user's question with the same model used in step 1. Query your vector DB for top-5 nearest chunks. Pass them to Multi-LLM Router as context.