Build a Multi-Document RAG System with Embeddings + Claude

- July 15, 2025

So you've played with basic RAG (Retrieval-Augmented Generation), but now you're ready to level up. In this tutorial, you’ll learn how to build a smarter RAG system that can:

💾 Ingest multiple documents (PDF or TXT)
🔍 Use embeddings for fast, relevant search
🤖 Generate responses using Claude based on the most relevant content

This is perfect for anyone creating a smart knowledge assistant, internal wiki search, or custom Q&A bot.

🎯 What We’ll Use

Tool	Purpose
Python	Core logic
Streamlit	User interface
Claude API	Answer generation
OpenAI / TensorFlow	Embeddings
FAISS	Efficient vector search
PyMuPDF	PDF text extraction

---

📦 Step 1: Install the Tools

pip install streamlit faiss-cpu openai python-dotenv PyMuPDF anthropic tiktoken

🔐 Create a .env file with:

OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-claude-api-key

---

🧠 Step 2: Define Helper Functions

import os
import fitz
import faiss
import openai
import tiktoken
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

# Text splitter
def split_text(text, chunk_size=500, overlap=100):
    tokens = text.split()
    chunks = []
    for i in range(0, len(tokens), chunk_size - overlap):
        chunk = " ".join(tokens[i:i + chunk_size])
        chunks.append(chunk)
    return chunks

# PDF extraction
def extract_text_from_pdf(pdf_file):
    doc = fitz.open(stream=pdf_file.read(), filetype="pdf")
    return "\n".join(page.get_text() for page in doc)

# Get embeddings
def get_embedding(text):
    response = openai.Embedding.create(
        input=[text],
        model="text-embedding-ada-002"
    )
    return response["data"][0]["embedding"]

# Build FAISS index
def build_index(chunks):
    dim = len(get_embedding("sample"))
    index = faiss.IndexFlatL2(dim)
    metadata = []
    vectors = []
    for chunk in chunks:
        embedding = get_embedding(chunk)
        vectors.append(embedding)
        metadata.append(chunk)
    index.add(np.array(vectors).astype("float32"))
    return index, metadata

---

🖼️ Step 3: Create the Streamlit App

import streamlit as st
import numpy as np
import requests

# Claude API call
def ask_claude(question, context):
    prompt = f"""
You are a helpful assistant. Use the following context to answer the question:

Context:
{context}

Question:
{question}
"""
    headers = {
        "x-api-key": os.getenv("ANTHROPIC_API_KEY"),
        "anthropic-version": "2023-06-01",
        "Content-Type": "application/json"
    }
    body = {
        "model": "claude-3-haiku-20240307",
        "temperature": 0.6,
        "max_tokens": 800,
        "messages": [{"role": "user", "content": prompt}]
    }
    res = requests.post("https://api.anthropic.com/v1/messages", headers=headers, json=body)
    return res.json()["content"]

st.title("📂 Multi-Doc RAG System with Claude")

uploaded_files = st.file_uploader("Upload PDF or TXT files", type=["pdf", "txt"], accept_multiple_files=True)
query = st.text_input("What would you like to ask?")

if uploaded_files and query:
    all_chunks = []

    for f in uploaded_files:
        if f.name.endswith(".pdf"):
            text = extract_text_from_pdf(f)
        else:
            text = f.read().decode("utf-8")

        chunks = split_text(text)
        all_chunks.extend(chunks)

    index, metadata = build_index(all_chunks)
    query_embedding = np.array([get_embedding(query)]).astype("float32")
    scores, indices = index.search(query_embedding, k=3)

    relevant_chunks = [metadata[i] for i in indices[0]]
    context = "\n---\n".join(relevant_chunks)
    response = ask_claude(query, context)

    st.subheader("🤖 Claude's Answer:")
    st.markdown(response)

---

🚀 Step 4: Run the App

streamlit run app.py

🧪 Test It With:

Company policies, SOPs, or handbooks
Research papers or manuals
Multiple PDFs of notes or project docs

---

🔧 Next-Level Ideas

🔗 Show source filenames or page numbers in the response
📸 Add thumbnail previews of each document
🧵 Save conversation history
🧠 Add a "re-ask" button to refine questions

---

✅ You Did It!

You now have a working multi-document RAG system using embeddings and Claude. This architecture is perfect for internal knowledge tools, smart wikis, chatbots, and assistants that “know your stuff.”

✨ Stay tuned for a follow-up tutorial: “How to Deploy Your Claude RAG Assistant to the Web”

Search This Blog

From Scratch AI