How to Build a RAG Pipeline with LangChain and Supabase pgvector in 2025
Prerequisites Checklist
Before writing a single line of Python, make sure your environment matches these requirements exactly. Mismatched versions are the #1 source of cryptic errors with this stack.
- [ ] Python 3.11+ — LangChain 0.2 uses Pydantic v2 internally; Python 3.10 works but 3.11 gives better async performance
- [ ] Node.js 20+ — Only needed if you use the Supabase CLI for local development; optional for a pure Python workflow
- [ ] Supabase account — Free tier is fine for development; sign up at supabase.com
- [ ] OpenAI API key —
text-embedding-3-smallortext-embedding-ada-002for embeddings;gpt-4ofor generation - [ ] LangChain 0.2+ — Specifically
langchain>=0.2.0,langchain-community>=0.2.0, andlangchain-openai>=0.1.0 - [ ] pip 23+ and a virtual environment (
python -m venv .venv)
Estimated time: 35–45 minutes for a working end-to-end pipeline.
Project folder structure
Keep your project organized from the start. Here's the layout we'll build toward:
rag-supabase/
├── .env # API keys — never commit this
├── .env.example # Safe template to commit
├── requirements.txt
├── ingest.py # Step 3: document ingestion
├── chain.py # Step 4: RAG chain definition
├── api.py # Step 5: FastAPI endpoint
├── docs/ # Source documents (PDFs, TXTs)
│ └── sample.pdf
└── supabase/
└── migrations/
└── 001_pgvector.sql # Step 1: database migration
Step 1: Set Up Supabase Project and Enable pgvector
Supabase ships with the pgvector extension pre-installed but not activated. You need to enable it and create a table with a vector column before LangChain can store or query anything. Getting the SQL schema right up front saves you from painful migration headaches later.
Create a new Supabase project
Go to supabase.com/dashboard, click New Project, choose a region close to your users, and set a strong database password. Copy the Project URL and service_role key from Settings → API — you'll need both shortly.
Enable the pgvector extension via SQL editor
Navigate to SQL Editor in your Supabase dashboard and run the full migration below. Save a copy as supabase/migrations/001_pgvector.sql:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Documents table with 1536-dimension embeddings (text-embedding-ada-002 / text-embedding-3-small)
CREATE TABLE IF NOT EXISTS documents (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
content text NOT NULL,
embedding vector(1536),
metadata jsonb DEFAULT '{}'
);
-- Index for fast approximate nearest-neighbor search
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- RPC function that LangChain's SupabaseVectorStore calls internally
CREATE OR REPLACE FUNCTION match_documents(
query_embedding vector(1536),
match_count int DEFAULT 5,
filter jsonb DEFAULT '{}'
)
RETURNS TABLE (
id uuid,
content text,
metadata jsonb,
similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
d.id,
d.content,
d.metadata,
1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
WHERE d.metadata @> filter
ORDER BY d.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
Note: The
<=>operator computes cosine distance.1 - distanceconverts it to cosine similarity (higher = more similar). Theivfflatindex dramatically speeds up queries on large datasets, but requires at least a few hundred rows before it outperforms a sequential scan.
If you plan to use text-embedding-3-large (3072 dimensions) instead, change every vector(1536) to vector(3072) now — you cannot alter a vector column's dimension after data is inserted.
Step 2: Install and Configure LangChain with Supabase
LangChain 0.2 split its codebase into multiple packages to reduce dependency bloat. Installing the right combination matters — the old monolithic langchain package alone won't give you SupabaseVectorStore.
Install Python dependencies with pip
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install \
langchain>=0.2.0 \
langchain-community>=0.2.0 \
langchain-openai>=0.1.0 \
supabase>=2.4.0 \
python-dotenv>=1.0.0 \
pypdf>=4.0.0 \
fastapi>=0.111.0 \
uvicorn[standard]>=0.29.0 \
pydantic>=2.0.0
pip freeze > requirements.txt
Configure environment variables securely
Create a .env file at the project root. Never hardcode secrets in source files:
# .env
SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... # service_role key
OPENAI_API_KEY=sk-...
Initialize SupabaseVectorStore in LangChain
Create chain.py and add the initialization logic that every other module will import:
# chain.py
import os
from dotenv import load_dotenv
from supabase import create_client, Client
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import SupabaseVectorStore
load_dotenv()
def get_vector_store() -> SupabaseVectorStore:
supabase: Client = create_client(
os.environ["SUPABASE_URL"],
os.environ["SUPABASE_SERVICE_KEY"], # Must be service_role, not anon key
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small", # 1536 dims, matches our SQL schema
openai_api_key=os.environ["OPENAI_API_KEY"],
)
vector_store = SupabaseVectorStore(
client=supabase,
embedding=embeddings,
table_name="documents",
query_name="match_documents", # Must match the SQL function name exactly
)
return vector_store
Note: Use the
service_rolekey here, not theanonkey. The anon key is subject to Row Level Security (RLS) policies. If RLS is enabled on yourdocumentstable without a permissive policy, every query will silently return zero results — one of the most confusing bugs in this stack.
Step 3: Ingest and Chunk Documents into the Vector Store
Raw documents are too large to embed as a whole — most embedding models cap at 8,192 tokens. Chunking splits documents into overlapping segments so context isn't lost at chunk boundaries, and each chunk gets its own embedding vector stored in Supabase.
Load and chunk documents
# ingest.py
import os
import sys
import time
from pathlib import Path
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from chain import get_vector_store
load_dotenv()
def ingest_file(file_path: str) -> int:
path = Path(file_path)
if path.suffix.lower() == ".pdf":
loader = PyPDFLoader(str(path))
else:
loader = TextLoader(str(path), encoding="utf-8")
print(f"Loading {path.name}...")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")
# Attach source metadata for idempotent upsert tracking
for i, chunk in enumerate(chunks):
chunk.metadata["source"] = str(path)
chunk.metadata["chunk_index"] = i
# Batch upserts in groups of 50 to avoid OpenAI rate limits
vector_store = get_vector_store()
batch_size = 50
total_inserted = 0
for i in range(0, len(chunks), batch_size):
batch = chunks[i : i + batch_size]
vector_store.add_documents(batch)
total_inserted += len(batch)
print(f" Upserted {total_inserted}/{len(chunks)} chunks")
if i + batch_size < len(chunks):
time.sleep(0.5) # Respect OpenAI embeddings rate limit
return total_inserted
if __name__ == "__main__":
for file_path in sys.argv[1:]:
count = ingest_file(file_path)
print(f"Done: {count} chunks ingested from {file_path}")
Run it against a document:
python ingest.py docs/sample.pdf
Performance tip: The OpenAI
text-embedding-3-smallendpoint allows up to 1,000,000 tokens per minute on tier-2 accounts. Batching 50 chunks at a time with a 500ms pause keeps you comfortably within limits for most documents. For bulk ingestion of thousands of pages, increase batch size and add exponential backoff onRateLimitError.
Step 4: Build the Retrieval-Augmented Generation Chain
With chunks indexed in Supabase, you can now wire up the actual RAG chain. The retriever performs a vector similarity search to fetch the top-k relevant chunks, which are injected into the LLM's context window alongside the user's question.
# chain.py (continued — add below get_vector_store)
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
def build_rag_chain() -> ConversationalRetrievalChain:
vector_store = get_vector_store()
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 5}, # Retrieve top 5 most similar chunks
)
llm = ChatOpenAI(
model="gpt-4o",
temperature=0.2, # Lower = more deterministic, better for Q&A
streaming=True,
openai_api_key=os.environ["OPENAI_API_KEY"],
)
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True,
output_key="answer",
)
chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
memory=memory,
return_source_documents=True, # Useful for citation UI
verbose=False,
)
return chain
Test the chain interactively before wiring it to an API:
# Quick smoke test — run with: python -c "from chain import build_rag_chain; ..."
import asyncio
from chain import build_rag_chain
chain = build_rag_chain()
result = chain.invoke({"question": "What is the main topic of the document?"})
print(result["answer"])
print("\nSources:")
for doc in result["source_documents"]:
print(f" - {doc.metadata.get('source', 'unknown')} (chunk {doc.metadata.get('chunk_index', '?')})")
Note:
ConversationBufferMemorystores the entire conversation in memory, which is fine for development. In production, swap it forConversationSummaryBufferMemoryor store chat history in Supabase itself (thepostgres_chat_message_historyintegration fromlangchain_communityworks perfectly here).
Step 5: Expose the RAG Chain via a FastAPI Endpoint
A working chain in a Python script is great for testing, but your frontend needs an HTTP interface. FastAPI gives you async support, automatic OpenAPI docs, and Pydantic validation with minimal boilerplate.
Scaffold the FastAPI app
# api.py
import json
import asyncio
from typing import List, Tuple
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from chain import build_rag_chain
app = FastAPI(title="RAG API", version="1.0.0")
# Build chain once at startup to reuse the same memory/connection
rag_chain = build_rag_chain()
class QueryRequest(BaseModel):
question: str
chat_history: List[Tuple[str, str]] = [] # List of (human, ai) tuples
class SourceDoc(BaseModel):
content: str
source: str
async def generate_stream(question: str, chat_history: list):
"""Async generator that yields JSON chunks for StreamingResponse."""
try:
# ConversationalRetrievalChain is synchronous; run in thread pool
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
None,
lambda: rag_chain.invoke({
"question": question,
"chat_history": chat_history,
})
)
answer = result.get("answer", "")
sources = [
{"content": doc.page_content[:200], "source": doc.metadata.get("source", "")}
for doc in result.get("source_documents", [])
]
# Stream the answer word by word for a typing effect
words = answer.split(" ")
for i, word in enumerate(words):
chunk = {"type": "token", "data": word + (" " if i < len(words) - 1 else "")}
yield json.dumps(chunk) + "\n"
await asyncio.sleep(0.01)
# Send sources as final chunk
yield json.dumps({"type": "sources", "data": sources}) + "\n"
except Exception as e:
yield json.dumps({"type": "error", "data": str(e)}) + "\n"
@app.post("/query")
async def query_documents(request: QueryRequest):
if not request.question.strip():
raise HTTPException(status_code=400, detail="Question cannot be empty")
return StreamingResponse(
generate_stream(request.question, request.chat_history),
media_type="application/x-ndjson",
)
@app.get("/health")
async def health():
return {"status": "ok"}
Start the server and test with curl
uvicorn api:app --reload --port 8000
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "Summarize the key points of the document", "chat_history": []}' \
--no-buffer
You'll see newline-delimited JSON chunks streaming back. To verify the Supabase side, open your Supabase dashboard → Logs → Postgres — you'll see the match_documents RPC being called with timing information.
Common Issues and Fixes
| Error / Symptom | Root Cause | Fix |
|---|---|---|
| operator does not exist: vector = unknown | pgvector extension not enabled, or casting issue in SQL | Run CREATE EXTENSION IF NOT EXISTS vector; and ensure the function signature uses vector(1536) not plain vector |
| Embeddings dimension mismatch | You changed embedding models after data was inserted (e.g., ada-002→3-small both happen to be 1536, but ada-002→3-large is 1536→3072) | Drop and recreate the table with the correct dimension; update vector(N) everywhere in SQL |
| Queries return 0 results silently | Supabase RLS is blocking the match_documents function | Use SUPABASE_SERVICE_KEY (service_role), not the anon key, OR add ALTER TABLE documents ENABLE ROW LEVEL SECURITY with a permissive policy |
| LangChain deprecation warnings in 0.2+ | Import paths moved from langchain.vectorstores to langchain_community.vectorstores | Update all imports: from langchain_community.vectorstores import SupabaseVectorStore and from langchain_openai import ChatOpenAI, OpenAIEmbeddings |
Error: operator does not exist vector = unknown
This usually means the SQL function was created before the vector extension was enabled, or in a different schema. Fix:
-- Verify extension is active
SELECT * FROM pg_extension WHERE extname = 'vector';
-- If missing, enable it
CREATE EXTENSION IF NOT EXISTS vector SCHEMA public;
-- Then re-run the CREATE FUNCTION block from Step 1
Embeddings dimension mismatch after model change
If you switch from text-embedding-ada-002 to text-embedding-3-large, the stored vector(1536) columns become incompatible with 3072-dimensional outputs. There's no ALTER shortcut:
-- Nuclear option: drop and recreate
DROP TABLE IF EXISTS documents CASCADE;
-- Then re-run the full migration with vector(3072)
-- Then re-run ingest.py on all your source documents
Supabase RLS blocking vector queries
If your documents table has RLS enabled and you're using the anon key, every match_documents() call returns zero rows with no error. The fix is either using the service_role key (safest for server-side code) or explicitly granting execute on the function:
-- Option A: Grant to authenticated role
GRANT EXECUTE ON FUNCTION match_documents TO authenticated;
-- Option B: Disable RLS on documents (development only)
ALTER TABLE documents DISABLE ROW LEVEL SECURITY;
LangChain deprecation warnings in 0.2+
LangChain 0.2 moved community integrations to separate packages. Update every import:
# OLD (pre-0.2) — raises deprecation warning
from langchain.vectorstores import SupabaseVectorStore
from langchain.chat_models import ChatOpenAI
# NEW (0.2+) — correct
from langchain_community.vectorstores import SupabaseVectorStore
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
FAQ
Q: Can I use a local embedding model instead of OpenAI?
Yes — HuggingFaceEmbeddings from langchain_community is a drop-in replacement. Install pip install sentence-transformers and swap the embeddings object:
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2", # 384 dims
)
Critically: all-MiniLM-L6-v2 produces 384-dimensional vectors, so you must change vector(1536) to vector(384) in your SQL schema and recreate the table before ingesting. Pick your model before you ingest any data.
Q: How do I keep embeddings in sync when source documents update?
The cleanest approach is an idempotent upsert strategy using a content hash stored in metadata. When re-ingesting a document, compute hashlib.md5(content.encode()).hexdigest() for each chunk and store it as metadata["content_hash"]. Before upserting, query Supabase for existing rows with that hash — if the hash matches, skip the chunk; if it differs (document updated), delete the old row and insert the new one. For deletions, filter by metadata->>'source' and delete all rows where the source file no longer exists on disk.
Q: Is this approach production-ready on Supabase free tier?
For low-traffic internal tools, yes. The free tier gives you a 500MB PostgreSQL database, which holds roughly 250,000 embeddings at 1536 dimensions (each vector is ~6KB on disk). The bigger constraint is the free tier's database pause after 1 week of inactivity and row limits on Edge Functions if you're using them for auth. For a production app with real users, upgrade to the Pro plan ($25/month) which removes the pause and gives you 8GB of database storage — comfortably supporting millions of embedded chunks.