How to Build a RAG Pipeline with LangChain and Supabase pgvector in 2025

How-To Guide·Jun 18, 2026·12 min read

Prerequisites Checklist

Before writing a single line of Python, make sure your environment matches these requirements exactly. Mismatched versions are the #1 source of cryptic errors with this stack.

[ ] Python 3.11+ — LangChain 0.2 uses Pydantic v2 internally; Python 3.10 works but 3.11 gives better async performance
[ ] Node.js 20+ — Only needed if you use the Supabase CLI for local development; optional for a pure Python workflow
[ ] Supabase account — Free tier is fine for development; sign up at supabase.com
[ ] OpenAI API key — text-embedding-3-small or text-embedding-ada-002 for embeddings; gpt-4o for generation
[ ] LangChain 0.2+ — Specifically langchain>=0.2.0, langchain-community>=0.2.0, and langchain-openai>=0.1.0
[ ] pip 23+ and a virtual environment (python -m venv .venv)

Estimated time: 35–45 minutes for a working end-to-end pipeline.

Project folder structure

Keep your project organized from the start. Here's the layout we'll build toward:

rag-supabase/
├── .env                    # API keys — never commit this
├── .env.example            # Safe template to commit
├── requirements.txt
├── ingest.py               # Step 3: document ingestion
├── chain.py                # Step 4: RAG chain definition
├── api.py                  # Step 5: FastAPI endpoint
├── docs/                   # Source documents (PDFs, TXTs)
│   └── sample.pdf
└── supabase/
    └── migrations/
        └── 001_pgvector.sql  # Step 1: database migration

Step 1: Set Up Supabase Project and Enable pgvector

Supabase ships with the pgvector extension pre-installed but not activated. You need to enable it and create a table with a vector column before LangChain can store or query anything. Getting the SQL schema right up front saves you from painful migration headaches later.

Create a new Supabase project

Go to supabase.com/dashboard, click New Project, choose a region close to your users, and set a strong database password. Copy the Project URL and service_role key from Settings → API — you'll need both shortly.

Enable the pgvector extension via SQL editor

Navigate to SQL Editor in your Supabase dashboard and run the full migration below. Save a copy as supabase/migrations/001_pgvector.sql:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Documents table with 1536-dimension embeddings (text-embedding-ada-002 / text-embedding-3-small)
CREATE TABLE IF NOT EXISTS documents (
  id        uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  content   text NOT NULL,
  embedding vector(1536),
  metadata  jsonb DEFAULT '{}'
);

-- Index for fast approximate nearest-neighbor search
CREATE INDEX IF NOT EXISTS documents_embedding_idx
  ON documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- RPC function that LangChain's SupabaseVectorStore calls internally
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding  vector(1536),
  match_count      int     DEFAULT 5,
  filter           jsonb   DEFAULT '{}'
)
RETURNS TABLE (
  id         uuid,
  content    text,
  metadata   jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.metadata,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE d.metadata @> filter
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Note: The <=> operator computes cosine distance. 1 - distance converts it to cosine similarity (higher = more similar). The ivfflat index dramatically speeds up queries on large datasets, but requires at least a few hundred rows before it outperforms a sequential scan.

If you plan to use text-embedding-3-large (3072 dimensions) instead, change every vector(1536) to vector(3072) now — you cannot alter a vector column's dimension after data is inserted.

Step 2: Install and Configure LangChain with Supabase

LangChain 0.2 split its codebase into multiple packages to reduce dependency bloat. Installing the right combination matters — the old monolithic langchain package alone won't give you SupabaseVectorStore.

Install Python dependencies with pip

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install \
  langchain>=0.2.0 \
  langchain-community>=0.2.0 \
  langchain-openai>=0.1.0 \
  supabase>=2.4.0 \
  python-dotenv>=1.0.0 \
  pypdf>=4.0.0 \
  fastapi>=0.111.0 \
  uvicorn[standard]>=0.29.0 \
  pydantic>=2.0.0

pip freeze > requirements.txt

Configure environment variables securely

Create a .env file at the project root. Never hardcode secrets in source files:

# .env
SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_SERVICE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...  # service_role key
OPENAI_API_KEY=sk-...

Initialize SupabaseVectorStore in LangChain

Create chain.py and add the initialization logic that every other module will import:

# chain.py
import os
from dotenv import load_dotenv
from supabase import create_client, Client
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import SupabaseVectorStore

load_dotenv()

def get_vector_store() -> SupabaseVectorStore:
    supabase: Client = create_client(
        os.environ["SUPABASE_URL"],
        os.environ["SUPABASE_SERVICE_KEY"],  # Must be service_role, not anon key
    )

    embeddings = OpenAIEmbeddings(
        model="text-embedding-3-small",  # 1536 dims, matches our SQL schema
        openai_api_key=os.environ["OPENAI_API_KEY"],
    )

    vector_store = SupabaseVectorStore(
        client=supabase,
        embedding=embeddings,
        table_name="documents",
        query_name="match_documents",  # Must match the SQL function name exactly
    )
    return vector_store

Note: Use the service_role key here, not the anon key. The anon key is subject to Row Level Security (RLS) policies. If RLS is enabled on your documents table without a permissive policy, every query will silently return zero results — one of the most confusing bugs in this stack.

Step 3: Ingest and Chunk Documents into the Vector Store

Raw documents are too large to embed as a whole — most embedding models cap at 8,192 tokens. Chunking splits documents into overlapping segments so context isn't lost at chunk boundaries, and each chunk gets its own embedding vector stored in Supabase.

Load and chunk documents

# ingest.py
import os
import sys
import time
from pathlib import Path
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from chain import get_vector_store

load_dotenv()

def ingest_file(file_path: str) -> int:
    path = Path(file_path)
    if path.suffix.lower() == ".pdf":
        loader = PyPDFLoader(str(path))
    else:
        loader = TextLoader(str(path), encoding="utf-8")

    print(f"Loading {path.name}...")
    documents = loader.load()

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""],
    )
    chunks = splitter.split_documents(documents)
    print(f"Split into {len(chunks)} chunks")

    # Attach source metadata for idempotent upsert tracking
    for i, chunk in enumerate(chunks):
        chunk.metadata["source"] = str(path)
        chunk.metadata["chunk_index"] = i

    # Batch upserts in groups of 50 to avoid OpenAI rate limits
    vector_store = get_vector_store()
    batch_size = 50
    total_inserted = 0

    for i in range(0, len(chunks), batch_size):
        batch = chunks[i : i + batch_size]
        vector_store.add_documents(batch)
        total_inserted += len(batch)
        print(f"  Upserted {total_inserted}/{len(chunks)} chunks")
        if i + batch_size < len(chunks):
            time.sleep(0.5)  # Respect OpenAI embeddings rate limit

    return total_inserted

if __name__ == "__main__":
    for file_path in sys.argv[1:]:
        count = ingest_file(file_path)
        print(f"Done: {count} chunks ingested from {file_path}")

Run it against a document:

python ingest.py docs/sample.pdf

Performance tip: The OpenAI text-embedding-3-small endpoint allows up to 1,000,000 tokens per minute on tier-2 accounts. Batching 50 chunks at a time with a 500ms pause keeps you comfortably within limits for most documents. For bulk ingestion of thousands of pages, increase batch size and add exponential backoff on RateLimitError.

Step 4: Build the Retrieval-Augmented Generation Chain

With chunks indexed in Supabase, you can now wire up the actual RAG chain. The retriever performs a vector similarity search to fetch the top-k relevant chunks, which are injected into the LLM's context window alongside the user's question.

# chain.py (continued — add below get_vector_store)
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

def build_rag_chain() -> ConversationalRetrievalChain:
    vector_store = get_vector_store()

    retriever = vector_store.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 5},  # Retrieve top 5 most similar chunks
    )

    llm = ChatOpenAI(
        model="gpt-4o",
        temperature=0.2,          # Lower = more deterministic, better for Q&A
        streaming=True,
        openai_api_key=os.environ["OPENAI_API_KEY"],
    )

    memory = ConversationBufferMemory(
        memory_key="chat_history",
        return_messages=True,
        output_key="answer",
    )

    chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        return_source_documents=True,  # Useful for citation UI
        verbose=False,
    )
    return chain

Test the chain interactively before wiring it to an API:

# Quick smoke test — run with: python -c "from chain import build_rag_chain; ..."
import asyncio
from chain import build_rag_chain

chain = build_rag_chain()
result = chain.invoke({"question": "What is the main topic of the document?"})
print(result["answer"])
print("\nSources:")
for doc in result["source_documents"]:
    print(f"  - {doc.metadata.get('source', 'unknown')} (chunk {doc.metadata.get('chunk_index', '?')})")

Note: ConversationBufferMemory stores the entire conversation in memory, which is fine for development. In production, swap it for ConversationSummaryBufferMemory or store chat history in Supabase itself (the postgres_chat_message_history integration from langchain_community works perfectly here).

Step 5: Expose the RAG Chain via a FastAPI Endpoint

A working chain in a Python script is great for testing, but your frontend needs an HTTP interface. FastAPI gives you async support, automatic OpenAPI docs, and Pydantic validation with minimal boilerplate.

Scaffold the FastAPI app

# api.py
import json
import asyncio
from typing import List, Tuple
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from chain import build_rag_chain

app = FastAPI(title="RAG API", version="1.0.0")

# Build chain once at startup to reuse the same memory/connection
rag_chain = build_rag_chain()

class QueryRequest(BaseModel):
    question: str
    chat_history: List[Tuple[str, str]] = []  # List of (human, ai) tuples

class SourceDoc(BaseModel):
    content: str
    source: str

async def generate_stream(question: str, chat_history: list):
    """Async generator that yields JSON chunks for StreamingResponse."""
    try:
        # ConversationalRetrievalChain is synchronous; run in thread pool
        loop = asyncio.get_event_loop()
        result = await loop.run_in_executor(
            None,
            lambda: rag_chain.invoke({
                "question": question,
                "chat_history": chat_history,
            })
        )

        answer = result.get("answer", "")
        sources = [
            {"content": doc.page_content[:200], "source": doc.metadata.get("source", "")}
            for doc in result.get("source_documents", [])
        ]

        # Stream the answer word by word for a typing effect
        words = answer.split(" ")
        for i, word in enumerate(words):
            chunk = {"type": "token", "data": word + (" " if i < len(words) - 1 else "")}
            yield json.dumps(chunk) + "\n"
            await asyncio.sleep(0.01)

        # Send sources as final chunk
        yield json.dumps({"type": "sources", "data": sources}) + "\n"

    except Exception as e:
        yield json.dumps({"type": "error", "data": str(e)}) + "\n"

@app.post("/query")
async def query_documents(request: QueryRequest):
    if not request.question.strip():
        raise HTTPException(status_code=400, detail="Question cannot be empty")

    return StreamingResponse(
        generate_stream(request.question, request.chat_history),
        media_type="application/x-ndjson",
    )

@app.get("/health")
async def health():
    return {"status": "ok"}

Start the server and test with curl

uvicorn api:app --reload --port 8000

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarize the key points of the document", "chat_history": []}' \
  --no-buffer

You'll see newline-delimited JSON chunks streaming back. To verify the Supabase side, open your Supabase dashboard → Logs → Postgres — you'll see the match_documents RPC being called with timing information.

Common Issues and Fixes

| Error / Symptom | Root Cause | Fix | |---|---|---| | operator does not exist: vector = unknown | pgvector extension not enabled, or casting issue in SQL | Run CREATE EXTENSION IF NOT EXISTS vector; and ensure the function signature uses vector(1536) not plain vector | | Embeddings dimension mismatch | You changed embedding models after data was inserted (e.g., ada-002→3-small both happen to be 1536, but ada-002→3-large is 1536→3072) | Drop and recreate the table with the correct dimension; update vector(N) everywhere in SQL | | Queries return 0 results silently | Supabase RLS is blocking the match_documents function | Use SUPABASE_SERVICE_KEY (service_role), not the anon key, OR add ALTER TABLE documents ENABLE ROW LEVEL SECURITY with a permissive policy | | LangChain deprecation warnings in 0.2+ | Import paths moved from langchain.vectorstores to langchain_community.vectorstores | Update all imports: from langchain_community.vectorstores import SupabaseVectorStore and from langchain_openai import ChatOpenAI, OpenAIEmbeddings |

Error: operator does not exist vector = unknown

This usually means the SQL function was created before the vector extension was enabled, or in a different schema. Fix:

-- Verify extension is active
SELECT * FROM pg_extension WHERE extname = 'vector';

-- If missing, enable it
CREATE EXTENSION IF NOT EXISTS vector SCHEMA public;

-- Then re-run the CREATE FUNCTION block from Step 1

Embeddings dimension mismatch after model change

If you switch from text-embedding-ada-002 to text-embedding-3-large, the stored vector(1536) columns become incompatible with 3072-dimensional outputs. There's no ALTER shortcut:

-- Nuclear option: drop and recreate
DROP TABLE IF EXISTS documents CASCADE;
-- Then re-run the full migration with vector(3072)
-- Then re-run ingest.py on all your source documents

Supabase RLS blocking vector queries

If your documents table has RLS enabled and you're using the anon key, every match_documents() call returns zero rows with no error. The fix is either using the service_role key (safest for server-side code) or explicitly granting execute on the function:

-- Option A: Grant to authenticated role
GRANT EXECUTE ON FUNCTION match_documents TO authenticated;

-- Option B: Disable RLS on documents (development only)
ALTER TABLE documents DISABLE ROW LEVEL SECURITY;

LangChain deprecation warnings in 0.2+

LangChain 0.2 moved community integrations to separate packages. Update every import:

# OLD (pre-0.2) — raises deprecation warning
from langchain.vectorstores import SupabaseVectorStore
from langchain.chat_models import ChatOpenAI

# NEW (0.2+) — correct
from langchain_community.vectorstores import SupabaseVectorStore
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

FAQ

Q: Can I use a local embedding model instead of OpenAI?

Yes — HuggingFaceEmbeddings from langchain_community is a drop-in replacement. Install pip install sentence-transformers and swap the embeddings object:

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",  # 384 dims
)

Critically: all-MiniLM-L6-v2 produces 384-dimensional vectors, so you must change vector(1536) to vector(384) in your SQL schema and recreate the table before ingesting. Pick your model before you ingest any data.

Q: How do I keep embeddings in sync when source documents update?

The cleanest approach is an idempotent upsert strategy using a content hash stored in metadata. When re-ingesting a document, compute hashlib.md5(content.encode()).hexdigest() for each chunk and store it as metadata["content_hash"]. Before upserting, query Supabase for existing rows with that hash — if the hash matches, skip the chunk; if it differs (document updated), delete the old row and insert the new one. For deletions, filter by metadata->>'source' and delete all rows where the source file no longer exists on disk.

Q: Is this approach production-ready on Supabase free tier?

For low-traffic internal tools, yes. The free tier gives you a 500MB PostgreSQL database, which holds roughly 250,000 embeddings at 1536 dimensions (each vector is ~6KB on disk). The bigger constraint is the free tier's database pause after 1 week of inactivity and row limits on Edge Functions if you're using them for auth. For a production app with real users, upgrade to the Pro plan ($25/month) which removes the pause and gives you 8GB of database storage — comfortably supporting millions of embedded chunks.