LangChain vs LlamaIndex for RAG Pipelines on AWS in 2025

Comparison·Jun 12, 2026·15 min read

Quick Summary: LangChain vs LlamaIndex for RAG on AWS

For pure RAG workloads on AWS — document ingestion, retrieval, and Q&A — LlamaIndex wins on ergonomics and retrieval depth. If you need agents, tool-calling, or complex multi-step workflows alongside RAG, LangChain's LCEL and agent ecosystem are hard to beat. Both frameworks now support Amazon Bedrock and OpenSearch Serverless, so your AWS stack doesn't force the decision.

Side-by-Side Feature Table

| Dimension | LangChain 0.3+ | LlamaIndex 0.10+ | |---|---|---| | Vector store integrations | 50+ (OpenSearch, pgvector, Pinecone…) | 30+ (OpenSearch, pgvector, Weaviate…) | | Amazon Bedrock support | ✓ ChatBedrock + BedrockEmbeddings | ✓ BedrockLLM + BedrockEmbedding | | Amazon OpenSearch Serverless | ✓ SigV4 via OpenSearchVectorSearch | ✓ SigV4 via OpensearchVectorClient | | S3 document loading | ✓ S3FileLoader / S3DirectoryLoader | ✓ S3Reader via llama-hub | | Chunking strategies | RecursiveCharacter, Token, Markdown | SentenceWindow, HierarchicalNode, Token | | Query decomposition | MultiQueryRetriever, ContextualCompression | SubQuestionQueryEngine, RecursiveRetrieval | | Reranking support | Cohere, cross-encoder via callbacks | Native postprocessors (Cohere, BGE, LLM) | | Observability | LangSmith, LangFuse, CloudWatch callbacks | Arize Phoenix, TruLens, OpenTelemetry | | Agent / tool-use ecosystem | ✓✓ Mature (LangGraph, ReAct, OpenAI tools) | ✓ Growing (query pipelines, agentic RAG) | | Community / GitHub stars | ~90k stars | ~35k stars |

TL;DR Recommendation Matrix

| If you need… | Pick… | |---|---| | Enterprise document Q&A with minimal boilerplate | LlamaIndex | | Agentic workflows + RAG in the same system | LangChain | | Best-in-class retrieval tuning and reranking | LlamaIndex | | LangSmith tracing out of the box | LangChain | | Multi-document synthesis with structured data | LlamaIndex | | Complex tool-calling over external APIs | LangChain |

Architecture Overview: How Each Framework Builds a RAG Pipeline

LangChain LCEL Chain Design

LangChain's LangChain Expression Language (LCEL) treats every component — retrievers, prompts, LLMs, output parsers — as a Runnable. You compose pipelines with the | pipe operator, and every step is automatically async-capable and streaming-compatible. The mental model is a Unix pipeline: data flows left to right, and you can swap any stage without rewriting the rest.

A standard RAG chain looks like: loader → splitter → vectorstore.as_retriever() | prompt | llm | output_parser. The retriever and LLM are both Runnables, so you can add .with_retry(), .with_fallbacks(), or .bind() extra kwargs at any point without breaking the chain.

LlamaIndex Query Engine and Node Pipeline

LlamaIndex is index-centric. Documents become Node objects with metadata, embeddings, and relationships. An Index (VectorStoreIndex, SummaryIndex, KnowledgeGraphIndex) wraps those nodes. A QueryEngine sits on top of the index and orchestrates retrieval, optional postprocessing (reranking, filtering), and synthesis. The separation between indexing time and query time is explicit and first-class — you build your index once, persist it, then attach query engines without re-ingesting.

How Both Integrate with Amazon Bedrock

Both frameworks call Bedrock via boto3 under the hood and respect your AWS credential chain (IAM role, ~/.aws/credentials, environment variables). For LangChain, use langchain_aws.ChatBedrock; for LlamaIndex, use llama_index.llms.bedrock.Bedrock. Neither requires you to manage Bedrock API keys — your EC2 instance profile or ECS task role is enough.

Code Example 1 — Minimal LangChain LCEL RAG chain on AWS:

import boto3
from langchain_aws import ChatBedrock, BedrockEmbeddings
from langchain_community.document_loaders import S3DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import OpenSearchVectorSearch
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from opensearchpy import RequestsHttpConnection
from requests_aws4auth import AWS4Auth

# AWS auth for OpenSearch Serverless
region = "us-east-1"
service = "aoss"  # use 'es' for managed OpenSearch
credentials = boto3.Session().get_credentials()
aws_auth = AWS4Auth(
    credentials.access_key,
    credentials.secret_key,
    region,
    service,
    session_token=credentials.token,
)

# 1. Load documents from S3
loader = S3DirectoryLoader(bucket="my-rag-docs", prefix="knowledge-base/")
docs = loader.load()

# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=128)
chunks = splitter.split_documents(docs)

# 3. Embeddings via Bedrock Titan
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    region_name=region,
)

# 4. Index into OpenSearch Serverless
vectorstore = OpenSearchVectorSearch.from_documents(
    chunks,
    embeddings,
    opensearch_url="https://<collection-id>.us-east-1.aoss.amazonaws.com",
    http_auth=aws_auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    index_name="rag-index",
)

# 5. Build LCEL chain
llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=region,
)

prompt = ChatPromptTemplate.from_template(
    """Answer using only the context below.\nContext: {context}\nQuestion: {question}"""
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

response = chain.invoke("What are the key compliance requirements?")
print(response)

Vector Store and Embedding Support on AWS

Amazon OpenSearch Serverless as a Vector Store

OpenSearch Serverless (AOSS) uses SigV4 request signing — not basic auth. Both frameworks support this, but the auth wiring differs slightly. In LangChain you pass http_auth=aws_auth with a requests_aws4auth.AWS4Auth object. In LlamaIndex you use OpensearchVectorClient with an opensearch-py client pre-configured with the same SigV4 connection class. The key gotcha: AOSS requires the aoss service name in your AWS4Auth constructor, not es.

Using Amazon Bedrock Embeddings (Titan)

Amazon Titan Embed Text v2 outputs 1024-dimensional vectors and supports 8192-token input windows. It's the most cost-effective embedding option on AWS at ~$0.00002 per 1K tokens. LangChain's BedrockEmbeddings and LlamaIndex's BedrockEmbedding both wrap this model identically. One practical difference: LlamaIndex's BedrockEmbedding exposes a dimensions parameter so you can request 256- or 512-dim vectors for lower storage cost.

pgvector on Amazon RDS vs OpenSearch: Which Fits Each Framework Better

For sub-10M document collections where you already run Postgres on RDS, pgvector cuts infrastructure complexity. LangChain's PGVector integration is more mature and supports metadata filtering with SQLAlchemy expressions. LlamaIndex's pgvector integration works but requires more manual connection management. For collections above 10M vectors or when you need hybrid BM25+dense search, AOSS wins for both frameworks — it scales storage and compute independently and requires no cluster sizing decisions.

Code Example 2 — LlamaIndex OpenSearchVectorStore with Bedrock Titan embeddings:

import boto3
import tempfile
import os
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.opensearch import OpensearchVectorStore, OpensearchVectorClient
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.core import SimpleDirectoryReader
from opensearchpy import RequestsHttpConnection, AWSV4SignerAuth

region = "us-east-1"
host = "<collection-id>.us-east-1.aoss.amazonaws.com"

# SigV4 auth for AOSS
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, "aoss")

# Build OpenSearch client
os_client = OpensearchVectorClient(
    endpoint=f"https://{host}",
    index="llamaindex-rag",
    dim=1024,
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

vector_store = OpensearchVectorStore(os_client)

# Download PDFs from S3 to a temp directory
s3 = boto3.client("s3", region_name=region)
bucket = "my-rag-docs"
prefix = "knowledge-base/"

with tempfile.TemporaryDirectory() as tmp_dir:
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        for obj in page.get("Contents", []):
            key = obj["Key"]
            if key.endswith(".pdf"):
                local_path = os.path.join(tmp_dir, os.path.basename(key))
                s3.download_file(bucket, key, local_path)

    documents = SimpleDirectoryReader(tmp_dir).load_data()

# Titan Embed v2 embeddings
embed_model = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v2:0",
    region_name=region,
)

# Build and persist index
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=embed_model,
    show_progress=True,
)
print(f"Indexed {len(documents)} documents into OpenSearch Serverless.")

Chunking, Indexing, and Document Ingestion

LangChain Text Splitters: RecursiveCharacterTextSplitter Deep Dive

RecursiveCharacterTextSplitter splits on a priority list of separators: ["\n\n", "\n", " ", ""]. It tries double-newlines first, falls back to single newlines, then spaces. This preserves paragraph structure better than a naive fixed-size splitter. Chunk metadata (source, page) flows through automatically from the loader, and you can add custom metadata fields via a metadatas parameter when calling create_documents. The key weakness: it doesn't understand sentence boundaries, so a 1024-character chunk might end mid-sentence.

LlamaIndex NodeParser and SentenceWindowNodeParser

LlamaIndex's SentenceWindowNodeParser is a meaningful upgrade for dense document corpora. It splits on sentence boundaries, then stores a configurable window of surrounding sentences in each node's metadata (window key). At query time, a MetadataReplacementPostProcessor swaps the retrieved sentence with its full window context before synthesis — so the LLM sees 3-5 sentences of context instead of one isolated sentence. This consistently improves answer quality on long-form documents without increasing index size.

Ingesting from S3 with Both Frameworks

Both frameworks require a download-to-local step because their loaders expect a filesystem path. The pattern is: paginate S3, download to tempfile.TemporaryDirectory(), run your parser, then discard the temp files. LlamaIndex's SimpleDirectoryReader auto-detects file types (PDF via PyMuPDF, DOCX via python-docx, CSV, HTML), which is more ergonomic than LangChain where you pick a loader per file type.

Code Example 3 — LlamaIndex SentenceWindowNodeParser with S3 ingestion:

import boto3
import tempfile
import os
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
from llama_index.embeddings.bedrock import BedrockEmbedding

region = "us-east-1"
s3 = boto3.client("s3", region_name=region)
bucket = "my-rag-docs"
prefix = "reports/"

# Download from S3, preserving S3 key as file_metadata
with tempfile.TemporaryDirectory() as tmp_dir:
    paginator = s3.get_paginator("list_objects_v2")
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        for obj in page.get("Contents", []):
            key = obj["Key"]
            if not key.endswith("/"):
                local_path = os.path.join(tmp_dir, os.path.basename(key))
                s3.download_file(bucket, key, local_path)

    # file_metadata injects S3 key and bucket into every node
    def s3_metadata(filepath: str) -> dict:
        filename = os.path.basename(filepath)
        return {"s3_bucket": bucket, "s3_prefix": prefix, "filename": filename}

    documents = SimpleDirectoryReader(
        tmp_dir,
        file_metadata=s3_metadata,
    ).load_data()

# SentenceWindowNodeParser: window_size=3 stores 3 sentences each side
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

nodes = node_parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes from {len(documents)} documents.")
print(f"Sample node metadata keys: {list(nodes[0].metadata.keys())}")
# Output: ['s3_bucket', 's3_prefix', 'filename', 'window', 'original_text', ...]

embed_model = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v2:0",
    region_name=region,
)

index = VectorStoreIndex(
    nodes,
    embed_model=embed_model,
    show_progress=True,
)

Query Pipeline Flexibility and Advanced Retrieval

LangChain: Multi-Query Retriever and Contextual Compression

LangChain's MultiQueryRetriever sends your original question to an LLM, generates N paraphrased variants, runs each against the vector store, then deduplicates the results. This is a simple but effective way to recover documents that use different terminology than the user's query. Layer ContextualCompressionRetriever on top to extract only the relevant passage from each retrieved document, which reduces prompt token count by 40-60% on verbose documents.

LlamaIndex: Sub-Question Query Engine and Recursive Retrieval

LlamaIndex's SubQuestionQueryEngine decomposes a complex question into sub-questions, routes each to the most appropriate index or tool, then synthesizes the answers. It's more structured than MultiQueryRetriever — you define which indexes handle which domains, and the engine reasons about routing. RecursiveRetriever allows nodes to reference other nodes or external indexes, enabling true hierarchical document structures where a summary node links to its detail pages.

Hybrid Search with BM25 + Dense Vectors on OpenSearch

OpenSearch natively supports hybrid search combining BM25 keyword scores with k-NN vector scores via a hybrid query type and a normalization processor. LangChain requires you to construct the hybrid query JSON manually and pass it via custom_query. LlamaIndex's OpenSearch integration has more built-in hybrid search support through its query mode configuration, making hybrid retrieval slightly less boilerplate-heavy.

Code Example 4 — LangChain MultiQueryRetriever with Bedrock Claude and OpenSearch:

import boto3
from langchain_aws import ChatBedrock, BedrockEmbeddings
from langchain_community.vectorstores import OpenSearchVectorSearch
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from opensearchpy import RequestsHttpConnection
from requests_aws4auth import AWS4Auth

region = "us-east-1"
credentials = boto3.Session().get_credentials()
aws_auth = AWS4Auth(
    credentials.access_key,
    credentials.secret_key,
    region,
    "aoss",
    session_token=credentials.token,
)

embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    region_name=region,
)

# Connect to existing OpenSearch index
vectorstore = OpenSearchVectorSearch(
    opensearch_url="https://<collection-id>.us-east-1.aoss.amazonaws.com",
    index_name="rag-index",
    embedding_function=embeddings,
    http_auth=aws_auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

# Claude LLM for query expansion
llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=region,
    model_kwargs={"max_tokens": 512, "temperature": 0.0},
)

# MultiQueryRetriever generates 3 query variants automatically
multi_retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    llm=llm,
    include_original=True,  # always include the original query
)

# RAG chain with multi-query expansion
prompt = ChatPromptTemplate.from_template(
    "Answer the question based only on the context.\nContext: {context}\nQuestion: {question}"
)

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

chain = (
    {"context": multi_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# This will auto-expand to ~4 query variants before retrieval
response = chain.invoke("What cost reduction measures were implemented in Q3?")
print(response)

Observability, Tracing, and Cost Tracking on AWS

LangChain + AWS CloudWatch via LangSmith Callbacks

LangChain emits lifecycle events (on_chain_start, on_retriever_end, on_llm_end) through its callback system. The simplest path to CloudWatch is a custom BaseCallbackHandler that writes structured JSON to a CloudWatch log group. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY for LangSmith, which gives you a full trace UI with latency breakdown, token counts, and error rates. LangSmith's RunTree also lets you capture Bedrock model IDs and token usage from the LLMResult response metadata.

LlamaIndex Instrumentation with Arize Phoenix or TruLens

LlamaIndex 0.10 moved to an instrumentation module with OpenTelemetry-compatible span events. llama_index.core.instrumentation exports get_dispatcher() — attach any SpanHandler (Arize Phoenix, LangFuse, or a custom OTLP handler) and every query, retrieval, and synthesis step becomes a traced span. For CloudWatch, send OTLP spans to an AWS Distro for OpenTelemetry (ADOT) collector sidecar, which forwards to CloudWatch Application Signals.

Tracking Bedrock Token Costs in Both Frameworks

Bedrock returns token usage in ResponseMetadata → usage. Both frameworks surface this differently:

| Concern | LangChain | LlamaIndex | |---|---|---| | Token count access | LLMResult.llm_output["usage"] in callbacks | TokenCountingHandler or span metadata | | Cost estimation | LangChain community get_openai_callback equivalent not built-in for Bedrock | TokenCountingHandler with manual $/token rate | | CloudWatch integration | Custom BaseCallbackHandler → boto3.logs.put_log_events | OTLP → ADOT → CloudWatch Application Signals | | X-Ray tracing | Manual via aws_xray_sdk in callbacks | Via ADOT OTLP exporter |

IAM permissions required for observability:

logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents for CloudWatch Logs
xray:PutTraceSegments, xray:PutTelemetryRecords for X-Ray
bedrock:InvokeModel scoped to your specific model ARNs
aoss:APIAccessAll for OpenSearch Serverless data plane access

When to Choose LangChain for Your AWS RAG Project

Multi-Step Agent Workflows and Tool Use

LangChain is the clear winner when RAG is one capability among many. Its agent ecosystem — LangGraph for stateful multi-agent graphs, ReAct agents, OpenAI Tools agents — is significantly more mature than LlamaIndex's agentic capabilities.

Choose LangChain if:

Your system needs agents that call external APIs, run SQL queries, and do RAG in the same workflow
You're building on top of LangGraph for stateful, cyclical agent workflows
You need LangSmith monitoring with minimal setup — it's the most polished LLM observability UI available
Your team already has LangChain expertise from non-AWS projects and wants to port it to Bedrock
You need Contextual Compression Retriever to aggressively trim retrieved context before synthesis
You're integrating with non-AWS vector stores (Pinecone, Weaviate, Chroma) alongside AWS services in a hybrid cloud setup
Your pipeline requires complex branching or conditional routing between different LLM calls, which LCEL's RunnableBranch handles cleanly

When to Choose LlamaIndex for Your AWS RAG Project

Document-Heavy Knowledge Bases and Structured Data

LlamaIndex's design philosophy is retrieval-first. The indexing → storage → query separation means you can swap query strategies without re-ingesting, which matters when you have millions of documents. Its StructuredDataExtractor and SQLTableRetriever are best-in-class for mixed structured/unstructured workloads.

Choose LlamaIndex if:

You're building an enterprise document Q&A system over large PDF/DOCX/HTML corpora
You need SentenceWindowNodeParser or HierarchicalNodeParser for fine-grained retrieval quality control
Your use case requires multi-document synthesis — LlamaIndex's RouterQueryEngine and SubQuestionQueryEngine handle this more elegantly
You want native reranking with minimal code (Cohere, BGE, LLM-based) via postprocessors
Your team prioritizes less boilerplate — a working RAG pipeline in LlamaIndex is ~40% fewer lines than the equivalent LangChain chain
You're building structured retrieval over knowledge graphs or SQL databases alongside vector search

Code Example 5 — LlamaIndex RetrieverQueryEngine with CohereRerank and AOSS:

import boto3
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.vector_stores.opensearch import OpensearchVectorStore, OpensearchVectorClient
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.llms.bedrock import Bedrock
from opensearchpy import RequestsHttpConnection, AWSV4SignerAuth

region = "us-east-1"
host = "<collection-id>.us-east-1.aoss.amazonaws.com"

# SigV4 auth for OpenSearch Serverless
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, "aoss")

os_client = OpensearchVectorClient(
    endpoint=f"https://{host}",
    index="llamaindex-rag",
    dim=1024,
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

vector_store = OpensearchVectorStore(os_client)

# Reconnect to existing index (no re-ingestion)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
embed_model = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v2:0",
    region_name=region,
)
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store,
    embed_model=embed_model,
)

# Retrieve top-20 candidates, rerank to top-5
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=20,
)

# CohereRerank uses Cohere API — set COHERE_API_KEY env var
cohere_reranker = CohereRerank(
    top_n=5,
    model="rerank-english-v3.0",
)

llm = Bedrock(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=region,
    max_tokens=1024,
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[cohere_reranker],
    llm=llm,
)

response = query_engine.query(
    "Summarize the regulatory changes affecting our Q4 reporting obligations."
)
print(response)
print(f"\nSource nodes used: {len(response.source_nodes)}")
for node in response.source_nodes:
    print(f"  - {node.metadata.get('filename', 'unknown')} (score: {node.score:.3f})")

Verdict: Choosing the Right RAG Framework for AWS in 2025

Summary Recommendation Table

| Project Type | Recommended Framework | Key AWS Services | |---|---|---| | Enterprise document Q&A | LlamaIndex | Bedrock (Titan + Claude), AOSS, S3 | | Agentic AI assistant | LangChain | Bedrock (Claude), OpenSearch or pgvector, Lambda | | Multi-document research synthesis | LlamaIndex | Bedrock, AOSS, S3 | | Chatbot with tool-calling + RAG | LangChain | Bedrock, DynamoDB (memory), AOSS | | High-volume ingestion pipeline | LlamaIndex | S3, Bedrock, AOSS, SQS | | Rapid prototype, minimal code | LlamaIndex | Bedrock, AOSS | | Teams using LangGraph/LangSmith | LangChain | Bedrock, AOSS or pgvector on RDS |

Migration Considerations if You Switch Later

Switching frameworks is painful at the ingestion layer (different node/document schemas) but manageable at the query layer. Your OpenSearch index and Bedrock embeddings are framework-agnostic — the vector data itself is portable. The main migration cost is rewriting your chunking pipeline and query logic, not re-embedding. Budget 2-4 weeks for a medium-sized production system migration, mostly spent on validating retrieval quality after the switch.

Key Takeaways Checklist

Answer these before choosing:

[ ] Data volume: >5M document chunks? Both work, but LlamaIndex's index/query separation makes re-querying cheaper.
[ ] Agent needs: Do you need tool-calling, multi-step reasoning, or LangGraph state machines? → LangChain.
[ ] Retrieval depth: Do you need reranking, SentenceWindow context, or sub-question decomposition? → LlamaIndex.
[ ] Observability: Is LangSmith a hard requirement? → LangChain. Comfortable with Phoenix/TruLens? → Either.
[ ] Team familiarity: Don't underestimate this. The best framework is the one your team debugs fluently.
[ ] Boilerplate tolerance: LlamaIndex ships working RAG in fewer lines. LangChain rewards teams who need that granular control.

For the most common AWS RAG use case — a document knowledge base serving employee or customer Q&A over PDF/DOCX content stored in S3, with Bedrock LLMs and OpenSearch Serverless — LlamaIndex is the better default choice in 2025. It gets you production-quality retrieval faster, with less framework friction. Reach for LangChain the moment your system needs to do things beyond retrieval: calling APIs, executing code, or coordinating multiple AI agents.