Both LlamaIndex and LangChain can build a RAG pipeline. The question is which one you'll be less frustrated with six months after you choose it.
I've built production RAG systems with both. The answer depends entirely on what you're building — and the frameworks have diverged significantly enough in 2025-2026 that the old "LlamaIndex is easier, LangChain is more flexible" take is outdated.
Let's build the same pipeline in both and compare the things that actually matter.
How both frameworks evolved
LangChain expanded aggressively. It started as a prompt orchestration library, added chains, added agents, added LangGraph for stateful workflows, added LangSmith for observability. It's now closer to a full-stack LLM framework. The complexity has grown with it — LCEL (LangChain Expression Language) introduced a pipe operator syntax that's elegant once you get it and confusing until you do.
LlamaIndex doubled down on RAG and data connectors. Version 0.10+ restructured the library significantly (breaking some older tutorials), but the result is a cleaner API for retrieval-focused use cases. It has 100+ data loaders (Notion, Confluence, Google Drive, PDFs, databases), query transformation strategies, and sub-question decomposition built in. For RAG-specific work, it's more batteries-included than LangChain.
The same pipeline in both frameworks
Load documents → chunk → embed → store → query → generate. Here it is in both.
LlamaIndex
pip install llama-index llama-index-llms-anthropic llama-index-embeddings-openai
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure models
Settings.llm = Anthropic(model="claude-sonnet-4-5")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.chunk_size = 512
Settings.chunk_overlap = 50
# Load documents from a directory
documents = SimpleDirectoryReader("./docs").load_data()
# Index (chunks, embeds, stores — all in one step)
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What is the refund policy for annual subscriptions?")
print(response.response)
print("\nSources:")
for node in response.source_nodes:
print(f" - {node.metadata.get('file_name', 'unknown')} (score: {node.score:.3f})")
That's 15 lines of meaningful code. LlamaIndex handles chunking, embedding, indexing, retrieval, and response synthesis. The magic is opinionated — you don't control a lot — but the defaults are sensible.
LangChain
pip install langchain langchain-anthropic langchain-openai langchain-chroma chromadb
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_anthropic import ChatAnthropic
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
# Load
loader = DirectoryLoader("./docs", glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()
# Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
chunks = splitter.split_documents(documents)
# Embed and store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
# LLM
llm = ChatAnthropic(model="claude-sonnet-4-5")
# RAG chain
prompt = ChatPromptTemplate.from_template("""Answer based on the context below.
If you don't know, say so.
Context: {context}
Question: {input}""")
combine_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, combine_chain)
# Query
result = rag_chain.invoke({"input": "What is the refund policy for annual subscriptions?"})
print(result["answer"])
print("\nSources:")
for doc in result["context"]:
print(f" - {doc.metadata.get('source', 'unknown')}")
That's 35 lines. LangChain requires you to explicitly construct each step. More verbose, but you see exactly what's happening.
Lines of code comparison
| Task | LlamaIndex | LangChain |
|---|---|---|
| Basic RAG | ~15 lines | ~35 lines |
| With custom prompt | +5 lines | Already explicit |
| Persistent vector store | +3 lines | +2 lines (swap Chroma) |
| Multiple file types | Built-in loaders | Requires loader per type |
LangChain's verbosity isn't always bad. When you need to customize the prompt, swap the retriever, or add post-processing to the output, LangChain's explicitness makes it easier to modify. With LlamaIndex, you sometimes have to dig into internals to override defaults.
Advanced retrieval: where LlamaIndex shines
LlamaIndex's killer feature for complex RAG is query transformation and decomposition.
Sub-question query engine — breaks complex queries into sub-questions, retrieves for each, then synthesizes:
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# Create tools for different document collections
tools = [
QueryEngineTool(
query_engine=policy_index.as_query_engine(),
metadata=ToolMetadata(
name="refund_policy",
description="Company refund and return policies"
)
),
QueryEngineTool(
query_engine=product_index.as_query_engine(),
metadata=ToolMetadata(
name="product_catalog",
description="Product specifications and pricing"
)
)
]
sub_question_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)
# This will decompose into: "What's the refund window?" + "What's the product price?"
response = sub_question_engine.query(
"Can I get a full refund on the Pro plan if I cancel within 14 days?"
)
LangChain's equivalent — MultiQueryRetriever — generates multiple rephrased queries and merges results, but it doesn't do the semantic decomposition LlamaIndex does. For complex multi-part questions, LlamaIndex's approach is meaningfully better.
Advanced retrieval: where LangChain shines
ParentDocumentRetriever — embeds small chunks for precise matching, returns the full parent document for context:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Small chunks for retrieval, large chunks for context
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
store = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=Chroma(embedding_function=embeddings),
docstore=store,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
retriever.add_documents(documents)
This retrieval pattern — embed small, return large — consistently outperforms naive chunking for long-form documents. LlamaIndex has a similar pattern (SentenceWindowNodeParser), but LangChain's implementation is more flexible.
LCEL composability — if you're building an agent that combines RAG with tools and conditional logic, LCEL makes composition cleaner:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
# Pipe syntax: retriever → format → prompt → llm → parse
rag_chain = (
RunnableParallel(context=retriever, question=RunnablePassthrough())
| prompt
| llm
| StrOutputParser()
)
You can't build something this composable in LlamaIndex without more boilerplate.
Observability
LangSmith (for LangChain) is the strongest observability story in the ecosystem. Set LANGCHAIN_TRACING_V2=true and every chain invocation automatically logs traces, token usage, latency, and costs. The UI shows you the full chain execution tree — retrieval, LLM call, output — in one view. Free tier is generous.
LlamaIndex has callbacks and event handlers, but there's no first-party LangSmith equivalent. You integrate with third-party tools (Arize, Weights & Biases, Braintrust) or build your own logging. It's more work to get equivalent observability.
If your team already uses LangSmith, that's a real point in LangChain's favor.
Performance
For basic retrieval, both frameworks have similar latency — they're wrappers around the same underlying operations (embedding API call, vector similarity search, LLM call). Framework overhead is negligible.
Where they differ: LlamaIndex's advanced retrieval strategies (sub-question, HyDE, query transformations) add latency because they make additional LLM calls. A sub-question engine might make 3-4 LLM calls for a single complex query. LangChain's MultiQueryRetriever does the same. Choose your retrieval strategy based on query complexity, not framework.
When LlamaIndex wins
- RAG-first projects where retrieval quality is the core product
- Multiple data sources — connecting Notion, Confluence, Google Drive, PDFs in one index
- Complex retrieval patterns — sub-question decomposition, recursive retrieval, hybrid search
- Teams new to RAG who want sensible defaults and don't need to configure every step
- Research/prototyping — faster to get to a working demo
When LangChain wins
- Agents + RAG combined — if your agent uses RAG as one of several tools, LCEL makes composition cleaner
- Fine-grained retrieval control — ParentDocumentRetriever, MultiVectorRetriever, custom retrievers
- Existing LangSmith investment — observability is genuinely better out of the box
- Team already knows LangChain — the switching cost is real, especially with LCEL
- LCEL chains where RAG is one step in a larger composed pipeline
Honest take
For pure RAG — documents → retrieval → answer — LlamaIndex is cleaner and faster to build with. The API surface is smaller, the defaults are better, and the advanced retrieval tooling is more mature.
For anything that combines RAG with agents, workflows, or conditional logic, LangChain gives you more control. LangGraph's stateful agent framework (covered in the LangGraph guide) has no direct LlamaIndex equivalent.
If you're starting fresh on a RAG-only project, use LlamaIndex. If you're building an agent that does RAG as part of a larger workflow, use LangChain + LangGraph. If you're a team with existing LangChain code, the migration cost rarely justifies switching.
The how RAG works deep dive covers the underlying concepts if you want to understand what either framework is doing under the hood. The vector store comparison helps with the storage layer decision once you've picked your framework.
And if you're evaluating whether RAG is even the right approach, the fine-tuning vs RAG guide lays out the tradeoffs clearly.



