What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

LlamaIndex vs LangChain for RAG in 2026 — A Code-First Comparison

Both LlamaIndex and LangChain can build a RAG pipeline. The question is which one you'll be less frustrated with six months after you choose it.

I've built production RAG systems with both. The answer depends entirely on what you're building — and the frameworks have diverged significantly enough in 2025-2026 that the old "LlamaIndex is easier, LangChain is more flexible" take is outdated.

Let's build the same pipeline in both and compare the things that actually matter.

How both frameworks evolved

LangChain expanded aggressively. It started as a prompt orchestration library, added chains, added agents, added LangGraph for stateful workflows, added LangSmith for observability. It's now closer to a full-stack LLM framework. The complexity has grown with it — LCEL (LangChain Expression Language) introduced a pipe operator syntax that's elegant once you get it and confusing until you do.

LlamaIndex doubled down on RAG and data connectors. Version 0.10+ restructured the library significantly (breaking some older tutorials), but the result is a cleaner API for retrieval-focused use cases. It has 100+ data loaders (Notion, Confluence, Google Drive, PDFs, databases), query transformation strategies, and sub-question decomposition built in. For RAG-specific work, it's more batteries-included than LangChain.

The same pipeline in both frameworks

Load documents → chunk → embed → store → query → generate. Here it is in both.

LlamaIndex

pip install llama-index llama-index-llms-anthropic llama-index-embeddings-openai

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure models
Settings.llm = Anthropic(model="claude-sonnet-4-5")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.chunk_size = 512
Settings.chunk_overlap = 50

# Load documents from a directory
documents = SimpleDirectoryReader("./docs").load_data()

# Index (chunks, embeds, stores — all in one step)
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What is the refund policy for annual subscriptions?")

print(response.response)
print("\nSources:")
for node in response.source_nodes:
    print(f"  - {node.metadata.get('file_name', 'unknown')} (score: {node.score:.3f})")

That's 15 lines of meaningful code. LlamaIndex handles chunking, embedding, indexing, retrieval, and response synthesis. The magic is opinionated — you don't control a lot — but the defaults are sensible.

LangChain

pip install langchain langchain-anthropic langchain-openai langchain-chroma chromadb

from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_anthropic import ChatAnthropic
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# Load
loader = DirectoryLoader("./docs", glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()

# Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
chunks = splitter.split_documents(documents)

# Embed and store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# LLM
llm = ChatAnthropic(model="claude-sonnet-4-5")

# RAG chain
prompt = ChatPromptTemplate.from_template("""Answer based on the context below.
If you don't know, say so.

Context: {context}

Question: {input}""")

combine_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, combine_chain)

# Query
result = rag_chain.invoke({"input": "What is the refund policy for annual subscriptions?"})

print(result["answer"])
print("\nSources:")
for doc in result["context"]:
    print(f"  - {doc.metadata.get('source', 'unknown')}")

That's 35 lines. LangChain requires you to explicitly construct each step. More verbose, but you see exactly what's happening.

Lines of code comparison

Task	LlamaIndex	LangChain
Basic RAG	~15 lines	~35 lines
With custom prompt	+5 lines	Already explicit
Persistent vector store	+3 lines	+2 lines (swap Chroma)
Multiple file types	Built-in loaders	Requires loader per type

LangChain's verbosity isn't always bad. When you need to customize the prompt, swap the retriever, or add post-processing to the output, LangChain's explicitness makes it easier to modify. With LlamaIndex, you sometimes have to dig into internals to override defaults.

Advanced retrieval: where LlamaIndex shines

LlamaIndex's killer feature for complex RAG is query transformation and decomposition.

Sub-question query engine — breaks complex queries into sub-questions, retrieves for each, then synthesizes:

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Create tools for different document collections
tools = [
    QueryEngineTool(
        query_engine=policy_index.as_query_engine(),
        metadata=ToolMetadata(
            name="refund_policy",
            description="Company refund and return policies"
        )
    ),
    QueryEngineTool(
        query_engine=product_index.as_query_engine(),
        metadata=ToolMetadata(
            name="product_catalog",
            description="Product specifications and pricing"
        )
    )
]

sub_question_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)

# This will decompose into: "What's the refund window?" + "What's the product price?"
response = sub_question_engine.query(
    "Can I get a full refund on the Pro plan if I cancel within 14 days?"
)

LangChain's equivalent — MultiQueryRetriever — generates multiple rephrased queries and merges results, but it doesn't do the semantic decomposition LlamaIndex does. For complex multi-part questions, LlamaIndex's approach is meaningfully better.

Advanced retrieval: where LangChain shines

ParentDocumentRetriever — embeds small chunks for precise matching, returns the full parent document for context:

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Small chunks for retrieval, large chunks for context
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)

store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=Chroma(embedding_function=embeddings),
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)
retriever.add_documents(documents)

This retrieval pattern — embed small, return large — consistently outperforms naive chunking for long-form documents. LlamaIndex has a similar pattern (SentenceWindowNodeParser), but LangChain's implementation is more flexible.

LCEL composability — if you're building an agent that combines RAG with tools and conditional logic, LCEL makes composition cleaner:

from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Pipe syntax: retriever → format → prompt → llm → parse
rag_chain = (
    RunnableParallel(context=retriever, question=RunnablePassthrough())
    | prompt
    | llm
    | StrOutputParser()
)

You can't build something this composable in LlamaIndex without more boilerplate.

Observability

LangSmith (for LangChain) is the strongest observability story in the ecosystem. Set LANGCHAIN_TRACING_V2=true and every chain invocation automatically logs traces, token usage, latency, and costs. The UI shows you the full chain execution tree — retrieval, LLM call, output — in one view. Free tier is generous.

LlamaIndex has callbacks and event handlers, but there's no first-party LangSmith equivalent. You integrate with third-party tools (Arize, Weights & Biases, Braintrust) or build your own logging. It's more work to get equivalent observability.

If your team already uses LangSmith, that's a real point in LangChain's favor.

Performance

For basic retrieval, both frameworks have similar latency — they're wrappers around the same underlying operations (embedding API call, vector similarity search, LLM call). Framework overhead is negligible.

Where they differ: LlamaIndex's advanced retrieval strategies (sub-question, HyDE, query transformations) add latency because they make additional LLM calls. A sub-question engine might make 3-4 LLM calls for a single complex query. LangChain's MultiQueryRetriever does the same. Choose your retrieval strategy based on query complexity, not framework.

When LlamaIndex wins

RAG-first projects where retrieval quality is the core product
Multiple data sources — connecting Notion, Confluence, Google Drive, PDFs in one index
Complex retrieval patterns — sub-question decomposition, recursive retrieval, hybrid search
Teams new to RAG who want sensible defaults and don't need to configure every step
Research/prototyping — faster to get to a working demo

When LangChain wins

Agents + RAG combined — if your agent uses RAG as one of several tools, LCEL makes composition cleaner
Fine-grained retrieval control — ParentDocumentRetriever, MultiVectorRetriever, custom retrievers
Existing LangSmith investment — observability is genuinely better out of the box
Team already knows LangChain — the switching cost is real, especially with LCEL
LCEL chains where RAG is one step in a larger composed pipeline

Honest take

For pure RAG — documents → retrieval → answer — LlamaIndex is cleaner and faster to build with. The API surface is smaller, the defaults are better, and the advanced retrieval tooling is more mature.

For anything that combines RAG with agents, workflows, or conditional logic, LangChain gives you more control. LangGraph's stateful agent framework (covered in the LangGraph guide) has no direct LlamaIndex equivalent.

If you're starting fresh on a RAG-only project, use LlamaIndex. If you're building an agent that does RAG as part of a larger workflow, use LangChain + LangGraph. If you're a team with existing LangChain code, the migration cost rarely justifies switching.

The how RAG works deep dive covers the underlying concepts if you want to understand what either framework is doing under the hood. The vector store comparison helps with the storage layer decision once you've picked your framework.

And if you're evaluating whether RAG is even the right approach, the fine-tuning vs RAG guide lays out the tradeoffs clearly.

Both LlamaIndex and LangChain can build a RAG pipeline. The question is which one you'll be less frustrated with six months after you choose it.

Let's build the same pipeline in both and compare the things that actually matter.

How both frameworks evolved

The same pipeline in both frameworks

Load documents → chunk → embed → store → query → generate. Here it is in both.

LlamaIndex

pip install llama-index llama-index-llms-anthropic llama-index-embeddings-openai

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure models
Settings.llm = Anthropic(model="claude-sonnet-4-5")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.chunk_size = 512
Settings.chunk_overlap = 50

# Load documents from a directory
documents = SimpleDirectoryReader("./docs").load_data()

# Index (chunks, embeds, stores — all in one step)
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine(similarity_top_k=4)
response = query_engine.query("What is the refund policy for annual subscriptions?")

print(response.response)
print("\nSources:")
for node in response.source_nodes:
    print(f"  - {node.metadata.get('file_name', 'unknown')} (score: {node.score:.3f})")

LangChain

pip install langchain langchain-anthropic langchain-openai langchain-chroma chromadb

from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_anthropic import ChatAnthropic
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# Load
loader = DirectoryLoader("./docs", glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()

# Chunk
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
chunks = splitter.split_documents(documents)

# Embed and store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# LLM
llm = ChatAnthropic(model="claude-sonnet-4-5")

# RAG chain
prompt = ChatPromptTemplate.from_template("""Answer based on the context below.
If you don't know, say so.

Context: {context}

Question: {input}""")

combine_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, combine_chain)

# Query
result = rag_chain.invoke({"input": "What is the refund policy for annual subscriptions?"})

print(result["answer"])
print("\nSources:")
for doc in result["context"]:
    print(f"  - {doc.metadata.get('source', 'unknown')}")

That's 35 lines. LangChain requires you to explicitly construct each step. More verbose, but you see exactly what's happening.

Lines of code comparison

Task	LlamaIndex	LangChain
Basic RAG	~15 lines	~35 lines
With custom prompt	+5 lines	Already explicit
Persistent vector store	+3 lines	+2 lines (swap Chroma)
Multiple file types	Built-in loaders	Requires loader per type

Advanced retrieval: where LlamaIndex shines

LlamaIndex's killer feature for complex RAG is query transformation and decomposition.

Sub-question query engine — breaks complex queries into sub-questions, retrieves for each, then synthesizes:

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Create tools for different document collections
tools = [
    QueryEngineTool(
        query_engine=policy_index.as_query_engine(),
        metadata=ToolMetadata(
            name="refund_policy",
            description="Company refund and return policies"
        )
    ),
    QueryEngineTool(
        query_engine=product_index.as_query_engine(),
        metadata=ToolMetadata(
            name="product_catalog",
            description="Product specifications and pricing"
        )
    )
]

sub_question_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)

# This will decompose into: "What's the refund window?" + "What's the product price?"
response = sub_question_engine.query(
    "Can I get a full refund on the Pro plan if I cancel within 14 days?"
)

Advanced retrieval: where LangChain shines

ParentDocumentRetriever — embeds small chunks for precise matching, returns the full parent document for context:

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Small chunks for retrieval, large chunks for context
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)

store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=Chroma(embedding_function=embeddings),
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)
retriever.add_documents(documents)

LCEL composability — if you're building an agent that combines RAG with tools and conditional logic, LCEL makes composition cleaner:

from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Pipe syntax: retriever → format → prompt → llm → parse
rag_chain = (
    RunnableParallel(context=retriever, question=RunnablePassthrough())
    | prompt
    | llm
    | StrOutputParser()
)

You can't build something this composable in LlamaIndex without more boilerplate.

Observability

If your team already uses LangSmith, that's a real point in LangChain's favor.

Performance

When LlamaIndex wins

RAG-first projects where retrieval quality is the core product
Multiple data sources — connecting Notion, Confluence, Google Drive, PDFs in one index
Complex retrieval patterns — sub-question decomposition, recursive retrieval, hybrid search
Teams new to RAG who want sensible defaults and don't need to configure every step
Research/prototyping — faster to get to a working demo

When LangChain wins

Agents + RAG combined — if your agent uses RAG as one of several tools, LCEL makes composition cleaner
Fine-grained retrieval control — ParentDocumentRetriever, MultiVectorRetriever, custom retrievers
Existing LangSmith investment — observability is genuinely better out of the box
Team already knows LangChain — the switching cost is real, especially with LCEL
LCEL chains where RAG is one step in a larger composed pipeline

Honest take

And if you're evaluating whether RAG is even the right approach, the fine-tuning vs RAG guide lays out the tradeoffs clearly.

LlamaIndex vs LangChain for RAG in 2026 — A Code-First Comparison

How both frameworks evolved

The same pipeline in both frameworks

LlamaIndex

LangChain

Lines of code comparison

Advanced retrieval: where LlamaIndex shines

Advanced retrieval: where LangChain shines

Observability

Performance

When LlamaIndex wins

When LangChain wins

Honest take

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)

Claude API vs OpenAI API — Developer Comparison Guide (2026)

LlamaIndex vs LangChain for RAG in 2026 — A Code-First Comparison

How both frameworks evolved

The same pipeline in both frameworks

LlamaIndex

LangChain

Lines of code comparison

Advanced retrieval: where LlamaIndex shines

Advanced retrieval: where LangChain shines

Observability

Performance

When LlamaIndex wins

When LangChain wins

Honest take

Related articles

Async Python for LLM Apps — Patterns That Actually Work in Production

Build a Vector Store for RAG — FAISS vs Chroma vs Pinecone (With Code)

Claude API vs OpenAI API — Developer Comparison Guide (2026)