What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Build a Python AI App in India: LangChain + Claude + AICredits.in (Full Tutorial)

By the end of this tutorial, you'll have a working document Q&A app that can answer questions about any PDF using RAG — Retrieval-Augmented Generation. You'll load a PDF, split it into chunks, embed them into a local vector store, and query them with Claude via AICredits.in.

Total API cost: approximately ₹5–15. Total time: 30 minutes.

Prerequisites:

Python 3.10+
Basic LangChain familiarity (if you're new to LangChain, this intro will help)
An AICredits account topped up with ₹100 via UPI

This post focuses specifically on the RAG pipeline with ChromaDB and PDF loading. If you want an AI agent with web search and tool use, that's covered in the LangChain agent tutorial — different architecture, different use case.

Why AICredits.in instead of Anthropic directly?

LangChain's ChatOpenAI and OpenAIEmbeddings classes work with any OpenAI-compatible endpoint — they don't know or care whether they're talking to OpenAI, Anthropic, or a gateway. You just change the base URL and API key.

AICredits.in exposes a fully OpenAI-compatible endpoint at https://api.aicredits.in/v1. Every LangChain component that accepts openai_api_base and openai_api_key works with it — no custom classes, no monkey-patching, no code changes beyond those two parameters.

The reason to use it as an Indian developer: Anthropic's billing page requires an international credit card. AICredits accepts UPI, GPay, PhonePe, net banking, and every domestic Indian payment method via Razorpay.

Setup

Create your AICredits account

Sign up at aicredits.in
Add ₹100 via UPI
Dashboard → API Keys → Create Key → set a ₹200 budget cap (enough for many hours of this tutorial)
Copy your key (starts with sk-)

Install dependencies

pip install langchain langchain-openai chromadb pypdf python-dotenv

Create a .env file in your project root:

AICREDITS_API_KEY=sk-your-key-here
AICREDITS_BASE_URL=https://api.aicredits.in/v1

Set environment variables

import os
from dotenv import load_dotenv

load_dotenv()

AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
AICREDITS_URL = os.environ["AICREDITS_BASE_URL"]

LangChain reads OPENAI_API_KEY and OPENAI_API_BASE as defaults if you'd rather set those:

export OPENAI_API_KEY="sk-your-aicredits-key"
export OPENAI_API_BASE="https://api.aicredits.in/v1"

Either approach works. The .env file approach is cleaner for projects you'll share.

Build the RAG pipeline

Step 1 — Load and chunk a PDF

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_and_chunk_pdf(pdf_path: str) -> list:
    """Load a PDF and split it into overlapping chunks."""
    
    # Load the PDF — each page becomes a Document
    loader = PyPDFLoader(pdf_path)
    pages = loader.load()
    
    print(f"Loaded {len(pages)} pages from {pdf_path}")
    
    # Split into chunks
    # chunk_size=1000 characters ≈ 250 tokens
    # chunk_overlap=200 preserves context across chunk boundaries
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
        separators=["\n\n", "\n", " ", ""]
    )
    
    chunks = splitter.split_documents(pages)
    
    print(f"Split into {len(chunks)} chunks")
    return chunks

The RecursiveCharacterTextSplitter tries to split on paragraph breaks first, then newlines, then spaces — so chunks stay semantically coherent rather than cutting mid-sentence. The 200-character overlap ensures that information near chunk boundaries isn't lost.

For a 10-page PDF, you'll typically get 30-60 chunks depending on how dense the text is.

Step 2 — Create embeddings and vector store

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

def create_vector_store(chunks: list, persist_directory: str = "./chroma_db") -> Chroma:
    """Embed chunks and store them in a local ChromaDB instance."""
    
    # OpenAIEmbeddings pointing to AICredits
    # AICredits supports text-embedding-3-small and text-embedding-ada-002
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    
    # Create ChromaDB vector store and persist locally
    # This embeds all chunks — the main cost of the setup step
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    
    print(f"Created vector store with {vectorstore._collection.count()} embeddings")
    print(f"Persisted to {persist_directory}")
    
    return vectorstore


def load_vector_store(persist_directory: str = "./chroma_db") -> Chroma:
    """Load an existing ChromaDB vector store (skip re-embedding)."""
    
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    
    return Chroma(
        persist_directory=persist_directory,
        embedding_function=embeddings
    )

ChromaDB stores the embeddings locally on disk. Once you've embedded a PDF, you can load the vector store without re-embedding — which saves API calls (and money) on subsequent runs.

The text-embedding-3-small model costs ~$0.02/M tokens, which is roughly ₹1.68/M via AICredits. A 10-page PDF with 30-60 chunks embeds for under ₹0.10 total.

Step 3 — Set up the Claude LLM

from langchain_openai import ChatOpenAI

def create_llm(model: str = "anthropic/claude-sonnet-4-6", temperature: float = 0) -> ChatOpenAI:
    """Create a LangChain LLM client pointed at AICredits."""
    
    return ChatOpenAI(
        model=model,
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        temperature=temperature,
        # temperature=0 for factual Q&A — we want consistent, grounded answers
    )

The temperature=0 setting is important for RAG. You want the model to answer based on what's in the retrieved chunks, not hallucinate plausible-sounding information. Higher temperature makes answers more creative but less faithful to the source material.

For Claude model options via AICredits:

# Most accurate, best for complex documents
model = "anthropic/claude-sonnet-4-6"   # ₹252/M input

# Faster and cheaper, good for simple Q&A
model = "anthropic/claude-haiku-3-5-20241022"  # ₹67/M input

# Maximum capability, worth it for dense technical docs
model = "anthropic/claude-opus-4-6"    # ₹1,260/M input

Step 4 — Build the QA chain

from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

def create_qa_chain(vectorstore: Chroma, llm: ChatOpenAI) -> RetrievalQA:
    """Build the retrieval QA chain."""
    
    # Custom prompt that instructs Claude to stay grounded in the retrieved context
    prompt_template = """Use the following context to answer the question at the end.
If the answer is not in the context, say "I couldn't find that in the document" — 
do not make up an answer.

Context:
{context}

Question: {question}

Answer:"""
    
    PROMPT = PromptTemplate(
        template=prompt_template,
        input_variables=["context", "question"]
    )
    
    # Retriever: fetch top 4 most similar chunks for each query
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}
    )
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",         # "stuff" = concatenate all chunks into one prompt
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT}
    )
    
    return qa_chain

chain_type="stuff" works well when your chunks are small (under 1K characters each) and you're retrieving 3-5 chunks. With k=4 chunks at 1K characters each, you're sending ~4K characters (~1,000 tokens) of context per query. That's well within Claude's context window and keeps query costs low.

If you have large documents where you need more context, switch to chain_type="map_reduce" — it processes each chunk separately and combines the answers, at the cost of more API calls.

Add a simple CLI interface

Here's the complete script with an argparse-based CLI:

#!/usr/bin/env python3
"""
Document Q&A using RAG with LangChain + Claude + AICredits.in
Usage:
  python rag_qa.py --pdf path/to/document.pdf --question "What is the main topic?"
  python rag_qa.py --pdf path/to/document.pdf  # interactive mode
"""

import os
import argparse
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

load_dotenv()

AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
AICREDITS_URL = os.environ.get("AICREDITS_BASE_URL", "https://api.aicredits.in/v1")

PROMPT_TEMPLATE = """Use the following context to answer the question at the end.
If the answer is not in the context, say "I couldn't find that in the document."

Context:
{context}

Question: {question}

Answer:"""


def build_vectorstore(pdf_path: str, persist_dir: str) -> Chroma:
    loader = PyPDFLoader(pdf_path)
    pages = loader.load()
    
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    chunks = splitter.split_documents(pages)
    print(f"Loaded {len(pages)} pages → {len(chunks)} chunks")
    
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=persist_dir
    )
    print(f"Embeddings stored in {persist_dir}")
    return vectorstore


def load_vectorstore(persist_dir: str) -> Chroma:
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    return Chroma(
        persist_directory=persist_dir,
        embedding_function=embeddings
    )


def create_chain(vectorstore: Chroma, model: str = "anthropic/claude-sonnet-4-6") -> RetrievalQA:
    llm = ChatOpenAI(
        model=model,
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        temperature=0
    )
    
    prompt = PromptTemplate(
        template=PROMPT_TEMPLATE,
        input_variables=["context", "question"]
    )
    
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}
    )
    
    return RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": prompt}
    )


def ask(chain: RetrievalQA, question: str, show_sources: bool = False) -> str:
    result = chain.invoke({"query": question})
    answer = result["result"]
    
    if show_sources:
        sources = result.get("source_documents", [])
        print(f"\n[Sources: {len(sources)} chunks retrieved]")
        for i, doc in enumerate(sources):
            page = doc.metadata.get("page", "?")
            print(f"  Chunk {i+1} (page {page}): {doc.page_content[:100]}...")
    
    return answer


def main():
    parser = argparse.ArgumentParser(description="RAG Q&A over a PDF")
    parser.add_argument("--pdf", required=True, help="Path to PDF file")
    parser.add_argument("--question", "-q", help="Single question (omit for interactive mode)")
    parser.add_argument("--model", default="anthropic/claude-sonnet-4-6", help="Model to use")
    parser.add_argument("--persist-dir", default="./chroma_db", help="ChromaDB storage directory")
    parser.add_argument("--rebuild", action="store_true", help="Re-embed even if store exists")
    parser.add_argument("--sources", action="store_true", help="Show source chunks")
    args = parser.parse_args()
    
    # Build or load vector store
    import pathlib
    store_path = pathlib.Path(args.persist_dir)
    
    if args.rebuild or not store_path.exists():
        print("Building vector store...")
        vectorstore = build_vectorstore(args.pdf, args.persist_dir)
    else:
        print(f"Loading existing vector store from {args.persist_dir}")
        vectorstore = load_vectorstore(args.persist_dir)
    
    chain = create_chain(vectorstore, model=args.model)
    
    if args.question:
        # Single question mode
        answer = ask(chain, args.question, show_sources=args.sources)
        print(f"\nQ: {args.question}")
        print(f"A: {answer}")
    else:
        # Interactive mode
        print(f"\nDocument Q&A ready. Type 'quit' to exit.")
        print(f"Model: {args.model}")
        while True:
            question = input("\nQuestion: ").strip()
            if question.lower() in ("quit", "exit", "q"):
                break
            if not question:
                continue
            answer = ask(chain, question, show_sources=args.sources)
            print(f"Answer: {answer}")


if __name__ == "__main__":
    main()

Usage examples:

# First run — build the vector store and ask a question
python rag_qa.py --pdf report.pdf --question "What are the main findings?"

# Show which chunks were retrieved
python rag_qa.py --pdf report.pdf --question "What is the revenue?" --sources

# Interactive mode — ask multiple questions without re-embedding
python rag_qa.py --pdf report.pdf

# Use Haiku for cheaper queries (same vectorstore, different model)
python rag_qa.py --pdf report.pdf --model anthropic/claude-haiku-3-5-20241022

# Force rebuild if you update the PDF
python rag_qa.py --pdf updated_report.pdf --rebuild

Cost breakdown

Here's what a typical session costs:

Operation	Tokens used	Cost (AICredits)
Embed 10-page PDF (~40 chunks)	~10,000	₹0.05
Embed a query (question)	~15 tokens	< ₹0.01
Claude query (4 chunks + question)	~1,200 input	₹0.30
Claude answer generation	~300 output	₹0.38
10 Q&A queries total	—	~₹7

You only pay for the embedding once per document. Every subsequent query costs ~₹0.70 on Claude Sonnet 4.6, or ~₹0.20 on Claude Haiku 3.5. A full session exploring a document with 10-15 questions comes in well under ₹10.

If you're processing hundreds of documents or running this for multiple users, use Haiku for routine queries and only route complex or ambiguous questions to Sonnet.

What to build next

Add a Streamlit UI: Replace the CLI with a simple web interface where users can upload PDFs and ask questions in a browser.

pip install streamlit

import streamlit as st

st.title("Document Q&A")
uploaded_file = st.file_uploader("Upload a PDF", type="pdf")
question = st.text_input("Ask a question about the document")

if uploaded_file and question:
    # save file, build vectorstore, run chain
    with st.spinner("Thinking..."):
        answer = ask(chain, question)
    st.write(answer)

Connect to Google Drive: Replace PyPDFLoader with Google Drive's API to load PDFs directly from a shared folder. Useful for team knowledge bases.

Swap the PDF for Indian government API data: The same RAG pattern works on any text — try loading data from data.gov.in, RBI press releases, or MCA company filings. Replace PyPDFLoader with a web scraper or API client, split the text, embed, and query.

Add metadata filtering: If you're indexing multiple documents, add source metadata to each chunk and filter by it at query time. Chroma supports metadata filters natively — you can ask "only look in Q3_report.pdf" without creating separate vector stores.

Try it now with AICredits.in

Access Claude, GPT-4o, Gemini, and 300+ models with UPI payment in ₹. No international card needed. Create free account →

Next steps

RAG lesson — the conceptual foundations of retrieval-augmented generation, including chunking strategies and when RAG outperforms fine-tuning
How RAG works — a deeper technical walkthrough of the embedding and retrieval pipeline
LangChain introduction guide — if you want to understand the full LangChain ecosystem before building more complex pipelines
LangChain agent tutorial — the companion to this post, focused on tool-using agents rather than RAG

Total API cost: approximately ₹5–15. Total time: 30 minutes.

Prerequisites:

Python 3.10+
Basic LangChain familiarity (if you're new to LangChain, this intro will help)
An AICredits account topped up with ₹100 via UPI

Why AICredits.in instead of Anthropic directly?

Setup

Create your AICredits account

Sign up at aicredits.in
Add ₹100 via UPI
Dashboard → API Keys → Create Key → set a ₹200 budget cap (enough for many hours of this tutorial)
Copy your key (starts with sk-)

Install dependencies

pip install langchain langchain-openai chromadb pypdf python-dotenv

Create a .env file in your project root:

AICREDITS_API_KEY=sk-your-key-here
AICREDITS_BASE_URL=https://api.aicredits.in/v1

Set environment variables

import os
from dotenv import load_dotenv

load_dotenv()

AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
AICREDITS_URL = os.environ["AICREDITS_BASE_URL"]

LangChain reads OPENAI_API_KEY and OPENAI_API_BASE as defaults if you'd rather set those:

export OPENAI_API_KEY="sk-your-aicredits-key"
export OPENAI_API_BASE="https://api.aicredits.in/v1"

Either approach works. The .env file approach is cleaner for projects you'll share.

Build the RAG pipeline

Step 1 — Load and chunk a PDF

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_and_chunk_pdf(pdf_path: str) -> list:
    """Load a PDF and split it into overlapping chunks."""
    
    # Load the PDF — each page becomes a Document
    loader = PyPDFLoader(pdf_path)
    pages = loader.load()
    
    print(f"Loaded {len(pages)} pages from {pdf_path}")
    
    # Split into chunks
    # chunk_size=1000 characters ≈ 250 tokens
    # chunk_overlap=200 preserves context across chunk boundaries
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
        separators=["\n\n", "\n", " ", ""]
    )
    
    chunks = splitter.split_documents(pages)
    
    print(f"Split into {len(chunks)} chunks")
    return chunks

For a 10-page PDF, you'll typically get 30-60 chunks depending on how dense the text is.

Step 2 — Create embeddings and vector store

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

def create_vector_store(chunks: list, persist_directory: str = "./chroma_db") -> Chroma:
    """Embed chunks and store them in a local ChromaDB instance."""
    
    # OpenAIEmbeddings pointing to AICredits
    # AICredits supports text-embedding-3-small and text-embedding-ada-002
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    
    # Create ChromaDB vector store and persist locally
    # This embeds all chunks — the main cost of the setup step
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=persist_directory
    )
    
    print(f"Created vector store with {vectorstore._collection.count()} embeddings")
    print(f"Persisted to {persist_directory}")
    
    return vectorstore


def load_vector_store(persist_directory: str = "./chroma_db") -> Chroma:
    """Load an existing ChromaDB vector store (skip re-embedding)."""
    
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    
    return Chroma(
        persist_directory=persist_directory,
        embedding_function=embeddings
    )

ChromaDB stores the embeddings locally on disk. Once you've embedded a PDF, you can load the vector store without re-embedding — which saves API calls (and money) on subsequent runs.

The text-embedding-3-small model costs ~$0.02/M tokens, which is roughly ₹1.68/M via AICredits. A 10-page PDF with 30-60 chunks embeds for under ₹0.10 total.

Step 3 — Set up the Claude LLM

from langchain_openai import ChatOpenAI

def create_llm(model: str = "anthropic/claude-sonnet-4-6", temperature: float = 0) -> ChatOpenAI:
    """Create a LangChain LLM client pointed at AICredits."""
    
    return ChatOpenAI(
        model=model,
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        temperature=temperature,
        # temperature=0 for factual Q&A — we want consistent, grounded answers
    )

For Claude model options via AICredits:

# Most accurate, best for complex documents
model = "anthropic/claude-sonnet-4-6"   # ₹252/M input

# Faster and cheaper, good for simple Q&A
model = "anthropic/claude-haiku-3-5-20241022"  # ₹67/M input

# Maximum capability, worth it for dense technical docs
model = "anthropic/claude-opus-4-6"    # ₹1,260/M input

Step 4 — Build the QA chain

from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

def create_qa_chain(vectorstore: Chroma, llm: ChatOpenAI) -> RetrievalQA:
    """Build the retrieval QA chain."""
    
    # Custom prompt that instructs Claude to stay grounded in the retrieved context
    prompt_template = """Use the following context to answer the question at the end.
If the answer is not in the context, say "I couldn't find that in the document" — 
do not make up an answer.

Context:
{context}

Question: {question}

Answer:"""
    
    PROMPT = PromptTemplate(
        template=prompt_template,
        input_variables=["context", "question"]
    )
    
    # Retriever: fetch top 4 most similar chunks for each query
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}
    )
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",         # "stuff" = concatenate all chunks into one prompt
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT}
    )
    
    return qa_chain

If you have large documents where you need more context, switch to chain_type="map_reduce" — it processes each chunk separately and combines the answers, at the cost of more API calls.

Add a simple CLI interface

Here's the complete script with an argparse-based CLI:

#!/usr/bin/env python3
"""
Document Q&A using RAG with LangChain + Claude + AICredits.in
Usage:
  python rag_qa.py --pdf path/to/document.pdf --question "What is the main topic?"
  python rag_qa.py --pdf path/to/document.pdf  # interactive mode
"""

import os
import argparse
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

load_dotenv()

AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
AICREDITS_URL = os.environ.get("AICREDITS_BASE_URL", "https://api.aicredits.in/v1")

PROMPT_TEMPLATE = """Use the following context to answer the question at the end.
If the answer is not in the context, say "I couldn't find that in the document."

Context:
{context}

Question: {question}

Answer:"""


def build_vectorstore(pdf_path: str, persist_dir: str) -> Chroma:
    loader = PyPDFLoader(pdf_path)
    pages = loader.load()
    
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    chunks = splitter.split_documents(pages)
    print(f"Loaded {len(pages)} pages → {len(chunks)} chunks")
    
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=persist_dir
    )
    print(f"Embeddings stored in {persist_dir}")
    return vectorstore


def load_vectorstore(persist_dir: str) -> Chroma:
    embeddings = OpenAIEmbeddings(
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        model="openai/text-embedding-3-small"
    )
    return Chroma(
        persist_directory=persist_dir,
        embedding_function=embeddings
    )


def create_chain(vectorstore: Chroma, model: str = "anthropic/claude-sonnet-4-6") -> RetrievalQA:
    llm = ChatOpenAI(
        model=model,
        openai_api_key=AICREDITS_KEY,
        openai_api_base=AICREDITS_URL,
        temperature=0
    )
    
    prompt = PromptTemplate(
        template=PROMPT_TEMPLATE,
        input_variables=["context", "question"]
    )
    
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}
    )
    
    return RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True,
        chain_type_kwargs={"prompt": prompt}
    )


def ask(chain: RetrievalQA, question: str, show_sources: bool = False) -> str:
    result = chain.invoke({"query": question})
    answer = result["result"]
    
    if show_sources:
        sources = result.get("source_documents", [])
        print(f"\n[Sources: {len(sources)} chunks retrieved]")
        for i, doc in enumerate(sources):
            page = doc.metadata.get("page", "?")
            print(f"  Chunk {i+1} (page {page}): {doc.page_content[:100]}...")
    
    return answer


def main():
    parser = argparse.ArgumentParser(description="RAG Q&A over a PDF")
    parser.add_argument("--pdf", required=True, help="Path to PDF file")
    parser.add_argument("--question", "-q", help="Single question (omit for interactive mode)")
    parser.add_argument("--model", default="anthropic/claude-sonnet-4-6", help="Model to use")
    parser.add_argument("--persist-dir", default="./chroma_db", help="ChromaDB storage directory")
    parser.add_argument("--rebuild", action="store_true", help="Re-embed even if store exists")
    parser.add_argument("--sources", action="store_true", help="Show source chunks")
    args = parser.parse_args()
    
    # Build or load vector store
    import pathlib
    store_path = pathlib.Path(args.persist_dir)
    
    if args.rebuild or not store_path.exists():
        print("Building vector store...")
        vectorstore = build_vectorstore(args.pdf, args.persist_dir)
    else:
        print(f"Loading existing vector store from {args.persist_dir}")
        vectorstore = load_vectorstore(args.persist_dir)
    
    chain = create_chain(vectorstore, model=args.model)
    
    if args.question:
        # Single question mode
        answer = ask(chain, args.question, show_sources=args.sources)
        print(f"\nQ: {args.question}")
        print(f"A: {answer}")
    else:
        # Interactive mode
        print(f"\nDocument Q&A ready. Type 'quit' to exit.")
        print(f"Model: {args.model}")
        while True:
            question = input("\nQuestion: ").strip()
            if question.lower() in ("quit", "exit", "q"):
                break
            if not question:
                continue
            answer = ask(chain, question, show_sources=args.sources)
            print(f"Answer: {answer}")


if __name__ == "__main__":
    main()

Usage examples:

# First run — build the vector store and ask a question
python rag_qa.py --pdf report.pdf --question "What are the main findings?"

# Show which chunks were retrieved
python rag_qa.py --pdf report.pdf --question "What is the revenue?" --sources

# Interactive mode — ask multiple questions without re-embedding
python rag_qa.py --pdf report.pdf

# Use Haiku for cheaper queries (same vectorstore, different model)
python rag_qa.py --pdf report.pdf --model anthropic/claude-haiku-3-5-20241022

# Force rebuild if you update the PDF
python rag_qa.py --pdf updated_report.pdf --rebuild

Cost breakdown

Here's what a typical session costs:

Operation	Tokens used	Cost (AICredits)
Embed 10-page PDF (~40 chunks)	~10,000	₹0.05
Embed a query (question)	~15 tokens	< ₹0.01
Claude query (4 chunks + question)	~1,200 input	₹0.30
Claude answer generation	~300 output	₹0.38
10 Q&A queries total	—	~₹7

If you're processing hundreds of documents or running this for multiple users, use Haiku for routine queries and only route complex or ambiguous questions to Sonnet.

What to build next

Add a Streamlit UI: Replace the CLI with a simple web interface where users can upload PDFs and ask questions in a browser.

pip install streamlit

import streamlit as st

st.title("Document Q&A")
uploaded_file = st.file_uploader("Upload a PDF", type="pdf")
question = st.text_input("Ask a question about the document")

if uploaded_file and question:
    # save file, build vectorstore, run chain
    with st.spinner("Thinking..."):
        answer = ask(chain, question)
    st.write(answer)

Connect to Google Drive: Replace PyPDFLoader with Google Drive's API to load PDFs directly from a shared folder. Useful for team knowledge bases.

Try it now with AICredits.in

Access Claude, GPT-4o, Gemini, and 300+ models with UPI payment in ₹. No international card needed. Create free account →

Next steps

RAG lesson — the conceptual foundations of retrieval-augmented generation, including chunking strategies and when RAG outperforms fine-tuning
How RAG works — a deeper technical walkthrough of the embedding and retrieval pipeline
LangChain introduction guide — if you want to understand the full LangChain ecosystem before building more complex pipelines
LangChain agent tutorial — the companion to this post, focused on tool-using agents rather than RAG

Build a Python AI App in India: LangChain + Claude + AICredits.in (Full Tutorial)

Why AICredits.in instead of Anthropic directly?

Setup

Create your AICredits account

Install dependencies

Set environment variables

Build the RAG pipeline

Step 1 — Load and chunk a PDF

Step 2 — Create embeddings and vector store

Step 3 — Set up the Claude LLM

Step 4 — Build the QA chain

Add a simple CLI interface

Cost breakdown

What to build next

Try it now with AICredits.in

Next steps

Related articles

AI Engineering Career Roadmap for Indian Developers: SDET/Backend to LLM Engineer in 6 Months

25 AI Prompts for Indian Startup Founders: Product, Pitch Deck, Investor Emails, and GTM

Anthropic's Claude for Open Source: How Indian Developers Can Get Claude Max Free

Build a Python AI App in India: LangChain + Claude + AICredits.in (Full Tutorial)

Why AICredits.in instead of Anthropic directly?

Setup

Create your AICredits account

Install dependencies

Set environment variables

Build the RAG pipeline

Step 1 — Load and chunk a PDF

Step 2 — Create embeddings and vector store

Step 3 — Set up the Claude LLM

Step 4 — Build the QA chain

Add a simple CLI interface

Cost breakdown

What to build next

Try it now with AICredits.in

Next steps

Related articles

AI Engineering Career Roadmap for Indian Developers: SDET/Backend to LLM Engineer in 6 Months

25 AI Prompts for Indian Startup Founders: Product, Pitch Deck, Investor Emails, and GTM

Anthropic's Claude for Open Source: How Indian Developers Can Get Claude Max Free