By the end of this tutorial, you'll have a working document Q&A app that can answer questions about any PDF using RAG — Retrieval-Augmented Generation. You'll load a PDF, split it into chunks, embed them into a local vector store, and query them with Claude via AICredits.in.
Total API cost: approximately ₹5–15. Total time: 30 minutes.
Prerequisites:
- Python 3.10+
- Basic LangChain familiarity (if you're new to LangChain, this intro will help)
- An AICredits account topped up with ₹100 via UPI
This post focuses specifically on the RAG pipeline with ChromaDB and PDF loading. If you want an AI agent with web search and tool use, that's covered in the LangChain agent tutorial — different architecture, different use case.
Why AICredits.in instead of Anthropic directly?
LangChain's ChatOpenAI and OpenAIEmbeddings classes work with any OpenAI-compatible endpoint — they don't know or care whether they're talking to OpenAI, Anthropic, or a gateway. You just change the base URL and API key.
AICredits.in exposes a fully OpenAI-compatible endpoint at https://api.aicredits.in/v1. Every LangChain component that accepts openai_api_base and openai_api_key works with it — no custom classes, no monkey-patching, no code changes beyond those two parameters.
The reason to use it as an Indian developer: Anthropic's billing page requires an international credit card. AICredits accepts UPI, GPay, PhonePe, net banking, and every domestic Indian payment method via Razorpay.
Setup
Create your AICredits account
- Sign up at aicredits.in
- Add ₹100 via UPI
- Dashboard → API Keys → Create Key → set a ₹200 budget cap (enough for many hours of this tutorial)
- Copy your key (starts with
sk-)
Install dependencies
pip install langchain langchain-openai chromadb pypdf python-dotenv
Create a .env file in your project root:
AICREDITS_API_KEY=sk-your-key-here
AICREDITS_BASE_URL=https://api.aicredits.in/v1
Set environment variables
import os
from dotenv import load_dotenv
load_dotenv()
AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
AICREDITS_URL = os.environ["AICREDITS_BASE_URL"]
LangChain reads OPENAI_API_KEY and OPENAI_API_BASE as defaults if you'd rather set those:
export OPENAI_API_KEY="sk-your-aicredits-key"
export OPENAI_API_BASE="https://api.aicredits.in/v1"
Either approach works. The .env file approach is cleaner for projects you'll share.
Build the RAG pipeline
Step 1 — Load and chunk a PDF
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
def load_and_chunk_pdf(pdf_path: str) -> list:
"""Load a PDF and split it into overlapping chunks."""
# Load the PDF — each page becomes a Document
loader = PyPDFLoader(pdf_path)
pages = loader.load()
print(f"Loaded {len(pages)} pages from {pdf_path}")
# Split into chunks
# chunk_size=1000 characters ≈ 250 tokens
# chunk_overlap=200 preserves context across chunk boundaries
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(pages)
print(f"Split into {len(chunks)} chunks")
return chunks
The RecursiveCharacterTextSplitter tries to split on paragraph breaks first, then newlines, then spaces — so chunks stay semantically coherent rather than cutting mid-sentence. The 200-character overlap ensures that information near chunk boundaries isn't lost.
For a 10-page PDF, you'll typically get 30-60 chunks depending on how dense the text is.
Step 2 — Create embeddings and vector store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
def create_vector_store(chunks: list, persist_directory: str = "./chroma_db") -> Chroma:
"""Embed chunks and store them in a local ChromaDB instance."""
# OpenAIEmbeddings pointing to AICredits
# AICredits supports text-embedding-3-small and text-embedding-ada-002
embeddings = OpenAIEmbeddings(
openai_api_key=AICREDITS_KEY,
openai_api_base=AICREDITS_URL,
model="openai/text-embedding-3-small"
)
# Create ChromaDB vector store and persist locally
# This embeds all chunks — the main cost of the setup step
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=persist_directory
)
print(f"Created vector store with {vectorstore._collection.count()} embeddings")
print(f"Persisted to {persist_directory}")
return vectorstore
def load_vector_store(persist_directory: str = "./chroma_db") -> Chroma:
"""Load an existing ChromaDB vector store (skip re-embedding)."""
embeddings = OpenAIEmbeddings(
openai_api_key=AICREDITS_KEY,
openai_api_base=AICREDITS_URL,
model="openai/text-embedding-3-small"
)
return Chroma(
persist_directory=persist_directory,
embedding_function=embeddings
)
ChromaDB stores the embeddings locally on disk. Once you've embedded a PDF, you can load the vector store without re-embedding — which saves API calls (and money) on subsequent runs.
The text-embedding-3-small model costs ~$0.02/M tokens, which is roughly ₹1.68/M via AICredits. A 10-page PDF with 30-60 chunks embeds for under ₹0.10 total.
Step 3 — Set up the Claude LLM
from langchain_openai import ChatOpenAI
def create_llm(model: str = "anthropic/claude-sonnet-4-6", temperature: float = 0) -> ChatOpenAI:
"""Create a LangChain LLM client pointed at AICredits."""
return ChatOpenAI(
model=model,
openai_api_key=AICREDITS_KEY,
openai_api_base=AICREDITS_URL,
temperature=temperature,
# temperature=0 for factual Q&A — we want consistent, grounded answers
)
The temperature=0 setting is important for RAG. You want the model to answer based on what's in the retrieved chunks, not hallucinate plausible-sounding information. Higher temperature makes answers more creative but less faithful to the source material.
For Claude model options via AICredits:
# Most accurate, best for complex documents
model = "anthropic/claude-sonnet-4-6" # ₹252/M input
# Faster and cheaper, good for simple Q&A
model = "anthropic/claude-haiku-3-5-20241022" # ₹67/M input
# Maximum capability, worth it for dense technical docs
model = "anthropic/claude-opus-4-6" # ₹1,260/M input
Step 4 — Build the QA chain
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate
def create_qa_chain(vectorstore: Chroma, llm: ChatOpenAI) -> RetrievalQA:
"""Build the retrieval QA chain."""
# Custom prompt that instructs Claude to stay grounded in the retrieved context
prompt_template = """Use the following context to answer the question at the end.
If the answer is not in the context, say "I couldn't find that in the document" —
do not make up an answer.
Context:
{context}
Question: {question}
Answer:"""
PROMPT = PromptTemplate(
template=prompt_template,
input_variables=["context", "question"]
)
# Retriever: fetch top 4 most similar chunks for each query
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}
)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # "stuff" = concatenate all chunks into one prompt
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": PROMPT}
)
return qa_chain
chain_type="stuff" works well when your chunks are small (under 1K characters each) and you're retrieving 3-5 chunks. With k=4 chunks at 1K characters each, you're sending ~4K characters (~1,000 tokens) of context per query. That's well within Claude's context window and keeps query costs low.
If you have large documents where you need more context, switch to chain_type="map_reduce" — it processes each chunk separately and combines the answers, at the cost of more API calls.
Add a simple CLI interface
Here's the complete script with an argparse-based CLI:
#!/usr/bin/env python3
"""
Document Q&A using RAG with LangChain + Claude + AICredits.in
Usage:
python rag_qa.py --pdf path/to/document.pdf --question "What is the main topic?"
python rag_qa.py --pdf path/to/document.pdf # interactive mode
"""
import os
import argparse
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate
load_dotenv()
AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
AICREDITS_URL = os.environ.get("AICREDITS_BASE_URL", "https://api.aicredits.in/v1")
PROMPT_TEMPLATE = """Use the following context to answer the question at the end.
If the answer is not in the context, say "I couldn't find that in the document."
Context:
{context}
Question: {question}
Answer:"""
def build_vectorstore(pdf_path: str, persist_dir: str) -> Chroma:
loader = PyPDFLoader(pdf_path)
pages = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(pages)
print(f"Loaded {len(pages)} pages → {len(chunks)} chunks")
embeddings = OpenAIEmbeddings(
openai_api_key=AICREDITS_KEY,
openai_api_base=AICREDITS_URL,
model="openai/text-embedding-3-small"
)
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=persist_dir
)
print(f"Embeddings stored in {persist_dir}")
return vectorstore
def load_vectorstore(persist_dir: str) -> Chroma:
embeddings = OpenAIEmbeddings(
openai_api_key=AICREDITS_KEY,
openai_api_base=AICREDITS_URL,
model="openai/text-embedding-3-small"
)
return Chroma(
persist_directory=persist_dir,
embedding_function=embeddings
)
def create_chain(vectorstore: Chroma, model: str = "anthropic/claude-sonnet-4-6") -> RetrievalQA:
llm = ChatOpenAI(
model=model,
openai_api_key=AICREDITS_KEY,
openai_api_base=AICREDITS_URL,
temperature=0
)
prompt = PromptTemplate(
template=PROMPT_TEMPLATE,
input_variables=["context", "question"]
)
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}
)
return RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt}
)
def ask(chain: RetrievalQA, question: str, show_sources: bool = False) -> str:
result = chain.invoke({"query": question})
answer = result["result"]
if show_sources:
sources = result.get("source_documents", [])
print(f"\n[Sources: {len(sources)} chunks retrieved]")
for i, doc in enumerate(sources):
page = doc.metadata.get("page", "?")
print(f" Chunk {i+1} (page {page}): {doc.page_content[:100]}...")
return answer
def main():
parser = argparse.ArgumentParser(description="RAG Q&A over a PDF")
parser.add_argument("--pdf", required=True, help="Path to PDF file")
parser.add_argument("--question", "-q", help="Single question (omit for interactive mode)")
parser.add_argument("--model", default="anthropic/claude-sonnet-4-6", help="Model to use")
parser.add_argument("--persist-dir", default="./chroma_db", help="ChromaDB storage directory")
parser.add_argument("--rebuild", action="store_true", help="Re-embed even if store exists")
parser.add_argument("--sources", action="store_true", help="Show source chunks")
args = parser.parse_args()
# Build or load vector store
import pathlib
store_path = pathlib.Path(args.persist_dir)
if args.rebuild or not store_path.exists():
print("Building vector store...")
vectorstore = build_vectorstore(args.pdf, args.persist_dir)
else:
print(f"Loading existing vector store from {args.persist_dir}")
vectorstore = load_vectorstore(args.persist_dir)
chain = create_chain(vectorstore, model=args.model)
if args.question:
# Single question mode
answer = ask(chain, args.question, show_sources=args.sources)
print(f"\nQ: {args.question}")
print(f"A: {answer}")
else:
# Interactive mode
print(f"\nDocument Q&A ready. Type 'quit' to exit.")
print(f"Model: {args.model}")
while True:
question = input("\nQuestion: ").strip()
if question.lower() in ("quit", "exit", "q"):
break
if not question:
continue
answer = ask(chain, question, show_sources=args.sources)
print(f"Answer: {answer}")
if __name__ == "__main__":
main()
Usage examples:
# First run — build the vector store and ask a question
python rag_qa.py --pdf report.pdf --question "What are the main findings?"
# Show which chunks were retrieved
python rag_qa.py --pdf report.pdf --question "What is the revenue?" --sources
# Interactive mode — ask multiple questions without re-embedding
python rag_qa.py --pdf report.pdf
# Use Haiku for cheaper queries (same vectorstore, different model)
python rag_qa.py --pdf report.pdf --model anthropic/claude-haiku-3-5-20241022
# Force rebuild if you update the PDF
python rag_qa.py --pdf updated_report.pdf --rebuild
Cost breakdown
Here's what a typical session costs:
| Operation | Tokens used | Cost (AICredits) |
|---|---|---|
| Embed 10-page PDF (~40 chunks) | ~10,000 | ₹0.05 |
| Embed a query (question) | ~15 tokens | < ₹0.01 |
| Claude query (4 chunks + question) | ~1,200 input | ₹0.30 |
| Claude answer generation | ~300 output | ₹0.38 |
| 10 Q&A queries total | — | ~₹7 |
You only pay for the embedding once per document. Every subsequent query costs ~₹0.70 on Claude Sonnet 4.6, or ~₹0.20 on Claude Haiku 3.5. A full session exploring a document with 10-15 questions comes in well under ₹10.
If you're processing hundreds of documents or running this for multiple users, use Haiku for routine queries and only route complex or ambiguous questions to Sonnet.
What to build next
Add a Streamlit UI: Replace the CLI with a simple web interface where users can upload PDFs and ask questions in a browser.
pip install streamlit
import streamlit as st
st.title("Document Q&A")
uploaded_file = st.file_uploader("Upload a PDF", type="pdf")
question = st.text_input("Ask a question about the document")
if uploaded_file and question:
# save file, build vectorstore, run chain
with st.spinner("Thinking..."):
answer = ask(chain, question)
st.write(answer)
Connect to Google Drive: Replace PyPDFLoader with Google Drive's API to load PDFs directly from a shared folder. Useful for team knowledge bases.
Swap the PDF for Indian government API data: The same RAG pattern works on any text — try loading data from data.gov.in, RBI press releases, or MCA company filings. Replace PyPDFLoader with a web scraper or API client, split the text, embed, and query.
Add metadata filtering: If you're indexing multiple documents, add source metadata to each chunk and filter by it at query time. Chroma supports metadata filters natively — you can ask "only look in Q3_report.pdf" without creating separate vector stores.
Try it now with AICredits.in
Access Claude, GPT-4o, Gemini, and 300+ models with UPI payment in ₹. No international card needed. Create free account →
Next steps
- RAG lesson — the conceptual foundations of retrieval-augmented generation, including chunking strategies and when RAG outperforms fine-tuning
- How RAG works — a deeper technical walkthrough of the embedding and retrieval pipeline
- LangChain introduction guide — if you want to understand the full LangChain ecosystem before building more complex pipelines
- LangChain agent tutorial — the companion to this post, focused on tool-using agents rather than RAG



