Every LangChain tutorial on the internet starts with "set your OPENAI_API_KEY" and assumes you funded it with a dollar card. This one doesn't. We're using AICredits.in — INR billing, UPI payment, no international card needed.
By the end of this tutorial you'll have a working LangChain AI agent that can answer questions and search the web, running on GPT-4o-mini and switchable to Claude or Gemini with one line change.
Prerequisites
- Python 3.9+
- An AICredits account with ₹100 topped up via UPI
- Basic familiarity with Python
pip install langchain langchain-openai langchain-community duckduckgo-search python-dotenv
Create a .env file:
AICREDITS_API_KEY=sk-your-aicredits-key
Setting up the LangChain client
LangChain's ChatOpenAI class accepts openai_api_base and openai_api_key directly. That's all you need to point it at AICredits:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
load_dotenv()
# ChatOpenAI pointing to AICredits
llm = ChatOpenAI(
model="openai/gpt-4o-mini",
openai_api_key=os.environ["AICREDITS_API_KEY"],
openai_api_base="https://api.aicredits.in/v1",
temperature=0.7
)
# Quick test
response = llm.invoke("What's 17 * 23? Show your working.")
print(response.content)
That's the foundation. Everything in LangChain that uses ChatOpenAI will now route through AICredits.
Example 1: Simple QA chain with GPT-4o-mini
GPT-4o-mini at ₹13.23/1M input tokens is the right default for most tasks. This example builds a simple question-answering chain with a custom system prompt:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
load_dotenv()
llm = ChatOpenAI(
model="openai/gpt-4o-mini",
openai_api_key=os.environ["AICREDITS_API_KEY"],
openai_api_base="https://api.aicredits.in/v1",
temperature=0.3
)
prompt = ChatPromptTemplate.from_messages([
("system", """You are a technical assistant for Indian software developers.
Be specific and practical. Include code examples when relevant.
When citing prices, use INR unless the user asks for USD."""),
("human", "{question}")
])
chain = prompt | llm | StrOutputParser()
questions = [
"What's the cheapest way to host a Next.js app in India?",
"How do I set up Razorpay webhooks in Python?",
"What's the difference between SQS and RabbitMQ for a small startup?"
]
for q in questions:
print(f"Q: {q}")
print(f"A: {chain.invoke({'question': q})}")
print("---")
At 500 input tokens and 400 output tokens per call, each question costs roughly:
- Input: 500 tokens × ₹13.23/1M = ₹0.0066
- Output: 400 tokens × ₹52.91/1M = ₹0.0212
- Total per call: ~₹0.028
Three questions: ₹0.084. That's how cheap GPT-4o-mini is for this kind of task.
Example 2: Multi-model comparison
One of the genuinely useful things about having a unified gateway is comparing models on the same prompt without touching your billing setup. Here's a simple harness:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
load_dotenv()
AICREDITS_KEY = os.environ["AICREDITS_API_KEY"]
BASE_URL = "https://api.aicredits.in/v1"
def make_chain(model: str, temperature: float = 0.5):
llm = ChatOpenAI(
model=model,
openai_api_key=AICREDITS_KEY,
openai_api_base=BASE_URL,
temperature=temperature
)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a concise technical writer. Answer in 3-4 sentences maximum."),
("human", "{question}")
])
return prompt | llm | StrOutputParser()
models = {
"GPT-4o-mini (₹13/1M)": "openai/gpt-4o-mini",
"Claude Haiku (₹96/1M)": "anthropic/claude-haiku-3-5-20241022",
"Gemini Flash (₹8.84/1M)": "google/gemini-2.0-flash",
}
question = "Explain the CAP theorem and which option a startup should generally prioritize"
for label, model_id in models.items():
chain = make_chain(model_id)
response = chain.invoke({"question": question})
print(f"\n=== {label} ===")
print(response)
Running this across the three models costs less than ₹1. That's a cheap way to calibrate which model works best for your specific use case before committing to one for production.
In my experience: Gemini Flash is fastest and cheapest, good for simple extraction and classification. Claude Haiku is more consistent on instruction following and structured tasks. GPT-4o-mini is the safe default when you're not sure.
Example 3: ReAct agent with web search
This is where things get interesting. A ReAct agent loops between reasoning and tool use — it can search the web, process the results, and reason about whether it has enough information to answer.
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import create_react_agent, AgentExecutor
from langchain_core.prompts import PromptTemplate
load_dotenv()
# Initialize the LLM — GPT-4o for the agent (better at tool use than mini)
llm = ChatOpenAI(
model="openai/gpt-4o",
openai_api_key=os.environ["AICREDITS_API_KEY"],
openai_api_base="https://api.aicredits.in/v1",
temperature=0
)
# Tool: web search
search = DuckDuckGoSearchRun()
tools = [search]
# ReAct prompt template
react_prompt = PromptTemplate.from_template("""Answer the following question as best you can.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}""")
# Create the agent
agent = create_react_agent(llm, tools, react_prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # shows the reasoning steps
max_iterations=5, # prevent infinite loops
handle_parsing_errors=True
)
# Run it
result = agent_executor.invoke({
"input": "What are the latest funding rounds for Indian AI startups in 2026? Summarize the top 3."
})
print("\n=== Final Answer ===")
print(result["output"])
The verbose=True flag shows you the full reasoning chain — each thought, each search query, each observation. It's worth watching the first few times to understand how the model decides when it has enough information.
For a production version, you'd turn off verbose and add error handling around the invoke call.
Switching the agent to Claude Sonnet 4
Swap one line to use Claude instead:
llm = ChatOpenAI(
model="anthropic/claude-sonnet-4-20250514", # changed
openai_api_key=os.environ["AICREDITS_API_KEY"],
openai_api_base="https://api.aicredits.in/v1",
temperature=0
)
Everything else stays the same. The ReAct loop, the tools, the prompt — all identical. This is the value of the OpenAI-compatible interface: your agent architecture is model-agnostic.
Claude Sonnet 4 is noticeably better at multi-step reasoning and at deciding when it has enough information to stop searching. For research-heavy agents, it's worth the 10x price increase over GPT-4o-mini.
Monitoring costs in the AICredits dashboard
After running these examples, open the AICredits dashboard → Usage Logs. You'll see a per-request breakdown showing:
- Timestamp
- Model used
- Input tokens
- Output tokens
- Cost in INR
This is more useful than it sounds. When an agent makes 5 search iterations instead of 2, you'll see exactly which calls drove the cost. It's how you catch runaway loops before they drain your wallet.
Setting a ₹500 budget cap
For any agent running unattended, set a budget cap on the API key. Dashboard → API Keys → Edit → Budget Limit.
With a ₹500 cap on a key running GPT-4o, you can make roughly:
- ~450 typical agent calls (5 iterations each, ~1,000 tokens/call)
- Or ~2,500 simple chat completion calls
At ₹500, you're not going to accidentally run up a ₹10,000 bill from a bug. Set the cap before you run anything in production.
# In your agent setup, add a try/except for budget cap errors
from openai import RateLimitError
try:
result = agent_executor.invoke({"input": user_question})
except RateLimitError as e:
# Budget cap hit or rate limited
print(f"API limit reached: {e}")
result = {"output": "Service temporarily unavailable. Please try again later."}
What ₹100 gets you
Here's what the minimum top-up (₹100) actually buys across each model:
| Model | INR/1M input tokens | Calls at 500 tokens/call | Total possible input tokens |
|---|---|---|---|
| Gemini 2.0 Flash | ₹8.84 | ~22,600 | 11.3B tokens |
| GPT-4o-mini | ₹13.23 | ~15,100 | 7.6B tokens |
| DeepSeek-R1 | ₹48.59 | ~4,100 | 2.1B tokens |
| Claude 3.5 Haiku | ₹96.30 | ~2,070 | 1.04B tokens |
| o3-mini | ₹96.30 | ~2,070 | 1.04B tokens |
| Claude Sonnet 4 | ₹264.00 | ~757 | 379M tokens |
| GPT-4o | ₹221.00 | ~904 | 452M tokens |
For a prototype or learning project, ₹100 is more than enough. For a production app with real user traffic, estimate based on your expected token volumes and top up accordingly.
The function calling lesson covers the tool use patterns in more depth if you want to extend the agent with more sophisticated tools — database lookups, API calls, custom business logic. The ReAct prompting lesson explains why the reasoning loop works the way it does, which helps when you're debugging agents that get stuck or loop unnecessarily.
One more thing: n8n integration
If you prefer visual workflows over code, AICredits works with n8n. Add an OpenAI node, set the base URL to https://api.aicredits.in/v1, and use your AICredits key as the API key. All the same models are available through the dropdown (they show up because the endpoint is OpenAI-compatible).
This means you can build production AI workflows in INR without writing any API integration code — just connect nodes and use UPI to fund the runs.
Sign up at aicredits.in, top up ₹100, and you'll have the agent above running in under 20 minutes.



