OpenAI's billing page has a special talent for rejecting Indian cards. Domestic debit cards don't work at all. Credit cards work only if your bank has international online transactions explicitly enabled — and even then, some banks block it at the card network level. The workarounds people use: Wise card (requires opening a foreign currency account), a US-based friend's card (doesn't scale), or just giving up.
If you want to use GPT-4o, o3-mini, or DALL·E 3 via the API and you're in India, AICredits.in is the most straightforward path. It's an OpenAI-compatible gateway that routes your API calls and bills you in INR via Razorpay. UPI, net banking, domestic cards — all of them work.
Supported OpenAI models
As of March 2026, here are the OpenAI models available through AICredits:
| Model | Model ID | Input (INR/1M tokens) | Output (INR/1M tokens) |
|---|---|---|---|
| GPT-4o | openai/gpt-4o | ₹221.00 | ₹882.00 |
| GPT-4o-mini | openai/gpt-4o-mini | ₹13.23 | ₹52.91 |
| o3-mini | openai/o3-mini | ₹96.30 | ₹385.18 |
| GPT-4.1 | openai/gpt-4.1 | ₹176.80 | ₹707.20 |
GPT-4o-mini at ₹13.23/1M input tokens is worth calling out. For high-volume tasks — classification, extraction, summarization, routing — it's the right default. You'd spend ₹13 to process roughly 100 articles. That's not a rounding error, that's genuinely cheap.
Migration from direct OpenAI: 2 line changes
If you're already using the OpenAI SDK, migration is:
# Before
from openai import OpenAI
client = OpenAI(api_key="sk-your-openai-key")
# After
from openai import OpenAI
client = OpenAI(
api_key="sk-your-aicredits-key",
base_url="https://api.aicredits.in/v1"
)
That's it. Every call you make with this client now routes through AICredits. The model names get a provider prefix (openai/gpt-4o instead of gpt-4o), but that's the only other change.
Before:
response = client.chat.completions.create(
model="gpt-4o",
...
)
After:
response = client.chat.completions.create(
model="openai/gpt-4o", # prefix added
...
)
Chat completions
Full working example with system prompt, multi-turn conversation, and temperature:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AICREDITS_API_KEY"],
base_url="https://api.aicredits.in/v1"
)
def chat(messages: list, model: str = "openai/gpt-4o-mini") -> str:
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=0.7,
max_tokens=1024
)
return response.choices[0].message.content
# Multi-turn conversation
history = [
{"role": "system", "content": "You are a senior backend engineer. Be terse and specific."}
]
history.append({"role": "user", "content": "What's the best way to handle database connection pooling in Python?"})
reply = chat(history)
history.append({"role": "assistant", "content": reply})
print(reply)
history.append({"role": "user", "content": "How does this change if I'm using async SQLAlchemy?"})
reply = chat(history)
print(reply)
Streaming
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AICREDITS_API_KEY"],
base_url="https://api.aicredits.in/v1"
)
with client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{"role": "user", "content": "Write a FastAPI endpoint that accepts a JSON body, validates it with Pydantic, and writes to PostgreSQL using asyncpg"}
],
stream=True,
max_tokens=2048
) as stream:
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print()
The streaming response format is identical to OpenAI's SSE format. Anything that consumes that format — Vercel AI SDK, LangChain streaming callbacks, your own SSE parser — works without modification.
Function calling (tool use)
Function calling works exactly as it does with direct OpenAI:
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AICREDITS_API_KEY"],
base_url="https://api.aicredits.in/v1"
)
tools = [
{
"type": "function",
"function": {
"name": "get_stock_price",
"description": "Get the current stock price for an Indian company",
"parameters": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "NSE ticker symbol, e.g. RELIANCE, TCS, INFY"
}
},
"required": ["ticker"]
}
}
}
]
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What's the current price of TCS stock?"}],
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f"Model wants to call: {function_name}({arguments})")
# → Model wants to call: get_stock_price({'ticker': 'TCS'})
This is the foundation of agentic prompting — letting the model decide when to call external tools rather than hardcoding the flow. Build this pattern right and you can swap GPT-4o for Claude or Gemini without touching the tool definitions.
o3-mini reasoning model
o3-mini is OpenAI's reasoning model — it spends more compute "thinking" before responding, which makes it better at math, logic, and multi-step problems. The API call adds a reasoning_effort parameter:
response = client.chat.completions.create(
model="openai/o3-mini",
messages=[
{
"role": "user",
"content": """A startup has ₹50,000 monthly budget. They need:
- 10,000 GPT-4o calls averaging 1,000 input tokens and 500 output tokens each
- 50,000 GPT-4o-mini calls averaging 500 input tokens and 200 output tokens each
Calculate total cost using AICredits pricing and whether they fit in budget."""
}
],
# reasoning_effort: "low" | "medium" | "high"
extra_body={"reasoning_effort": "medium"}
)
print(response.choices[0].message.content)
At ₹96.30/1M input tokens, o3-mini sits between GPT-4o-mini and GPT-4o on price. For coding problems, SQL queries, and structured reasoning tasks, it's often better value than GPT-4o.
Image generation with DALL·E 3
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["AICREDITS_API_KEY"],
base_url="https://api.aicredits.in/v1"
)
response = client.images.generate(
model="openai/dall-e-3",
prompt="A Bangalore tech office at dusk, floor-to-ceiling glass windows, city lights below, warm interior lighting, cinematic photography style",
size="1024x1024",
quality="standard",
n=1
)
image_url = response.data[0].url
print(image_url)
Check the AICredits dashboard for DALL·E 3 pricing in INR — image generation is billed per image rather than per token.
Automatic failover
One underappreciated feature: if the underlying OpenAI endpoint is throttling or returning errors, AICredits can route to a backup. This isn't guaranteed instant recovery, but for production workloads running overnight batch jobs or handling user traffic, it reduces the blast radius of an OpenAI outage.
In practice this means your error rate stays lower than if you called OpenAI directly during a partial outage. For teams running customer-facing features on top of GPT-4o, this matters.
Common gotcha: model name format
The single most common error when switching to AICredits:
Error: model "gpt-4o" not found
Fix: prefix with the provider.
# Wrong
model="gpt-4o"
# Right
model="openai/gpt-4o"
Set this in an environment variable or constant so you only have to remember it once:
# config.py
GPT4O = "openai/gpt-4o"
GPT4O_MINI = "openai/gpt-4o-mini"
O3_MINI = "openai/o3-mini"
CLAUDE_HAIKU = "anthropic/claude-haiku-3-5-20241022"
GEMINI_FLASH = "google/gemini-2.0-flash"
Then use config.GPT4O_MINI in your calls. When you want to experiment with a different model, you change one line.
Per-key budget caps
Create separate API keys for separate projects or team members. Set a budget cap on each key in the AICredits dashboard. When the cap is hit, that key stops working — no runaway costs from a bug in production.
Recommended structure for a small team:
- Production key: Higher cap (₹5,000/month), tight access control
- Dev key: ₹500 cap, shared with developers
- Experiment key: ₹200 cap, used for testing new models or prompts
The dashboard shows per-key usage breakdowns so you can see exactly what each project is spending.
What you're getting for the 10% markup
The AICredits fee is transparent: 5% forex buffer + 5% platform fee on top of live rates. Consider what you'd otherwise pay: a Wise card has exchange rate spreads of 0.5–1.7% plus a monthly fee; an international credit card charges 2–3.5% as a foreign transaction fee. By the time you factor in those costs plus the hassle of maintaining a foreign currency account, 10% is often cheaper — and the UPI/domestic card access is priceless if you don't have that infrastructure already.
For the actual implementation patterns that consume most of your API budget, prompting for coding covers the prompts that give the highest quality-to-token ratio on code tasks. If you're building agents on top of GPT-4o, the Agents track covers the architecture decisions that matter.



