Google's Gemini Flash free tier is legitimately useful. You can build real applications on it — chatbots, document processors, RAG systems — without entering a credit card. That's rare in the AI API world, and it matters a lot if you're an Indian developer who doesn't have an international card or doesn't want to risk unexpected charges.
But the free tier has limits, and they'll bite you in production in ways that aren't obvious until they happen. This guide covers exactly what you get, what you can actually build, and when to move on.
What's actually free in 2026
Google's free tier numbers as of April 2026:
| Model | Requests/minute | Tokens/day | Requests/day |
|---|---|---|---|
| Gemini 2.0 Flash | 15 | 1,000,000 (1M) | 1,500 |
| Gemini 2.0 Flash Lite | 30 | 1,000,000 (1M) | 1,500 |
| Gemini 1.5 Pro | 2 | 50,000 | 50 |
| Gemini 2.0 Pro | Not available free | — | — |
Gemini Flash is the sweet spot. 1M tokens/day is genuinely a lot — a typical chatbot interaction is 1,000-2,000 tokens, which means you can handle 500-1,000 conversations per day for free. The 15 req/min limit is where things get interesting (more on that below).
Gemini 1.5 Pro's free tier is almost useless — 2 requests per minute and 50,000 tokens/day is barely enough for manual testing. Don't plan production usage around it.
What you can realistically build on the free tier:
- Document summarisers for internal use
- Chatbots serving up to ~50 concurrent users (casual usage)
- RAG prototypes over a few hundred documents
- Batch processing pipelines (if you respect rate limits)
- Developer tools used by your small team
What you can't sustain on free:
- Consumer-facing products with any real traction
- Anything requiring guaranteed response times
- Use cases where 1M tokens/day isn't enough (long document workflows, large-scale summarisation)
Setting up with Google AI Studio
No card needed for the free tier. The setup takes about 5 minutes.
- Go to aistudio.google.com and sign in with your Google account
- Click "Get API key" in the left sidebar
- Create a new project or select an existing one
- Copy the API key — it starts with
AIza
That's it. No billing setup, no card, no verification.
Install the SDK:
pip install google-generativeai
Basic test to confirm it works:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Explain GST input tax credit in 2 sentences")
print(response.text)
Using the OpenAI-compatible endpoint
If you're already using LangChain or OpenAI SDK syntax, Gemini has an OpenAI-compatible endpoint. This makes it easy to swap models without rewriting integration code:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gemini-2.0-flash",
openai_api_key="YOUR_GEMINI_API_KEY",
openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
)
response = llm.invoke("What is the RBI repo rate as of 2026?")
print(response.content)
LangChain setup for Gemini Flash
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
google_api_key="YOUR_API_KEY",
temperature=0.3,
)
messages = [
SystemMessage(content="You are a helpful assistant for Indian small business owners."),
HumanMessage(content="What documents do I need to register for GST?"),
]
response = llm.invoke(messages)
print(response.content)
What to build on the free tier
Here are 5 real project ideas for Indian developers, with estimated token usage so you know if the free tier can handle them.
1. WhatsApp business assistant
Use Twilio's WhatsApp API + Gemini Flash to give small businesses an automated WhatsApp bot. A typical conversation is 800-1,500 tokens. At 500 customers/day with 1 conversation each: ~750,000 tokens/day. That fits in the free tier.
Where it breaks: if customers have long conversations (troubleshooting, detailed queries), token usage spikes. Monitor your daily usage.
2. Resume screener for recruitment agencies
Many small Indian HR agencies still screen CVs manually. A resume screener using Gemini Flash can process PDFs and score them against a job description. One resume screening = ~3,000-5,000 tokens. Free tier handles ~200 resumes/day — plenty for a small agency.
3. Legal document summariser
Indian lawyers deal with court orders, contracts, and regulatory filings that are often 50-100 pages. Gemini's 1M token context window is great for this. One document summary = ~10,000-20,000 tokens. Free tier handles 50-100 documents/day — enough for an internal tool.
4. Product description generator for e-commerce
Sellers on Meesho, Flipkart, or their own Shopify store need product descriptions in Hindi and English. Generating 10 descriptions = ~5,000 tokens. Free tier handles 200 batch runs/day. Solid free-tier project.
5. Customer support FAQ bot
Build a RAG bot over your FAQ documents. A typical support interaction is 1,000-2,000 tokens. Free tier handles ~500 support queries/day — enough for a small SaaS product in early stage.
When the free tier starts hurting
Hitting rate limits
15 requests/minute sounds fine until you have 20 users hitting your app at the same time during a demo or launch. You'll start seeing 429 Resource Exhausted errors.
The fix is rate limiting and exponential backoff on your side:
import time
import random
from google.api_core.exceptions import ResourceExhausted
def call_with_backoff(model, prompt, max_retries=5):
for attempt in range(max_retries):
try:
return model.generate_content(prompt)
except ResourceExhausted:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limit hit. Waiting {wait:.1f}s before retry {attempt + 1}")
time.sleep(wait)
For user-facing apps, add a request queue so users get a "processing" state rather than an error:
from queue import Queue
import threading
request_queue = Queue()
RATE_LIMIT = 14 # stay under 15/min with headroom
def process_queue():
while True:
prompt, callback = request_queue.get()
try:
response = model.generate_content(prompt)
callback(response.text)
except Exception as e:
callback(f"Error: {str(e)}")
time.sleep(60 / RATE_LIMIT) # throttle to stay under limit
Needing a better model
Gemini Flash is fast and cheap, but it's not the best model for every task. If you're doing:
- Complex reasoning (multi-step problem solving, legal analysis)
- Code generation where correctness matters
- Creative writing where quality matters
...you'll notice the quality difference versus Claude Sonnet or GPT-4o. This is especially true for structured output tasks where Flash occasionally breaks JSON formatting under complex conditions.
When output quality starts mattering more than cost, that's when you want to add a premium model as a fallback or primary for high-stakes tasks.
Production reliability
The free tier has no SLA. Google can and does rate-limit aggressively during peak hours, and free tier users are deprioritised. I've seen free tier requests queue for 10-30 seconds during high-traffic periods — unacceptable for a user-facing product.
For anything real users depend on, you need either:
- Paid Gemini API tier (requires international card)
- A multi-model setup where you have a fallback
The upgrade path for Indian developers
Stay on Google: Upgrading to the paid Gemini API tier requires an international credit/debit card. Rates are reasonable — Gemini Flash is $0.075/1M input tokens ($0.15/1M output), which is roughly ₹6.3/1M input tokens — but the card requirement blocks many Indian developers.
Add Claude as a higher-quality option: If you want Claude Sonnet (significantly better at complex reasoning) and don't have an international card, AICredits.in provides API access to Claude, GPT-4o, and 300+ models with UPI billing. You pay in rupees, top up with PhonePe or Google Pay, no card needed.
Best of both worlds: Use Gemini Flash free for routine tasks (simple Q&A, summarisation of well-structured content, quick classifications), and route complex tasks to Claude Sonnet via AICredits.in. In code:
def get_llm(task_type: str):
if task_type in ["simple_qa", "classification", "summary"]:
# Free Gemini Flash for routine tasks
return ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
google_api_key=os.environ["GEMINI_API_KEY"],
)
else:
# Claude Sonnet for complex reasoning via AICredits
return ChatOpenAI(
model="anthropic/claude-sonnet-4-6",
openai_api_key=os.environ["AICREDITS_API_KEY"],
openai_api_base="https://api.aicredits.in/v1",
)
This approach gives you zero cost for ~80% of requests while maintaining quality for the 20% that matter.
💡 Want to add Claude to your stack? AICredits.in has UPI billing — no international card needed. Top up with ₹500 and you have access to Claude, GPT-4o, Gemini Pro, and 300+ models.
Next steps
Once you've got Gemini Flash running, these are the logical next moves:
- Build a full RAG system: The RAG lesson covers embeddings, vector stores, and retrieval — you can use Gemini's embedding models for free too
- Add LangChain properly: LangChain introduction guide walks through the full chain/agent setup
- Compare models for your use case: DeepSeek vs Claude India 2026 covers the quality/cost tradeoffs relevant to Indian developers
- Understand API gateways: AICredits.in review covers the multi-model access options available in India



