What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Gemini 2.0 Flash Free Tier: Build AI Apps in India Without Spending a Rupee

Google's Gemini Flash free tier is legitimately useful. You can build real applications on it — chatbots, document processors, RAG systems — without entering a credit card. That's rare in the AI API world, and it matters a lot if you're an Indian developer who doesn't have an international card or doesn't want to risk unexpected charges.

But the free tier has limits, and they'll bite you in production in ways that aren't obvious until they happen. This guide covers exactly what you get, what you can actually build, and when to move on.

What's actually free in 2026

Google's free tier numbers as of April 2026:

Model	Requests/minute	Tokens/day	Requests/day
Gemini 2.0 Flash	15	1,000,000 (1M)	1,500
Gemini 2.0 Flash Lite	30	1,000,000 (1M)	1,500
Gemini 1.5 Pro	2	50,000	50
Gemini 2.0 Pro	Not available free	—	—

Gemini Flash is the sweet spot. 1M tokens/day is genuinely a lot — a typical chatbot interaction is 1,000-2,000 tokens, which means you can handle 500-1,000 conversations per day for free. The 15 req/min limit is where things get interesting (more on that below).

Gemini 1.5 Pro's free tier is almost useless — 2 requests per minute and 50,000 tokens/day is barely enough for manual testing. Don't plan production usage around it.

What you can realistically build on the free tier:

Document summarisers for internal use
Chatbots serving up to ~50 concurrent users (casual usage)
RAG prototypes over a few hundred documents
Batch processing pipelines (if you respect rate limits)
Developer tools used by your small team

What you can't sustain on free:

Consumer-facing products with any real traction
Anything requiring guaranteed response times
Use cases where 1M tokens/day isn't enough (long document workflows, large-scale summarisation)

Setting up with Google AI Studio

No card needed for the free tier. The setup takes about 5 minutes.

Go to aistudio.google.com and sign in with your Google account
Click "Get API key" in the left sidebar
Create a new project or select an existing one
Copy the API key — it starts with AIza

That's it. No billing setup, no card, no verification.

Install the SDK:

pip install google-generativeai

Basic test to confirm it works:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-2.0-flash")

response = model.generate_content("Explain GST input tax credit in 2 sentences")
print(response.text)

Using the OpenAI-compatible endpoint

If you're already using LangChain or OpenAI SDK syntax, Gemini has an OpenAI-compatible endpoint. This makes it easy to swap models without rewriting integration code:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gemini-2.0-flash",
    openai_api_key="YOUR_GEMINI_API_KEY",
    openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
)

response = llm.invoke("What is the RBI repo rate as of 2026?")
print(response.content)

LangChain setup for Gemini Flash

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key="YOUR_API_KEY",
    temperature=0.3,
)

messages = [
    SystemMessage(content="You are a helpful assistant for Indian small business owners."),
    HumanMessage(content="What documents do I need to register for GST?"),
]

response = llm.invoke(messages)
print(response.content)

What to build on the free tier

Here are 5 real project ideas for Indian developers, with estimated token usage so you know if the free tier can handle them.

1. WhatsApp business assistant

Use Twilio's WhatsApp API + Gemini Flash to give small businesses an automated WhatsApp bot. A typical conversation is 800-1,500 tokens. At 500 customers/day with 1 conversation each: ~750,000 tokens/day. That fits in the free tier.

Where it breaks: if customers have long conversations (troubleshooting, detailed queries), token usage spikes. Monitor your daily usage.

2. Resume screener for recruitment agencies

Many small Indian HR agencies still screen CVs manually. A resume screener using Gemini Flash can process PDFs and score them against a job description. One resume screening = ~3,000-5,000 tokens. Free tier handles ~200 resumes/day — plenty for a small agency.

3. Legal document summariser

Indian lawyers deal with court orders, contracts, and regulatory filings that are often 50-100 pages. Gemini's 1M token context window is great for this. One document summary = ~10,000-20,000 tokens. Free tier handles 50-100 documents/day — enough for an internal tool.

4. Product description generator for e-commerce

Sellers on Meesho, Flipkart, or their own Shopify store need product descriptions in Hindi and English. Generating 10 descriptions = ~5,000 tokens. Free tier handles 200 batch runs/day. Solid free-tier project.

5. Customer support FAQ bot

Build a RAG bot over your FAQ documents. A typical support interaction is 1,000-2,000 tokens. Free tier handles ~500 support queries/day — enough for a small SaaS product in early stage.

When the free tier starts hurting

Hitting rate limits

15 requests/minute sounds fine until you have 20 users hitting your app at the same time during a demo or launch. You'll start seeing 429 Resource Exhausted errors.

The fix is rate limiting and exponential backoff on your side:

import time
import random
from google.api_core.exceptions import ResourceExhausted

def call_with_backoff(model, prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return model.generate_content(prompt)
        except ResourceExhausted:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limit hit. Waiting {wait:.1f}s before retry {attempt + 1}")
            time.sleep(wait)

For user-facing apps, add a request queue so users get a "processing" state rather than an error:

from queue import Queue
import threading

request_queue = Queue()
RATE_LIMIT = 14  # stay under 15/min with headroom

def process_queue():
    while True:
        prompt, callback = request_queue.get()
        try:
            response = model.generate_content(prompt)
            callback(response.text)
        except Exception as e:
            callback(f"Error: {str(e)}")
        time.sleep(60 / RATE_LIMIT)  # throttle to stay under limit

Needing a better model

Gemini Flash is fast and cheap, but it's not the best model for every task. If you're doing:

Complex reasoning (multi-step problem solving, legal analysis)
Code generation where correctness matters
Creative writing where quality matters

...you'll notice the quality difference versus Claude Sonnet or GPT-4o. This is especially true for structured output tasks where Flash occasionally breaks JSON formatting under complex conditions.

When output quality starts mattering more than cost, that's when you want to add a premium model as a fallback or primary for high-stakes tasks.

Production reliability

The free tier has no SLA. Google can and does rate-limit aggressively during peak hours, and free tier users are deprioritised. I've seen free tier requests queue for 10-30 seconds during high-traffic periods — unacceptable for a user-facing product.

For anything real users depend on, you need either:

Paid Gemini API tier (requires international card)
A multi-model setup where you have a fallback

The upgrade path for Indian developers

Stay on Google: Upgrading to the paid Gemini API tier requires an international credit/debit card. Rates are reasonable — Gemini Flash is $0.075/1M input tokens ($0.15/1M output), which is roughly ₹6.3/1M input tokens — but the card requirement blocks many Indian developers.

Add Claude as a higher-quality option: If you want Claude Sonnet (significantly better at complex reasoning) and don't have an international card, AICredits.in provides API access to Claude, GPT-4o, and 300+ models with UPI billing. You pay in rupees, top up with PhonePe or Google Pay, no card needed.

Best of both worlds: Use Gemini Flash free for routine tasks (simple Q&A, summarisation of well-structured content, quick classifications), and route complex tasks to Claude Sonnet via AICredits.in. In code:

def get_llm(task_type: str):
    if task_type in ["simple_qa", "classification", "summary"]:
        # Free Gemini Flash for routine tasks
        return ChatGoogleGenerativeAI(
            model="gemini-2.0-flash",
            google_api_key=os.environ["GEMINI_API_KEY"],
        )
    else:
        # Claude Sonnet for complex reasoning via AICredits
        return ChatOpenAI(
            model="anthropic/claude-sonnet-4-6",
            openai_api_key=os.environ["AICREDITS_API_KEY"],
            openai_api_base="https://api.aicredits.in/v1",
        )

This approach gives you zero cost for ~80% of requests while maintaining quality for the 20% that matter.

💡 Want to add Claude to your stack? AICredits.in has UPI billing — no international card needed. Top up with ₹500 and you have access to Claude, GPT-4o, Gemini Pro, and 300+ models.

Next steps

Once you've got Gemini Flash running, these are the logical next moves:

Build a full RAG system: The RAG lesson covers embeddings, vector stores, and retrieval — you can use Gemini's embedding models for free too
Add LangChain properly: LangChain introduction guide walks through the full chain/agent setup
Compare models for your use case: DeepSeek vs Claude India 2026 covers the quality/cost tradeoffs relevant to Indian developers
Understand API gateways: AICredits.in review covers the multi-model access options available in India

What's actually free in 2026

Google's free tier numbers as of April 2026:

Model	Requests/minute	Tokens/day	Requests/day
Gemini 2.0 Flash	15	1,000,000 (1M)	1,500
Gemini 2.0 Flash Lite	30	1,000,000 (1M)	1,500
Gemini 1.5 Pro	2	50,000	50
Gemini 2.0 Pro	Not available free	—	—

Gemini 1.5 Pro's free tier is almost useless — 2 requests per minute and 50,000 tokens/day is barely enough for manual testing. Don't plan production usage around it.

What you can realistically build on the free tier:

Document summarisers for internal use
Chatbots serving up to ~50 concurrent users (casual usage)
RAG prototypes over a few hundred documents
Batch processing pipelines (if you respect rate limits)
Developer tools used by your small team

What you can't sustain on free:

Consumer-facing products with any real traction
Anything requiring guaranteed response times
Use cases where 1M tokens/day isn't enough (long document workflows, large-scale summarisation)

Setting up with Google AI Studio

No card needed for the free tier. The setup takes about 5 minutes.

Go to aistudio.google.com and sign in with your Google account
Click "Get API key" in the left sidebar
Create a new project or select an existing one
Copy the API key — it starts with AIza

That's it. No billing setup, no card, no verification.

Install the SDK:

pip install google-generativeai

Basic test to confirm it works:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-2.0-flash")

response = model.generate_content("Explain GST input tax credit in 2 sentences")
print(response.text)

Using the OpenAI-compatible endpoint

If you're already using LangChain or OpenAI SDK syntax, Gemini has an OpenAI-compatible endpoint. This makes it easy to swap models without rewriting integration code:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gemini-2.0-flash",
    openai_api_key="YOUR_GEMINI_API_KEY",
    openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
)

response = llm.invoke("What is the RBI repo rate as of 2026?")
print(response.content)

LangChain setup for Gemini Flash

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key="YOUR_API_KEY",
    temperature=0.3,
)

messages = [
    SystemMessage(content="You are a helpful assistant for Indian small business owners."),
    HumanMessage(content="What documents do I need to register for GST?"),
]

response = llm.invoke(messages)
print(response.content)

What to build on the free tier

Here are 5 real project ideas for Indian developers, with estimated token usage so you know if the free tier can handle them.

1. WhatsApp business assistant

Where it breaks: if customers have long conversations (troubleshooting, detailed queries), token usage spikes. Monitor your daily usage.

2. Resume screener for recruitment agencies

3. Legal document summariser

4. Product description generator for e-commerce

5. Customer support FAQ bot

Build a RAG bot over your FAQ documents. A typical support interaction is 1,000-2,000 tokens. Free tier handles ~500 support queries/day — enough for a small SaaS product in early stage.

When the free tier starts hurting

Hitting rate limits

15 requests/minute sounds fine until you have 20 users hitting your app at the same time during a demo or launch. You'll start seeing 429 Resource Exhausted errors.

The fix is rate limiting and exponential backoff on your side:

import time
import random
from google.api_core.exceptions import ResourceExhausted

def call_with_backoff(model, prompt, max_retries=5):
    for attempt in range(max_retries):
        try:
            return model.generate_content(prompt)
        except ResourceExhausted:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limit hit. Waiting {wait:.1f}s before retry {attempt + 1}")
            time.sleep(wait)

For user-facing apps, add a request queue so users get a "processing" state rather than an error:

from queue import Queue
import threading

request_queue = Queue()
RATE_LIMIT = 14  # stay under 15/min with headroom

def process_queue():
    while True:
        prompt, callback = request_queue.get()
        try:
            response = model.generate_content(prompt)
            callback(response.text)
        except Exception as e:
            callback(f"Error: {str(e)}")
        time.sleep(60 / RATE_LIMIT)  # throttle to stay under limit

Needing a better model

Gemini Flash is fast and cheap, but it's not the best model for every task. If you're doing:

Complex reasoning (multi-step problem solving, legal analysis)
Code generation where correctness matters
Creative writing where quality matters

...you'll notice the quality difference versus Claude Sonnet or GPT-4o. This is especially true for structured output tasks where Flash occasionally breaks JSON formatting under complex conditions.

When output quality starts mattering more than cost, that's when you want to add a premium model as a fallback or primary for high-stakes tasks.

Production reliability

For anything real users depend on, you need either:

Paid Gemini API tier (requires international card)
A multi-model setup where you have a fallback

The upgrade path for Indian developers

def get_llm(task_type: str):
    if task_type in ["simple_qa", "classification", "summary"]:
        # Free Gemini Flash for routine tasks
        return ChatGoogleGenerativeAI(
            model="gemini-2.0-flash",
            google_api_key=os.environ["GEMINI_API_KEY"],
        )
    else:
        # Claude Sonnet for complex reasoning via AICredits
        return ChatOpenAI(
            model="anthropic/claude-sonnet-4-6",
            openai_api_key=os.environ["AICREDITS_API_KEY"],
            openai_api_base="https://api.aicredits.in/v1",
        )

This approach gives you zero cost for ~80% of requests while maintaining quality for the 20% that matter.

💡 Want to add Claude to your stack? AICredits.in has UPI billing — no international card needed. Top up with ₹500 and you have access to Claude, GPT-4o, Gemini Pro, and 300+ models.

Next steps

Once you've got Gemini Flash running, these are the logical next moves:

Build a full RAG system: The RAG lesson covers embeddings, vector stores, and retrieval — you can use Gemini's embedding models for free too
Add LangChain properly: LangChain introduction guide walks through the full chain/agent setup
Compare models for your use case: DeepSeek vs Claude India 2026 covers the quality/cost tradeoffs relevant to Indian developers
Understand API gateways: AICredits.in review covers the multi-model access options available in India

What's actually free in 2026

Setting up with Google AI Studio

Using the OpenAI-compatible endpoint

LangChain setup for Gemini Flash

What to build on the free tier

1. WhatsApp business assistant

2. Resume screener for recruitment agencies

3. Legal document summariser

4. Product description generator for e-commerce

5. Customer support FAQ bot

When the free tier starts hurting

Hitting rate limits

Needing a better model

Production reliability

The upgrade path for Indian developers

Next steps

Related articles

AI Engineering Career Roadmap for Indian Developers: SDET/Backend to LLM Engineer in 6 Months

25 AI Prompts for Indian Startup Founders: Product, Pitch Deck, Investor Emails, and GTM

Anthropic's Claude for Open Source: How Indian Developers Can Get Claude Max Free

What's actually free in 2026

Setting up with Google AI Studio

Using the OpenAI-compatible endpoint

LangChain setup for Gemini Flash

What to build on the free tier

1. WhatsApp business assistant

2. Resume screener for recruitment agencies

3. Legal document summariser

4. Product description generator for e-commerce

5. Customer support FAQ bot

When the free tier starts hurting

Hitting rate limits

Needing a better model

Production reliability

The upgrade path for Indian developers

Next steps

Related articles

AI Engineering Career Roadmap for Indian Developers: SDET/Backend to LLM Engineer in 6 Months

25 AI Prompts for Indian Startup Founders: Product, Pitch Deck, Investor Emails, and GTM

Anthropic's Claude for Open Source: How Indian Developers Can Get Claude Max Free