Data analytics is India's second-largest AI-adjacent role after software engineering. The problem with most AI prompt guides for data analysts: they use generic "sales data" examples with no India-specific context. The result is prompts that technically work but don't match the actual situations Indian analysts face — BSE/NSE stock data structured in specific ways, RBI quarterly reports with their own quirks, FMCG Nielsen data with retailer-level granularity, e-commerce metrics from Flipkart and Myntra that behave very differently from Amazon US equivalents.
These 35 prompts are built around those real situations. Run them through Claude, GPT-4o, or Gemini — the templates work across models.
Exploratory Data Analysis (EDA) prompts
1. EDA starting point
Given this output from
df.info()anddf.describe(), generate an EDA plan. Specifically: what to check first, which distributions are worth examining and why, which relationships between variables are worth exploring, and what red flags to investigate immediately.
df.info()output:[OUTPUT]
df.describe()output:[OUTPUT]Dataset context: [WHAT THIS DATA IS — e.g., "Monthly transactions from a Tier 2 Indian FMCG distributor, 2021-2025"]
This beats asking "what should I do with this dataset?" because it forces the model to work from actual data characteristics rather than generating a generic EDA checklist.
2. Outlier investigation
For each of these outlier rows from my dataset, determine whether it's likely: (a) a data entry error, (b) a genuine edge case, or (c) a system/pipeline bug. Explain your reasoning.
Consider that in Indian e-commerce data: COD orders in tier-3 cities often have unusual patterns, orders during festive sales can have atypical values, and return-to-origin (RTO) rates vary significantly by PIN code cluster.
Outlier rows:
[DATA]Context about the data pipeline: [DESCRIPTION]
3. Correlation explainer for business stakeholders
Translate this correlation matrix into plain English. Avoid all technical jargon. Focus on which relationships are actionable — that is, which correlations suggest a lever a business team could pull. For each notable relationship: what it means, why it might exist, and what question it raises for the business.
[CORRELATION MATRIX]Business context: [WHAT THE BUSINESS DOES, WHO WILL READ THIS]
4. Distribution analysis
Analyze this distribution for [VARIABLE NAME]. Tell me: what does its shape suggest about the data generation process, what transformations should I consider if I'm using this in a model, are there signs of data quality issues (truncation, heaping at round numbers, bimodal patterns that suggest mixed populations), and what does this tell me about the underlying business reality?
Distribution description / histogram values: [DESCRIBE OR PASTE DATA]
5. Missing data strategy
My dataset has these missing value patterns. Recommend imputation strategies with reasoning for each column type.
Note: in Indian datasets, watch for these specific missingness causes — phone numbers entered with/without country codes causing pattern-based missingness, PIN codes left blank for digital-only customers, GST numbers absent for consumers vs businesses, gender fields often missing or non-binary.
Missing value summary:
[COLUMN, % MISSING, DTYPE, LIKELY USE IN ANALYSIS]Dataset context: [WHAT THIS WILL BE USED FOR]
6. Feature engineering for Indian e-commerce returns
Suggest 10 engineered features that might predict return rates for a fashion D2C brand selling in India. For each feature: the formula or derivation, why it might be predictive, and what data is required to compute it.
Consider: festive season effects (Diwali, Navratri, End of Reason Sale), COD vs prepaid order patterns (COD has significantly higher return rates), tier classification of PIN codes, GST category of products, size/fit information availability, and influencer traffic sources.
Current available columns: [LIST]
7. Customer segment discovery
What customer segments appear to exist in this behavioral data from an Indian app? How would I statistically confirm each segment's existence? What India-specific segments should I look for — for example: feature phone users with low-data browsing patterns, EMI buyers who consistently choose 3/6-month options, voice-first users, tier-2/3 users with distinctive timing patterns.
Data description: [BEHAVIORAL METRICS AVAILABLE] Sample statistics:
[SUMMARY STATS]
8. Indian retail seasonality analysis
Identify and interpret seasonal patterns in this monthly data. Specifically flag: Diwali and Dhanteras effects (October/November), Navratri patterns, Eid purchasing patterns in relevant categories, end-of-financial-year effect (March), monsoon impact (June-September) by category, and Big Billion Day / Great Indian Festival timing.
Monthly data:
[DATA: MONTH, VALUE]Category: [PRODUCT CATEGORY]
SQL and data extraction prompts
9. SQL from plain English (Indian business context)
Write a PostgreSQL query for this request: "Give me all customers who placed more than 3 orders during Diwali week (Oct 20 - Nov 3) in any of the last 3 years, where order value was above ₹2,000, and they haven't returned any item. Group by home state and rank by order count descending."
Table schemas:
[CREATE TABLE STATEMENTS]
Replace the English request with your actual requirement. Being specific about Indian business terminology (Diwali week, COD, RTO, etc.) gets significantly better query output than generic phrasing.
10. Query optimiser
This PostgreSQL query takes [X] seconds on our [Y] row orders table. Review and suggest optimisations. For each suggestion: what's wrong, the specific fix (index DDL, query rewrite, or config change), and the estimated improvement.
Current query:
[QUERY]EXPLAIN ANALYZE output:
[EXPLAIN OUTPUT]Table stats (row counts, current indexes):
[SCHEMA + \di OUTPUT]
11. Window function explainer
Explain step by step what this SQL window function is doing. Use a small example — 5-6 rows of sample data — to illustrate each step of the calculation. Then tell me: what are the common mistakes people make with this pattern, and what should I test to verify it's correct?
[THE WINDOW FUNCTION QUERY]
12. Data quality check generator
Generate PostgreSQL queries to check these data quality dimensions for my table: (1) null rates by column, (2) format validity — PAN format (5 uppercase letters + 4 digits + 1 uppercase letter), 15-digit GSTIN format, Indian mobile number format with/without +91, 6-digit PIN code, (3) referential integrity against the tables I specify, (4) duplicate detection by business key, (5) value range validation for numeric columns.
Table:
[CREATE TABLE]Business key: [COLUMN(S)] Reference tables: [IF ANY]
13. Schema documenter
Given this CREATE TABLE DDL, generate a human-readable data dictionary. For each column: column name, data type, business meaning (inferred from the name + context), valid values or constraints, and an example value.
[CREATE TABLE STATEMENT]System context: [WHAT THIS TABLE IS FROM — e.g., "RazorPay webhook data for a B2C e-commerce platform"]
14. ETL pipeline explainer
Explain what this stored procedure / dbt model / transformation script does in plain English. What business process does it implement? What are the assumptions it makes about the data? What could go wrong in production?
[PROCEDURE / QUERY]
15. Index strategy recommender
Given this slow query and table structure, recommend exactly which indexes to create. For each index: the CREATE INDEX statement, why it helps this query specifically, and any trade-offs (write overhead, storage).
Slow query:
[QUERY]Table structure:
[SHOW CREATE TABLE]Query frequency: [TIMES/DAY] | Table write rate: [INSERTS/DAY]
16. NSE/BSE stock data query
Write a PostgreSQL query to calculate these technical indicators for all Nifty 500 stocks from a daily OHLCV table: (1) 52-week high and low, (2) 200-day simple moving average, (3) 14-day RSI using Wilder's smoothing method.
Return one row per stock with the most recent values. Handle stocks that don't have 200+ days of history gracefully (return NULL rather than error).
Table schema:
CREATE TABLE daily_prices ( symbol VARCHAR(20), trade_date DATE, open DECIMAL(10,2), high DECIMAL(10,2), low DECIMAL(10,2), close DECIMAL(10,2), volume BIGINT );
Data storytelling and communication prompts
17. Insight narrative
Convert these 5 raw analysis findings into a compelling executive summary. Lead with the most business-important insight, not the most statistically interesting one. Use concrete numbers throughout. For each finding: a clear one-sentence insight, the supporting evidence, and one recommended action.
Raw findings: [LIST YOUR FINDINGS]
Audience: [WHO THIS IS FOR — e.g., "Growth VP at a B2C fintech, operational focus"]
18. Dashboard KPI selection
Help me choose the 5 most important metrics for a dashboard for a [Tier 2 India D2C brand / regional NBFC / EdTech startup targeting Bharat]. For each metric: what it is, how it's calculated, what action it should trigger when it moves, and what's a reasonable threshold to alert on.
Business model: [DESCRIPTION] Available data: [WHAT YOU HAVE]
The "what action it should trigger" framing prevents dashboards from becoming passive displays that no one acts on.
19. CFO presentation structure
Structure these analysis findings into a 10-slide deck for a CFO. For each slide: title, key message (one sentence), supporting data to include, and a call to action if applicable. Slides must include: the question being answered, methodology (simplified), top 3 findings, financial implications, and recommended next steps.
Analysis findings: [BULLET POINTS OF YOUR FINDINGS]
20. Negative finding communicator
I found that [PRODUCT FEATURE / CAMPAIGN / PARTNERSHIP] didn't work as expected. I need to communicate this constructively to a stakeholder who championed the initiative. Draft the message — email or Slack — that: states the finding clearly without burying it, provides context that's fair (not excuse-making), suggests what to learn from this, and proposes a next step.
The finding: [WHAT YOU FOUND] The stakeholder's role: [DESCRIPTION] Their emotional investment: [HOW MUCH THEY CARE]
21. A/B test result explainer
Explain these A/B test results to a non-technical product manager. Specifically: what the numbers mean in plain English, whether the result is statistically significant and what that actually means, what the practical effect size means for the business, and what we should do next.
Results:
- Control: [N users, X% conversion]
- Treatment: [N users, X% conversion]
- P-value: [VALUE]
- Test duration: [DAYS]
- Primary metric: [METRIC NAME AND BUSINESS MEANING]
22. Metric definition writer
Write clear business definitions for these metrics. For each: precise calculation formula (what's in the numerator and denominator, what's excluded and why), the edge cases that people argue about, how it should be interpreted (what's good, what's bad, what drives it), and what it should NOT be confused with.
Metrics to define: [LIST — e.g., "D30 retention", "NMV", "Take rate", "CAC:LTV ratio"]
Business context: [COMPANY TYPE]
23. Hypothesis generator
Generate 10 hypotheses for why [METRIC] dropped [X%] in [PERIOD]. For each hypothesis: plausibility rating (high/medium/low) given what's typical in Indian [market/industry], what data would confirm or deny it, and a rough effort estimate to investigate.
Rank them by: most likely × fastest to verify.
Context:
- Metric: [NAME AND DEFINITION]
- Drop: [MAGNITUDE, DURATION]
- What changed recently: [CODE DEPLOYS, CAMPAIGNS, PRICING, EXTERNAL EVENTS]
- What didn't change: [RELEVANT STABLE FACTORS]
24. Recommendation writer
Convert these data findings into 3 concrete business recommendations. For each recommendation: what to do (specific, actionable), the data evidence behind it, expected impact (quantified where possible), risks and what could go wrong, and who needs to approve or act.
Findings: [YOUR ANALYSIS OUTPUT]
Business constraints: [BUDGET, TIMELINE, TEAM CAPACITY]
25. Cohort retention explainer
Explain what this cohort retention chart tells us about our users. What's the shape of the curve? Is it healthy? What does the Week 1 drop-off rate tell us compared to Week 4? Are there any anomalies in specific cohorts? What would a healthy retention curve look like for a [PRODUCT TYPE] in India?
Cohort table:
[PASTE COHORT DATA — Week 0, Week 1, Week 2...]Product type: [APP TYPE — e.g., "B2C fintech app targeting tier 2 India"]
26. NASSCOM/Nielsen benchmark framer
We have these internal metrics that compare unfavorably/favorably to industry benchmarks. Help me frame this comparison fairly in a stakeholder presentation — when our numbers look worse than benchmarks but there's a legitimate reason, and where we genuinely underperform and should say so.
Our metrics: [VALUES] Industry benchmarks (source): [BENCHMARKS — NASSCOM report / Nielsen data / competitor disclosures] Relevant differences in our business model or customer mix: [DESCRIPTION]
Python and pandas prompts
27. Pandas code generator
Write pandas code to [SPECIFIC TRANSFORMATION]. Optimise for readability first, then performance. Include comments explaining non-obvious steps. Add a brief validation at the end to confirm the output looks correct.
Input dataframe structure:
# [describe columns, dtypes, shape]Desired output: [DESCRIBE]
Example: "Pivot this long transaction table to wide format where each row is a customer and columns are monthly spend by category, filling missing month-category combinations with 0"
28. Pandas performance optimiser
This pandas operation on [X] rows takes [Y] seconds. Suggest faster alternatives. For each suggestion: the code, why it's faster, and whether it changes the output in any edge cases.
Current code:
[CODE]Priority: [SPEED vs MEMORY — e.g., "speed, we have 32GB RAM available"]
29. Indian data regex patterns
Write Python regex patterns to extract and validate these data types from free text or user input:
- PAN numbers (format: ABCDE1234F — 5 uppercase letters, 4 digits, 1 uppercase letter)
- GST numbers (15-character GSTIN format)
- Indian mobile numbers (with/without +91 or 0 prefix, with/without spaces or dashes)
- 6-digit PIN codes (validate against range 110001-855117)
- IFSC codes (4 letters + 0 + 6 alphanumeric)
For each: the regex pattern, a validation function, and test cases covering valid + common invalid formats.
30. Indian fiscal year handling
Write Python functions to handle Indian fiscal year (April-March) calculations:
get_indian_fy(date)— returns FY string e.g. "FY2025-26"get_fy_quarter(date)— returns Q1/Q2/Q3/Q4 of Indian FY (Q1 = Apr-Jun)get_advance_tax_dates(fy)— returns the 4 advance tax due dates for a given FYdays_remaining_in_fy(date)— days until March 31 of current FYfy_date_range(fy_string)— returns (start_date, end_date) for a given FY stringHandle edge cases: dates exactly on April 1 and March 31, FY strings in both "FY2025-26" and "2025-26" formats.
31. Indian number formatter for charts
Write a matplotlib formatter and a standalone Python function that display numbers in Indian format:
- Values ≥ 1 crore: display as "X.XX Cr"
- Values ≥ 1 lakh: display as "X.XX L"
- Values < 1 lakh: display as "X,XX,XXX" (Indian comma format)
Include:
- A standalone
format_indian(n, decimals=2)function- A
matplotlib.ticker.FuncFormattercompatible function for axis labels- Usage example applying it to a bar chart Y-axis
32. Streamlit dashboard from analysis
Given this analysis (describe the findings and the data shape), generate a Streamlit app with: a sidebar with filters relevant to the analysis, main KPI cards showing the top-level numbers, at least one time-series or comparison chart using plotly, and a data table with download-to-CSV button.
Analysis description: [WHAT THE ANALYSIS SHOWS] Data shape: [COLUMNS, TYPES, SIZE] Filters that make sense: [DATE RANGE, CATEGORY, REGION, ETC.]
33. Analytics unit test writer
Write pytest tests for this data transformation function. Cover: the happy path, empty dataframe input, missing required columns, wrong data types for key columns, and relevant boundary values for the business logic.
For each test: a clear test name that describes what scenario is being tested, the arrange/act/assert structure, and a brief comment explaining why this case matters.
Function:
[FUNCTION CODE]
34. Slow pandas profiler
Help me identify the bottleneck in this pandas pipeline. Suggest where to add profiling instrumentation —
line_profiler,memory_profiler, or a simpler%%time/%%timeitapproach. Also: based on the code, where do you suspect the bottleneck is and why?[PIPELINE CODE]Observed behavior: [WHAT'S SLOW — "the whole thing" / "it's fine until X step" / etc.]
35. Jupyter notebook documenter
Add markdown cells to this analysis notebook. For each code section: a markdown cell above explaining the business question that section answers, the methodology choice and why (briefly), and what the output means. For any charts or tables: a markdown cell below interpreting the result, not just describing it.
Notebook cells:
[PASTE NOTEBOOK CONTENT OR DESCRIBE CELL BY CELL]Target reader: [WHO WILL READ THIS — data team peer / stakeholder / future self]
💡 Want to go deeper? For running these prompts via Claude API with UPI billing in India — ₹100 minimum, no international card needed — visit AICredits.in.
Next steps
These 35 prompts cover the core data analyst workflow. The next layer is building automated pipelines that run these analyses on a schedule:
- How RAG works — for querying your own datasets with natural language
- Prompting for data analysis — general data analysis techniques
- AI research workflows — systematic approaches to data-driven research
- More profession-specific prompt guides
Try it now with AICredits.in
Access Claude, GPT-4o, Gemini, and 300+ models with UPI payment in ₹. No international card needed. Create free account →



