Skip to main content
Data Prompts

Analyze CSV Data

Get a Python script to load, explore, and summarize a CSV dataset.

beginnerWorks with any modelData
Prompt
Generate a Python script to load and analyze a CSV dataset.

**CSV structure:**
[CSV_STRUCTURE]
(Describe the column names and their types — e.g., "customer_id (int), name (string), signup_date (YYYY-MM-DD), total_spent (float), country (string)" — and the approximate number of rows)

**Analysis goal:**
[ANALYSIS_GOAL]
(What do you want to find out? e.g., "find outliers in total_spent", "show the distribution of signups by month", "calculate correlations between numeric columns", "identify rows with missing values")

**File path variable:** assume the CSV is loaded from a variable called `FILE_PATH` at the top of the script.

Please generate a Python script using pandas that:

1. **Loads the file** — reads the CSV into a DataFrame with appropriate dtype hints based on the column descriptions.

2. **Describes the data** — prints: shape (rows, columns), column dtypes, count of null values per column, and basic descriptive statistics (mean, min, max, std) for numeric columns.

3. **Performs the requested analysis** — [ANALYSIS_GOAL] — with clear print statements labeling each output section.

4. **Prints a summary table** — a final formatted summary of the key findings.

Use clear variable names, add a comment above each logical block, and make the script runnable from the command line with `python script.py`.

How to Use

Describe your CSV columns by name and type in [CSV_STRUCTURE] — copy the header row from your file and annotate each column. Describe what you want to learn in [ANALYSIS_GOAL] as a plain-English question. The generated script is runnable; set the FILE_PATH variable at the top to your actual file path before running.

Variables

VariableDescription
[CSV_STRUCTURE]Column names, data types, and approximate row count of your CSV file
[ANALYSIS_GOAL]What you want to find out — describe the analysis in plain English

Tips

  • If you do not know all column types, describe what you know and mention "types unknown" for the rest — pandas will infer them on load, which the script will then report.
  • For very large files (over 1M rows), ask the model to add chunksize or dtype optimizations by including "large file, optimize memory usage" in [ANALYSIS_GOAL].