Data Prompts
Analyze CSV Data
Get a Python script to load, explore, and summarize a CSV dataset.
Prompt
Generate a Python script to load and analyze a CSV dataset. **CSV structure:** [CSV_STRUCTURE] (Describe the column names and their types — e.g., "customer_id (int), name (string), signup_date (YYYY-MM-DD), total_spent (float), country (string)" — and the approximate number of rows) **Analysis goal:** [ANALYSIS_GOAL] (What do you want to find out? e.g., "find outliers in total_spent", "show the distribution of signups by month", "calculate correlations between numeric columns", "identify rows with missing values") **File path variable:** assume the CSV is loaded from a variable called `FILE_PATH` at the top of the script. Please generate a Python script using pandas that: 1. **Loads the file** — reads the CSV into a DataFrame with appropriate dtype hints based on the column descriptions. 2. **Describes the data** — prints: shape (rows, columns), column dtypes, count of null values per column, and basic descriptive statistics (mean, min, max, std) for numeric columns. 3. **Performs the requested analysis** — [ANALYSIS_GOAL] — with clear print statements labeling each output section. 4. **Prints a summary table** — a final formatted summary of the key findings. Use clear variable names, add a comment above each logical block, and make the script runnable from the command line with `python script.py`.
How to Use
Describe your CSV columns by name and type in [CSV_STRUCTURE] — copy the header row from your file and annotate each column. Describe what you want to learn in [ANALYSIS_GOAL] as a plain-English question. The generated script is runnable; set the FILE_PATH variable at the top to your actual file path before running.
Variables
| Variable | Description |
|---|---|
| [CSV_STRUCTURE] | Column names, data types, and approximate row count of your CSV file |
| [ANALYSIS_GOAL] | What you want to find out — describe the analysis in plain English |
Tips
- If you do not know all column types, describe what you know and mention "types unknown" for the rest — pandas will infer them on load, which the script will then report.
- For very large files (over 1M rows), ask the model to add
chunksizeordtypeoptimizations by including "large file, optimize memory usage" in [ANALYSIS_GOAL].