Skip to main content
Data Prompts

Feature Engineering Ideas

Generate candidate features for a machine learning model given a dataset description and prediction target.

advancedWorks with any modelData
Prompt
Generate feature engineering ideas for a machine learning model.

**Prediction target:** [TARGET]
(What are you trying to predict? e.g., "customer churn in the next 30 days", "house sale price", "loan default probability")

**Available raw columns:**
[RAW_COLUMNS]
(List column names and types — e.g., "signup_date (datetime), last_login (datetime), total_orders (int), avg_order_value (float), country (string), plan_tier (categorical: free/pro/enterprise)")

**Model type:** [MODEL_TYPE]
(e.g., "gradient boosting (XGBoost/LightGBM)", "logistic regression", "neural network", "random forest")

For each suggested feature:
1. **Feature name** — snake_case
2. **Construction** — exact formula or pandas/SQL code snippet to create it
3. **Rationale** — why this feature might predict [TARGET]
4. **Potential issue** — leakage risk, cardinality problem, or distribution concern

Organize features into categories:
- **Temporal features** (from datetime columns)
- **Aggregation features** (counts, means, recency, frequency)
- **Interaction features** (products or ratios of existing columns)
- **Encoding features** (how to handle categoricals for [MODEL_TYPE])
- **Domain-specific features** (business logic derived features)

Flag any features at high risk of data leakage.

How to Use

Describe the prediction target precisely in [TARGET] — the more specific (including the time horizon), the better the feature ideas. List all raw columns with their types in [RAW_COLUMNS]. Include [MODEL_TYPE] because the best encoding strategy (one-hot, target encoding, embeddings) differs by model family.

Variables

VariableDescription
[TARGET]What you are predicting — be specific about the definition and time horizon
[RAW_COLUMNS]All available columns with their data types
[MODEL_TYPE]The type of model, as this affects encoding and interaction feature strategies

Tips

  • After generating ideas, ask: "Which 5 of these features are most likely to have high predictive power and why?" to prioritize implementation.
  • For temporal data, always ask the AI to check for leakage: "Verify that none of these features could contain information from after the prediction cutoff date."
  • Use the SQL snippets to validate that the feature can actually be computed from your warehouse before implementing it in your Python pipeline.