What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Claude Computer Use: A Practical Prompting Guide

Claude's computer use capability — where the model can see your screen, move a cursor, type, and click — is one of those features that sounds like magic until you actually try to use it. Then you run into walls fast: the model clicks the wrong button, gets confused by a dropdown it can't read, or loops endlessly trying to find an element that moved.

The computer use API is genuinely useful. But it requires a different prompting mindset than you're used to. This guide covers what actually works.

What computer use is (and what it isn't)

Claude computer use gives the model three tools: screenshot (see the current screen state), computer (move mouse, click, type, press keys), and optionally bash (run shell commands). You define a task, Claude takes a screenshot to understand the current state, decides what action to take, acts, screenshots again, and repeats until done or stuck.

This is agentic prompting — each step affects the environment, which changes what the model sees next. Unlike a chat that you can undo, computer use actions are real. If Claude clicks "Delete" on a file, that file is gone.

The beta API works best for:

Repetitive desktop tasks that follow consistent UI patterns
Form-filling workflows across apps that don't have APIs
Browser automation where the page structure changes and CSS selectors break
Testing UI flows exactly as a user would see them

It struggles with:

Captchas and login flows with 2FA
Rapidly changing UIs (animations, hover states)
Tasks requiring precise pixel-level targeting
Any workflow that requires real-time response (it's not fast)

The core prompting pattern

The most reliable pattern is: be specific about the goal, name the application, describe the starting state, and define what "done" looks like.

Vague:

Fill out the expense report.

Better:

Open the Concur expense report app (already open in Chrome at concur.company.com).
I need to submit a new expense report for the March 4 team dinner.
The receipt is at ~/Downloads/dinner-receipt.pdf.
Amount: $342.50, Category: Team Meals, Project code: PROJ-2024-Q1.
The report is done when you see the confirmation screen with a submission number.

The specificity matters because Claude can't ask clarifying questions mid-task the same way you can in chat. It has to make decisions and it'll guess when uncertain — and guesses in UI automation are often wrong.

Grounding with screenshots

Don't assume Claude knows what state the screen is in. Start every session with an explicit screenshot request:

First, take a screenshot so you can see the current state of the screen.
Then proceed to [task].

Better still, describe what Claude should see before it starts:

Chrome is open with Gmail. The inbox is visible. I want you to:
1. Search for emails from sarah@company.com in the last 7 days
2. Star each one that mentions "Q1 budget"
3. Report back how many you starred

When Claude knows what it should be looking at, it can immediately confirm or flag if something is wrong.

Task decomposition for complex workflows

Long workflows fail more than short ones. A task with 15 steps has more ways to go wrong than one with 3 steps. Break complex automation into checkpointed chunks.

Instead of:

Download the sales data from our CRM, clean it in Excel, and upload it to the Google Sheet.

Run three separate calls:

# Call 1
Open Salesforce at [URL]. Export the Q1 opportunities report as CSV to ~/Downloads/.
Done when the file appears in ~/Downloads/.

# Call 2 (verify file exists first)
Open ~/Downloads/opportunities-q1.csv in Excel.
Remove rows where Stage is "Prospecting".
Save as ~/Downloads/opportunities-q1-cleaned.csv.

# Call 3
Open Google Sheets at [URL].
Import ~/Downloads/opportunities-q1-cleaned.csv into the "Q1 Data" sheet, replacing existing content.

Each call is independently verifiable. If step 2 fails, you haven't wasted the work from step 1.

Handling UI ambiguity

Claude sees the screen as an image. It uses visual understanding to identify buttons, fields, and text — but it can get confused by:

Similar-looking elements: "Click the Save button" when there's both a "Save" and "Save as Draft" button. Be explicit: "Click the blue 'Save' button in the top-right corner, not 'Save as Draft'."

Dynamic content: Dropdowns, modals, and tooltips that appear on hover. Tell Claude to hover first and wait: "Hover over the 'Actions' menu in the top toolbar. Wait for the dropdown to appear, then click 'Export'."

Scroll position: Claude might not know content is below the fold. "Scroll down on the left sidebar until you see 'Advanced Settings', then click it."

State changes: After clicking, the UI might take a moment to update. Tell Claude to wait and re-screenshot: "Click 'Generate Report'. Wait for the progress bar to disappear (it may take 10-15 seconds), then screenshot to confirm the report appeared."

Error recovery prompting

Agentic tasks fail. Build recovery into your prompts.

Your goal is to export the monthly report from the dashboard.
If you encounter a permissions error, stop and describe exactly what you saw.
If the Export button is grayed out or disabled, take a screenshot and report what you see — do not try to proceed.
If the page takes more than 30 seconds to load, stop and report.
Do not attempt more than 3 times if an action doesn't produce the expected result.

Without explicit stopping conditions, Claude will sometimes retry actions in a loop, clicking the same broken button repeatedly or trying different approaches that all fail for the same underlying reason.

The system prompt for reliability

For production computer use, a strong system prompt matters more than the individual task prompts. Here's a pattern that works:

You are an automation assistant with access to a desktop computer.

Core behaviors:
- Take a screenshot before starting any task to confirm the current screen state
- After each significant action, take a screenshot to verify the result
- If something looks unexpected, stop and describe what you see before continuing
- Never click "Delete", "Remove", "Unsubscribe", or any destructive action unless explicitly instructed
- If you encounter a login screen, stop and report — do not attempt to enter credentials
- Prefer keyboard shortcuts over mouse clicks when both are available
- When typing in a field, click the field first to confirm focus before typing

Uncertainty protocol:
- If you're 90%+ confident in an action, proceed
- If you're less certain, describe what you see and what you're considering doing, then ask
- If you're stuck after 2 attempts at the same step, stop and report what happened

Output format:
- Before each action: briefly describe what you're about to do
- After completing the task: summarize what you did and confirm the end state

Practical example: web form automation

Here's a full example that works reliably for filling a web form:

System: [Use the system prompt above]

User: Fill out the vendor onboarding form at [URL] with this information:
- Company name: Acme Corp
- Contact email: billing@acme.com
- Primary use case: Software licensing
- Monthly volume: $5,000-$10,000
- Tax ID: 12-3456789

Before submitting, take a screenshot of the filled form so I can review it.
Do NOT click Submit until I confirm.

The "show me before submitting" instruction is important. It adds a human checkpoint before irreversible action — a pattern you should use for anything that emails someone, charges money, or deletes data.

Bash vs computer for automation

When available, prefer bash over mouse clicks for tasks that can be done programmatically. Bash is faster, more reliable, and not dependent on UI state.

Computer use is ideal for tasks that genuinely require the visual interface — forms that aren't accessible via API, applications without CLI access, tasks where you need to visually confirm what you're doing.

But if the task can be done with a curl request, a Python script, or a CLI command, prompt Claude to use bash instead. The output is deterministic and doesn't depend on what's rendering on screen.

Option A (computer): Navigate to the downloads page, find the CSV, click download...
Option B (bash): curl -H "Authorization: Bearer $TOKEN" https://api.service.com/export/csv -o data.csv

Option B will succeed or fail clearly. Option A might work, or might click the wrong link, or might get blocked by a cookie banner.

What to build now

Computer use is in beta for a reason — it's capable but not production-ready for high-stakes autonomous workflows. Where it shines today:

Internal tooling where you control the environment and can test thoroughly
Research workflows that are tedious but not critical (scraping structured data from pages, filling out forms you'd otherwise do manually)
Testing your own applications from a user's perspective

For anything customer-facing or involving sensitive data, keep a human in the loop. Use it as a powerful assistant, not a fully autonomous agent.

The capability is evolving fast. The prompting patterns that work now — specific goals, checkpointed tasks, explicit stopping conditions, human confirmation before irreversible actions — will keep working as the model improves.

If you're building agentic workflows, the Agents track covers the underlying patterns. And for other automation approaches that don't require a visual interface, the function calling lesson is worth reading alongside this.

The computer use API is genuinely useful. But it requires a different prompting mindset than you're used to. This guide covers what actually works.

What computer use is (and what it isn't)

The beta API works best for:

Repetitive desktop tasks that follow consistent UI patterns
Form-filling workflows across apps that don't have APIs
Browser automation where the page structure changes and CSS selectors break
Testing UI flows exactly as a user would see them

It struggles with:

Captchas and login flows with 2FA
Rapidly changing UIs (animations, hover states)
Tasks requiring precise pixel-level targeting
Any workflow that requires real-time response (it's not fast)

The core prompting pattern

The most reliable pattern is: be specific about the goal, name the application, describe the starting state, and define what "done" looks like.

Vague:

Fill out the expense report.

Better:

Open the Concur expense report app (already open in Chrome at concur.company.com).
I need to submit a new expense report for the March 4 team dinner.
The receipt is at ~/Downloads/dinner-receipt.pdf.
Amount: $342.50, Category: Team Meals, Project code: PROJ-2024-Q1.
The report is done when you see the confirmation screen with a submission number.

Grounding with screenshots

Don't assume Claude knows what state the screen is in. Start every session with an explicit screenshot request:

First, take a screenshot so you can see the current state of the screen.
Then proceed to [task].

Better still, describe what Claude should see before it starts:

Chrome is open with Gmail. The inbox is visible. I want you to:
1. Search for emails from sarah@company.com in the last 7 days
2. Star each one that mentions "Q1 budget"
3. Report back how many you starred

When Claude knows what it should be looking at, it can immediately confirm or flag if something is wrong.

Task decomposition for complex workflows

Long workflows fail more than short ones. A task with 15 steps has more ways to go wrong than one with 3 steps. Break complex automation into checkpointed chunks.

Instead of:

Download the sales data from our CRM, clean it in Excel, and upload it to the Google Sheet.

Run three separate calls:

# Call 1
Open Salesforce at [URL]. Export the Q1 opportunities report as CSV to ~/Downloads/.
Done when the file appears in ~/Downloads/.

# Call 2 (verify file exists first)
Open ~/Downloads/opportunities-q1.csv in Excel.
Remove rows where Stage is "Prospecting".
Save as ~/Downloads/opportunities-q1-cleaned.csv.

# Call 3
Open Google Sheets at [URL].
Import ~/Downloads/opportunities-q1-cleaned.csv into the "Q1 Data" sheet, replacing existing content.

Each call is independently verifiable. If step 2 fails, you haven't wasted the work from step 1.

Handling UI ambiguity

Claude sees the screen as an image. It uses visual understanding to identify buttons, fields, and text — but it can get confused by:

Similar-looking elements: "Click the Save button" when there's both a "Save" and "Save as Draft" button. Be explicit: "Click the blue 'Save' button in the top-right corner, not 'Save as Draft'."

Scroll position: Claude might not know content is below the fold. "Scroll down on the left sidebar until you see 'Advanced Settings', then click it."

Error recovery prompting

Agentic tasks fail. Build recovery into your prompts.

Your goal is to export the monthly report from the dashboard.
If you encounter a permissions error, stop and describe exactly what you saw.
If the Export button is grayed out or disabled, take a screenshot and report what you see — do not try to proceed.
If the page takes more than 30 seconds to load, stop and report.
Do not attempt more than 3 times if an action doesn't produce the expected result.

The system prompt for reliability

For production computer use, a strong system prompt matters more than the individual task prompts. Here's a pattern that works:

You are an automation assistant with access to a desktop computer.

Core behaviors:
- Take a screenshot before starting any task to confirm the current screen state
- After each significant action, take a screenshot to verify the result
- If something looks unexpected, stop and describe what you see before continuing
- Never click "Delete", "Remove", "Unsubscribe", or any destructive action unless explicitly instructed
- If you encounter a login screen, stop and report — do not attempt to enter credentials
- Prefer keyboard shortcuts over mouse clicks when both are available
- When typing in a field, click the field first to confirm focus before typing

Uncertainty protocol:
- If you're 90%+ confident in an action, proceed
- If you're less certain, describe what you see and what you're considering doing, then ask
- If you're stuck after 2 attempts at the same step, stop and report what happened

Output format:
- Before each action: briefly describe what you're about to do
- After completing the task: summarize what you did and confirm the end state

Practical example: web form automation

Here's a full example that works reliably for filling a web form:

System: [Use the system prompt above]

User: Fill out the vendor onboarding form at [URL] with this information:
- Company name: Acme Corp
- Contact email: billing@acme.com
- Primary use case: Software licensing
- Monthly volume: $5,000-$10,000
- Tax ID: 12-3456789

Before submitting, take a screenshot of the filled form so I can review it.
Do NOT click Submit until I confirm.

Bash vs computer for automation

When available, prefer bash over mouse clicks for tasks that can be done programmatically. Bash is faster, more reliable, and not dependent on UI state.

But if the task can be done with a curl request, a Python script, or a CLI command, prompt Claude to use bash instead. The output is deterministic and doesn't depend on what's rendering on screen.

Option A (computer): Navigate to the downloads page, find the CSV, click download...
Option B (bash): curl -H "Authorization: Bearer $TOKEN" https://api.service.com/export/csv -o data.csv

Option B will succeed or fail clearly. Option A might work, or might click the wrong link, or might get blocked by a cookie banner.

What to build now

Computer use is in beta for a reason — it's capable but not production-ready for high-stakes autonomous workflows. Where it shines today:

Internal tooling where you control the environment and can test thoroughly
Research workflows that are tedious but not critical (scraping structured data from pages, filling out forms you'd otherwise do manually)
Testing your own applications from a user's perspective

For anything customer-facing or involving sensitive data, keep a human in the loop. Use it as a powerful assistant, not a fully autonomous agent.

Claude Computer Use: A Practical Prompting Guide

What computer use is (and what it isn't)

The core prompting pattern

Grounding with screenshots

Task decomposition for complex workflows

Handling UI ambiguity

Error recovery prompting

The system prompt for reliability

Practical example: web form automation

Bash vs computer for automation

What to build now

Related articles

Claude Max Plan — What You Get and Whether It's Worth It

50 Best AI Prompts for Claude That Actually Work (2026)

Claude API vs OpenAI API — Developer Comparison Guide (2026)

Claude Computer Use: A Practical Prompting Guide

What computer use is (and what it isn't)

The core prompting pattern

Grounding with screenshots

Task decomposition for complex workflows

Handling UI ambiguity

Error recovery prompting

The system prompt for reliability

Practical example: web form automation

Bash vs computer for automation

What to build now

Related articles

Claude Max Plan — What You Get and Whether It's Worth It

50 Best AI Prompts for Claude That Actually Work (2026)

Claude API vs OpenAI API — Developer Comparison Guide (2026)