What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Claude computer use in 2026 — practical workflows you can actually build today

Computer use has improved a lot since the 2024 launch. The OCR is better, screenshot analysis is faster, and element targeting is more reliable. It's still not the right tool for most tasks — if there's an API, use the API. But for specific use cases, it's surprisingly capable.

This post covers five workflows that actually work reliably in 2026, along with honest numbers on cost, reliability, and when to pick Stagehand or the direct API instead.

What computer use is (and isn't)

The architecture: a Docker container with a virtual desktop → Claude takes screenshots → sends to the API → receives mouse/keyboard actions → acts on the screen.

This means:

High latency: 2–5 seconds per action (screenshot upload + API call + action execution)
High cost: ~$0.10–0.30 per screenshot analysis. A 20-step workflow costs $2–6.
No API calls: Claude literally sees what a human sees and interacts the same way

When it's the right choice: legacy systems with no API, government portals, vendor software you can't integrate with, enterprise systems that predate modern APIs.

When it's the wrong choice: anything with an API. Web scraping that doesn't require interaction. Forms on modern sites (use Stagehand instead — it's cheaper and more reliable).

Setup

# Pull the official computer use Docker image
docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

# Run with VNC access
docker run -it \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -v $HOME/.anthropic:/home/user/.anthropic \
  -p 5900:5900 \  # VNC port
  -p 8080:8080 \  # Web interface
  ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Open http://localhost:8080 to see the virtual desktop and interact with it.

For production, run this on a VPS (you need at least 2GB RAM for the container):

# On Hostinger VPS or any cloud server
docker run -d \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -p 5900:5900 \
  -p 8080:8080 \
  ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

The computer use API

import anthropic
import base64

client = anthropic.Anthropic()

def take_screenshot() -> str:
    """Capture current desktop screenshot as base64. 
    In the Docker container, use the provided screenshot tool."""
    import subprocess
    result = subprocess.run(
        ["scrot", "-o", "/tmp/screenshot.png"],
        capture_output=True
    )
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode()

def run_computer_use_agent(task: str, max_steps: int = 25) -> str:
    messages = [{
        "role": "user",
        "content": task
    }]
    
    for step in range(max_steps):
        # Add current screenshot
        screenshot = take_screenshot()
        
        response = client.messages.create(
            model="claude-opus-4-7",  # Computer use requires Opus for reliability
            max_tokens=1024,
            tools=[
                {
                    "type": "computer_20250124",
                    "name": "computer",
                    "display_width_px": 1920,
                    "display_height_px": 1080,
                    "display_number": 1,
                }
            ],
            messages=messages + [{
                "role": "user",
                "content": [{
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot,
                    }
                }]
            }],
        )
        
        if response.stop_reason == "end_turn":
            return response.content[-1].text if response.content else "Task complete"
        
        # Execute tool actions
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                execute_computer_action(block.input)
        
        messages.append({"role": "assistant", "content": response.content})
    
    return "Reached step limit"

def execute_computer_action(action: dict):
    """Execute a mouse/keyboard action on the virtual desktop."""
    import pyautogui
    
    action_type = action.get("action")
    
    if action_type == "screenshot":
        pass  # Screenshot is taken at the start of each loop
    elif action_type == "mouse_move":
        pyautogui.moveTo(action["coordinate"][0], action["coordinate"][1])
    elif action_type == "left_click":
        pyautogui.click(action["coordinate"][0], action["coordinate"][1])
    elif action_type == "type":
        pyautogui.typewrite(action["text"], interval=0.05)
    elif action_type == "key":
        pyautogui.press(action["key"])
    elif action_type == "scroll":
        pyautogui.scroll(action.get("direction", "down") == "up" and 3 or -3)

Workflow 1: Automated form filling for government portals

The use case that no API covers: MCA21 filings, GST portal operations, state government portals, tender submission portals. All have UIs, none have public APIs.

# Prepare the data you want to submit
form_data = {
    "company_name": "MasterPrompting Technologies Pvt Ltd",
    "cin": "U72900KA2024PTC123456",
    "director_din": "12345678",
    "filing_type": "MGT-7",
    "financial_year": "2025-26",
}

task = f"""
Open Firefox and navigate to mca.gov.in/mcafoportal/login.do

Log in with credentials:
- Username: {MCA_USERNAME}
- Password: {MCA_PASSWORD}

After login, navigate to e-Filing → Annual Return (MGT-7).

Fill in the form with this data:
{form_data}

Take a screenshot before submitting. 
DO NOT click the final Submit button — stop at the review page and describe what you see.
"""

result = run_computer_use_agent(task)
print(result)  # Description of the review page for human to approve

Reliability: ~85% on simple single-page forms, ~60% on multi-page wizard forms with dynamic validation. Always stop before final submission and require human approval.

Workflow 2: UI regression testing

Claude as a visual QA checker — not for unit tests, but for "does this look right?" validation:

checklist = """
Navigate to https://staging.yourapp.com/login

Check and report on each item:
1. Is the logo visible in the top-left corner?
2. Are there two input fields labeled "Email" and "Password"?
3. Is the "Sign In" button visible and enabled?
4. Is there a "Forgot password?" link?
5. Does the page have a dark mode toggle?
6. Are there any visible error messages or broken images?
7. Does clicking "Sign In" with empty fields show validation messages?

For each check: PASS, FAIL, or PARTIAL. Describe any failures specifically.
"""

result = run_computer_use_agent(checklist)
print(result)
# → "1. PASS - Logo visible top-left
#    2. PASS - Two fields present
#    3. PASS
#    4. PASS
#    5. FAIL - No dark mode toggle visible on this page
#    6. PASS - No errors
#    7. PASS - 'Email is required' validation shown"

This is more reliable than Selenium for visual checks because it doesn't depend on element IDs or CSS classes — it actually looks at the rendered page.

Workflow 3: Legacy ERP data extraction

An old ERP with no export API, only a web UI. Extract 100 purchase orders by navigating the interface:

task = """
Open Chrome and navigate to http://erp.internal.company.com/login
Log in with the provided credentials.

Navigate to: Procurement → Purchase Orders → All POs

For each purchase order in the list (up to 20):
1. Click on the PO number to open it
2. Extract: PO number, vendor name, total amount, status, date created
3. Go back to the list
4. Move to the next PO

Format the extracted data as a JSON array.
Stop when you've processed 20 POs or reach the end of the list.
"""

result = run_computer_use_agent(task, max_steps=50)
# Parse the JSON from result
import json, re
json_match = re.search(r'\[.*\]', result, re.DOTALL)
if json_match:
    po_data = json.loads(json_match.group())

At 2–4 seconds per PO (navigate, read, extract, back) = 40–80 seconds for 20 POs. Budget $1–2 for this extraction. Slow, but it works on systems you can't otherwise touch.

Workflow 4: Cross-browser visual QA

import subprocess

browsers = ["google-chrome", "firefox"]
results = {}

for browser in browsers:
    subprocess.Popen([browser, "https://yourapp.com"])
    import time; time.sleep(3)
    
    task = f"""
    The {browser} browser should now be open with https://yourapp.com loaded.
    
    Take a screenshot and describe:
    1. Does the header render correctly (logo, navigation, dark mode toggle)?
    2. Are the fonts rendering (no missing glyphs)?
    3. Is the layout centered or broken?
    4. Rate the visual appearance: Good/Degraded/Broken
    
    Be specific about any issues you see.
    """
    
    results[browser] = run_computer_use_agent(task, max_steps=5)

# Compare results
for browser, result in results.items():
    print(f"\n{browser}:")
    print(result)

Workflow 5: Bulk data labeling via a UI labeling tool

When your labeling tool (Scale AI, Label Studio) doesn't expose the specific workflow you need via API:

task = """
Label Studio is open in the browser with the Image Classification project.

For each unlabeled image shown:
1. Look at the image
2. Identify the main subject: cat, dog, bird, other
3. Click the appropriate label button
4. Click Next to move to the following image

Continue until you've labeled 50 images or the project shows "Complete".
Count: keep track of how many of each label you've applied.
"""

result = run_computer_use_agent(task, max_steps=200)
print(result)  # Summary: "Labeled 50 images: 23 cat, 15 dog, 8 bird, 4 other"

Cost reality check

Workflow	Steps	Cost	Reliability
Simple form fill	10–15	$1–3	~85%
UI regression test	5–8	$0.50–1.50	~90%
Legacy data extraction	50–100	$5–20	~70%
Cross-browser QA	5–10	$0.50–2	~90%
Bulk labeling (50 items)	100–150	$10–30	~75%

For anything that can use an API or Stagehand, those options are significantly cheaper. Computer use's value is exclusively in the "no API available" case.

Use claude-opus-4-7 for computer use tasks — the smaller models are noticeably less reliable at spatial reasoning and element targeting.

This post covers five workflows that actually work reliably in 2026, along with honest numbers on cost, reliability, and when to pick Stagehand or the direct API instead.

What computer use is (and isn't)

The architecture: a Docker container with a virtual desktop → Claude takes screenshots → sends to the API → receives mouse/keyboard actions → acts on the screen.

This means:

High latency: 2–5 seconds per action (screenshot upload + API call + action execution)
High cost: ~$0.10–0.30 per screenshot analysis. A 20-step workflow costs $2–6.
No API calls: Claude literally sees what a human sees and interacts the same way

When it's the right choice: legacy systems with no API, government portals, vendor software you can't integrate with, enterprise systems that predate modern APIs.

When it's the wrong choice: anything with an API. Web scraping that doesn't require interaction. Forms on modern sites (use Stagehand instead — it's cheaper and more reliable).

Setup

# Pull the official computer use Docker image
docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

# Run with VNC access
docker run -it \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -v $HOME/.anthropic:/home/user/.anthropic \
  -p 5900:5900 \  # VNC port
  -p 8080:8080 \  # Web interface
  ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Open http://localhost:8080 to see the virtual desktop and interact with it.

For production, run this on a VPS (you need at least 2GB RAM for the container):

# On Hostinger VPS or any cloud server
docker run -d \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  -p 5900:5900 \
  -p 8080:8080 \
  ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

The computer use API

import anthropic
import base64

client = anthropic.Anthropic()

def take_screenshot() -> str:
    """Capture current desktop screenshot as base64. 
    In the Docker container, use the provided screenshot tool."""
    import subprocess
    result = subprocess.run(
        ["scrot", "-o", "/tmp/screenshot.png"],
        capture_output=True
    )
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode()

def run_computer_use_agent(task: str, max_steps: int = 25) -> str:
    messages = [{
        "role": "user",
        "content": task
    }]
    
    for step in range(max_steps):
        # Add current screenshot
        screenshot = take_screenshot()
        
        response = client.messages.create(
            model="claude-opus-4-7",  # Computer use requires Opus for reliability
            max_tokens=1024,
            tools=[
                {
                    "type": "computer_20250124",
                    "name": "computer",
                    "display_width_px": 1920,
                    "display_height_px": 1080,
                    "display_number": 1,
                }
            ],
            messages=messages + [{
                "role": "user",
                "content": [{
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot,
                    }
                }]
            }],
        )
        
        if response.stop_reason == "end_turn":
            return response.content[-1].text if response.content else "Task complete"
        
        # Execute tool actions
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                execute_computer_action(block.input)
        
        messages.append({"role": "assistant", "content": response.content})
    
    return "Reached step limit"

def execute_computer_action(action: dict):
    """Execute a mouse/keyboard action on the virtual desktop."""
    import pyautogui
    
    action_type = action.get("action")
    
    if action_type == "screenshot":
        pass  # Screenshot is taken at the start of each loop
    elif action_type == "mouse_move":
        pyautogui.moveTo(action["coordinate"][0], action["coordinate"][1])
    elif action_type == "left_click":
        pyautogui.click(action["coordinate"][0], action["coordinate"][1])
    elif action_type == "type":
        pyautogui.typewrite(action["text"], interval=0.05)
    elif action_type == "key":
        pyautogui.press(action["key"])
    elif action_type == "scroll":
        pyautogui.scroll(action.get("direction", "down") == "up" and 3 or -3)

Workflow 1: Automated form filling for government portals

The use case that no API covers: MCA21 filings, GST portal operations, state government portals, tender submission portals. All have UIs, none have public APIs.

# Prepare the data you want to submit
form_data = {
    "company_name": "MasterPrompting Technologies Pvt Ltd",
    "cin": "U72900KA2024PTC123456",
    "director_din": "12345678",
    "filing_type": "MGT-7",
    "financial_year": "2025-26",
}

task = f"""
Open Firefox and navigate to mca.gov.in/mcafoportal/login.do

Log in with credentials:
- Username: {MCA_USERNAME}
- Password: {MCA_PASSWORD}

After login, navigate to e-Filing → Annual Return (MGT-7).

Fill in the form with this data:
{form_data}

Take a screenshot before submitting. 
DO NOT click the final Submit button — stop at the review page and describe what you see.
"""

result = run_computer_use_agent(task)
print(result)  # Description of the review page for human to approve

Reliability: ~85% on simple single-page forms, ~60% on multi-page wizard forms with dynamic validation. Always stop before final submission and require human approval.

Workflow 2: UI regression testing

Claude as a visual QA checker — not for unit tests, but for "does this look right?" validation:

checklist = """
Navigate to https://staging.yourapp.com/login

Check and report on each item:
1. Is the logo visible in the top-left corner?
2. Are there two input fields labeled "Email" and "Password"?
3. Is the "Sign In" button visible and enabled?
4. Is there a "Forgot password?" link?
5. Does the page have a dark mode toggle?
6. Are there any visible error messages or broken images?
7. Does clicking "Sign In" with empty fields show validation messages?

For each check: PASS, FAIL, or PARTIAL. Describe any failures specifically.
"""

result = run_computer_use_agent(checklist)
print(result)
# → "1. PASS - Logo visible top-left
#    2. PASS - Two fields present
#    3. PASS
#    4. PASS
#    5. FAIL - No dark mode toggle visible on this page
#    6. PASS - No errors
#    7. PASS - 'Email is required' validation shown"

This is more reliable than Selenium for visual checks because it doesn't depend on element IDs or CSS classes — it actually looks at the rendered page.

Workflow 3: Legacy ERP data extraction

An old ERP with no export API, only a web UI. Extract 100 purchase orders by navigating the interface:

task = """
Open Chrome and navigate to http://erp.internal.company.com/login
Log in with the provided credentials.

Navigate to: Procurement → Purchase Orders → All POs

For each purchase order in the list (up to 20):
1. Click on the PO number to open it
2. Extract: PO number, vendor name, total amount, status, date created
3. Go back to the list
4. Move to the next PO

Format the extracted data as a JSON array.
Stop when you've processed 20 POs or reach the end of the list.
"""

result = run_computer_use_agent(task, max_steps=50)
# Parse the JSON from result
import json, re
json_match = re.search(r'\[.*\]', result, re.DOTALL)
if json_match:
    po_data = json.loads(json_match.group())

At 2–4 seconds per PO (navigate, read, extract, back) = 40–80 seconds for 20 POs. Budget $1–2 for this extraction. Slow, but it works on systems you can't otherwise touch.

Workflow 4: Cross-browser visual QA

import subprocess

browsers = ["google-chrome", "firefox"]
results = {}

for browser in browsers:
    subprocess.Popen([browser, "https://yourapp.com"])
    import time; time.sleep(3)
    
    task = f"""
    The {browser} browser should now be open with https://yourapp.com loaded.
    
    Take a screenshot and describe:
    1. Does the header render correctly (logo, navigation, dark mode toggle)?
    2. Are the fonts rendering (no missing glyphs)?
    3. Is the layout centered or broken?
    4. Rate the visual appearance: Good/Degraded/Broken
    
    Be specific about any issues you see.
    """
    
    results[browser] = run_computer_use_agent(task, max_steps=5)

# Compare results
for browser, result in results.items():
    print(f"\n{browser}:")
    print(result)

Workflow 5: Bulk data labeling via a UI labeling tool

When your labeling tool (Scale AI, Label Studio) doesn't expose the specific workflow you need via API:

task = """
Label Studio is open in the browser with the Image Classification project.

For each unlabeled image shown:
1. Look at the image
2. Identify the main subject: cat, dog, bird, other
3. Click the appropriate label button
4. Click Next to move to the following image

Continue until you've labeled 50 images or the project shows "Complete".
Count: keep track of how many of each label you've applied.
"""

result = run_computer_use_agent(task, max_steps=200)
print(result)  # Summary: "Labeled 50 images: 23 cat, 15 dog, 8 bird, 4 other"

Cost reality check

Workflow	Steps	Cost	Reliability
Simple form fill	10–15	$1–3	~85%
UI regression test	5–8	$0.50–1.50	~90%
Legacy data extraction	50–100	$5–20	~70%
Cross-browser QA	5–10	$0.50–2	~90%
Bulk labeling (50 items)	100–150	$10–30	~75%

For anything that can use an API or Stagehand, those options are significantly cheaper. Computer use's value is exclusively in the "no API available" case.

Use claude-opus-4-7 for computer use tasks — the smaller models are noticeably less reliable at spatial reasoning and element targeting.

Claude computer use in 2026 — practical workflows you can actually build today

What computer use is (and isn't)

Setup

The computer use API

Workflow 1: Automated form filling for government portals

Workflow 2: UI regression testing

Workflow 3: Legacy ERP data extraction

Workflow 4: Cross-browser visual QA

Workflow 5: Bulk data labeling via a UI labeling tool

Cost reality check

Related articles

Claude Max Plan — What You Get and Whether It's Worth It

50 Best AI Prompts for Claude That Actually Work (2026)

Claude API vs OpenAI API — Developer Comparison Guide (2026)

Claude computer use in 2026 — practical workflows you can actually build today

What computer use is (and isn't)

Setup

The computer use API

Workflow 1: Automated form filling for government portals

Workflow 2: UI regression testing

Workflow 3: Legacy ERP data extraction

Workflow 4: Cross-browser visual QA

Workflow 5: Bulk data labeling via a UI labeling tool

Cost reality check

Related articles

Claude Max Plan — What You Get and Whether It's Worth It

50 Best AI Prompts for Claude That Actually Work (2026)

Claude API vs OpenAI API — Developer Comparison Guide (2026)