What is prompt engineering?

Prompt engineering is the practice of crafting inputs to AI language models to produce accurate, useful, and reliable outputs. It involves choosing the right words, structure, context, and format to guide the AI toward the response you actually need — rather than a generic or off-target one.

Which AI models benefit most from better prompting?

All major large language models — including ChatGPT (GPT-4o), Claude, and Gemini — respond significantly to prompt quality. The same task can produce dramatically different results depending on how you structure your request. Better prompting improves output across every major model.

Do I need technical skills to do prompt engineering?

No. Prompt engineering is done in natural language — you write text instructions, not code. Basic prompting needs no technical background at all. Advanced techniques like prompt chaining or agentic workflows can benefit from light scripting knowledge, but the core skill is clear written communication.

Where can I learn more about prompt engineering?

MasterPrompting.net offers a structured curriculum from beginner to advanced, covering every major technique from basic clarity and context to chain-of-thought, meta-prompting, and agentic workflows. Start with the Beginner track to build a solid foundation.

Claude Computer Use API: Build a Browser Automation Agent in Python

Claude 4.6 scored 72%+ on OSWorld — the standard benchmark for AI computer use. That's a meaningful jump from prior generations and puts it in the territory of genuine production use for browser automation. If you're still writing Playwright or Selenium scripts for repetitive web tasks, there's now an alternative that requires no selectors and handles dynamic UIs without constant maintenance.

The tradeoff is real: computer use is slower and more expensive per action than Playwright. But for workflows where no API exists, where the UI changes frequently, or where you're dealing with legacy government portals (if you're building in India, you know what I mean), it's often the only practical option.

What computer use actually is

The fundamental loop is a screenshot-action cycle: Claude receives a screenshot of the current screen state, decides what action to take next — click, type, scroll, press a key — your code executes that action, takes a new screenshot, and sends it back. Claude never sees the DOM or HTML source. It sees exactly what a human would see. That's the whole thing.

This is why it handles dynamic JavaScript, custom UI frameworks, and non-standard layouts better than selector-based automation. Playwright breaks when a CSS class changes. Computer use doesn't care — it's looking at pixels, not the DOM tree.

There are three computer use tools:

computer — screenshots, mouse clicks, keyboard input, scrolling
text_editor — file reading and writing (for multi-file tasks)
bash — shell command execution

Most browser automation workflows only need computer. The other two matter for agent workflows that also read/write files or run shell commands.

Setting up the computer use tools

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1366,
        "display_height_px": 768,
        "display_number": 1
    }
]

The display dimensions matter. Claude calibrates click coordinates to the display size you declare. If you capture screenshots at 1920x1080 but tell Claude the display is 1366x768, every click will land in the wrong place.

For headless operation on a server, you need a virtual display. On Linux, xvfb-run handles this:

# Install dependencies
sudo apt-get install xvfb xdotool scrot

# Run your script with a virtual display
xvfb-run -s "-screen 0 1366x768x24" python your_agent.py

Alternatively, you can use Playwright's Chromium as the rendering layer and take screenshots via Playwright's screenshot API. This gives you better cross-platform support without the Xvfb setup on macOS.

The screenshot-action loop (Python implementation)

Here's a working implementation that handles the core loop:

import anthropic
import base64
import subprocess
from pathlib import Path

client = anthropic.Anthropic()

tools = [
    {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1366,
        "display_height_px": 768,
        "display_number": 1
    }
]

def take_screenshot() -> str:
    """Takes a screenshot and returns base64 encoded PNG."""
    subprocess.run(["scrot", "/tmp/screenshot.png"], check=True)
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode()

def execute_action(action: dict) -> None:
    """Executes a computer action returned by Claude."""
    action_type = action.get("type") or action.get("action")
    
    if action_type == "screenshot":
        pass  # handled by take_screenshot()
    elif action_type == "left_click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
    elif action_type == "right_click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "3"])
    elif action_type == "double_click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "--repeat", "2", "1"])
    elif action_type == "type":
        subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
    elif action_type == "key":
        subprocess.run(["xdotool", "key", action["key"]])
    elif action_type == "scroll":
        x, y = action["coordinate"]
        direction = action.get("direction", "down")
        amount = action.get("amount", 3)
        btn = "5" if direction == "down" else "4"
        for _ in range(amount):
            subprocess.run(["xdotool", "click", "--clearmodifiers", btn])

def run_computer_agent(task: str, max_steps: int = 20) -> str:
    """Runs a computer use agent for a given task."""
    messages = []
    
    for step in range(max_steps):
        screenshot = take_screenshot()
        
        # Build the user message with the current screenshot
        user_content = [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot
                }
            }
        ]
        
        # First step: include the task. Subsequent steps: just "Continue."
        if step == 0:
            user_content.append({"type": "text", "text": task})
        else:
            user_content.append({"type": "text", "text": "Continue with the task."})
        
        messages.append({"role": "user", "content": user_content})
        
        response = client.messages.create(
            model="claude-opus-4-6",  # Opus for best computer use performance
            max_tokens=4096,
            tools=tools,
            messages=messages,
            thinking={"type": "adaptive"},
            effort="medium"
        )
        
        # Task complete
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return "Task completed."
        
        # Execute all tool use actions
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                execute_action(block.input)
        
        messages.append({"role": "assistant", "content": response.content})
    
    return f"Reached max steps ({max_steps}) without completing."

The max_steps guard is important. Computer use workflows can loop if Claude gets stuck, and each step costs real money.

Real examples

Form filling

The task prompt handles the whole thing — no selectors, no field IDs:

result = run_computer_agent(
    task="""
    Open Firefox and navigate to https://example.com/contact.
    Fill in the contact form with:
    - Name: Priya Sharma
    - Email: priya@example.com  
    - Phone: +91 98765 43210
    - Message: I'd like to enquire about your enterprise plan.
    
    Click Submit. Confirm the success message appears and tell me what it says.
    """
)
print(result)

Data extraction

result = run_computer_agent(
    task="""
    Open https://example.com/products in Firefox.
    Scroll through the entire product listing (scroll down until no new products appear).
    Extract all product names and their prices into a list.
    Return the list as: Product Name | Price (INR)
    """
)

MCA21 portal automation (India-specific)

This is one of the most common Indian developer use cases. MCA21 — the Ministry of Corporate Affairs portal — has no API, a non-standard UI, and breaks Playwright regularly after updates. Computer use handles it:

result = run_computer_agent(
    task="""
    Open the MCA21 portal at https://efiling.mca.gov.in.
    Log in with username [USERNAME] and password [PASSWORD].
    Navigate to the company filing section.
    Search for company with CIN: [CIN_NUMBER].
    Download the most recent annual return filing (MGT-7 or MGT-7A).
    Save it to /tmp/annual_return.pdf.
    Confirm the download completed.
    """
)

The same pattern works for GSTN portal, Income Tax e-filing, EPFO employer portal, and SEBI SCORES — all portals that have frustrated Indian developers for years. Computer use doesn't care that they're built on decade-old frameworks.

Cost management — computer use gets expensive fast

Computer use typically consumes 1,000–5,000 tokens per screenshot-action pair. A 20-step workflow: 20K–100K tokens. At Opus 4.6 pricing ($5/$25 per MTok input/output), a complex 20-step task can cost $0.50–$2.50 (roughly ₹42–₹210).

That's expensive if you're running hundreds of automations. A few patterns to control costs:

Use Sonnet 4.6 for simpler workflows. Sonnet 4.6 costs $3/$15 per MTok vs. Opus's $5/$25 — 40% cheaper on input. For straightforward form filling and data extraction, Sonnet performs comparably to Opus on computer use.

Cap steps aggressively. The max_steps parameter in the loop above is your budget ceiling. Set it based on expected task complexity, not "just in case."

Step counter pattern for budget awareness:

def run_computer_agent_with_budget(task: str, max_steps: int = 15, 
                                    token_budget: int = 50000) -> str:
    messages = []
    total_tokens = 0
    
    for step in range(max_steps):
        # ... (same loop as before) ...
        
        response = client.messages.create(...)
        total_tokens += response.usage.input_tokens + response.usage.output_tokens
        
        print(f"Step {step + 1}: {response.usage.input_tokens} in, "
              f"{response.usage.output_tokens} out, "
              f"total: {total_tokens}")
        
        if total_tokens > token_budget:
            return f"Budget exceeded at step {step + 1}. Partial result: ..."
        
        if response.stop_reason == "end_turn":
            return response.content[-1].text
        
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                execute_action(block.input)
        
        messages.append({"role": "assistant", "content": response.content})

Try it now with AICredits.in

Access Claude Opus 4.6 for computer use tasks in India with UPI payment. Set per-key spending limits to control automation costs — no international card needed. Create free account →

Limitations worth knowing before you commit

CAPTCHAs: Claude can attempt most visual CAPTCHAs — image recognition ones, simple text CAPTCHAs, checkbox "I'm not a robot." But reCAPTCHA v3 (invisible, behaviour-based) and hCaptcha's harder challenges will fail. Don't build production workflows that depend on CAPTCHA solving.

Multi-tab workflows: Current computer use is single-display. If your workflow requires switching between multiple browser tabs, you need to give explicit instructions for each tab switch ("press Ctrl+Tab to switch to the next tab"). It works — it's just not automatic.

Video content: Claude sees static screenshots, not live video or animated elements. Dropdown menus, loading spinners, and animations are only captured at the moment of the screenshot. Add small delays before screenshots that follow click actions that trigger animations.

HTTPS certificate errors: Claude will flag these and ask for confirmation rather than bypass them. Build your test environments with valid certificates, or handle the certificate error prompt explicitly in your task instructions.

Latency: Each screenshot-action pair involves a full API round trip. A 10-step workflow takes 30–90 seconds depending on network and model speed. This isn't a replacement for Playwright on latency-sensitive tasks — it's for workflows where reliability matters more than speed.

Next steps

What computer use actually is

There are three computer use tools:

computer — screenshots, mouse clicks, keyboard input, scrolling
text_editor — file reading and writing (for multi-file tasks)
bash — shell command execution

Most browser automation workflows only need computer. The other two matter for agent workflows that also read/write files or run shell commands.

Setting up the computer use tools

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1366,
        "display_height_px": 768,
        "display_number": 1
    }
]

For headless operation on a server, you need a virtual display. On Linux, xvfb-run handles this:

# Install dependencies
sudo apt-get install xvfb xdotool scrot

# Run your script with a virtual display
xvfb-run -s "-screen 0 1366x768x24" python your_agent.py

The screenshot-action loop (Python implementation)

Here's a working implementation that handles the core loop:

import anthropic
import base64
import subprocess
from pathlib import Path

client = anthropic.Anthropic()

tools = [
    {
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1366,
        "display_height_px": 768,
        "display_number": 1
    }
]

def take_screenshot() -> str:
    """Takes a screenshot and returns base64 encoded PNG."""
    subprocess.run(["scrot", "/tmp/screenshot.png"], check=True)
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode()

def execute_action(action: dict) -> None:
    """Executes a computer action returned by Claude."""
    action_type = action.get("type") or action.get("action")
    
    if action_type == "screenshot":
        pass  # handled by take_screenshot()
    elif action_type == "left_click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
    elif action_type == "right_click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "3"])
    elif action_type == "double_click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "--repeat", "2", "1"])
    elif action_type == "type":
        subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
    elif action_type == "key":
        subprocess.run(["xdotool", "key", action["key"]])
    elif action_type == "scroll":
        x, y = action["coordinate"]
        direction = action.get("direction", "down")
        amount = action.get("amount", 3)
        btn = "5" if direction == "down" else "4"
        for _ in range(amount):
            subprocess.run(["xdotool", "click", "--clearmodifiers", btn])

def run_computer_agent(task: str, max_steps: int = 20) -> str:
    """Runs a computer use agent for a given task."""
    messages = []
    
    for step in range(max_steps):
        screenshot = take_screenshot()
        
        # Build the user message with the current screenshot
        user_content = [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": screenshot
                }
            }
        ]
        
        # First step: include the task. Subsequent steps: just "Continue."
        if step == 0:
            user_content.append({"type": "text", "text": task})
        else:
            user_content.append({"type": "text", "text": "Continue with the task."})
        
        messages.append({"role": "user", "content": user_content})
        
        response = client.messages.create(
            model="claude-opus-4-6",  # Opus for best computer use performance
            max_tokens=4096,
            tools=tools,
            messages=messages,
            thinking={"type": "adaptive"},
            effort="medium"
        )
        
        # Task complete
        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            return "Task completed."
        
        # Execute all tool use actions
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                execute_action(block.input)
        
        messages.append({"role": "assistant", "content": response.content})
    
    return f"Reached max steps ({max_steps}) without completing."

The max_steps guard is important. Computer use workflows can loop if Claude gets stuck, and each step costs real money.

Real examples

Form filling

The task prompt handles the whole thing — no selectors, no field IDs:

result = run_computer_agent(
    task="""
    Open Firefox and navigate to https://example.com/contact.
    Fill in the contact form with:
    - Name: Priya Sharma
    - Email: priya@example.com  
    - Phone: +91 98765 43210
    - Message: I'd like to enquire about your enterprise plan.
    
    Click Submit. Confirm the success message appears and tell me what it says.
    """
)
print(result)

Data extraction

result = run_computer_agent(
    task="""
    Open https://example.com/products in Firefox.
    Scroll through the entire product listing (scroll down until no new products appear).
    Extract all product names and their prices into a list.
    Return the list as: Product Name | Price (INR)
    """
)

MCA21 portal automation (India-specific)

result = run_computer_agent(
    task="""
    Open the MCA21 portal at https://efiling.mca.gov.in.
    Log in with username [USERNAME] and password [PASSWORD].
    Navigate to the company filing section.
    Search for company with CIN: [CIN_NUMBER].
    Download the most recent annual return filing (MGT-7 or MGT-7A).
    Save it to /tmp/annual_return.pdf.
    Confirm the download completed.
    """
)

Cost management — computer use gets expensive fast

That's expensive if you're running hundreds of automations. A few patterns to control costs:

Cap steps aggressively. The max_steps parameter in the loop above is your budget ceiling. Set it based on expected task complexity, not "just in case."

Step counter pattern for budget awareness:

def run_computer_agent_with_budget(task: str, max_steps: int = 15, 
                                    token_budget: int = 50000) -> str:
    messages = []
    total_tokens = 0
    
    for step in range(max_steps):
        # ... (same loop as before) ...
        
        response = client.messages.create(...)
        total_tokens += response.usage.input_tokens + response.usage.output_tokens
        
        print(f"Step {step + 1}: {response.usage.input_tokens} in, "
              f"{response.usage.output_tokens} out, "
              f"total: {total_tokens}")
        
        if total_tokens > token_budget:
            return f"Budget exceeded at step {step + 1}. Partial result: ..."
        
        if response.stop_reason == "end_turn":
            return response.content[-1].text
        
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                execute_action(block.input)
        
        messages.append({"role": "assistant", "content": response.content})

Try it now with AICredits.in

Access Claude Opus 4.6 for computer use tasks in India with UPI payment. Set per-key spending limits to control automation costs — no international card needed. Create free account →

Claude Computer Use API: Build a Browser Automation Agent in Python

What computer use actually is

Setting up the computer use tools

The screenshot-action loop (Python implementation)

Real examples

Form filling

Data extraction

MCA21 portal automation (India-specific)

Cost management — computer use gets expensive fast

Try it now with AICredits.in

Limitations worth knowing before you commit

Next steps

Related articles

Build Your First MCP Server in Python: Connect Claude to Indian APIs (Under 100 Lines)

How to Use Claude Code in India Without a Credit Card (2026 Guide)

Claude Code for QA Engineers: How to Automate Test Writing with AI (2026)

Claude Computer Use API: Build a Browser Automation Agent in Python

What computer use actually is

Setting up the computer use tools

The screenshot-action loop (Python implementation)

Real examples

Form filling

Data extraction

MCA21 portal automation (India-specific)

Cost management — computer use gets expensive fast

Try it now with AICredits.in

Limitations worth knowing before you commit

Next steps

Related articles

Build Your First MCP Server in Python: Connect Claude to Indian APIs (Under 100 Lines)

How to Use Claude Code in India Without a Credit Card (2026 Guide)

Claude Code for QA Engineers: How to Automate Test Writing with AI (2026)