Claude 4.6 scored 72%+ on OSWorld — the standard benchmark for AI computer use. That's a meaningful jump from prior generations and puts it in the territory of genuine production use for browser automation. If you're still writing Playwright or Selenium scripts for repetitive web tasks, there's now an alternative that requires no selectors and handles dynamic UIs without constant maintenance.
The tradeoff is real: computer use is slower and more expensive per action than Playwright. But for workflows where no API exists, where the UI changes frequently, or where you're dealing with legacy government portals (if you're building in India, you know what I mean), it's often the only practical option.
What computer use actually is
The fundamental loop is a screenshot-action cycle: Claude receives a screenshot of the current screen state, decides what action to take next — click, type, scroll, press a key — your code executes that action, takes a new screenshot, and sends it back. Claude never sees the DOM or HTML source. It sees exactly what a human would see. That's the whole thing.
This is why it handles dynamic JavaScript, custom UI frameworks, and non-standard layouts better than selector-based automation. Playwright breaks when a CSS class changes. Computer use doesn't care — it's looking at pixels, not the DOM tree.
There are three computer use tools:
computer— screenshots, mouse clicks, keyboard input, scrollingtext_editor— file reading and writing (for multi-file tasks)bash— shell command execution
Most browser automation workflows only need computer. The other two matter for agent workflows that also read/write files or run shell commands.
Setting up the computer use tools
import anthropic
client = anthropic.Anthropic()
tools = [
{
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1366,
"display_height_px": 768,
"display_number": 1
}
]
The display dimensions matter. Claude calibrates click coordinates to the display size you declare. If you capture screenshots at 1920x1080 but tell Claude the display is 1366x768, every click will land in the wrong place.
For headless operation on a server, you need a virtual display. On Linux, xvfb-run handles this:
# Install dependencies
sudo apt-get install xvfb xdotool scrot
# Run your script with a virtual display
xvfb-run -s "-screen 0 1366x768x24" python your_agent.py
Alternatively, you can use Playwright's Chromium as the rendering layer and take screenshots via Playwright's screenshot API. This gives you better cross-platform support without the Xvfb setup on macOS.
The screenshot-action loop (Python implementation)
Here's a working implementation that handles the core loop:
import anthropic
import base64
import subprocess
from pathlib import Path
client = anthropic.Anthropic()
tools = [
{
"type": "computer_20250124",
"name": "computer",
"display_width_px": 1366,
"display_height_px": 768,
"display_number": 1
}
]
def take_screenshot() -> str:
"""Takes a screenshot and returns base64 encoded PNG."""
subprocess.run(["scrot", "/tmp/screenshot.png"], check=True)
with open("/tmp/screenshot.png", "rb") as f:
return base64.standard_b64encode(f.read()).decode()
def execute_action(action: dict) -> None:
"""Executes a computer action returned by Claude."""
action_type = action.get("type") or action.get("action")
if action_type == "screenshot":
pass # handled by take_screenshot()
elif action_type == "left_click":
x, y = action["coordinate"]
subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
elif action_type == "right_click":
x, y = action["coordinate"]
subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "3"])
elif action_type == "double_click":
x, y = action["coordinate"]
subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "--repeat", "2", "1"])
elif action_type == "type":
subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
elif action_type == "key":
subprocess.run(["xdotool", "key", action["key"]])
elif action_type == "scroll":
x, y = action["coordinate"]
direction = action.get("direction", "down")
amount = action.get("amount", 3)
btn = "5" if direction == "down" else "4"
for _ in range(amount):
subprocess.run(["xdotool", "click", "--clearmodifiers", btn])
def run_computer_agent(task: str, max_steps: int = 20) -> str:
"""Runs a computer use agent for a given task."""
messages = []
for step in range(max_steps):
screenshot = take_screenshot()
# Build the user message with the current screenshot
user_content = [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot
}
}
]
# First step: include the task. Subsequent steps: just "Continue."
if step == 0:
user_content.append({"type": "text", "text": task})
else:
user_content.append({"type": "text", "text": "Continue with the task."})
messages.append({"role": "user", "content": user_content})
response = client.messages.create(
model="claude-opus-4-6", # Opus for best computer use performance
max_tokens=4096,
tools=tools,
messages=messages,
thinking={"type": "adaptive"},
effort="medium"
)
# Task complete
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
return "Task completed."
# Execute all tool use actions
for block in response.content:
if block.type == "tool_use" and block.name == "computer":
execute_action(block.input)
messages.append({"role": "assistant", "content": response.content})
return f"Reached max steps ({max_steps}) without completing."
The max_steps guard is important. Computer use workflows can loop if Claude gets stuck, and each step costs real money.
Real examples
Form filling
The task prompt handles the whole thing — no selectors, no field IDs:
result = run_computer_agent(
task="""
Open Firefox and navigate to https://example.com/contact.
Fill in the contact form with:
- Name: Priya Sharma
- Email: priya@example.com
- Phone: +91 98765 43210
- Message: I'd like to enquire about your enterprise plan.
Click Submit. Confirm the success message appears and tell me what it says.
"""
)
print(result)
Data extraction
result = run_computer_agent(
task="""
Open https://example.com/products in Firefox.
Scroll through the entire product listing (scroll down until no new products appear).
Extract all product names and their prices into a list.
Return the list as: Product Name | Price (INR)
"""
)
MCA21 portal automation (India-specific)
This is one of the most common Indian developer use cases. MCA21 — the Ministry of Corporate Affairs portal — has no API, a non-standard UI, and breaks Playwright regularly after updates. Computer use handles it:
result = run_computer_agent(
task="""
Open the MCA21 portal at https://efiling.mca.gov.in.
Log in with username [USERNAME] and password [PASSWORD].
Navigate to the company filing section.
Search for company with CIN: [CIN_NUMBER].
Download the most recent annual return filing (MGT-7 or MGT-7A).
Save it to /tmp/annual_return.pdf.
Confirm the download completed.
"""
)
The same pattern works for GSTN portal, Income Tax e-filing, EPFO employer portal, and SEBI SCORES — all portals that have frustrated Indian developers for years. Computer use doesn't care that they're built on decade-old frameworks.
Cost management — computer use gets expensive fast
Computer use typically consumes 1,000–5,000 tokens per screenshot-action pair. A 20-step workflow: 20K–100K tokens. At Opus 4.6 pricing ($5/$25 per MTok input/output), a complex 20-step task can cost $0.50–$2.50 (roughly ₹42–₹210).
That's expensive if you're running hundreds of automations. A few patterns to control costs:
Use Sonnet 4.6 for simpler workflows. Sonnet 4.6 costs $3/$15 per MTok vs. Opus's $5/$25 — 40% cheaper on input. For straightforward form filling and data extraction, Sonnet performs comparably to Opus on computer use.
Cap steps aggressively. The max_steps parameter in the loop above is your budget ceiling. Set it based on expected task complexity, not "just in case."
Step counter pattern for budget awareness:
def run_computer_agent_with_budget(task: str, max_steps: int = 15,
token_budget: int = 50000) -> str:
messages = []
total_tokens = 0
for step in range(max_steps):
# ... (same loop as before) ...
response = client.messages.create(...)
total_tokens += response.usage.input_tokens + response.usage.output_tokens
print(f"Step {step + 1}: {response.usage.input_tokens} in, "
f"{response.usage.output_tokens} out, "
f"total: {total_tokens}")
if total_tokens > token_budget:
return f"Budget exceeded at step {step + 1}. Partial result: ..."
if response.stop_reason == "end_turn":
return response.content[-1].text
for block in response.content:
if block.type == "tool_use" and block.name == "computer":
execute_action(block.input)
messages.append({"role": "assistant", "content": response.content})
Try it now with AICredits.in
Access Claude Opus 4.6 for computer use tasks in India with UPI payment. Set per-key spending limits to control automation costs — no international card needed. Create free account →
Limitations worth knowing before you commit
CAPTCHAs: Claude can attempt most visual CAPTCHAs — image recognition ones, simple text CAPTCHAs, checkbox "I'm not a robot." But reCAPTCHA v3 (invisible, behaviour-based) and hCaptcha's harder challenges will fail. Don't build production workflows that depend on CAPTCHA solving.
Multi-tab workflows: Current computer use is single-display. If your workflow requires switching between multiple browser tabs, you need to give explicit instructions for each tab switch ("press Ctrl+Tab to switch to the next tab"). It works — it's just not automatic.
Video content: Claude sees static screenshots, not live video or animated elements. Dropdown menus, loading spinners, and animations are only captured at the moment of the screenshot. Add small delays before screenshots that follow click actions that trigger animations.
HTTPS certificate errors: Claude will flag these and ask for confirmation rather than bypass them. Build your test environments with valid certificates, or handle the certificate error prompt explicitly in your task instructions.
Latency: Each screenshot-action pair involves a full API round trip. A 10-step workflow takes 30–90 seconds depending on network and model speed. This isn't a replacement for Playwright on latency-sensitive tasks — it's for workflows where reliability matters more than speed.



