Advanced tools

12 min read

Browser Automation

Control browsers programmatically with OpenClaw. Automate logins, extract data, fill forms, and handle complex web workflows with the built-in browser tool.

OpenClaw includes a powerful browser automation tool that lets your agents interact with websites just like a human would. Click buttons, fill forms, extract data, and navigate complex workflows. All without leaving your chat interface.

This guide covers everything from basic browser control to advanced authentication workflows and troubleshooting common issues on Linux and WSL2.

What You Can Do

The browser tool opens up a world of automation possibilities:

  • Data extraction: Scrape information from websites that do not offer APIs
  • Form automation: Fill and submit forms automatically
  • Login workflows: Automate authentication to access protected content
  • Testing: Verify web applications work correctly
  • Monitoring: Check website status and content changes
  • File downloads: Download files that require interaction to access

Basic Browser Control

OpenClaw uses Playwright under the hood for reliable browser automation. The browser tool provides a simplified interface while maintaining full power.

Opening a Page

Start by navigating to a URL. The browser tool launches a headless browser and loads the page.

// Navigate to a website
browser open https://example.com

// The agent sees the page content
// and can interact with elements

Clicking Elements

Click buttons, links, or any clickable element using natural language descriptions or selectors.

// Click by text
browser click "Sign in button"

// Click by ARIA role
browser click role=button name="Submit"

// Click by CSS selector
browser click selector="#login-button"

Filling Forms

Enter text into input fields before submitting forms.

// Fill input fields
browser type "Email address" with "user@example.com"
browser type "Password" with "your-password"

// Submit the form
browser click "Sign in button"

Extracting Data

Pull information from the page for further processing.

// Get page content
browser snapshot

// Extract specific elements
browser extract text from "h1"
browser extract all links

Authentication Workflows

Many useful automations require logging in first. Here are proven patterns for handling authentication.

Basic Login Flow

The simplest approach: navigate to login, fill credentials, submit.

// Navigate to login page
browser open https://app.example.com/login

// Fill credentials
browser type "Email" with "user@example.com"
browser type "Password" with "secure-password"

// Submit login
browser click "Sign in"

// Wait for redirect
browser wait for navigation

// Verify login succeeded
browser snapshot

Session Persistence

To avoid logging in every time, persist the authenticated session.

// Login once
browser open https://app.example.com/login
browser type "Email" with "user@example.com"
browser type "Password" with "password"
browser click "Sign in"
browser wait for navigation

// Save session state
browser save session to /workspace/sessions/example-session.json

// Later, restore the session
browser open https://app.example.com/
browser load session from /workspace/sessions/example-session.json

Handling Two-Factor Authentication

For sites with 2FA, you have several options:

  • Manual completion: Log in once manually, then persist the session for reuse
  • Backup codes: Use one-time backup codes if the site provides them
  • App passwords: Some services offer app-specific passwords that bypass 2FA
  • TOTP integration: Generate TOTP codes if you control the 2FA setup

Advanced Techniques

Waiting for Dynamic Content

Modern web apps load content dynamically. Use waits to ensure elements are ready.

// Wait for element to appear
browser wait for "Loading spinner" to disappear

// Wait for specific text
browser wait for text "Dashboard"

// Wait for network idle
browser wait for networkidle

File Uploads

Automate file uploads by specifying the file path.

// Click upload button to open dialog
browser click "Upload file"

// Provide file path
browser upload "/workspace/files/document.pdf"

// Confirm upload
browser click "Confirm upload"

Handling Popups and Dialogs

Browsers often show confirmation dialogs. Handle them programmatically.

// Accept confirmation dialog
browser dialog accept

// Dismiss dialog
browser dialog dismiss

// Fill prompt dialog
browser dialog type "Input value"
browser dialog accept

Screenshots

Capture visual state for debugging or documentation.

// Full page screenshot
browser screenshot fullpage

// Element screenshot
browser screenshot element "Chart container"

// Viewport screenshot
browser screenshot

Linux and WSL2 Setup

Browser automation requires additional setup on Linux and Windows WSL2 environments.

Ubuntu/Debian Dependencies

Install Chrome and its dependencies:

# Update package list
sudo apt-get update

# Install Chrome dependencies
sudo apt-get install -y \
  libnss3 \
  libatk-bridge2.0-0 \
  libxss1 \
  libgtk-3-0 \
  libgbm-dev

WSL2 Display Setup

WSL2 requires a display server for browser automation:

# Install VcXsrv on Windows (X Server)
# Download from: https://sourceforge.net/projects/vcxsrv/

# In WSL2, set display
export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0

# Or use WSLg (Windows 11)
export DISPLAY=:0

Common WSL2 Issues

Issue: Browser launches but shows blank page

Fix: Ensure DISPLAY is set correctly and firewall allows connections.

# Test display connection
xclock

# If xclock does not appear, fix DISPLAY
export DISPLAY=$(ip route | grep default | awk '{print $3}'):0

Issue: Permission denied errors

Fix: Check file permissions and run with appropriate user.

# Fix Chrome sandbox permissions
sudo chown root:root /path/to/chrome-sandbox
sudo chmod 4755 /path/to/chrome-sandbox

# Or disable sandbox (less secure)
browser open https://example.com --no-sandbox

Issue: Chrome crashes immediately

Fix: Install missing dependencies or use Docker.

# Run browser in Docker
# Use playwright's docker image
docker run -it --rm \
  -v $(pwd):/workspace \
  -e DISPLAY \
  mcr.microsoft.com/playwright:v1.40.0 \
  npx playwright open https://example.com

Best Practices

Use Selectors Wisely

Prefer stable selectors over dynamic ones:

  • Good: ARIA roles, data-testid attributes, semantic HTML
  • Avoid: Auto-generated class names, XPath with indices, fragile CSS selectors

Handle Failures Gracefully

Websites change. Build resilient automations:

// Check if element exists before clicking
if browser has "Accept cookies" then
  browser click "Accept cookies"
end

// Try multiple selectors
browser click "Submit" or "Save" or "Continue"

Respect Rate Limits

Do not overwhelm websites with requests:

  • Add delays between actions
  • Cache session data when possible
  • Use APIs instead of scraping when available
  • Follow robots.txt guidelines

Secure Credential Handling

Never hardcode credentials in automations:

// Use environment variables
browser type "Password" with env.LOGIN_PASSWORD

// Or prompt user
ask "Enter your password:" → password
browser type "Password" with password

Integration with Other Tools

Combining with Subagents

Offload browser tasks to subagents for parallel execution:

// Main agent delegates browser task
spawn subagent "Check competitor pricing" with task:
  - Open competitor-website.com
  - Search for "product name"
  - Extract price from results
  - Return price data

Browser + MCP Servers

Combine browser automation with MCP for powerful workflows:

// Use MCP to query database
mcp query "SELECT * FROM products WHERE stock < 10"

// Use browser to check supplier website
for each product in low-stock:
  browser open supplier.com/search
  browser type "Search" with product.name
  browser click "Search"
  extract price and availability

FAQ

Can OpenClaw automate any website?

OpenClaw can automate most websites, but some have anti-automation measures. Sites with CAPTCHA, advanced bot detection, or strict rate limiting may require additional handling or be unsuitable for automation.

Is browser automation secure?

Yes, when used responsibly. OpenClaw runs browsers in isolated contexts. Never automate banking or sensitive sites without understanding the risks. Use dedicated browser profiles and avoid storing credentials in plain text.

Why does browser automation fail on WSL2?

WSL2 lacks display capabilities by default. Install a VcXsrv or WSLg, set DISPLAY environment variable, and ensure Chrome dependencies are installed. The guide covers specific fixes.

Can I use my existing browser profile?

Yes, but with caution. Using your main profile risks data corruption. Create a dedicated automation profile instead. The browser tool supports specifying custom profile directories.

How do I handle two-factor authentication?

For 2FA, either complete authentication manually once and persist the session, use backup codes, or implement TOTP generation if you control the 2FA setup. Some sites allow app-specific passwords that bypass 2FA.

Summary

Browser automation extends OpenClaw beyond APIs and files into the full web. Master the basics of navigation, clicking, and form filling. Handle authentication with session persistence. Set up properly on Linux and WSL2. Follow best practices for resilient, respectful automation.

Combine browser automation with subagents for parallel processing, MCP servers for data integration, and memory files for persistent configuration to build sophisticated web workflows.

Need help from people who already use this stuff?

Need help with browser automation?

Join My AI Agent Profit Lab for working examples, troubleshooting help, and advanced automation patterns from the community.