Automate anything in the browser using AI 🤯

Nikhil Adiga

--

browser-use + chromium + playwright = mind blown!

“A very popular joke in the programming world is why spend 5 minutes doing something when you can spend 5 days automating it. But now you can reduce the time from 5 days to 5 minutes 👀! Imagine the possibilities.”

In this article I tell you how to get started with browser-use with a simple example where I automate the process of sending an email to anyone using your gmail account 🚀

What is browser-use? 🤔

browser-use is the easiest way to connect your AI agents with the browser. But what does that really mean? In essence, browser-use provides a streamlined interface for your AI agents to interact with web pages just like a human user would. This eliminates the need for complex, low-level browser automation scripts. We all know the pain of writing web scraping scripts with very specific classnames, id’s to fetch data. And if the UI of the website changes, your scraping script stops working. Why go through all that difficulty when you can have an AI do it for you?

If this sounds exciting to you, let’s get started! 🏃‍♂️

Installing browser-use locally 🧑‍💻

According to their documentation, they recommend installing a tool called uv which is an extremely fast package manager for python written in Rust. This also makes it easier to setup a virtual environment which is recommended when working on python projects.

To install this, run the following command extracted directly from uv docs

curl -LsSf https://astral.sh/uv/install.sh | sh

This will install uv on your machine and add the path of the binary to your system. (You may have to restart your current terminal session)

💡 browser-use requires python 3.11 or above

The next step is to create a virtual environment using the uv command

uv venv --python 3.11

This will create a folder called .venv that contains the necessary files to isolate your project’s Python dependencies.

Now that we have created a virtual environment, we need to activate it. We can do this by running this command

source .venv/bin/activate

This command essentially sets up your shell environment to use the Python interpreter and packages within the specified virtual environment, isolating your project’s dependencies from the system’s Python installation.

Time to install browser-use. The official docs suggest using this command to install the package directly to your virtual env.

uv pip install browser-use

And the final piece of the puzzle is installing Playwright. Playwright is an open source automation library for browser testing and web scraping developed by Microsoft. It’s an alternative to Puppeteer. Similar to Puppeteer, this library also uses Chromium behind the scenes and there is also support for headless mode. Use this command to install Playwright.

playwright install

That’s it! We can start writing code now 🤓 In this example, I’ve demonstrated how we can use browser-use to automate sending an email using gmail. Create a file called app.py and copy this code.

🚨 Only run this example if you’re okay with passing your account credentials to OpenAI in the prompt. You have been warned ⚠️

from langchain_openai import ChatOpenAI
from browser_use import Agent, Browser, BrowserConfig
from dotenv import load_dotenv
load_dotenv()

import asyncio

task2="""
### Prompt for send an email to my manager asking him to approve me leave

**Objective:**
Open gmail, compose a new email to manager@gmail.com with the subject "Approve my leave" and the body "I'm getting married, approve my leave now!".
Use these credentials to log in:
- Email: youremail@gmail.com
- Password: yourpassword
- Choose the Tap yes on Device option.

**Important:**
- The email should be sent successfully.
- After filling the email address of the recipient, simulate an enter button click.
- The email should have the subject "Approve my leave" and the body "I'm getting married, approve my leave now!".
---

**Important:** Ensure efficiency and accuracy throughout the process."""

browser = Browser()

agent = Agent(
task=task2,
llm=ChatOpenAI(model="gpt-4o"),
browser=browser,
)

async def main():
await agent.run()
input("Press Enter to close the browser...")
await browser.close()

if __name__ == '__main__':
asyncio.run(main())

Additionally you also need to create a .env file or create an environmental variable in your system for OPENAI_API_KEY. Sample .env file would look like this.

OPENAI_API_KEY=yoursecretopenaiapikey

This simple piece of code is everything that you need to automate your mail sending process. However, there is still one manual step of approving sign in since I was using 2FA for my test google account. U can run this program by using this command and see the AI in action. You’ll really appreciate the complexity of the code that has gone behind making this when you see it.

python app.py

This was a very simple example of what this library can achieve, but the possibilities are limitless! 💡 You can ask it do things like

  1. Prepare a grocery list and order items from popular websites 🛒
  2. Write an essay in a Google Doc ✍️
  3. Fetch data about your favorite celebrities and what they have been upto recently 🕵
  4. Fetch specifications of a product from a very specific website (I tried fetching technical specifications of smartphones by model from gsmarena website and the results were amazing) 🤩

The Github account of browser-use contains a lot of other amazing examples. Do check it out. There is also an example where they bypass captcha using this!

Conclusion

While this feels like web scraping on steroids 💪, the library is still relatively new and has to prove itself. Tools like puppeteer and traditional web scraping have stood the test of time and this needs to do the same and the way things are going, I’m sure it will!

--

--

No responses yet

Write a response