A Step-by-Step Instructions for Developing a Complete AI Web Agent Across Multiple Domains Using Gemini And Notte.

A Step-by-Step Instructions for Developing a Complete AI Web Agent Across Multiple Domains Using Gemini And Notte.
A Step-by-Step Instructions for Developing a Complete AI Web Agent Across Multiple Domains Using Gemini And Notte.

A Step-by-Step Instructions for Developing a Complete AI Web Agent Across Multiple Domains Using Gemini And Notte.

Imagine having a tireless assistant that can surf the web for you. This assistant could research products, compare prices, gather the latest news, find job openings, and even keep an eye on your competitors, all without your direct supervision. This isn't science fiction; it's what's possible with AI web agents. This guide will walk you through building your very own, using a powerful duo of tools: Notte and Gemini.

We will break down every piece of the puzzle, explaining the code in a straightforward way. You do not need to be a programming guru to follow along. By the end, you will have a clear picture of how to assemble an intelligent agent that can navigate and interact with the digital world, performing complex tasks on your behalf.

Your Agent's Toolkit

Before we start building, let's get to know our main tools. Every great project starts with the right equipment, and ours is no different. We will be using three key components to bring our AI agent to life.

Notte: The Hands and Feet

Think of Notte as the physical body of our agent. It provides the means for our agent to interact with the web. Notte is a framework that specializes in browser automation, meaning it can control a web browser just like a person would. It can open websites, type text into search bars, click links, and read the content on a page. It's the component that handles all the active “doing” on the internet.

Gemini: The Brains of the Operation

If Notte is the body, Gemini is the brain. Gemini is a powerful AI model from Google that gives our agent its intelligence. When Notte shows our agent a webpage, Gemini is what analyzes the content and makes decisions. It understands natural language, so we can give it instructions like, “Find the price of this laptop,” and it can figure out the steps needed to accomplish that task. It provides the reasoning and decision-making that separates a smart agent from a simple script.

Pydantic: The Organizer

When our agent gathers information, we want it to be neat and tidy. We do not want a messy pile of text. Pydantic is a tool that helps us create strict, organized structures for our data. It's like giving our agent a set of filing folders labeled “Product Name,” “Price,” and “Availability.” When the agent finds a piece of information, it places it in the correct folder, ensuring the final output is clean, predictable, and easy for us to use.

Setting Up Your Workshop

Now it is time to get our hands dirty. The first phase involves setting up our programming environment. This means installing the necessary libraries and configuring our access to the Gemini AI.

Installing the Libraries

Our first action is to install all the required Python packages. These are collections of pre-written code that will save us a massive amount of time. We do this with a terminal command that fetches and installs everything we need.

pip install notte python-dotenv pydantic google-generativeai requests beautifulsoup4
patchright install --with-deps chromium

This command installs Notte for the agent framework, Pydantic for data structuring, and the Google library for accessing Gemini. It also installs a headless version of the Chromium web browser, which Notte will control behind the scenes.

Configuring the Gemini API Key

To use Gemini's brainpower, our program needs permission. We get this with an API key, which is a unique secret code that identifies our project to Google's services. You will need to obtain your own free key from Google's AI platform.

Once you have your key, we write a small script to load it into our environment so the program can use it.

import os
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv()

GEMINI_API_KEY = "USE YOUR OWN API KEY HERE"
os.environ['GEMINI_API_KEY'] = GEMINI_API_KEY
genai.configure(api_key=GEMINI_API_KEY)```

This code snippet imports the necessary tools. It then creates a variable for your API key. Finally, it configures the Google Generative AI library, telling it to use your specific key for all future requests. This step is like giving our agent its official credentials to access its intelligence source.

### Designing the Agent's Blueprints

With the setup complete, we can start designing our agent. A good agent, like a good assistant, needs to be organized. We will start by defining the structure of the information it will collect using Pydantic.

#### Data Models for Clarity

We will create several "models" that act as templates for our data. Each model defines a specific type of information we want to collect, making sure that every piece of data has a proper place. This upfront organization is what makes our agent's findings reliable and useful.

```python
from typing import List, Optional
from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: str
    rating: Optional[float]
    availability: str
    description: str

class NewsArticle(BaseModel):
    title: str
    summary: str
    url: str
    date: str
    source: str

class SocialMediaPost(BaseModel):
    content: str
    author: str
    likes: int
    timestamp: str
    platform: str

class SearchResult(BaseModel):
    query: str
    results: List[dict]
    total_found: int```

Each class here is a blueprint. The `ProductInfo` class tells the agent that for every product it researches, it must find a name, price, rating, availability, and description. The `NewsArticle` model ensures every article has a title, summary, and URL. This level of structure is what elevates our project from a simple scraper to an intelligent data-gathering tool.

### Assembling the Core Agent

Now we will build the central controller for our agent. We will create a Python class that will manage the browser, connect to the AI, and house all the specific skills we will teach it. Think of this class as the main chassis to which we will attach all the tools and capabilities.

#### The AdvancedNotteAgent Class

This class will serve as the foundation for all our agent's operations. It will handle the technical startup and shutdown procedures for the web browser session, making it easy for us to deploy our agent for any given task.

```python
import notte

class AdvancedNotteAgent:
    def __init__(self, headless=True, max_steps=20):
        self.headless = headless
        self.max_steps = max_steps
        self.session = None
        self.agent = None

    def __enter__(self):
        self.session = notte.Session(headless=self.headless)
        self.session.__enter__()
        self.agent = notte.Agent(
            session=self.session,
            reasoning_model='gemini/gemini-2.5-flash',
            max_steps=self.max_steps
        )
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.session:
            self.session.__exit__(exc_type, exc_val, exc_tb)

The __init__ function sets up the initial configuration. The headless=True parameter means the browser will run invisibly in the background. The __enter__ and __exit__ functions are special methods that handle the setup and cleanup of the Notte session, allowing us to manage our agent cleanly. Inside __enter__, we start the browser session and initialize the Notte Agent, telling it to use the Gemini model for its reasoning.

Teaching the Agent New Skills

With the agent's body assembled, it is time to teach it how to perform specific tasks. We will add functions, called methods, to our AdvancedNotteAgent class. Each method will represent a distinct skill, from researching products to scanning for jobs.

Skill 1: Product Research

This skill allows the agent to visit an e-commerce website, search for a product, and extract its details according to our ProductInfo model.

def research_product(self, product_name: str, website: str = "amazon.com"):
    """Research a product and extract structured information"""
    task = f"Go to {website}, search for '{product_name}', click on the first result, and extract the product information."
    response = self.agent.run(
        task=task,
        response_format=ProductInfo,
        url=f"https://{website}"
    )
    return response.answer

Here, we define a clear, natural language task for the agent. When we call self.agent.run(), we pass this instruction along with the ProductInfo model as the desired response_format. Notte and Gemini work together to navigate to the website, perform the search, and return the data neatly structured.

Skill 2: News Aggregation

Next, we teach the agent to act as a news reporter. This skill will have it scan Google News for recent articles on a topic of our choice.

def news_aggregator(self, topic: str, num_articles: int = 3):
    """Aggregate news articles on a specific topic"""
    task = f"Search for recent news about '{topic}', find {num_articles} relevant articles, and extract their information."
    response = self.agent.run(
        task=task,
        url="https://news.google.com",
        response_format=List[NewsArticle]
    )
    return response.answer

The pattern is similar. We provide a clear task and a starting URL. This time, the response_format is List[NewsArticle], which tells the agent we expect to receive a list containing multiple news articles, each structured according to our NewsArticle blueprint.

Skill 3: Social Media Monitoring

This skill allows our agent to monitor social platforms for conversations around a specific hashtag.

def social_media_monitor(self, hashtag: str, platform: str = "twitter"):
    """Monitor social media for specific hashtags"""
    if platform.lower() == "twitter":
        url = "https://twitter.com"
    elif platform.lower() == "reddit":
        url = "https://reddit.com"
    else:
        url = f"https://{platform}.com"
    
    task = f"Go to {platform}, search for posts with hashtag '{hashtag}', and extract information from the top posts."
    response = self.agent.run(
        task=task,
        url=url,
        response_format=List[SocialMediaPost]
    )
    return response.answer```

This method adds a bit of logic to select the correct URL based on the chosen platform. It then dispatches the agent with instructions to find posts and format them using our `SocialMediaPost` model.

#### Skill 4: Competitive Analysis

Here, we give our agent a business-oriented task: to investigate competitors. The agent will visit a list of websites and attempt to find pricing and feature information.

```python
def competitive_analysis(self, company: str, competitors: List[str]):
    """Perform competitive analysis by gathering pricing and feature information"""
    results = {}
    for competitor in [company] + competitors:
        task = f"Go to {competitor}'s website, find their pricing page or main product features, and summarize the findings."
        try:
            response = self.agent.run(
                task=task,
                url=f"https://{competitor}.com"
            )
            results[competitor] = response.answer
            time.sleep(2)
        except Exception as e:
            results[competitor] = f"Error: {str(e)}"
    return results

This skill introduces a loop. The agent iterates through each competitor website one by one. It also includes error handling with a try…except block, so if one website fails to load or presents a problem, the entire process does not crash.

Skill 5: Job Market Scanning

Let's teach our agent to help with a job search. This method will scan a job board like Indeed for positions matching a specific title and location.

def job_market_scanner(self, job_title: str, location: str = "remote"):
    """Scan job market for opportunities"""
    task = f"Search for '{job_title}' jobs in '{location}', extract job titles, companies, and locations for the top results."
    response = self.agent.run(
        task=task,
        url="https://indeed.com"
    )
    return response.answer

Following the established pattern, we define a task and a starting URL, then let the agent handle the search and extraction.

Skill 6: Price Comparison

This skill is a variation of the product research task. Instead of getting all details for one product, it focuses on finding just the price of a product across several different websites.

def price_comparison(self, product: str, websites: List[str]):
    """Compare prices across multiple websites"""
    price_data = {}
    for site in websites:
        task = f"Search for '{product}' on this website and find the price."
        try:
            response = self.agent.run(
                task=task,
                url=f"https://{site}"
            )
            price_data[site] = response.answer
            time.sleep(1)
        except Exception as e:
            price_data[site] = f"Error: {str(e)}"
    return price_data

This is another great example of using a loop to automate a repetitive task. The agent visits each website in the list, performs the same focused search for a price, and stores the results.

Skill 7: Content Research

Finally, we will teach the agent to be a content strategist. It can research trending topics on platforms like Medium for blog posts or YouTube for videos.

def content_research(self, topic: str, content_type: str = "blog"):
    """Research content ideas and trending topics"""
    if content_type == "blog":
        url = "https://medium.com"
        task = f"Search for '{topic}' articles, analyze trending content, and identify key themes."
    elif content_type == "video":
        url = "https://youtube.com"
        task = f"Search for '{topic}' videos, analyze view counts, titles, and identify trending formats."
    else:
        url = "https://google.com"
        task = f"Search for '{topic}' content across the web and analyze trends."
    
    response = self.agent.run(task=task, url=url)
    return {"topic": topic, "insights": response.answer, "platform": content_type}

This method uses conditional logic to change the agent's task and destination based on the type of content we are interested in. It is a versatile skill for anyone involved in content creation or marketing.

Demonstrating the Agent's Abilities

We have built an agent and equipped it with a wide range of skills. Now it is time to see it in action. The code includes several demonstration functions, each designed to showcase a specific capability.

E-commerce and Price Comparison Demo

This demo puts the agent to work as a personal shopper. It first does in-depth research on a product and then performs a price comparison for that same product across multiple retailers.

def demo_ecommerce_research():
    """Demo: E-commerce product research and comparison"""
    print("E-commerce Research Demo")
    with AdvancedNotteAgent(headless=True) as agent:
        product = agent.research_product("wireless earbuds", "amazon.com")
        print(f"Product Research Results: {product}")

        websites = ["amazon.com", "ebay.com", "walmart.com"]
        prices = agent.price_comparison("wireless earbuds", websites)
        print(f"Price Comparison: {prices}")

News Intelligence Demo

Here, the agent acts as an intelligence analyst, quickly gathering and summarizing the latest news on a subject.

def demo_news_intelligence():
    """Demo: News aggregation and analysis"""
    print("News Intelligence Demo")
    with AdvancedNotteAgent() as agent:
        articles = agent.news_aggregator("artificial intelligence", 3)
        for i, article in enumerate(articles, 1):
            print(f"Article {i}: {article.title} - {article.source}")

Social Listening Demo

This demonstration shows the agent monitoring the social pulse of the internet, finding relevant conversations on Reddit.

def demo_social_listening():
    """Demo: Social media monitoring and sentiment analysis"""
    print("Social Media Listening Demo")
    with AdvancedNotteAgent() as agent:
        posts = agent.social_media_monitor("#AI", "reddit")
        for i, post in enumerate(posts, 1):
            print(f"Post {i}: {post.author} said '{post.content[:100]}...'")

These demos provide a practical look at what our agent can do. By running them, you can watch as it autonomously opens browsers, navigates pages, and extracts information, turning abstract code into a tangible, working assistant.

Advanced Orchestration: The Workflow Manager

Running single tasks is useful, but the true power of automation is unlocked when we can chain tasks together into a complete workflow. To do this, we introduce a WorkflowManager class, which acts as a conductor for our agent.

Creating a Coordinated Mission

The WorkflowManager is a simple but powerful concept. It holds a list of tasks and executes them in sequence. This allows us to combine our agent's individual skills to perform much more complex, multi-step operations without needing to manually run each part.

class WorkflowManager:
    def __init__(self):
        self.agents = []
        self.results = {}

    def add_agent_task(self, name: str, task_func, *args, **kwargs):
        """Add an agent task to the workflow"""
        self.agents.append({
            'name': name,
            'func': task_func,
            'args': args,
            'kwargs': kwargs
        })

    def execute_workflow(self, parallel=False):
        """Execute all agent tasks in the workflow"""
        for agent_task in self.agents:
            name = agent_task['name']
            func = agent_task['func']
            try:
                result = func(*agent_task['args'], **agent_task['kwargs'])
                self.results[name] = result
            except Exception as e:
                self.results[name] = f"Error: {str(e)}"
        return self.results

A Full Market Research Workflow

With the WorkflowManager, we can construct a complete market research pipeline. This example function combines product research, competitor analysis, and social sentiment monitoring into a single, orchestrated workflow.

def market_research_workflow(company_name: str, product_category: str):
    """Complete market research workflow"""
    workflow = WorkflowManager()
    
    with AdvancedNotteAgent(headless=True) as agent:
        workflow.add_agent_task(
            "Product Research",
            agent.content_research,
            product_category,
            "blog"
        )
        workflow.add_agent_task(
            "Competitive Analysis",
            agent.competitive_analysis,
            company_name,
            ["competitorA.com", "competitorB.com"]
        )
        workflow.add_agent_task(
            "Social Sentiment",
            agent.social_media_monitor,
            f"#{company_name}",
            "twitter"
        )
    
    return workflow.execute_workflow()

This demonstrates the scalability of our agent-based approach. We can easily build, modify, and combine tasks to create sophisticated automation pipelines for nearly any domain, from business intelligence to personal productivity.

Your AI Agent Awaits

Through this guide, we have progressed from an empty file to a fully functional, multi-domain AI web agent. We established our environment, designed structured data models, built a core agent class, and taught it a variety of valuable skills. We then watched it perform these tasks through demos and learned how to orchestrate them into complex workflows.

The beauty of this system is its flexibility. The combination of Notte's browser automation and Gemini's reasoning creates a powerful platform for building your own custom solutions. You can easily modify the existing skills or add new ones to suit your specific needs. What you have learned here is not just how to build one agent, but a methodology for building any agent to tackle the challenges you face in the digital world.

More Articles for you:

Subscription Form