It is a repository containing the code, data, and implementation for "WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models." WebVoyager is an advanced web agent powered by Large Multimodal Models (LMMs) that can autonomously complete user instructions by interacting with real-world websites.
It is a repository containing the code, data, and implementation for “WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models.” WebVoyager is an advanced web agent powered by Large Multimodal Models (LMMs) that can autonomously complete user instructions by interacting with real-world websites. The system uses Selenium to create an online web browsing environment, enabling it to perform tasks end-to-end.
The repository includes a dataset of 643 task queries across 15 websites, with each website containing over 40 queries. This dataset is stored in `data/WebVoyager_data.jsonl`. Additionally, 90 web browsing tasks from the GAIA dataset (validation set) are included, accessible in `data/GAIA_web.jsonl`. These tasks form a comprehensive task pool, and users are encouraged to expand the dataset using GPT-4 by modifying provided prompts.
To run WebVoyager, users must set up the environment and execute the provided `run.sh` script. The system’s performance heavily depends on prompt optimization, and the repository includes a system prompt in `prompts.py` that has been iteratively refined. Users can customize the prompt or modify the action format and execution logic in `run.py` to suit specific needs.
Results from WebVoyager are saved in an output directory, containing interaction messages and screenshots for each task. These outputs are evaluated using GPT-4V to determine task completion success. An auto-evaluation tool is provided in the `evaluation` directory, which requires updating the API key and process directory before execution.
The repository emphasizes that WebVoyager is not an officially supported product and disclaims responsibility for the accuracy of the model’s outputs, which may be influenced by factors like OpenAI API non-determinism, prompt changes, or website alterations. Users are advised to cite the associated paper if they find the work helpful. The repository also includes navigation menus, saved searches, and other GitHub features for ease of use.
It is an experimental autonomous agent called ReactAgent that uses the GPT-4 language model to generate and compose React components from user stories.
It is an experimental autonomous agent called ReactAgent that uses the GPT-4 language model to generate and compose React components from user stories.
It is a dynamic Artificial Intelligence Automation Platform designed to manage AI instruction and execute tasks efficiently across multiple AI providers.
It is a terminal-based platform designed for experimenting with AI-driven software engineering, specifically focusing on code generation and improvement.
It is a cloud-hosted browser platform designed to enable AI agents to perform web-based tasks securely and autonomously, mimicking human-like interactions.
It is a framework and suite of applications designed for developing and deploying large language model (LLM) applications based on Qwen (version 2.0 or higher).
It is a platform designed to create, run, and scale web automations using advanced AI technologies such as Vision-Language Models (VLMs), Large Language Models (LLMs), and AI agents.
It is an autonomous framework designed for data labeling and processing tasks, enabling the creation of intelligent agents that can independently learn and apply skills through iterative processes.
It is an experimental open-source project called Multi-GPT, designed to make GPT-4 fully autonomous by enabling multiple specialized AI agents, referred to as "expertGPTs," to collaborate on tasks.
It is a fully autonomous, general-purpose AI agent designed to function as a standalone artificial intelligence assistant, similar to JARVIS, using a Large Language Model (LLM) as its core processor.
It is a GenAI evaluation and observability platform designed to simulate, evaluate, and observe AI agents, enabling users to develop, test, and deploy AI applications with enhanced quality, speed, and reliability.
It is an all-in-one AI assistant platform designed to provide secure, customizable, and open-source solutions tailored to meet the unique needs of businesses.
It is a decentralized protocol that enables individuals and organizations to co-own and participate in autonomous AI agent economies by incentivizing and coordinating the creation, operation, and interaction of AI agents.
It is the world's first text-to-website builder that creates fully functional, multipage websites from a single prompt, eliminating the need for hiring expensive designers, copywriters, web developers, or SEO agencies.
It is a conversational AI platform designed to enhance customer experience by resolving over 50% of customer calls and delivering consistent, high-quality brand interactions.
It is an experimental autonomous agent called ReactAgent that uses the GPT-4 language model to generate and compose React components from user stories.
It is an open platform called OpenAgents designed to enable the use and hosting of language agents in real-world applications, providing both general users and developers with tools to interact with and deploy language agents.