Announcing BigAction

Announcing BigAction

BigAction, an open-source initiative to collect datasets to train and evaluate large action models

Daniel Huynh

TL;DR

  • We are launching BigAction, an open-source initiative to collect datasets to train and evaluate large action models (LAMs). Inspired by BigScience and BigCode, this initiative intends to be an open haven for researchers, students, hobbyists, and professionals alike to advance the fields of LAMs.
  • The first dataset to be released and continuously upgraded is TheWave, a dataset of web interactions that takes a user query (like “Click on login”) and the HTML of the current page as input TheWave and outputs the code to perform the action, for instance, in Selenium.
  • We will soon release our open-source tool to collect data easily for TheWave. This tool leverages Gradio to provide a GUI that nontechnical people can use. We will also provide evaluation metrics to measure how well a model performs in web interaction.

Context

Large Language Models (LLMs) have opened many new possibilities thought unreachable for years. Among their many potentialities, one of their main strengths lies in their ability to produce consistent and structured text, such as code, which could be used to trigger actions.

Therefore, Large Action Models (LAMs) have emerged as a subsegment of LLMs to generate actions on behalf of users, which could range from filling forms to calling APIs through pulling information from governmental websites.

While promising, developing reliable LAMs is a real challenge. The LLMs at the root of LAMs often hallucinate, producing either incorrect code that crashes or, even worse, executable code that performs the wrong action!

Even though this poses a big problem, it is manageable. “Hallucinations” are merely the consequence of applying LLMs on out-of-distribution data, aka evaluating the model on tasks it was not exposed to during training.

Therefore, curating a dataset of high-quality (query, action) input pairs should greatly improve performance or, even better, solve the problem of mapping queries to actions altogether!

But to achieve such an ambitious goal, we need a diverse and qualitative dataset of such (query, action) pairs that represent what real AI systems will encounter in practice. This dataset should be as representative as possible to fully grasp the complexity of websites and their associated interactions and allow an AI to reproduce these capabilities.

While academic benchmarks have been collected, they are often 

  • Limited in their diversity
  • Small
  • Not necessarily containing actionable outputs 

Collecting such a dataset is challenging, as websites on the Internet are extremely diverse, from the frameworks used to generate websites to the content that can range from a blog to share cat images to complex SaaS apps like Salesforce. 

Collecting such data would take much work for most individuals or organizations, especially startups and academics. This might mean that only large entities could create AI to browse the Internet effectively.

This would be quite deplorable, and many voices, from Hugging Face’s CEO Clément Delangue to Meta’s VP of AI, Yann LeCun, have highlighted the risks of such a future.

That’s why we need an initiative to ensure that high-quality and diverse data is available to researchers and smaller companies: hence, our initiative BigAction!

Introducing BigAction

BigAction is an initiative we are launching to allow researchers, small companies, individuals, and anyone who wishes to collect open-source datasets to evaluate and train LAMs.

Our models for this initiative are the previously launched initiatives BigScience and BigCode that fostered the development of open science in AI to apply it to specific domains, such as science and coding.

Those decentralized projects have gathered research and industry to produce high-quality deliverables, ranging from BLOOM, one of the first large-scale 175B LLM dedicated to science, to the Star Coder series of models.

BigAction intends to create the same environment and energy to spur the AI community to push the state of the art of AI to perform actions for users’ sake. Whether you are a researcher, student, professional, or hobbyist, we welcome contributors from all backgrounds.! You can join our Discord to start chatting with us and we will present the different projects soon.

While LaVague launched this initiative, it is meant to be decentralized, and we merely want to act as a catalyst in the early phases. We believe open science will be critical both for model performance and for just sharing of the outcomes.

As having the right measures to ensure a system is properly working is critical before experimenting with complex architectures, collecting the right dataset and finding the right metric are critical. 

TheWave

That is why the first deliverable of BigAction is TheWave, a dataset containing pairs of (query, HTML) -> (Selenium code) to evaluate and train LAMs for browser action.

In this scenario, users provide queries (like “Click on login”), and code is generated to perform that action by analyzing the DOM and the query.

You can find our first dataset, TheWave, on our Hugging Face organization. TheWave contains examples of website pages, queries and the appropriate code, to help evaluate models in their ability to generate the right action.

You can see below samples of TheWave:

TheWave contains the following fields:

  • Query: The query to be performed
  • URL: URL of the website where the action is to be performed
  • HTML: HTML of the page
  • Selenium_ground_truth: Selenium code to find the element to be interacted with
  • Ground_truth_outer_html: Outer HTML of the element to be interacted with
  • Ground_truth_highlighted_screenshot: base 64 encoding of a screenshot of the page with the element highlighted

Here is an example of such a screenshot on Wikipedia:

This is the first version, but we will continue to iteratively expand its size/quality and diversity to fully capture the actions humans perform on the internet to teach an AI system to automate them.

Data collection process

To collect this dataset, we used LaVague to automatically generate the Selenium code to target the element to be interacted with:

If the code works, the user is asked to manually validate that the identified component is the right one. While being a bit tedious, this is necessary to ensure the correctness of the generation, as an instruction can have several outcomes (imagine saying ‘Click on Datasets’ and there are two links, ‘Enterprise Datasets’ and ‘Community Datasets’).

While this requires some work, it ensures high-quality data and that we can trust this benchmark. In the long run, we will use more autonomous pipelines, where AI could browse the internet, generate queries, and train a model to classify whether the highlighted element corresponds to the right element!

We will soon open-source the tool we used and open a process for others to contribute data so we can create a high-quality benchmark for the community to evaluate their Large Action Models!

Conclusion

We are excited to launch the BigAction initiative, which fosters an open-science approach to developing Action Models that can interact with the Internet for us. LAMs provide a great opportunity to develop autonomous agents acting on our behalf. 

However, building a truly performant AI system will require tremendous effort. Therefore, having a vibrant open-source community to tackle it seems to be the fastest and most sustainable way to build such systems and also to share their value.

We hope this initiative will spur open science, enabling the development of better datasets, metrics, and systems, from retriever to LLM generation, to build truly performant large action models for web interaction!

If you are interested in contributing, whether it is to build better datasets, test a new architecture for Selenium code generation, or collaborate on a paper, do not hesitate to drop us a message on the Discord we have created for this on LaVague’s server. We intend to have a separate Discord server for BigAction in the future, but for now as LaVague carries the initial effort, we will do it in our Discord.