# Mind2Web API

## Mind2Web Description

Mind2Web is a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on websites. The Mind2Web paper can be found [here](https://arxiv.org/abs/2306.06070).

Note that this API only contains the Mind2Web Training data. We do not include the Test data to avoid data contamination with data crawlers for training LLMs.

## Mind2Web API Documentation

### Introduction

This API serves the Mind2Web engine which provides endpoints to access partial and full datasets.

* **Version**: 0.0.9 (Experimental)
* **Status**: Development
* **Rate Limit**: 500 requests per minute.

**Root Endpoint**: [`https://api.junglegym.ai/`](https://api.junglegym.ai/)

### Endpoints

#### 1. Root Test Endpoint

* **URL**: [`https://api.junglegym.ai/`](https://api.junglegym.ai/)
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: A welcome message.

```json
jsonCopy code{
    "message": "Hello World from the JungleGym dataset server!"
}
```

#### 2. Load Light Train Dataset

* **URL**: `/load_light_train_dataset`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns light training dataset.

```json
jsonCopy code{
    "data": [...]
}
```

#### 3. Load Full Train Dataset

* **URL**: `/load_full_train_dataset`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns full training dataset.

```json
jsonCopy code{
    "data": [...]
}
```

#### 4. Get List of Actions

* **URL**: `/get_list_of_actions?annotation_id=<annotation_id>`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns actions and their representations for a given annotation ID.

```json
jsonCopy code{
    "actions": [...],
    "action_reprs": [...]
}
```

#### 5. Get Raw JSON Screenshots

* **URL**: `/get_raw_json_screenshots?annotation_id=<annotation_id>`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns raw JSON screenshots for a given annotation ID.

```json
jsonCopy code{
    "data": {...}
}
```

#### 6. Get Raw DOM Content

* **URL**: `/get_raw_dom_content?annotation_id=<annotation_id>`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns raw DOM content for a given annotation ID.

```json
jsonCopy code{
    "data": {...}
}
```

#### 7. Get Storage

* **URL**: `/get_storage?annotation_id=<annotation_id>`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns storage data for a given annotation ID.

```json
jsonCopy code{
    "data": {...}
}
```

#### 8. Get Raw Trace Zip

* **URL**: `/get_raw_trace_zip?annotation_id=<annotation_id>`
* **Method**: `GET`
* **Rate Limit**: 500/minute.
* **Response**: Returns a trace.zip file for the given annotation ID.

### Errors

The API will return specific HTTP status codes for different kinds of errors:

* `401 Unauthorized`: For forbidden or unauthorized access.
* `404 Not Found`: If the requested resource or data is not found.
* `500 Internal Server Error`: For any internal server issues.

Make sure to check the `detail` field in the response for a specific error message.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.junglegym.ai/junglegym/api-documentation/mind2web-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
