🛒 WebShop

🚀 Setup

(how I actually set it up at coffee)

Everything is done not inside a docker contaienr but directly in the coffee host. This is because 1) it is using conda, and 2) it is easier to expose a port so that tea can access it via coffee.cs.columbia.edu.

If you somehow lost everything, here is how I setup everything:

first, donwload the repo

> git clone https://github.com/princeton-nlp/webshop.git webshop

Create a virtual environment using Anaconda and activate it

> conda create -n webshop python=3.8.13
> conda activate webshop

Install requirements into the webshop virtual environment via the setup.sh script

> ./setup.sh -d all

note that this will fail at the gdown commands. This is because it will download cookies into .cache/gdown, but .cache does NOT exist in the coffee host. So, we need to manually do it in communication which has .cache. Then, the rest of the setup will work properly under coffee. The setup script performs several actions in the following order:

Installs Python dependencies listed in requirements.txt
Downloads product and instruction data for populating WebShop
Downloads spaCy en_core_web_lg model
Construct search engine index from product, instruction data
Downloads 50 randomly chosen trajectories generated by MTurk workers The -d flag argument allows you to specify whether you would like to pull the entire product + instruction data set (-d all) or a subset of 1000 random products (-d small).

By default the WebShop only loads 1,000 products for a faster environment preview. To load all products, change web_agent_site/utils.py:

# DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2_1000.json')
# DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle_1000.json')
DEFAULT_ATTR_PATH = join(BASE_DIR, '../data/items_ins_v2.json')
DEFAULT_FILE_PATH = join(BASE_DIR, '../data/items_shuffle.json')

(Optional) Download ResNet image feature files here and put into data/ for running models that require image features.
(Optional) Human demonstration data and be downloaded here.

If you just want to restart the service, inside the coffee host (not in any container):

cd into this project and do:
```
> conda activate webshop
```
then run the website:
```
> ./run_dev.sh
```

🛠️ Usage

The WebShop environment can be rendered in two modes - html and simple - each of which offer a different observation space. The simple mode strips away the extraneous meta-data that the html mode includes to make model training and evaluation easier.

Webpage Environment (`html` mode)

Launch the WebShop webpage:

> ./run_dev.sh

The site should then be viewable in the browser. Go to http://localhost:3000/ABC, where you should land on the search home page with a random instruction.

Navigating the website will automatically generate a corresponding trajectory file in the user_session_logs/mturk folder. Each file corresponds to a single instruction/web session, and each step of the file corresponds to a single action (i.e. search[...], click[...]).

The current WebShop build comes with two flags:

--log: Include this flag to create a trajectory .jsonl log file of actions on WebShop
--attrs: Include this flag to display an Attributes tab on the item_page of WebShop

Text Environment (`simple` mode)

The simple mode of the WebShop environment is packaged and readily available as an OpenAI environment. The OpenAI gym definitions of the text environment can be found in the web_agent_site/envs folder.

To start using the gym and building agents that interact with the WebShop environment, include the following statements in your Python file:

import gym
from web_agent_site.envs import WebAgentTextEnv

env = gym.make('WebAgentTextEnv-v0', observation_mode='text', num_products=...)

Now, you can write your own agent that interacts with the environment via the standard OpenAI gym interface.

Examples of a RandomPolicy agent interacting with the WebShop environment in both html and simple mode can be found in the run_envs folder. To run these examples locally, run the run_web_agent_text_env.sh or run_web_agent_site_env.sh script:

> ./run_web_agent_text_env.sh
Products loaded.
Keys Cleaned.
Attributes Loaded.
100%|██████████████████| 1000/1000
Loaded 6910 goals.
Amazon Shopping Game [SEP] Instruction: [SEP] Find me slim f...
Available actions: {'has_search_bar': True, 'clickables': ['search']}
Taking action "search[shoes]" -> Reward = 0.0
...

In order to run the run_web_agent_site_env.sh script, you must download a version of ChromeDriver compatible with your Chrome browser version. Once you have downloaded and unzipped the executable, rename it chromedriver and place it in the webshop/envs folder.

Baseline Models

To run baseline models (rule, IL, RL, IL+RL) from the paper, please refer to the README.md in the baseline_models folder.

Sim-to-real Transfer

To read more about how the sim-to-real transfer of agents trained on WebShop to other environments works, please refer to the README.md in the transfer folder.

💫 Contributions

We would love to hear from the broader NLP and Machine Learning community, and we welcome any contributions, pull requests, or issues! To do so, please either file a new pull request or issue and fill in the corresponding templates accordingly. We'll be sure to follow up shortly!

🪪 License

Check LICENSE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛒 WebShop

🚀 Setup

🛠️ Usage

Webpage Environment (`html` mode)

Text Environment (`simple` mode)

Baseline Models

Sim-to-real Transfer

💫 Contributions

🪪 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
assets		assets
baseline_models		baseline_models
run_envs		run_envs
search_engine		search_engine
tests		tests
transfer		transfer
web_agent_site		web_agent_site
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
conftest.py		conftest.py
requirements.txt		requirements.txt
run_dev.sh		run_dev.sh
run_prod.sh		run_prod.sh
run_web_agent_site_env.sh		run_web_agent_site_env.sh
run_web_agent_text_env.sh		run_web_agent_text_env.sh
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

🛒 WebShop

🚀 Setup

🛠️ Usage

Webpage Environment (html mode)

Text Environment (simple mode)

Baseline Models

Sim-to-real Transfer

💫 Contributions

🪪 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Webpage Environment (`html` mode)

Text Environment (`simple` mode)

Packages