DOM-Q-NET: Grounded RL on Structured Language
"DOM-Q-NET: Grounded RL on Structured Language" International Conference on Learning Representations (2019). Sheng Jia, Jamie Kiros, Jimmy Ba. [arxiv] [openreview]
Demo
Trained multitask agent: https://www.youtube.com/watch?v=eGzTDIvX4IY
Facebook login: https://www.youtube.com/watch?v=IQytRUKmWhs&t=2s
Requirement
Need to download selenium & install chrome driver for selenium..
Installation
- Clone this repo
- Download MiniWoB++ environment from the original repo https://github.com/stanfordnlp/miniwob-plusplus
and copy miniwob-plusplus/html folder to miniwob/html in this repo - In fact, this html folder could be stored anywhere, but remember to perform one of the following actions:
- Set environment variable
"WOB_PATH"
to
file://"your-path-to-miniwob-plusplus"/html/miniwob
E.g. "your-path-to-miniwob-plusplus" is "/h/sheng/DOM-Q-NET/miniwob- Directly modify the
base_url
on line 33 of instance.py to
"your-path-to-miniwob-plusplus"/html/miniwob
In my case,base_url='file:///h/sheng/DOM-Q-NET/miniwob/html/miniwob/'
Run experiment
Experiment launch files are stored under runs
For example,
cd runs/hard2medium9tasks/
sh run1.sh
will launch a 11 multi-task (social-media
search-engine
login-user
enter-password
click-checkboxes
click-option
enter-dynamic-text
enter-text
email-inbox-delete
click-tab-2
navigation-tree
) experiment.
Multitask Assumptions
State & Action restrictions
Item | Maximum number of items |
---|---|
DOM tree leaves (action space) | 160 |
DOM tree | 200 |
Instruction tokens | 16 |
Attribute embeddings & vocabulary
Attribute | max vocabulary | Embedding dimension |
---|---|---|
Tag | 100 |
16 |
Text (shared with instructions) | 600 |
48 |
Class | 100 |
16 |
- UNKnown tokens
These are assigned to a random vector such that the cosine distance with the text attribute can yield 1.0 for the direct alignment.
Acknowledgement
Credit to Dopamine for the implementation of prioritized replay used in dstructs/dopamine_segtree.py