All Projects → NExTplusplus → TAT-QA

NExTplusplus / TAT-QA

Licence: MIT license
TAT-QA (Tabular And Textual dataset for Question Answering) contains 16,552 questions associated with 2,757 hybrid contexts from real-world financial reports.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TAT-QA

Enso
Hybrid visual and textual functional programming.
Stars: ✭ 5,238 (+12995%)
Mutual labels:  textual, hybrid
chihu
ionic2-example <吃乎>一款美食app 🍜 ☕️ 🍦 (This is a support android and apple ionic2 case, a food app)
Stars: ✭ 64 (+60%)
Mutual labels:  hybrid
React Native Turbolinks
React Native adapter for building hybrid apps with Turbolinks 5
Stars: ✭ 177 (+342.5%)
Mutual labels:  hybrid
Easybridge
A design of easy js-bridge which provide the ability to communicate between java and javascript.It is based on the android webview's feature [addJavaScriptInterface]
Stars: ✭ 158 (+295%)
Mutual labels:  hybrid
Kerkee android
kerkee is a hybrid app framework,This repository is kerkee for android
Stars: ✭ 208 (+420%)
Mutual labels:  hybrid
refu
Refu language
Stars: ✭ 21 (-47.5%)
Mutual labels:  hybrid
Cerebrum
Crossmodal Supervised Learning Toolkit using High-Performance Extreme Learning Machines over the audio-visual-textual data
Stars: ✭ 41 (+2.5%)
Mutual labels:  textual
Cordovacn
Apache Cordova is an open-source mobile development framework. It allows you to use standard web technologies such as HTML5, CSS3, and JavaScript for cross-platform development, avoiding each mobile platforms' native development language. (Apache Cordova是一个开放源代码的移动开发框架,它允许你使用web技术如:JavaScript,HTML,CSS进行跨平台开发,避免使用原生开发。)
Stars: ✭ 240 (+500%)
Mutual labels:  hybrid
Next.js
The React Framework
Stars: ✭ 78,384 (+195860%)
Mutual labels:  hybrid
py-trueconsensus
python prototype for hybrid consensus
Stars: ✭ 48 (+20%)
Mutual labels:  hybrid
React Native Webview Invoke
Invoke functions between React Native and WebView
Stars: ✭ 211 (+427.5%)
Mutual labels:  hybrid
Agenericclient
AGenericClient 泛客户端开发,其中包括小程序、快应用、H5、移动 App、桌面应用、游戏开发,涉及到了:uni-app、Taro 多端统一开发框架、ReactNative和Flutter 移动端跨平台开发框架、以及移动端Native Android开发、桌面跨平台Electron 掌握多端开发
Stars: ✭ 228 (+470%)
Mutual labels:  hybrid
ModelAutoBuild
A framework for dynamically creating a tabular model based on an Excel template.
Stars: ✭ 43 (+7.5%)
Mutual labels:  tabular
Android ctrip
Android Flutter 混合开发高仿大厂App
Stars: ✭ 180 (+350%)
Mutual labels:  hybrid
kerkee ios
kerkee is a hybrid app framework,This repository is kerkee for ios
Stars: ✭ 121 (+202.5%)
Mutual labels:  hybrid
Hybridfoundation
混合应用基础架构 跨平台热更新方案 Js双向通信 基础WebView
Stars: ✭ 164 (+310%)
Mutual labels:  hybrid
Tetra3d
Tetra3D is a 3D hybrid software/hardware renderer made for games written in Go with Ebitengine.
Stars: ✭ 271 (+577.5%)
Mutual labels:  hybrid
build ionic2 app chinese
this is the chinese version of <build ionic2 app chinese>
Stars: ✭ 16 (-60%)
Mutual labels:  hybrid
wengan
An accurate and ultra-fast hybrid genome assembler
Stars: ✭ 81 (+102.5%)
Mutual labels:  hybrid
Hybrid-Web-Platform
Full-fledged WebView as Xamarin.Forms plugin with cross-platform C# to JavaScript and JavaScript to C# calls support. Eventually invented for painless hybrid apps creation.
Stars: ✭ 19 (-52.5%)
Mutual labels:  hybrid

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

TAT-QA (Tabular And Textual dataset for Question Answering) contains 16,552 questions associated with 2,757 hybrid contexts from real-world financial reports.

You can download our TAT-QA dataset via TAT-QA dataset.

For more information, please refer to our TAT-QA website or read our ACL2021 paper PDF.

TagOp Model

Requirements

To create an environment with MiniConda and activate it.

conda create -n tat-qa python==3.7
conda activate tat-qa
pip install -r requirement.txt
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+${CUDA}.html

We adopt RoBERTa as our encoder to develop our TagOp and use the following commands to prepare RoBERTa model

cd dataset_tagop
mkdir roberta.large && cd roberta.large
wget -O pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-pytorch_model.bin
wget -O config.json https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json
wget -O vocab.json https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json
wget -O merges.txt https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt

Training & Testing

Preprocessing dataset

We heuristicly generate the "facts" and "mapping" fields based on raw dataset, which are stored under the folder of dataset_tagop.

Prepare dataset

PYTHONPATH=$PYTHONPATH:$(pwd):$(pwd)/tag_op python tag_op/prepare_dataset.py --mode [train/dev/test]

Note: The result will be written into the folder ./tag_op/cache default.

Train & Evaluation

CUDA_VISIBLE_DEVICES=2 PYTHONPATH=$PYTHONPATH:$(pwd) python tag_op/trainer.py --data_dir tag_op/cache/ \
--save_dir ./checkpoint --batch_size 48 --eval_batch_size 8 --max_epoch 50 --warmup 0.06 --optimizer adam --learning_rate 5e-4 \
--weight_decay 5e-5 --seed 123 --gradient_accumulation_steps 4 --bert_learning_rate 1.5e-5 --bert_weight_decay 0.01 \
--log_per_updates 50 --eps 1e-6 --encoder roberta

Testing

CUDA_VISIBLE_DEVICES=2 PYTHONPATH=$PYTHONPATH:$(pwd) python tag_op/predictor.py --data_dir tag_op/cache/ --test_data_dir tag_op/cache/ \\
--save_dir tag_op/ --eval_batch_size 32 --model_path ./checkpoint --encoder roberta

Note: The training process may take around 2 days using a single 32GB v100.

Citation

Please kindly cite our work if you use our dataset or codes, thank you.

@inproceedings{zhu-etal-2021-tat,
    title = "{TAT}-{QA}: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance",
    author = "Zhu, Fengbin  and
      Lei, Wenqiang  and
      Huang, Youcheng  and
      Wang, Chao  and
      Zhang, Shuo  and
      Lv, Jiancheng  and
      Feng, Fuli  and
      Chua, Tat-Seng",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.254",
    doi = "10.18653/v1/2021.acl-long.254",
    pages = "3277--3287"
}

Any Question?

For any issues please create an issue here or kindly email us at: Youcheng Huang [email protected] or Fengbin Zhu [email protected], thank you.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].