Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → huggingface → Hmtl

huggingface / Hmtl

Licence: mit

🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch nlp natural-language-processing

Projects that are alternatives of or similar to Hmtl

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Stars: ✭ 8,112 (+648.34%)

Mutual labels: natural-language-processing

The notes for Math, Machine Learning, Deep Learning and Research papers.

Stars: ✭ 53 (-95.11%)

Mutual labels: natural-language-processing

Emotion Detector

A python code to detect emotions from text

Stars: ✭ 54 (-95.02%)

Mutual labels: natural-language-processing

Natural Korean Processor for Apache Spark

Stars: ✭ 50 (-95.39%)

Mutual labels: natural-language-processing

Python Tutorial Notebooks

Python tutorials as Jupyter Notebooks for NLP, ML, AI

Stars: ✭ 52 (-95.2%)

Mutual labels: natural-language-processing

Nltk Book Resource

Notes and solutions to complement the official NLTK book

Stars: ✭ 54 (-95.02%)

Mutual labels: natural-language-processing

Cs224n Solutions

Solutions for CS224n course from Stanford University: Natural Language Processing with Deep Learning

Stars: ✭ 48 (-95.57%)

Mutual labels: natural-language-processing

Some JavaScript works published as demos, mostly ML or DS

Stars: ✭ 55 (-94.93%)

Mutual labels: natural-language-processing

Fasttext multilingual

Multilingual word vectors in 78 languages

Stars: ✭ 1,067 (-1.57%)

Mutual labels: natural-language-processing

Text classification with Sparse Composite Document Vectors.

Stars: ✭ 54 (-95.02%)

Mutual labels: natural-language-processing

Mycroft's multilingual text parsing and formatting library

Stars: ✭ 51 (-95.3%)

Mutual labels: natural-language-processing

Nlp Various Tutorials

자연어 처리와 관련한 여러 튜토리얼 저장소

Stars: ✭ 52 (-95.2%)

Mutual labels: natural-language-processing

Market Reporter

Automatic Generation of Brief Summaries of Time-Series Data

Stars: ✭ 54 (-95.02%)

Mutual labels: natural-language-processing

Stanford CoreNLP: A Java suite of core NLP tools.

Stars: ✭ 8,248 (+660.89%)

Mutual labels: natural-language-processing

Vietnamese Electra

Electra pre-trained model using Vietnamese corpus

Stars: ✭ 55 (-94.93%)

Mutual labels: natural-language-processing

Convai Baseline

ConvAI baseline solution

Stars: ✭ 49 (-95.48%)

Mutual labels: natural-language-processing

Thot toolkit for statistical machine translation

Stars: ✭ 53 (-95.11%)

Mutual labels: natural-language-processing

Research papers

Record some papers I have read and paper notes I have taken, also including some awesome papers reading lists and academic blog posts.

Stars: ✭ 55 (-94.93%)

Mutual labels: natural-language-processing

Corpus of Annual Reports in Japan

Stars: ✭ 55 (-94.93%)

Mutual labels: natural-language-processing

"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

Stars: ✭ 1,073 (-1.01%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

HMTL (Hierarchical Multi-Task Learning model)

***** New November 20th, 2018: Online web demo is available *****

We released an online demo (along with pre-trained weights) so that you can play yourself with the model. The code for the web interface is also available in the demo folder.

To download the pre-trained models, please install git lfs and do a git lfs pull. The weights of the model will be saved in the model_dumps folder.

A Hierarchical Multi-Task Approach for Learning Embeddings from Semantic Tasks
Victor SANH, Thomas WOLF, Sebastian RUDER
Accepted at AAAI 2019

HMTL Architecture

About

HMTL is a Hierarchical Multi-Task Learning model which combines a set of four carefully selected semantic tasks (namely Named Entity Recoginition, Entity Mention Detection, Relation Extraction and Coreference Resolution). The model achieves state-of-the-art results on Named Entity Recognition, Entity Mention Detection and Relation Extraction. Using SentEval, we show that as we move from the bottom to the top layers of the model, the model tend to learn more complex semantic representation.

For further details on the results, please refer to our paper.

We released the code for training, fine tuning and evaluating HMTL. We hope that this code will be useful for building your own Multi-Task models (hierarchical or not). The code is written in Python and powered by Pytorch.

Dependecies and installation

The main dependencies are:

AllenNLP
PyTorch
SentEval (only for evaluating the embeddings)

The code works with Python 3.6. A stable version of the dependencies is listed in requirements.txt.

You can quickly setup a working environment by calling the script ./script/machine_setup.sh. It installs Python 3.6, creates a clean virtual environment, and installs all the required dependencies (listed in requirements.txt). Please adapt the script depending on your needs.

Example usage

We based our implementation on the AllenNLP library. For an introduction to this library, you should check these tutorials.

An experiment is defined in a json configuration file (see configs/*.json for examples). The configuration file mainly describes the datasets to load, the model to create along with all the hyper-parameters of the model.

Once you have set up your configuration file (and defined custom classes such DatasetReaders if needed), you can simply launch a training with the following command and arguments:

python train.py --config_file_path configs/hmtl_coref_conll.json --serialization_dir my_first_training

Once the training has started, you can simply follow the training in the terminal or open a Tensorboard (please make sure you have installed Tensorboard and its Tensorflow dependecy before):

tensorboard --logdir my_first_training/log

Evaluating the embeddings with SentEval

We used SentEval to assess the linguistic properties learned by the model. hmtl_senteval.py gives an example of how we can create an interface between SentEval and HMTL. It evaluates the linguistic properties learned by every layer of the hiearchy (shared based word embeddings and encoders).

Data

To download the pre-trained embeddings we used in HMTL, you can simply launch the script ./script/data_setup.sh.

We did not attach the datasets used to train HMTL for licensing reasons, but we invite you to collect them by yourself: OntoNotes 5.0, CoNLL2003, and ACE2005. The configuration files expect the datasets to be placed in the data/ folder.

References

Please consider citing the following paper if you find this repository useful.

@article{sanh2018hmtl,
  title={A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks},
  author={Sanh, Victor and Wolf, Thomas and Ruder, Sebastian},
  journal={arXiv preprint arXiv:1811.06031},
  year={2018}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,084

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (12) 🔗