All Projects → recognai → Get_started_with_deep_learning_for_text_with_allennlp

recognai / Get_started_with_deep_learning_for_text_with_allennlp

Getting started with AllenNLP and PyTorch by training a tweet classifier

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Get started with deep learning for text with allennlp

Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+1104.35%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+2460.87%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Pytorchnlpbook
Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://nlproc.info
Stars: ✭ 1,390 (+1914.49%)
Mutual labels:  pytorch-tutorial, natural-language-processing, neural-networks
Awesome Ai Awesomeness
A curated list of awesome awesomeness about artificial intelligence
Stars: ✭ 268 (+288.41%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Learn Data Science For Free
This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …
Stars: ✭ 4,757 (+6794.2%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+31752.17%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (+139.13%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Machine Learning
머신러닝 입문자 혹은 스터디를 준비하시는 분들에게 도움이 되고자 만든 repository입니다. (This repository is intented for helping whom are interested in machine learning study)
Stars: ✭ 705 (+921.74%)
Mutual labels:  pytorch-tutorial, natural-language-processing, neural-networks
Riceteacatpanda
repo with challenge material for riceteacatpanda (2020)
Stars: ✭ 18 (-73.91%)
Mutual labels:  artificial-intelligence, natural-language-processing, neural-networks
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-44.93%)
Mutual labels:  artificial-intelligence, natural-language-processing
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-43.48%)
Mutual labels:  artificial-intelligence, natural-language-processing
Ai Residency List
List of AI Residency & Research programs, Ph.D Fellowships, Research Internships
Stars: ✭ 69 (+0%)
Mutual labels:  artificial-intelligence, neural-networks
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-46.38%)
Mutual labels:  artificial-intelligence, natural-language-processing
Artificialintelligenceengines
Computer code collated for use with Artificial Intelligence Engines book by JV Stone
Stars: ✭ 35 (-49.28%)
Mutual labels:  artificial-intelligence, neural-networks
Machine Learning From Scratch
Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.
Stars: ✭ 42 (-39.13%)
Mutual labels:  artificial-intelligence, neural-networks
Deepaudioclassification
Finding the genre of a song with Deep Learning
Stars: ✭ 969 (+1304.35%)
Mutual labels:  artificial-intelligence, neural-networks
Deepbrain
Deep Learning tools for brain medical images
Stars: ✭ 51 (-26.09%)
Mutual labels:  artificial-intelligence, neural-networks
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-23.19%)
Mutual labels:  artificial-intelligence, natural-language-processing
Easy Deep Learning With Allennlp
🔮Deep Learning for text made easy with AllenNLP
Stars: ✭ 32 (-53.62%)
Mutual labels:  natural-language-processing, neural-networks
Tensorhub
TensorHub is a library built on top of TensorFlow 2.0 to provide simple, modular and repeatable abstractions to accelerate deep learning research.
Stars: ✭ 48 (-30.43%)
Mutual labels:  artificial-intelligence, neural-networks

Introduction

This repository contains code and experiments using PyTorch, AllenNLP and spaCy and is intended as a learning resource for getting started with this libraries and with deep learning for NLP technologies.

In particular, it contains:

  1. Custom modules for defining a SequenceClassifier and its Predictor.
  2. A basic custom DataReader for reading CSV files.
  3. An experiments folder containing several experiment JSON files to show how to define a baseline and refine it with more sophisticated approaches.

The overall goal is to classify tweets in Spanish corresponding to the COSET challenge dataset: a collection of tweets for a recent Spanish Election. The winning approach of the challenge is described in the following paper: http://ceur-ws.org/Vol-1881/COSET_paper_7.pdf.

Setup

Use a virtual environment, Conda for example:

conda create -n allennlp_spacy
source activate allennlp_spacy

Install PyTorch for your platform:

pip install http://download.pytorch.org/whl/torch-0.2.0.post3-cp36-cp36m-macosx_10_7_x86_64.whl

Install spaCy Spanish model:

python -m spacy download es

Install AllenNLP and other dependencies:

pip install -r requirements.txt

Install custom module for running AllenNLP commands with custom models:

python setup.py develop

Install Tensorboard:

pip install tensorboard

Download pre-trained and prepare word vectors from fastText project:

download_prepare_fasttext.sh

Goals

  1. Understand the basic components of AllenNLP and PyTorch.

  2. Understand how to configure AllenNLP to use spaCy models in different languages, in this case Spanish model.

  3. Understand how to create and connect custom models using AllenNLP and extending its command-line.

  4. Design and compare several experiments on a simple Tweet classification tasks in Spanish. Start by defining a simple baseline and progressively use more complex models.

  5. Use Tensorboard for monitoring the experiments.

  6. Compare your results with existing literature (i.e., results of the COSET Tweet classification challenge)

  7. Learn how to prepare and use external pre-trained word embeddings, in this case fastText's wikipedia-based word vectors.

Exercises

Inspecting Seq2VecEncoders and understanding the basic building blocks of AllenNLP:

Check the basic structure of these modules in AllenNLP.

Defining and running our baseline:

In the folder experiments/definitions/ you can find the definition of our baseline, using a BagOfEmbeddingsEncoder.

Run the experiment using:

python -m recognai.run train experiments/definitions/baseline_boe_classifier.json -s experiments/output/baseline

Monitor your experiments using Tensorboard:

You can monitor your experiments by running TensorBoard and pointing it to the experiments output folder:

tensorboard --logdir=experiments/output

Defining and running a CNN classifier:

In the folder experiments/definitions/ you can find the definition of a CNN classifier. As you see, we only need to configure a new encoder using a CNN.

Run the experiment using:

python -m recognai.run train experiments/definitions/cnn_classifier.json -s experiments/output/cnn

Using pre-trained word embeddings:

Facebook fastText's team has made available pre-trained word embeddings for 294 languages (see https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md). Using the download_prepare_fasttext.sh script, you can download the Spanish vectors and use them as pre-trained weights in either of the models.

To use pre-trained embeddings, you can run the experiment using:

python -m recognai.run train experiments/definitions/cnn_classifier_fasttext_embeddings_fixed.json -s experiments/output/cnn_embeddings_fixed

Or use pre-trained embeddings and let the network tune their weights, using:

python -m recognai.run train experiments/definitions/cnn_classifier_fasttext_embeddings_tunable.json -s experiments/output/cnn_embeddings_tuned

Extra:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].