All Projects → outcastofmusic → Quick Nlp

outcastofmusic / Quick Nlp

Licence: mit
Pytorch NLP library based on FastAI

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Quick Nlp

classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (-78.14%)
Mutual labels:  seq2seq, nlp-library
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+19879.21%)
Mutual labels:  nlp-library, seq2seq
Nuts
自然语言处理常见任务(主要包括文本分类,序列标注,自动问答等)解决方案试验田
Stars: ✭ 21 (-92.47%)
Mutual labels:  seq2seq, nlp-library
DLCV2018SPRING
Deep Learning for Computer Vision (CommE 5052) in NTU
Stars: ✭ 38 (-86.38%)
Mutual labels:  seq2seq
neural-chat
An AI chatbot using seq2seq
Stars: ✭ 30 (-89.25%)
Mutual labels:  seq2seq
TaLKConvolutions
Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)
Stars: ✭ 26 (-90.68%)
Mutual labels:  seq2seq
Seq2seq chatbot links
Links to the implementations of neural conversational models for different frameworks
Stars: ✭ 270 (-3.23%)
Mutual labels:  seq2seq
dts
A Keras library for multi-step time-series forecasting.
Stars: ✭ 130 (-53.41%)
Mutual labels:  seq2seq
Nagisa
A Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (-6.81%)
Mutual labels:  nlp-library
NeuralTextSimplification
Exploring Neural Text Simplification
Stars: ✭ 64 (-77.06%)
Mutual labels:  seq2seq
NLP-tools
Useful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-86.02%)
Mutual labels:  nlp-library
chatbot
🤖️ 基于 PyTorch 的任务型聊天机器人(支持私有部署和 docker 部署的 Chatbot)
Stars: ✭ 77 (-72.4%)
Mutual labels:  seq2seq
2D-LSTM-Seq2Seq
PyTorch implementation of a 2D-LSTM Seq2Seq Model for NMT.
Stars: ✭ 25 (-91.04%)
Mutual labels:  seq2seq
Deepqa
My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot
Stars: ✭ 2,811 (+907.53%)
Mutual labels:  seq2seq
clj-duckling
Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings. (a duckling clojure fork)
Stars: ✭ 15 (-94.62%)
Mutual labels:  nlp-library
Keras Text Summarization
Text summarization using seq2seq in Keras
Stars: ✭ 260 (-6.81%)
Mutual labels:  seq2seq
Giveme5W
Extraction of the five journalistic W-questions (5W) from news articles
Stars: ✭ 16 (-94.27%)
Mutual labels:  nlp-library
torch-asg
Auto Segmentation Criterion (ASG) implemented in pytorch
Stars: ✭ 42 (-84.95%)
Mutual labels:  seq2seq
keras seq2seq word level
Implementation of seq2seq word-level model using keras
Stars: ✭ 12 (-95.7%)
Mutual labels:  seq2seq
Chatbot ner
chatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (-2.15%)
Mutual labels:  nlp-library

Quick NLP


Quick NLP is a deep learning nlp library inspired by the fast.ai library <https://github.com/fastai/fastai>_

It follows the same api as fastai and extends it allowing for quick and easy running of nlp models

Features

Installation

Installation of fast.ai library is required. Please install using the instructions here <https://github.com/fastai/fastai>_ . It is important that the latest version of fast.ai is used and not the pip version which is not up to date.

After setting up an environment using the fasta.ai instructions please clone the quick-nlp repo and use pip install to install the package as follows:

.. code-block:: bash

git clone https://github.com/outcastofmusic/quick-nlp
cd quick-nlp
pip install .

Docker Image


A docker image with the latest master is available to use it please run:

.. code-block:: bash

    docker run --runtime nvidia -it -p 8888:8888 --mount type=bind,source="$(pwd)",target=/workspace agispof/quicknlp:latest

this will mount your current directory to /workspace and start a jupyter lab session in that directory

Usage Example
-------------

The main goal of quick-nlp is to provided the easy interface of the fast.ai library for seq2seq models.

For example  Lets assume that we have a dataset_path with folders for training, validation files.
Each file is a tsv file where each row is two sentences separated by a tab. For example a file inside the train folder can be a eng_to_fr.tsv file with the following first few lines::

    Go.	Va !
    Run!	Cours !
    Run!	Courez !
    Wow!	Ça alors !
    Fire!	Au feu !
    Help!	À l'aide !
    Jump.	Saute.
    Stop!	Ça suffit !
    Stop!	Stop !
    Stop!	Arrête-toi !
    Wait!	Attends !
    Wait!	Attendez !
    I see.	Je comprends.


loading the data from the directory is as simple as:

.. code-block:: python

    from fastai.plots import *
    from torchtext.data import Field
    from fastai.core import SGD_Momentum
    from fastai.lm_rnn import seq2seq_reg
    from quicknlp import SpacyTokenizer, print_batch, S2SModelData
    INIT_TOKEN = "<sos>"
    EOS_TOKEN = "<eos>"
    DATAPATH = "dataset_path"
    fields = [
        ("english", Field(init_token=INIT_TOKEN, eos_token=EOS_TOKEN, tokenize=SpacyTokenizer('en'), lower=True)),
        ("french", Field(init_token=INIT_TOKEN, eos_token=EOS_TOKEN, tokenize=SpacyTokenizer('fr'), lower=True))

    ]
    batch_size = 64
    data = S2SModelData.from_text_files(path=DATAPATH, fields=fields,
                                        train="train",
                                        validation="validation",
                                        source_names=["english", "french"],
                                        target_names=["french"],
                                        bs= batch_size
                                       )


Finally, to train a seq2seq model with the data we only need to do:

.. code-block:: python

    emb_size = 300
    nh = 1024
    nl = 3
    learner = data.get_model(opt_fn=SGD_Momentum(0.7), emb_sz=emb_size,
                             nhid=nh,
                             nlayers=nl,
                             bidir=True,
                            )
    clip = 0.3
    learner.reg_fn = reg_fn
    learner.clip = clip
    learner.fit(2.0, wds=1e-6)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].