Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.

Stars: ✭ 64 (-63.01%)

Mutual labels: attention, transformer

Awesome Bert Nlp

A curated list of NLP resources focused on BERT, attention mechanism, Transformer networks, and transfer learning.

Stars: ✭ 567 (+227.75%)

Mutual labels: natural-language-processing, transformer

Multihead Siamese Nets

Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task.

Stars: ✭ 144 (-16.76%)

Mutual labels: natural-language-processing, attention

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+135.84%)

Mutual labels: attention, transformer

Pytorch Original Transformer

My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.

Stars: ✭ 411 (+137.57%)

Mutual labels: attention, transformer

Njunmt Tf

An open-source neural machine translation system developed by Natural Language Processing Group, Nanjing University.

Stars: ✭ 97 (-43.93%)

Mutual labels: attention, transformer

Sightseq

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

Stars: ✭ 116 (-32.95%)

Mutual labels: attention, transformer

View All Similar Projects ➔

Julia implementation of transformer-based models, with Flux.jl.

Installation

In the Julia REPL:

]add Transformers

For using GPU, install & build:

]add CUDA

]build 

julia> using CUDA

julia> using Transformers

#run the model below
.
.
.

Example

Using pretrained Bert with Transformers.jl.

using Transformers
using Transformers.Basic
using Transformers.Pretrain

ENV["DATADEPS_ALWAYS_ACCEPT"] = true

bert_model, wordpiece, tokenizer = pretrain"bert-uncased_L-12_H-768_A-12"
vocab = Vocabulary(wordpiece)

text1 = "Peter Piper picked a peck of pickled peppers" |> tokenizer |> wordpiece
text2 = "Fuzzy Wuzzy was a bear" |> tokenizer |> wordpiece

text = ["[CLS]"; text1; "[SEP]"; text2; "[SEP]"]
@assert text == [
    "[CLS]", "peter", "piper", "picked", "a", "peck", "of", "pick", "##led", "peppers", "[SEP]", 
    "fuzzy", "wu", "##zzy",  "was", "a", "bear", "[SEP]"
]

token_indices = vocab(text)
segment_indices = [fill(1, length(text1)+2); fill(2, length(text2)+1)]

sample = (tok = token_indices, segment = segment_indices)

bert_embedding = sample |> bert_model.embed
feature_tensors = bert_embedding |> bert_model.transformers

See example folder for the complete example.

Huggingface

We have some support for the models from huggingface/transformers.

using Transformers.HuggingFace

# loading a model from huggingface model hub
julia> model = hgf"bert-base-cased:forquestionanswering";
┌ Warning: Transformers.HuggingFace.HGFBertForQuestionAnswering doesn't have field cls.
└ @ Transformers.HuggingFace ~/peter/repo/gsoc2020/src/huggingface/models/models.jl:46
┌ Warning: Some fields of Transformers.HuggingFace.HGFBertForQuestionAnswering aren't initialized with loaded state: qa_outputs
└ @ Transformers.HuggingFace ~/peter/repo/gsoc2020/src/huggingface/models/models.jl:52

Current we only support a few model and the tokenizer part is not finished yet.

For more information

If you want to know more about this package, see the document and the series of blog posts I wrote for JSoC and GSoC. You can also tag me (@chengchingwen) on Julia's slack or discourse if you have any questions, or just create a new Issue on GitHub.

Roadmap

What we have before v0.2

Transformer and TransformerDecoder support for both 2d & 3d data.
PositionEmbedding implementation.
Positionwise for handling 2d & 3d input.
docstring for most of the functions.
runable examples (see example folder)
Transformers.HuggingFace for handling pretrains from huggingface/transformers

What we will have in v0.2.0

Complete tokenizer APIs
tutorials
benchmarks
more examples

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 173

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (15) 🔗