All Projects → bnosac → golgotha

bnosac / golgotha

Licence: MPL-2.0 license
Contextualised Embeddings and Language Modelling using BERT and Friends using R

Programming Languages

r
7636 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to golgotha

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-38.46%)
Mutual labels:  transformers, transformer, bert
anonymisation
Anonymization of legal cases (Fr) based on Flair embeddings
Stars: ✭ 85 (+117.95%)
Mutual labels:  transformers, bert
text2keywords
Trained T5 and T5-large model for creating keywords from text
Stars: ✭ 53 (+35.9%)
Mutual labels:  transformers, transformer
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (+43.59%)
Mutual labels:  transformers, bert
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-41.03%)
Mutual labels:  transformer, bert
Kevinpro-NLP-demo
All NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现
Stars: ✭ 117 (+200%)
Mutual labels:  transformer, bert
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (+200%)
Mutual labels:  transformers, bert
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+528.21%)
Mutual labels:  transformer, bert
PDN
The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
Stars: ✭ 44 (+12.82%)
Mutual labels:  transformer, bert
Xpersona
XPersona: Evaluating Multilingual Personalized Chatbot
Stars: ✭ 54 (+38.46%)
Mutual labels:  transformer, bert
ParsBigBird
Persian Bert For Long-Range Sequences
Stars: ✭ 58 (+48.72%)
Mutual labels:  transformers, bert
wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
Stars: ✭ 39 (+0%)
Mutual labels:  transformers, bert
sticker2
Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot
Stars: ✭ 14 (-64.1%)
Mutual labels:  transformer, bert
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+7151.28%)
Mutual labels:  transformers, bert
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (+48.72%)
Mutual labels:  transformer, bert
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (-2.56%)
Mutual labels:  transformers, bert
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+487.18%)
Mutual labels:  transformers, bert
FasterTransformer
Transformer related optimization, including BERT, GPT
Stars: ✭ 1,571 (+3928.21%)
Mutual labels:  transformer, bert
GoEmotions-pytorch
Pytorch Implementation of GoEmotions 😍😢😱
Stars: ✭ 95 (+143.59%)
Mutual labels:  transformers, bert
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+287.18%)
Mutual labels:  transformers, bert

golgotha - Contextualised Embeddings and Language Modelling using BERT and Friends using R

  • This R package wraps the transformers module using reticulate
  • The objective of the package is to get easily sentence embeddings using a BERT-like model in R For using in downstream modelling (e.g. Support Vector Machines / Sentiment Labelling / Classification / Regression / POS tagging / Lemmatisation / Text Similarities)
  • Golgotha: Hope for lonely AI pelgrims on their way to losing CPU power: http://costes.org/cdbm20.mp3

Installation

  • For installing the development version of this package:
    • Execute in R: devtools::install_github("bnosac/golgotha", INSTALL_opts = "--no-multiarch")
    • Look to the documentation of the functions: help(package = "golgotha")

Example with BERT model architecture

  • Download a model (e.g. bert multilingual lowercased)
library(golgotha)
transformer_download_model("bert-base-multilingual-uncased")
  • Load the model and get the embedding of sentences / subword tokens or just tokenise
model <- transformer("bert-base-multilingual-uncased")
x <- data.frame(doc_id = c("doc_1", "doc_2"),
                text = c("give me back my money or i'll call the police.",
                         "talk to the hand because the face don't want to hear it any more."),
                stringsAsFactors = FALSE)
embedding <- predict(model, x, type = "embed-sentence")
embedding <- predict(model, x, type = "embed-token")
tokens    <- predict(model, x, type = "tokenise")
  • Same example but now on Dutch / French
text <- c("vlieg met me mee naar de horizon want ik hou alleen van jou",
          "l'amour n'est qu'un enfant de pute, il agite le bonheur mais il laisse le malheur",
          "http://costes.org/cdso01.mp3", 
          "http://costes.org/mp3.htm")
text <- setNames(text, c("doc_nl", "doc_fr", "le petit boudin", "thebible"))
embedding <- predict(model, text, type = "embed-sentence")
embedding <- predict(model, text, type = "embed-token")
tokens    <- predict(model, text, type = "tokenise")

Example with DistilBERT model architecture

For any model architecture but BERT, you have to provide argument architecture within the 10 supported model architectures

  • Download a model (e.g. distilbert multilingual cased), it will be by default stored in the system.file(package = "golgotha", "models") folder. If you want to change this, use the path argument of transformer_download_model
transformer_download_model("distilbert-base-multilingual-cased", architecture = "DistilBERT")
  • Once downloaded, you can just load the model and start embedding your text
model <- transformer("distilbert-base-multilingual-uncased", architecture = "DistilBERT")
x <- data.frame(doc_id = c("doc_1", "doc_2"),
                text = c("give me back my money or i'll call the police.",
                         "talk to the hand because the face don't want to hear it any more."),
                stringsAsFactors = FALSE)
embedding <- predict(model, x, type = "embed-sentence")
embedding <- predict(model, x, type = "embed-token")
tokens    <- predict(model, x, type = "tokenise")

Some other models available

The list is not exhaustive. Look to the transformer documentation for an up-to-date model list. Available models will also depend on the version of the transformer module you have installed.

model <- transformer("bert-base-multilingual-uncased")
model <- transformer("bert-base-multilingual-cased")
model <- transformer("bert-base-dutch-cased")
model <- transformer("bert-base-uncased")
model <- transformer("bert-base-cased")
model <- transformer("bert-base-chinese")
model <- transformer("distilbert-base-cased", architecture = "DistilBERT")
model <- transformer("distilbert-base-uncased-distilled-squad", architecture = "DistilBERT")
model <- transformer("distilbert-base-german-cased", architecture = "DistilBERT")
model <- transformer("distilbert-base-multilingual-cased", architecture = "DistilBERT")
model <- transformer("distilroberta-base", architecture = "DistilBERT")

Issues

  • This package requires transformers and torch to be installed. Normally R package reticulate automagically gets this done for you.
  • If your installation gets stuck somehow, you can normally install these requirements as follows.
library(reticulate)
install_miniconda()
conda_install(envname = 'r-reticulate', c('torch', 'transformers==2.4.1'), pip = TRUE)

Continuous Integration

Build Status

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].