bnosac / golgotha

Licence: MPL-2.0 license

Contextualised Embeddings and Language Modelling using BERT and Friends using R

Programming Languages

7636 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to golgotha

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-38.46%)

Mutual labels: transformers, transformer, bert

anonymisation

Anonymization of legal cases (Fr) based on Flair embeddings

Stars: ✭ 85 (+117.95%)

Mutual labels: transformers, bert

text2keywords

Trained T5 and T5-large model for creating keywords from text

Stars: ✭ 53 (+35.9%)

Mutual labels: transformers, transformer

bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

Stars: ✭ 56 (+43.59%)

Mutual labels: transformers, bert

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-41.03%)

Mutual labels: transformer, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+200%)

Mutual labels: transformer, bert

label-studio-transformers

Label data using HuggingFace's transformers and automatically get a prediction service

Stars: ✭ 117 (+200%)

Mutual labels: transformers, bert

tensorflow-ml-nlp-tf2

텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료

Stars: ✭ 245 (+528.21%)

Mutual labels: transformer, bert

PDN

The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)

Stars: ✭ 44 (+12.82%)

Mutual labels: transformer, bert

Xpersona

XPersona: Evaluating Multilingual Personalized Chatbot

Stars: ✭ 54 (+38.46%)

Mutual labels: transformer, bert

ParsBigBird

Persian Bert For Long-Range Sequences

Stars: ✭ 58 (+48.72%)

Mutual labels: transformers, bert

wechsel

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Stars: ✭ 39 (+0%)

Mutual labels: transformers, bert

sticker2

Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot

Stars: ✭ 14 (-64.1%)

Mutual labels: transformer, bert

Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

Stars: ✭ 2,828 (+7151.28%)

Mutual labels: transformers, bert

KitanaQA

KitanaQA: Adversarial training and data augmentation for neural question-answering models

Stars: ✭ 58 (+48.72%)

Mutual labels: transformer, bert

Text-Summarization

Abstractive and Extractive Text summarization using Transformers.

Stars: ✭ 38 (-2.56%)

Mutual labels: transformers, bert

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+487.18%)

Mutual labels: transformers, bert

FasterTransformer

Transformer related optimization, including BERT, GPT

Stars: ✭ 1,571 (+3928.21%)

Mutual labels: transformer, bert

GoEmotions-pytorch

Pytorch Implementation of GoEmotions 😍😢😱

Stars: ✭ 95 (+143.59%)

Mutual labels: transformers, bert

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+287.18%)

Mutual labels: transformers, bert

View All Similar Projects ➔

golgotha - Contextualised Embeddings and Language Modelling using BERT and Friends using R

This R package wraps the transformers module using reticulate
The objective of the package is to get easily sentence embeddings using a BERT-like model in R For using in downstream modelling (e.g. Support Vector Machines / Sentiment Labelling / Classification / Regression / POS tagging / Lemmatisation / Text Similarities)
Golgotha: Hope for lonely AI pelgrims on their way to losing CPU power: http://costes.org/cdbm20.mp3

Installation

For installing the development version of this package:
- Execute in R: devtools::install_github("bnosac/golgotha", INSTALL_opts = "--no-multiarch")
- Look to the documentation of the functions: help(package = "golgotha")

Example with BERT model architecture

Download a model (e.g. bert multilingual lowercased)

library(golgotha)
transformer_download_model("bert-base-multilingual-uncased")

Load the model and get the embedding of sentences / subword tokens or just tokenise

model <- transformer("bert-base-multilingual-uncased")
x <- data.frame(doc_id = c("doc_1", "doc_2"),
                text = c("give me back my money or i'll call the police.",
                         "talk to the hand because the face don't want to hear it any more."),
                stringsAsFactors = FALSE)
embedding <- predict(model, x, type = "embed-sentence")
embedding <- predict(model, x, type = "embed-token")
tokens    <- predict(model, x, type = "tokenise")

Same example but now on Dutch / French

text <- c("vlieg met me mee naar de horizon want ik hou alleen van jou",
          "l'amour n'est qu'un enfant de pute, il agite le bonheur mais il laisse le malheur",
          "http://costes.org/cdso01.mp3", 
          "http://costes.org/mp3.htm")
text <- setNames(text, c("doc_nl", "doc_fr", "le petit boudin", "thebible"))
embedding <- predict(model, text, type = "embed-sentence")
embedding <- predict(model, text, type = "embed-token")
tokens    <- predict(model, text, type = "tokenise")

Example with DistilBERT model architecture

For any model architecture but BERT, you have to provide argument architecture within the 10 supported model architectures

Download a model (e.g. distilbert multilingual cased), it will be by default stored in the system.file(package = "golgotha", "models") folder. If you want to change this, use the path argument of transformer_download_model

transformer_download_model("distilbert-base-multilingual-cased", architecture = "DistilBERT")

Once downloaded, you can just load the model and start embedding your text

model <- transformer("distilbert-base-multilingual-uncased", architecture = "DistilBERT")
x <- data.frame(doc_id = c("doc_1", "doc_2"),
                text = c("give me back my money or i'll call the police.",
                         "talk to the hand because the face don't want to hear it any more."),
                stringsAsFactors = FALSE)
embedding <- predict(model, x, type = "embed-sentence")
embedding <- predict(model, x, type = "embed-token")
tokens    <- predict(model, x, type = "tokenise")

Some other models available

The list is not exhaustive. Look to the transformer documentation for an up-to-date model list. Available models will also depend on the version of the transformer module you have installed.

model <- transformer("bert-base-multilingual-uncased")
model <- transformer("bert-base-multilingual-cased")
model <- transformer("bert-base-dutch-cased")
model <- transformer("bert-base-uncased")
model <- transformer("bert-base-cased")
model <- transformer("bert-base-chinese")
model <- transformer("distilbert-base-cased", architecture = "DistilBERT")
model <- transformer("distilbert-base-uncased-distilled-squad", architecture = "DistilBERT")
model <- transformer("distilbert-base-german-cased", architecture = "DistilBERT")
model <- transformer("distilbert-base-multilingual-cased", architecture = "DistilBERT")
model <- transformer("distilroberta-base", architecture = "DistilBERT")

Issues

This package requires transformers and torch to be installed. Normally R package reticulate automagically gets this done for you.
If your installation gets stuck somehow, you can normally install these requirements as follows.

library(reticulate)
install_miniconda()
conda_install(envname = 'r-reticulate', c('torch', 'transformers==2.4.1'), pip = TRUE)

Continuous Integration

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

bnosac / golgotha

Programming Languages

Labels

Projects that are alternatives of or similar to golgotha

golgotha - Contextualised Embeddings and Language Modelling using BERT and Friends using R

Installation

Example with BERT model architecture

Example with DistilBERT model architecture

Some other models available

Issues

Continuous Integration