All Projects → OscarKjell → text

OscarKjell / text

Licence: other
Using Transformers from HuggingFace in R

Programming Languages

r
7636 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to text

Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+5446.97%)
Mutual labels:  transformers
Transformer-Implementations
Library - Vanilla, ViT, DeiT, BERT, GPT
Stars: ✭ 34 (-48.48%)
Mutual labels:  transformers
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (+15.15%)
Mutual labels:  transformers
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+4762.12%)
Mutual labels:  transformers
KoBERT-Transformers
KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)
Stars: ✭ 162 (+145.45%)
Mutual labels:  transformers
SnowflakeNet
(TPAMI 2022) Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer
Stars: ✭ 74 (+12.12%)
Mutual labels:  transformers
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+3715.15%)
Mutual labels:  transformers
jax-models
Unofficial JAX implementations of deep learning research papers
Stars: ✭ 108 (+63.64%)
Mutual labels:  transformers
Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Stars: ✭ 1,813 (+2646.97%)
Mutual labels:  transformers
KB-ALBERT
KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델
Stars: ✭ 215 (+225.76%)
Mutual labels:  transformers
Nlp Architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Stars: ✭ 2,768 (+4093.94%)
Mutual labels:  transformers
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+65.15%)
Mutual labels:  transformers
thermostat
Collection of NLP model explanations and accompanying analysis tools
Stars: ✭ 126 (+90.91%)
Mutual labels:  transformers
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+8566.67%)
Mutual labels:  transformers
STAM-pytorch
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Stars: ✭ 109 (+65.15%)
Mutual labels:  transformers
Simpletransformers
Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
Stars: ✭ 2,881 (+4265.15%)
Mutual labels:  transformers
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+227.27%)
Mutual labels:  transformers
Transformer-in-PyTorch
Transformer/Transformer-XL/R-Transformer examples and explanations
Stars: ✭ 21 (-68.18%)
Mutual labels:  transformers
TransQuest
Transformer based translation quality estimation
Stars: ✭ 85 (+28.79%)
Mutual labels:  transformers
LIT
[AAAI 2022] This is the official PyTorch implementation of "Less is More: Pay Less Attention in Vision Transformers"
Stars: ✭ 79 (+19.7%)
Mutual labels:  transformers

text

CRAN Status Github build status Project Status: Active – The project has reached a stable, usable state and is being actively developed Lifecycle: maturing CRAN Downloads codecov

An R-package for analyzing natural language with transformers from HuggingFace using Natural Language Processing and Machine Learning.

The text-package has two main objectives:

  • First, to serve R-users as a point solution for transforming text to state-of-the-art word embeddings that are ready to be used for downstream tasks. The package provides a user-friendly link to language models based on transformers from Hugging Face.

  • Second, to serve as an end-to-end solution that provides state-of-the-art AI techniques tailored for social and behavioral scientists.

Modular and End-to-End Solution

Modular and End-to-End Solution

Text is created through a collaboration between psychology and computer science to address research needs and ensure state-of-the-art techniques. It provides powerful functions tailored to test research hypotheses in social and behavior sciences for both relatively small and large datasets. Text is continuously tested on Ubuntu, Mac OS and Windows using the latest stable R version.

Tutorial preprint paper

Short installation guide

Most users simply need to run below installation code. For those experiencing problems or want more alternatives, please see the Extended Installation Guide.

For the text-package to work, you first have to install the text-package in R, and then make it work with text required python packages.

  1. Install text-version (at the moment the second step only works using the development version of text from GitHub).

GitHub development version:

# install.packages("devtools")
devtools::install_github("oscarkjell/text")

CRAN version:

install.packages("text")
  1. Install and initialize text required python packages:
library(text)

# Install text required python packages in a conda environment (with defaults).
textrpp_install()

# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R. 
textrpp_initialize(save_profile = TRUE)

Point solution for transforming text to embeddings

Recent significant advances in NLP research have resulted in improved representations of human language (i.e., language models). These language models have produced big performance gains in tasks related to understanding human language. Text are making these SOTA models easily accessible through an interface to HuggingFace in Python.

Text provides many of the contemporary state-of-the-art language models that are based on deep learning to model word order and context. Multilingual language models can also represent several languages; multilingual BERT comprises 104 different languages.

Table 1. Some of the available language models

Models References Layers Dimensions Language
‘bert-base-uncased’ Devlin et al. 2019 12 768 English
‘roberta-base’ Liu et al. 2019 12 768 English
‘distilbert-base-cased’ Sahn et al., 2019 6 768 English
‘bert-base-multilingual-cased’ Devlin et al. 2019 12 768 104 top languages at Wikipedia
‘xlm-roberta-large’ Liu et al 24 1024 100 language

See HuggingFace for a more comprehensive list of models.

The textEmbed() function is the main embedding function in text; and can output contextualized embeddings for tokens (i.e., the embeddings for each single word instance of each text) and texts (i.e., single embeddings per text taken from aggregating all token embeddings of the text).

library(text)
# Transform the text data to BERT word embeddings

# Example text
texts <- c("I feel great!")

# Defaults
embeddings <- textEmbed(texts)
embeddings

See Get Started for more information.

Language Analysis Tasks

It is also possible to access many language analysis tasks such as textClassify(), textGeneration(), and textTranslate().

library(text)

# Generate text from the prompt "I am happy to"
generated_text <- textGeneration("I am happy to",
                                 model = "gpt2")
generated_text

For a full list of language analysis tasks supported in text see the References

An end-to-end package

Text also provides functions to analyse the word embeddings with well-tested machine learning algorithms and statistics. The focus is to analyze and visualize text, and their relation to other text or numerical variables. For example, the textTrain() function is used to examine how well the word embeddings from a text can predict a numeric or categorical variable. Another example is functions plotting statistically significant words in the word embedding space.

library(text) 
# Use data (DP_projections_HILS_SWLS_100) that have been pre-processed with the textProjectionData function; the preprocessed test-data included in the package is called: DP_projections_HILS_SWLS_100
plot_projection <- textProjectionPlot(
  word_data = DP_projections_HILS_SWLS_100,
  y_axes = TRUE,
  title_top = " Supervised Bicentroid Projection of Harmony in life words",
  x_axes_label = "Low vs. High HILS score",
  y_axes_label = "Low vs. High SWLS score",
  position_jitter_hight = 0.5,
  position_jitter_width = 0.8
)
plot_projection$final_plot

M

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].