Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → systats → textlearnR

systats / textlearnR

Licence: other

A simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.

Programming Languages

7636 projects

Labels

nlp text-mining keras hyperparameter-optimization classification

Projects that are alternatives of or similar to textlearnR

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (+2075%)

Mutual labels: text-mining, classification

R package to Embed All the Things! using StarSpace

Stars: ✭ 95 (+493.75%)

Mutual labels: text-mining, classification

Awesome Text Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Stars: ✭ 158 (+887.5%)

Mutual labels: text-mining, classification

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (+787.5%)

Mutual labels: text-mining, classification

RMDL: Random Multimodel Deep Learning for Classification

Stars: ✭ 375 (+2243.75%)

Mutual labels: text-mining, classification

Applied Text Mining In Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

Stars: ✭ 59 (+268.75%)

Mutual labels: text-mining, classification

Fake news detection

Fake News Detection in Python

Stars: ✭ 194 (+1112.5%)

Mutual labels: text-mining, classification

time-series-classification

Classifying time series using feature extraction

Stars: ✭ 75 (+368.75%)

Mutual labels: classification

Metric Learning Adversarial Robustness

Code for NeurIPS 2019 Paper

Stars: ✭ 44 (+175%)

Mutual labels: classification

R-Machine-Learning

D-Lab's 6 hour introduction to machine learning in R. Learn the fundamentals of machine learning, regression, and classification, using tidymodels in R.

Stars: ✭ 27 (+68.75%)

Mutual labels: classification

Super simple text classifier using Naive Bayes. Plug-and-play, no dependencies

Stars: ✭ 12 (-25%)

Mutual labels: classification

Deep learning, Convolutional neural networks, Image processing, Document processing, Table detection, Page object detection, Table classification. https://www.sciencedirect.com/science/article/pii/S0925231221018142

Stars: ✭ 37 (+131.25%)

Mutual labels: classification

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

Stars: ✭ 59 (+268.75%)

Mutual labels: text-mining

Repository for anime characters recognition website, powered by TensorFlow

Stars: ✭ 113 (+606.25%)

Mutual labels: classification

[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Stars: ✭ 55 (+243.75%)

Mutual labels: text-mining

HateALERT-EVALITA

Code for replicating results of team 'hateminers' at EVALITA-2018 for AMI task

Stars: ✭ 13 (-18.75%)

Mutual labels: classification

Natural World Tasks

Stars: ✭ 24 (+50%)

Mutual labels: classification

Skin-Cancer-Segmentation

Classification and Segmentation with Mask-RCNN of Skin Cancer using ISIC dataset

Stars: ✭ 61 (+281.25%)

Mutual labels: classification

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20

Stars: ✭ 165 (+931.25%)

Mutual labels: classification

A flexible Python front-end inference SDK based on TensorRT

Stars: ✭ 83 (+418.75%)

Mutual labels: classification

View All Similar Projects ➔

textlearnR

A simple collection of well working NLP models (Keras) in R, tuned and benchmarked on a variety of datasets. This is a work in progress and the first version only supports classification tasks (at the moment).

What can this package do for you? (in the future)

Training neural networks can be bothering and time consuming due to the sheer amount of hyper-parameters. Hyperparameters are values that are defined prior and provided as additional model input. Tuning those requires either deeper knowledge about the model behavior itself or computational resources for random searches or optimization on the parameter space. textlearnR provides a light weight framework to train and compare ML models from Keras, H2O, starspace and text2vec (coming soon). Furthermore, it allows to define parameters for text processing (e.g. maximal number of words and text length), which are also considered to be priors.

Beside language models, textlearnR also integrates third party packages for automatically tuning hyperparameters. The following strategies will be avaiable:

Searching

Grid search
Random search
Sobol sequence (quasi-random numbers designed to cover the space more evenly than uniform random numbers). Computationally expensive but parallelizeable.

Optimization

GA Genetic algorithms for stochastic optimization (only real-values).
mlrMBO Bayesian and model-based optimization.
Others:
- Nelder–Mead simplex (gradient-free)
- Particle swarm (gradient-free)

For constructing new parameter objects the tidy way, the package dials is used. Each model optimized is saved to a SQLite database in data/model_dump.db. Of course, committed to tidy principals. Contributions are highly welcomed!

Supervised Models

keras_model <- list(
  simple_mlp = textlearnR::keras_simple_mlp,
  deep_mlp = textlearnR::keras_deep_mlp,
  simple_lstm = textlearnR::keras_simple_lstm,
  #deep_lstm = textlearnR::keras_deep_lstm,
  pooled_gru = textlearnR::keras_pooled_gru,
  cnn_lstm = textlearnR::keras_cnn_lstm,
  cnn_gru = textlearnR::keras_cnn_gru,
  gru_cnn = textlearnR::keras_gru_cnn,
  multi_cnn = textlearnR::keras_multi_cnn
)

Datasets

celebrity-faceoff
Google Jigsaw Toxic Comment Classification
Hate speech detection
nlp-datasets
Scopus Classification
party affiliations

Understand one model

textlearnR::keras_simple_mlp(
    input_dim = 10000, 
    embed_dim = 128, 
    seq_len = 50, 
    output_dim = 1
  ) %>% 
  summary

## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## embedding_1 (Embedding)          (None, 50, 128)               1280000     
## ___________________________________________________________________________
## flatten_1 (Flatten)              (None, 6400)                  0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 128)                   819328      
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 1)                     129         
## ===========================================================================
## Total params: 2,099,457
## Trainable params: 2,099,457
## Non-trainable params: 0
## ___________________________________________________________________________

rather flowchart or ggalluvial

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 16

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗