All Projects → arbox → Nlp With Ruby

arbox / Nlp With Ruby

Licence: cc0-1.0
Curated List: Practical Natural Language Processing done in Ruby

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Nlp With Ruby

Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-60.53%)
Mutual labels:  list, natural-language-processing, sentiment-analysis
Multimodal Sentiment Analysis
Attention-based multimodal fusion for sentiment analysis
Stars: ✭ 172 (-81.04%)
Mutual labels:  natural-language-processing, sentiment-analysis
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+177.62%)
Mutual labels:  natural-language-processing, sentiment-analysis
Malaya
Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (-73.65%)
Mutual labels:  natural-language-processing, sentiment-analysis
Googlelanguager
R client for the Google Translation API, Google Cloud Natural Language API and Google Cloud Speech API
Stars: ✭ 145 (-84.01%)
Mutual labels:  natural-language-processing, sentiment-analysis
Char Cnn Text Classification Pytorch
Character-level Convolutional Neural Networks for text classification in PyTorch
Stars: ✭ 147 (-83.79%)
Mutual labels:  natural-language-processing, sentiment-analysis
Shifterator
Interpretable data visualizations for understanding how texts differ at the word level
Stars: ✭ 209 (-76.96%)
Mutual labels:  natural-language-processing, sentiment-analysis
Pynlp
A pythonic wrapper for Stanford CoreNLP.
Stars: ✭ 103 (-88.64%)
Mutual labels:  natural-language-processing, sentiment-analysis
Low Resource Languages
Resources for conservation, development, and documentation of low resource (human) languages.
Stars: ✭ 247 (-72.77%)
Mutual labels:  list, natural-language-processing
Languagecrunch
LanguageCrunch NLP server docker image
Stars: ✭ 281 (-69.02%)
Mutual labels:  natural-language-processing, sentiment-analysis
Aspect Based Sentiment Analysis
A paper list for aspect based sentiment analysis.
Stars: ✭ 311 (-65.71%)
Mutual labels:  natural-language-processing, sentiment-analysis
Absapapers
Worth-reading papers and related awesome resources on aspect-based sentiment analysis (ABSA). 值得一读的方面级情感分析论文与相关资源集合
Stars: ✭ 142 (-84.34%)
Mutual labels:  natural-language-processing, sentiment-analysis
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-85.34%)
Mutual labels:  natural-language-processing, sentiment-analysis
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-82.58%)
Mutual labels:  natural-language-processing, sentiment-analysis
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (-87.76%)
Mutual labels:  natural-language-processing, sentiment-analysis
Dostoevsky
Sentiment analysis library for russian language
Stars: ✭ 191 (-78.94%)
Mutual labels:  natural-language-processing, sentiment-analysis
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-90.63%)
Mutual labels:  natural-language-processing, sentiment-analysis
Pytreebank
😡😇 Stanford Sentiment Treebank loader in Python
Stars: ✭ 93 (-89.75%)
Mutual labels:  natural-language-processing, sentiment-analysis
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+253.8%)
Mutual labels:  natural-language-processing, sentiment-analysis
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+414.88%)
Mutual labels:  natural-language-processing, sentiment-analysis

Awesome Support Me

[RubyML | RubyDataScience | RubyInterop]

Awesome NLP with Ruby

Useful resources for text processing in Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines.

This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

✨ Every contribution is welcome! Add links through pull requests or create an issue to start a discussion.

Follow us on Twitter and please spread the word using the #RubyNLP hash tag!

Contents

✨ Tutorials

Please help us to fill out this section! 😃

NLP Pipeline Subtasks

An NLP Pipeline starts with a plain text.

Pipeline Generation

  • composable_operations - Definition framework for operation pipelines.
  • ruby-spark - Spark bindings with an easy to understand DSL.
  • phobos - Simplified Ruby Client for Apache Kafka.
  • parallel - Supervisor for parallel execution on multiple CPUs or in many threads.
  • pwrake - Rake extensions to run local and remote tasks in parallel.

Multipurpose Engines

On-line APIs

Language Identification

Language Identification is one of the first crucial steps in every NLP Pipeline.

  • scylla - Language Categorization and Identification.

Segmentation

Tools for Tokenization, Word and Sentence Boundary Detection and Disambiguation.

  • tokenizer - Simple multilingual tokenizer. [tutorial]
  • pragmatic_tokenizer - Multilingual tokenizer to split a string into tokens.
  • nlp-pure - Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
  • textoken - Simple and customizable text tokenization library.
  • pragmatic_segmenter - Word Boundary Disambiguation with many cookies.
  • punkt-segmenter - Pure Ruby implementation of the Punkt Segmenter.
  • tactful_tokenizer - RegExp based tokenizer for different languages.
  • scapel - Sentence Boundary Disambiguation tool.

Lexical Processing

Stemming

Stemming is the term used in information retrieval to describe the process for reducing wordforms to some base representation. Stemming should be distinguished from Lemmatization since stems are not necessarily have linguistic motivation.

  • ruby-stemmer - Ruby-Stemmer exposes the SnowBall API to Ruby.
  • uea-stemmer - Conservative stemmer for search and indexing.

Lemmatization

Lemmatization is considered a process of finding a base form of a word. Lemmas are often collected in dictionaries.

  • lemmatizer - WordNet based Lemmatizer for English texts.

Lexical Statistics: Counting Types and Tokens

  • wc - Facilities to count word occurrences in a text.
  • word_count - Word counter for String and Hash objects.
  • words_counted - Pure Ruby library counting word statistics with different custom options.

Filtering Stop Words

  • stopwords-filter - Filter and Stop Word Lexicon based on the SnowBall lemmatizer.

Phrasal Level Processing

  • n_gram - N-Gram generator.
  • ruby-ngram - Break words and phrases into ngrams.
  • raingrams - Flexible and general-purpose ngrams library written in pure Ruby.

Syntactic Processing

Constituency Parsing

Semantic Analysis

  • amatch - Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').
  • damerau-levenshtein - Calculates edit distance using the Damerau-Levenshtein algorithm.
  • hotwater - Fast Ruby FFI string edit distance algorithms.
  • levenshtein-ffi - Fast string edit distance computation, using the Damerau-Levenshtein algorithm.
  • tf_idf - Term Frequency / Inverse Document Frequency in pure Ruby.
  • tf-idf-similarity - Calculate the similarity between texts using TF/IDF.

Pragmatical Analysis

High Level Tasks

Spelling and Error Correction

Text Alignment

  • alignment - Alignment routines for bilingual texts (Gale-Church implementation).

Machine Translation

  • google-api-client - Google API Ruby Client.
  • microsoft_translator - Ruby client for the microsoft translator API.
  • termit - Google Translate with speech synthesis in your terminal.
  • zipf - implementation of BLEU and other base algorithms.

Sentiment Analysis

Numbers, Dates, and Time Parsing

  • chronic - Pure Ruby natural language date parser.
  • chronic_between - Simple Ruby natural language parser for date and time ranges.
  • chronic_duration - Pure Ruby parser for elapsed time.
  • kronic - Methods for parsing and formatting human readable dates.
  • nickel - Extracts date, time, and message information from naturally worded text.
  • tickle - Parser for recurring and repeating events.
  • numerizer - Ruby parser for English number expressions.

Named Entity Recognition

  • ruby-ner - Named Entity Recognition with Stanford NER and Ruby.
  • ruby-nlp - Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer.

Text-to-Speech-to-Text

  • espeak-ruby - Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.
  • tts - Text-to-Speech conversion using the Google translate service.
  • att_speech - Ruby wrapper over the AT&T Speech API for speech to text.
  • pocketsphinx-ruby - Pocketsphinx bindings.

Dialog Agents, Assistants, and Chatbots

  • chatterbot - Straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.
  • lita - Highly extensible chat operation bot framework written with persistent storage on Redis.

Linguistic Resources

Machine Learning Libraries

Machine Learning Algorithms in pure Ruby or written in other programming languages with appropriate bindings for Ruby.

For more up-to-date list please look at the Awesome ML with Ruby list.

  • rb-libsvm - Support Vector Machines with Ruby.
  • weka - JRuby bindings for Weka, different ML algorithms implemented through Weka.
  • decisiontree - Decision Tree ID3 Algorithm in pure Ruby [post].
  • rtimbl - Memory based learners from the Timbl framework.
  • classifier-reborn - General classifier module to allow Bayesian and other types of classifications.
  • lda-ruby - Ruby implementation of the LDA (Latent Dirichlet Allocation) for automatic Topic Modelling and Document Clustering.
  • liblinear-ruby-swig - Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).
  • linnaeus - Redis-backed Bayesian classifier.
  • maxent_string_classifier - JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.
  • naive_bayes - Simple Naive Bayes classifier.
  • nbayes - Full-featured, Ruby implementation of Naive Bayes.
  • omnicat - Generalized rack framework for text classifications.
  • omnicat-bayes - Naive Bayes text classification implementation as an OmniCat classifier strategy.
  • ruby-fann - Ruby bindings to the Fast Artificial Neural Network Library (FANN).
  • rblearn - Feature Extraction and Crossvalidation library.

Data Visualization

Please refer to the Data Visualization section on the Data Science with Ruby list.

Optical Character Recognition

Text Extraction

  • yomu - library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit.

Full Text Search, Information Retrieval, Indexing

Language Aware String Manipulation

Libraries for language aware string manipulation, i.e. search, pattern matching, case conversion, transcoding, regular expressions which need information about the underlying language.

  • fuzzy_match - Fuzzy string comparison with Distance measures and Regular Expression.
  • fuzzy-string-match - Fuzzy string matching library for Ruby.
  • active_support - RoR ActiveSupport gem has various string extensions that can handle case.
  • fuzzy_tools - Toolset for fuzzy searches in Ruby tuned for accuracy.
  • u - U extends Ruby’s Unicode support.
  • unicode - Unicode normalization library.
  • CommonRegexRuby - Find a lot of kinds of common information in a string.
  • regexp-examples - Generate strings that match a given regular expression.
  • verbal_expressions - Make difficult regular expressions easy.
  • translit_kit - Transliterate Hebrew & Yiddish text into Latin characters.
  • re2 - hight-speed Regular Expression library for Text Mining and Text Extraction.
  • regex_sample - sample string generation from a given Regular Expression.

Articles, Posts, Talks, and Presentations

Projects and Code Examples

Books

  • Miller, Rob. Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Programmers, 2015. [link]
  • Watson, Mark. Scripting Intelligence: Web 3.0 Information Gathering and Processing. APRESS, 2010. [link]
  • Watson, Mark. Practical Semantic Web and Linked Data Applications. Lulu, 2010. [link]

Community

Needs your Help!

All projects in this section are really important for the community but need more attention. Please if you have spare time and dedication spend some hours on the code here.

Related Resources

License

Creative Commons Zero 1.0 Awesome NLP with Ruby by Andrei Beliankou and Contributors.

To the extent possible under law, the person who associated CC0 with Awesome NLP with Ruby has waived all copyright and related or neighboring rights to Awesome NLP with Ruby.

You should have received a copy of the CC0 legalcode along with this work. If not, see https://creativecommons.org/publicdomain/zero/1.0/.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].