All Projects → monkeylearn → Monkeylearn Ruby

monkeylearn / Monkeylearn Ruby

Licence: mit
Official Ruby client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Ruby apps.

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Monkeylearn Ruby

Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+28818.42%)
Mutual labels:  natural-language-processing, text-classification
Wikipedia2vec
A tool for learning vector representations of words and entities from Wikipedia
Stars: ✭ 655 (+761.84%)
Mutual labels:  natural-language-processing, text-classification
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+32302.63%)
Mutual labels:  natural-language-processing, text-classification
Textfooler
A Model for Natural Language Attack on Text Classification and Inference
Stars: ✭ 298 (+292.11%)
Mutual labels:  natural-language-processing, text-classification
Scdv
Text classification with Sparse Composite Document Vectors.
Stars: ✭ 54 (-28.95%)
Mutual labels:  natural-language-processing, text-classification
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+371.05%)
Mutual labels:  natural-language-processing, text-classification
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+7509.21%)
Mutual labels:  natural-language-processing, text-classification
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (+151.32%)
Mutual labels:  natural-language-processing, text-classification
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-50%)
Mutual labels:  natural-language-processing, text-classification
Easy Deep Learning With Allennlp
🔮Deep Learning for text made easy with AllenNLP
Stars: ✭ 32 (-57.89%)
Mutual labels:  natural-language-processing, text-classification
Pytorch Transformers Classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Stars: ✭ 229 (+201.32%)
Mutual labels:  natural-language-processing, text-classification
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+1389.47%)
Mutual labels:  natural-language-processing, text-classification
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+3589.47%)
Mutual labels:  natural-language-processing, text-classification
Spacy Streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
Stars: ✭ 360 (+373.68%)
Mutual labels:  natural-language-processing, text-classification
Bert4doc Classification
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
Stars: ✭ 220 (+189.47%)
Mutual labels:  natural-language-processing, text-classification
Pythoncode Tutorials
The Python Code Tutorials
Stars: ✭ 544 (+615.79%)
Mutual labels:  natural-language-processing, text-classification
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+3213.16%)
Mutual labels:  natural-language-processing, text-classification
Fastnlp
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+3111.84%)
Mutual labels:  natural-language-processing, text-classification
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+939.47%)
Mutual labels:  natural-language-processing, text-classification
Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-21.05%)
Mutual labels:  natural-language-processing, text-classification

monkeylearn-ruby

Official Ruby client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Ruby apps.

Installation

Install with rubygems:

$ gem install monkeylearn

Or add this line to your Gemfile

$ gem "monkeylearn", "~> 3"

Usage

First, require and configure the lib:

Before making requests to the API, you need to set your account API Key:

require 'monkeylearn'

# Basic configuration
Monkeylearn.configure do |c|
  c.token = 'INSERT_YOUR_API_TOKEN_HERE'
end

Requests

From the Monkeylearn module, you can call any endpoint (check the available endpoints below). For example, you can classify a list of texts using the public Sentiment analysis classifier:

classifier_model_id='cl_Jx8qzYJh'
data = [
  'Great hotel with excellent location',
  'This is the worst hotel ever.'
]

response = Monkeylearn.classifiers.classify(classifier_model_id, data)

Responses

The response object returned by every endpoint call is a MonkeylearnResponse object. The body attribute has the parsed response from the API:

puts response.body
# =>  [
# =>      {
# =>          "text" => "Great hotel with excellent location",
# =>          "external_id" => nil,
# =>          "error" => false,
# =>          "classifications" => [
# =>              {
# =>                  "tag_name" => "Positive",
# =>                  "tag_id" => 1994,
# =>                  "confidence" => 0.922,
# =>              }
# =>          ]
# =>      },
# =>      {
# =>          "text" => "This is the worst hotel ever.",
# =>          "external_id" => nil,
# =>          "error" => false,
# =>          "classifications" => [
# =>              {
# =>                  "tag_name" => "Negative",
# =>                  "tag_id" => 1941,
# =>                  "confidence" => 0.911,
# =>              }
# =>          ]
# =>      }
# =>  ]

You can also access other attributes in the response object to get information about the queries used or available:

puts response.plan_queries_allowed
# =>  300

puts response.plan_queries_remaining
# =>  240

puts response.request_queries_used
# =>  2

Errors

Endpoint calls may raise exceptions. Here is an example on how to handle them:

begin
  response = Monkeylearn.classifiers.classify("[MODEL_ID]", ["My text"])
rescue PlanQueryLimitError => d
  puts "#{d.error_code}: #{d.detail}"
end

Available exceptions:

class Description
MonkeylearnError Base class for each exception below.
RequestParamsError An invalid parameter was sent. Check the exception message or response object for more information.
AuthenticationError Authentication failed, usually because an invalid token was provided. Check the exception message. More about Authentication.
ForbiddenError You don't have permissions to perform the action on the given resource.
ModelLimitError You have reached the custom model limit for your plan.
ModelNotFound The model does not exist. Check the model_id.
TagNotFound The tag does not exist. Check the tag_id parameter.
PlanQueryLimitError You have reached the monthly query limit for your plan. Consider upgrading your plan. More about Plan query limits.
PlanRateLimitError You have sent too many requests in the last minute. Check the exception details. More about Plan rate limit.
ConcurrencyRateLimitError You have sent too many requests in the last second. Check the exception details. More about Concurrency rate limit.
ModuleStateError The state of the module is invalid. Check the exception details.

Handling batching and throttled responses manually

Classify and Extract endpoints may require more than one request to the MonkeyLearn API in order to process every text in the data parameter. If the auto_batch config is true (which is the default value) you don't have to keep the data length below the max allowed value (200), you can just pass the full list and the library will handle the bactching making multiple requests if necessary.

If you want to handle this yourself you can set auto_batch to false and slice the data yourself:

require 'monkeylearn'

Monkeylearn.configure do |c|
  c.token = 'INSERT_YOUR_API_TOKEN_HERE'
  c.auto_batch = false
end

data = ['Text to classify'] * 300
batch_size = 200
model_id = '[MODULE_ID]'

responses = (0...data.length).step(batch_size).collect do |start_idx|
  sliced_data = data[start_idx, batch_size]
  Monkeylearn.classifiers.classify(model_id, sliced_data, batch_size: batch_size)
end

multi_response = Monkeylearn::MultiResponse.new(responses)

puts multi_response.body

Also, any API calls might be throttled (see Rate limiting). If the retry_if_throttled config is true (which is the default value) any throttled request will be retried after waiting (sleep) the required time.

You can control this manually if you need to:

require 'monkeylearn'

Monkeylearn.configure do |c|
  c.token = 'INSERT_YOUR_API_TOKEN_HERE'
  c.auto_batch = false
  c.retry_if_throttled = false
end

data = ['Text to classify'] * 300
batch_size = 200
model_id = '[MODULE_ID]'

responses = (0...data.length).step(batch_size).collect do |start_idx|
  sliced_data = data[start_idx, batch_size]
  throttled = true
  while throttled
    begin
      response = Monkeylearn.classifiers.classify(model_id, sliced_data, batch_size: batch_size)
      throttled = false
    rescue ConcurrencyRateLimitError
      sleep 2
    rescue PlanRateLimitError => e
      sleep e.seconds_to_wait
    end
  end
  response
end

multi_response = Monkeylearn::MultiResponse.new(responses)

puts multi_response.body

This way you'll be able to control every request that is sent to the MonkeyLearn API.

Available endpoints

The following are all the endpoints of the API. For more information about each endpoint, check out the API documentation.

Classifiers

Classify

Monkeylearn.classifiers.classify(model_id, data, options = {})

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example 'cl_oJNMkt2V'.
data Array[String or Hash] A list of up to 200 data elements to classify. Each element must be a String with the text or a Hash with the required text key and the text as the value. You can provide an optional external_id key with a string that will be included in the response.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Optional parameters:

Parameter Type Default Description
production_model Boolean False Indicates if the classifications are performed by the production model. Only use this parameter with custom models (not with the public ones). Note that you first need to deploy your model to production either from the UI model settings or by using the Classifier deploy endpoint.
batch_size Integer 200 Max amount of texts each request will send to Monkeylearn. A number from 1 to 200.

Example:

data = ["First text", {text: "Second text", external_id: "2"}]
response = Monkeylearn.classifiers.classify("[MODEL_ID]", data)

Classifier detail

Monkeylearn.classifiers.detail(model_id)

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.

Example:

response = Monkeylearn.classifiers.detail("[MODEL_ID]")

Create Classifier

Monkeylearn.classifiers.create(name, options = {})

Parameters:

Parameter Type Description
name String The name of the model.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Optional parameters:

Parameter Type Default Description
description String '' The description of the model.
algorithm String 'nb' The algorithm used when training the model. It can either be "nb" or "svm".
language String 'en' The language of the model. Full list of supported languages.
max_features Integer 10000 The maximum number of features used when training the model. Between 10 and 100000.
ngram_range Array [1,1] Indicates which n-gram range is used when training the model. It's a list of two numbers between 1 and 3. They indicate the minimum and the maximum n for the n-grams used, respectively.
use_stemming Boolean true Indicates whether stemming is used when training the model.
preprocess_numbers Boolean true Indicates whether number preprocessing is done when training the model.
preprocess_social_media Boolean false Indicates whether preprocessing of social media is done when training the model.
normalize_weights Boolean true Indicates whether weights will be normalized when training the model.
stopwords Boolean or Array true The list of stopwords used when training the model. Use false for no stopwords, true for the default stopwords, or an array of strings for custom stopwords.
whitelist Array [] The whitelist of words used when training the model.

Example:

response = Monkeylearn.classifiers.create("New classifier name", algorithm: "svm", ngram_range: [1, 2])

Edit Classifier

Monkeylearn.classifiers.edit(model_id, options = {})

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Optional parameters:

Parameter Type Description
name String The name of the model.
description String The description of the model.
algorithm String The algorithm used when training the model. It can either be "nb" or "svm".
language String The language of the model. Full list of supported languages.
max_features Integer The maximum number of features used when training the model. Between 10 and 100000.
ngram_range Array Indicates which n-gram range used when training the model. A list of two numbers between 1 and 3. They indicate the minimum and the maximum n for the n-grams used, respectively.
use_stemming Boolean Indicates whether stemming is used when training the model.
preprocess_numbers Boolean Indicates whether number preprocessing is done when training the model.
preprocess_social_media Boolean Indicates whether preprocessing of social media is done when training the model.
normalize_weights Boolean Indicates whether weights will be normalized when training the model.
stopwords Boolean or Array The list of stopwords used when training the model. Use false for no stopwords, true for the default stopwords, or an array of strings for custom stopwords.
whitelist Array The whitelist of words used when training the model.

Example:

response = Monkeylearn.classifiers.edit("[MODEL_ID]", name: "New classifier name", algorithm: "nb")

Delete classifier

Monkeylearn.classifiers.delete(model_id)

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.

Example:

Monkeylearn.classifiers.delete('[MODEL_ID]')

List Classifiers

Monkeylearn.classifiers.list(page: 1, per_page: 20, order_by: '-created')

Optional parameters:

Parameter Type Default Description
page Integer 1 Specifies which page to get.
per_page Integer 20 Specifies how many items per page will be returned.
order_by String or Array '-created' Specifies the ordering criteria. It can either be a String for single criteria ordering or an array of Strings for more than one. Each String must be a valid field name; if you want inverse/descending order of the field prepend a - (dash) character. Some valid examples are: 'is_public', '-name' or ['-is_public', 'name'].

Example:

response = Monkeylearn.classifiers.list(page: 2, per_page: 5, order_by: ['-is_public', 'name'])

Train

Monkeylearn.classifiers.train(model_id)

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.

Example:

Monkeylearn.classifiers.train('[MODEL_ID]')

Deploy

Monkeylearn.classifiers.deploy(model_id)

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.

Example:

Monkeylearn.classifiers.deploy('[MODEL_ID]')

Tag detail

Monkeylearn.classifiers.tags.detail(model_id, tag_id)

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.
tag_id Integer Tag ID.

Example:

response = Monkeylearn.classifiers.tags.detail("[MODEL_ID]", TAG_ID)

Create tag

Monkeylearn.classifiers.tags.create(model_id, name, options = {})

Parameters:

Parameter Type Description
model_id `String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.
name String The name of the new tag.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Example:

response = Monkeylearn.classifiers.tags.create("[MODEL_ID]", "Positive")

Edit tag

Monkeylearn.classifiers.tags.edit(model_id, tag_id, options = {})

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.
tag_id Integer Tag ID.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Optional parameters:

Parameter Type Description
name String The new name of the tag.

Example:

response = Monkeylearn.classifiers.tags.edit("[MODEL_ID]", TAG_ID, name: "New name")

Delete tag

Monkeylearn.classifiers.tags.delete(model_id, tag_id, options = {})

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.
tag_id Integer Tag ID.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Optional parameters:

Parameter Type Default Description
move_data_to int nil An optional tag ID. If provided, data associated with the tag to be deleted will be moved to the specified tag before deletion.

Example:

Monkeylearn.classifiers.tags.delete("[MODEL_ID]", TAG_ID)

Upload data

Monkeylearn.classifiers.upload_data(model_id, data)

Parameters:

Parameter Type Description
model_id String Classifier ID. It always starts with 'cl', for example, 'cl_oJNMkt2V'.
data Array[Hash] A list of hashes with the keys described below.

data hash keys:

Key Description
text A String of the text to upload.
tags An optional Array of tags that can be refered to by their numeric ID or their name. The text will be tagged with each tag in the list when created (in case it doesn't already exist on the model). Otherwise, its tags will be updated to the new ones. New tags will be created if they don't already exist.
markers An optional Array of String. Each one represents a marker that will be associated with the text. New markers will be created if they don't already exist.

Example:

response = Monkeylearn.classifiers.upload_data(
  "[MODEL_ID]",
  [{text: "text 1", tags: [TAG_ID_1, "[tag_name]"]},
   {text: "text 2", tags: [TAG_ID_1, TAG_ID_2]}]
)

Extractors

Extract

Monkeylearn.extractors.extract(model_id, data, options = {})

Parameters:

Parameter Type Description
model_id String Extractor ID. It always starts with 'ex', for example, 'ex_oJNMkt2V'.
data Array[String or Hash] A list of up to 200 data elements to extract from. Each element must be a string with the text or a dict with the required text key and the text as the value. You can also provide an optional external_id key with a string that will be included in the response.
options Hash Optional parameters, see below. The hash always expects symbols as keys.

Optional parameters:

Parameter Type Default Description
production_model Boolean False Indicates if the extractions are performed by the production model. Only use this parameter with custom models (not with the public ones). Note that you first need to deploy the model to production either from the UI model settings or by using the Classifier deploy endpoint.
batch_size Integer 200 Max number of texts each request will send to MonkeyLearn. A number from 1 to 200.

Example:

data = ["First text", {"text": "Second text", "external_id": "2"}]
response = Monkeylearn.extractors.extract("[MODEL_ID]", data)

Extractor detail

Monkeylearn.extractors.detail(model_id)

Parameters:

Parameter Type Description
model_id String Extractor ID. It always starts with 'ex', for example, 'ex_oJNMkt2V'.

Example:

response = Monkeylearn.extractors.detail("[MODEL_ID]")

List extractors

Monkeylearn.extractors.list(page: 1, per_page: 20, order_by: '-created')

Parameters:

Parameter Type Default Description
page Integer 1 Specifies which page to get.
per_page Integer 20 Specifies how many items per page will be returned.
order_by String or Array '-created' Specifies the ordering criteria. It can either be a String for single criteria ordering or an array of Strings for more than one. Each String must be a valid field name; if you want inverse/descending order of the field prepend a - (dash) character. Some valid examples are: 'is_public', '-name' or ['-is_public', 'name'].

Example:

response = Monkeylearn.extractors.list(page: 2, per_page: 5, order_by: ['-is_public', 'name'])
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].