All Projects → greyblake → Whatlang Rs

greyblake / Whatlang Rs

Licence: mit
Natural language detection library for Rust. Try demo online: https://www.greyblake.com/whatlang/

Programming Languages

rust
11053 projects
language
365 projects

Projects that are alternatives of or similar to Whatlang Rs

support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-64.5%)
Mutual labels:  classifier, text-classification, text-analysis
Nepali-News-Classifier
Text Classification of Nepali Language Document. This Mini Project was done for the partial fulfillment of NLP Course : COMP 473.
Stars: ✭ 13 (-96.75%)
Mutual labels:  classifier, text-classification
Ai law
all kinds of baseline models for long text classificaiton( text categorization)
Stars: ✭ 243 (-39.25%)
Mutual labels:  ai, text-classification
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-96.25%)
Mutual labels:  classifier, text-classification
Gofamily
🔥 大厂 BAT 面试高频知识点,后端技术体系。包含了 C GO Python, 网络,Redis ,MySQL ,消息队列 ,高并发,微服务,缓存,操作系统,算法,LeetCode 刷题等知识
Stars: ✭ 474 (+18.5%)
Mutual labels:  ai, algorithm
Machine Learning Open Source
Monthly Series - Machine Learning Top 10 Open Source Projects
Stars: ✭ 943 (+135.75%)
Mutual labels:  ai, algorithm
nlpbuddy
A text analysis application for performing common NLP tasks through a web dashboard interface and an API
Stars: ✭ 115 (-71.25%)
Mutual labels:  text-classification, text-analysis
Fake news detection
Fake News Detection in Python
Stars: ✭ 194 (-51.5%)
Mutual labels:  text-classification, text-analysis
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-91.75%)
Mutual labels:  text-classification, text-analysis
node-fasttext
Nodejs binding for fasttext representation and classification.
Stars: ✭ 39 (-90.25%)
Mutual labels:  classifier, text-classification
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+5394.5%)
Mutual labels:  ai, text-classification
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-10.5%)
Mutual labels:  text-classification, text-analysis
Aigames
use AI to play some games.
Stars: ✭ 422 (+5.5%)
Mutual labels:  ai, algorithm
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+217.5%)
Mutual labels:  ai, algorithm
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (-63.5%)
Mutual labels:  algorithm, classifier
ML4K-AI-Extension
Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.
Stars: ✭ 18 (-95.5%)
Mutual labels:  classifier, text-classification
Awesome Text Classification
Awesome-Text-Classification Projects,Papers,Tutorial .
Stars: ✭ 158 (-60.5%)
Mutual labels:  text-classification, text-analysis
Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (-58.25%)
Mutual labels:  text-classification, text-analysis
DaDengAndHisPython
【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]
Stars: ✭ 59 (-85.25%)
Mutual labels:  text-classification, text-analysis
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (-13%)
Mutual labels:  text-classification, text-analysis

Whatlang - rust library for natural language detection

Whatlang

Natural language detection for Rust with focus on simplicity and performance.

Build Status License Documentation

Content

Features

  • Supports 78 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Add to you Cargo.toml:

[dependencies]

whatlang = "0.11.1"

Example:

extern crate whatlang;

use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";

    let info = detect(text).unwrap();
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());
}

For more details (e.g. how to blacklist some languages) please check the documentation.

Feature toggles

Feature Description
enum-map Lang and Script implement Enum trait from enum-map

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How is_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Running benchmarks

This is mostly useful to test performance optimizations.

cargo bench

Comparison with alternatives

Whatlang CLD2 CLD3
Implementation language Rust C++ C++
Languages 87 83 107
Algorithm trigrams quadgrams neural network
Supported Encoding UTF-8 UTF-8 ?
HTML support no yes ?

Ports and clones

Derivation

Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT © Sergey Potapov

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].