Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → abadojack → Whatlanggo

abadojack / Whatlanggo

Licence: mit

Natural language detection library for Go

Programming Languages

31211 projects - #10 most used programming language

365 projects

Labels

nlp text-processing

Projects that are alternatives of or similar to Whatlanggo

A sharp cut(1) clone.

Stars: ✭ 542 (+13.15%)

Mutual labels: text-processing

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.

Stars: ✭ 75 (-84.34%)

Mutual labels: text-processing

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (-11.06%)

Mutual labels: text-processing

🍟 [Library] dA aNn0Y1Ng t3Xt g3NeRa7or

Stars: ✭ 22 (-95.41%)

Mutual labels: text-processing

Useful python NLP tools (evaluation, GUI interface, tokenization)

Stars: ✭ 39 (-91.86%)

Mutual labels: text-processing

Textpipe: clean and extract metadata from text

Stars: ✭ 284 (-40.71%)

Mutual labels: text-processing

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (-94.36%)

Mutual labels: text-processing

Diff Match Patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Stars: ✭ 4,910 (+925.05%)

Mutual labels: text-processing

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Stars: ✭ 48 (-89.98%)

Mutual labels: text-processing

A fast implementation of Aho-Corasick in Rust.

Stars: ✭ 424 (-11.48%)

Mutual labels: text-processing

Qiniu Text Processing Libraries for Go

Stars: ✭ 25 (-94.78%)

Mutual labels: text-processing

support-tickets-classification

This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (-70.35%)

Mutual labels: text-processing

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (-27.35%)

Mutual labels: text-processing

advanced-text-mining

TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.

Stars: ✭ 15 (-96.87%)

Mutual labels: text-processing

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (-9.6%)

Mutual labels: text-processing

Drop-in replacements for base R string functions powered by stringi

Stars: ✭ 14 (-97.08%)

Mutual labels: text-processing

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-96.03%)

Mutual labels: text-processing

Python Nameparser

A simple Python module for parsing human names into their individual components

Stars: ✭ 462 (-3.55%)

Mutual labels: text-processing

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

Stars: ✭ 438 (-8.56%)

Mutual labels: text-processing

Simple SQL-like syntax on top of Perl text processing.

Stars: ✭ 414 (-13.57%)

Mutual labels: text-processing

View All Similar Projects ➔

Whatlanggo

Natural language detection for Go.

Features

Supports 84 languages
100% written in Go
No external dependencies
Fast
Recognizes not only a language, but also a script (Latin, Cyrillic, etc)

Getting started

Installation:

    go get -u github.com/abadojack/whatlanggo

Simple usage example:

package main

import (
	"fmt"

	"github.com/abadojack/whatlanggo"
)

func main() {
	info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}

Blacklisting and whitelisting

package main

import (
	"fmt"

	"github.com/abadojack/whatlanggo"
)

func main() {
	//Blacklist
	options := whatlanggo.Options{
		Blacklist: map[whatlanggo.Lang]bool{
			whatlanggo.Ydd: true,
		},
	}

	info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)

	fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])

	//Whitelist
	options1 := whatlanggo.Options{
		Whitelist: map[whatlanggo.Lang]bool{
			whatlanggo.Epo: true,
			whatlanggo.Ukr: true,
		},
	}

	info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}

For more details, please check the documentation.

Requirements

Go 1.8 or higher

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How IsReliable calculated?

It is based on the following factors:

How many unique trigrams are in the given text
How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

License

Derivation

whatlanggo is a derivative of Franc (JavaScript, MIT) by Titus Wormer.

Acknowledgements

Thanks to greyblake (Potapov Sergey) for creating whatlang-rs from where I got the idea and algorithms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 479

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗