All Projects → hiredscorelabs → seqtolang

hiredscorelabs / seqtolang

Licence: other
Multi-Langauge Identification

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to seqtolang

lidbox
End-to-end spoken language identification out of the box.
Stars: ✭ 39 (+50%)
Mutual labels:  language-identification
language-identification-template
Detect the languages from short pieces of text
Stars: ✭ 20 (-23.08%)
Mutual labels:  language-identification
lingua-go
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+2530.77%)
Mutual labels:  language-identification
lidtk
Language Identification Toolkit
Stars: ✭ 17 (-34.62%)
Mutual labels:  language-identification

seqtolang

PyPI pyversions CircleCI Actions Status

seqtolang is a python library for multi-langauge documents identification.

See this post for implementation details.

Getting Started

Install from source:

$ git clone https://github.com/hiredscorelabs/seqtolang
$ cd seqtolang
$ python setup.py install

or using PyPi:

$ pip install seqtolang

Basic usage:

from seqtolang import Detector

detector = Detector()
text = "In Chinese, the French phrase 'Je rentre chez moi Je rentre chez moi' will be '我正在回家'"
languages = detector.detect(text)
print(languages)

>>> [('fr', 0.499), ('en', 0.437), ('zh', 0.062)]


tokens = detector.detect(text, aggregated=False)
print(tokens)

>>> ['eng', 'eng', 'eng', 'eng', 'eng', 'fra', 'fra', 'fra', 'fra', 'fra', 'fra', 'fra', 'fra', 'eng', 'eng', 'zho']

seqtolang support 36 languages:

['afr', 'eus', 'bel', 'ben', 'bul', 'cat', 'zho', 'ces', 'dan', 'nld', 'eng', 'est', 'fin', 'fra', 
'glg', 'deu', 'ell', 'hin', 'hun', 'isl', 'ind', 'gle', 'ita', 'jpn', 'kor', 'lat', 'lit', 'pol', 
'por', 'ron', 'rus', 'slk', 'spa', 'swe', 'ukr', 'vie']

Docker Example

To make it easier to test the lib a runnable docker is also provided. To test it:

$> docker build . -t seqtolang
$> docker run -e SEQTOLANG_TEXT="Good boy in chinese is 好孩子" seqtolang
['Good', 'boy', 'in', 'chinese', 'is', '好孩子']
['eng', 'eng', 'eng', 'eng', 'eng', 'zho']

Support

Getting Help

You can ask questions and join the development discussion on Github Issues

License

Apache License 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].