All Projects → GregBowyer → cld2-cffi

GregBowyer / cld2-cffi

Licence: Apache-2.0 License
Python bindings to the Compact Language Detector

Programming Languages

C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to cld2-cffi

Hms Ml Demo
HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
Stars: ✭ 187 (+484.38%)
Mutual labels:  language-detection
SwiftUIMLKitTranslator
SwiftUI MLKit Language Identification & Translator
Stars: ✭ 23 (-28.12%)
Mutual labels:  language-detection
UniLang
Translate text from one language to another using Google Translate
Stars: ✭ 33 (+3.13%)
Mutual labels:  language-detection
cnn-ld-tf
Convolutional Neural Network for Language Detection in Tensorflow
Stars: ✭ 12 (-62.5%)
Mutual labels:  language-detection
detectlanguage-python
Detect Language API Python Client
Stars: ✭ 49 (+53.13%)
Mutual labels:  language-detection
lingua-go
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+2037.5%)
Mutual labels:  language-detection
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+7768.75%)
Mutual labels:  language-detection
Command-line-translator
Command-line access to google translate and some other features
Stars: ✭ 26 (-18.75%)
Mutual labels:  language-detection
detectlanguage-java
Detect Language API Java Client
Stars: ✭ 23 (-28.12%)
Mutual labels:  language-detection
nlpserver
NLP Web Service
Stars: ✭ 76 (+137.5%)
Mutual labels:  language-detection
pycld3
Python3 bindings for the Compact Language Detector v3 (CLD3)
Stars: ✭ 122 (+281.25%)
Mutual labels:  language-detection
php-google-translate-for-free
Library for free use Google Translator. With attempts connecting on failure and array support.
Stars: ✭ 124 (+287.5%)
Mutual labels:  language-detection
tongue
Elixir port of Nakatani Shuyo's natural language detector
Stars: ✭ 17 (-46.87%)
Mutual labels:  language-detection
Malaya
Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (+646.88%)
Mutual labels:  language-detection
pastey
A lightweight, self-hosted paste platform
Stars: ✭ 65 (+103.13%)
Mutual labels:  language-detection
L10n Swift
Localization of the application with ability to change language "on the fly" and support for plural form in any language.
Stars: ✭ 177 (+453.13%)
Mutual labels:  language-detection
jstarcraft-nlp
专注于解决自然语言处理领域的几个核心问题:词法分析,句法分析,语义分析,语种检测,信息抽取,文本聚类和文本分类. 为相关领域的研发人员提供完整的通用设计与参考实现. 涵盖了多种自然语言处理算法,适配了多个自然语言处理框架. 兼容Lucene/Solr/ElasticSearch插件.
Stars: ✭ 92 (+187.5%)
Mutual labels:  language-detection
get-user-locale
A function that returns user's locale as an IETF language tag, based on all available sources.
Stars: ✭ 44 (+37.5%)
Mutual labels:  language-detection
language-identification-template
Detect the languages from short pieces of text
Stars: ✭ 20 (-37.5%)
Mutual labels:  language-detection
spacy-fastlang
Language detection using Spacy and Fasttext
Stars: ✭ 34 (+6.25%)
Mutual labels:  language-detection

CLD2-CFFI - Python (CFFI) Bindings for Compact Language Detector 2

CFFI bindings for CLD2


Latest version released on PyPi Build status Windows Build Status:: Coverage Code Health


This package contains the CLD (Compact Language Detection) library as maintained by Dick Sites (https://code.google.com/p/cld2/). The first fork was done at revision r161. It also contains python bindings that were originally created by Mike McCandless. The bindings have gone through several hands, with the latest changes being made to rework the bindings for CFFI.

These bindings are identical in API to the original cld2 bindings, and as a result can be used as a drop in replacement.

The LICENSE is the same as Chromium's LICENSE and is included in the LICENSE file for reference.

Installing

Should be as simple as

$ pip install cld2-cffi

Development Version

The latest development version can be installed directly from GitHub:

$ pip install --upgrade 'git+https://github.com/GregBowyer/cld2-cffi.git'

Usage

import cld2

isReliable, textBytesFound, details = cld2.detect("This is my sample text")
print('  reliable: %s' % (isReliable != 0))
print('  textBytes: %s' % textBytesFound)
print('  details: %s' % str(details))

# The output looks like so:
#  reliable: True
#  textBytes: 24
#  details: (('ENGLISH', 'en', 95, 1736.0), ('Unknown', 'un', 0, 0.0), ('Unknown', 'un', 0, 0.0))

Documentation

First, you must get your content (plain text or HTML) encoded into UTF8 bytes. Then, detect like this:

isReliable, textBytesFound, details = cld2.detect(bytes)
isReliable
is True if the top language is much better than 2nd best language.
textBytesFound
tells you how many actual bytes CLD analyzed (after removing HTML tags, collapsing areas of too-many-spaces, etc.).
details
has an entry per top 3 languages that matched, that includes the percent confidence of the match as well as a separate normalized score.

The module exports these global constants:

cld2.ENCODINGS
list of the encoding names CLD recognizes (if you provide hintEncoding, it must be one of these names).
cld2.LANGUAGES
list of languages and their codes (if you provide hintLanguageCode, it must be one of the codes from these codes).
cld2.EXTERNAL_LANGUAGES
list of external languages and their codes. Note that external languages cannot be hinted, but may be matched if you pass includeExtendedLanguages=True (the default).
cld2.DETECTED_LANGUAGES
list of all detectable languages, as best I can determine (this was reverse engineered from a unit test, ie it contains a language X if that language was tested and passes for at least one example text).

Authors

Please see AUTHORS.

Reporting bugs

Please see BUG_REPORTS.

Contribute

Please see CONTRIBUTING.

Licence

Please see LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].