All Projects → nickdavidhaynes → Spacy Cld

nickdavidhaynes / Spacy Cld

Licence: mit
Language detection extension for spaCy 2.0+

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Spacy Cld

language-identification-template
Detect the languages from short pieces of text
Stars: ✭ 20 (-80.58%)
Mutual labels:  language-detection
Franc
Natural language detection
Stars: ✭ 3,605 (+3400%)
Mutual labels:  language-detection
Cadscenario personalisation
This is a end to end Personalisation business scenario
Stars: ✭ 10 (-90.29%)
Mutual labels:  language-detection
get-user-locale
A function that returns user's locale as an IETF language tag, based on all available sources.
Stars: ✭ 44 (-57.28%)
Mutual labels:  language-detection
Node Language Detect
🇫🇷 NodeJS language detection library using n-gram
Stars: ✭ 309 (+200%)
Mutual labels:  language-detection
Enry
A faster file programming language detector
Stars: ✭ 435 (+322.33%)
Mutual labels:  language-detection
UniLang
Translate text from one language to another using Google Translate
Stars: ✭ 33 (-67.96%)
Mutual labels:  language-detection
Guess Language.el
Emacs minor mode that detects the language you're typing in. Automatically switches spell checker. Supports multiple languages per document.
Stars: ✭ 78 (-24.27%)
Mutual labels:  language-detection
Lingua
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (+231.07%)
Mutual labels:  language-detection
Geomate
GeoMate is a friend in need for all things geolocation. IP to geo lookup, automatic redirects (based on country, continent, language, etc), site switcher... You name it.
Stars: ✭ 19 (-81.55%)
Mutual labels:  language-detection
cld2-cffi
Python bindings to the Compact Language Detector
Stars: ✭ 32 (-68.93%)
Mutual labels:  language-detection
Lingua Rs
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
Stars: ✭ 260 (+152.43%)
Mutual labels:  language-detection
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+346.6%)
Mutual labels:  language-detection
Command-line-translator
Command-line access to google translate and some other features
Stars: ✭ 26 (-74.76%)
Mutual labels:  language-detection
Cld2
R Wrapper for Google's Compact Language Detector 2
Stars: ✭ 34 (-66.99%)
Mutual labels:  language-detection
pastey
A lightweight, self-hosted paste platform
Stars: ✭ 65 (-36.89%)
Mutual labels:  language-detection
Yii2 Localeurls
Automatic locale/language management for URLs
Stars: ✭ 384 (+272.82%)
Mutual labels:  language-detection
Paasaa
Natural language detection for Elixir
Stars: ✭ 86 (-16.5%)
Mutual labels:  language-detection
Google Translate Php
🌐 Free Google Translate API PHP Package. Translates totally free of charge.
Stars: ✭ 1,131 (+998.06%)
Mutual labels:  language-detection
Language Detection
A language detection library for PHP. Detects the language from a given text string.
Stars: ✭ 665 (+545.63%)
Mutual labels:  language-detection

spaCy-CLD: Bringing simple language detection to spaCy

This package is a spaCy 2.0 extension that adds language detection to spaCy's text processing pipeline. Inspired from a discussion here.

Installation

pip install spacy_cld

Usage

Adding the spaCy-CLD component to the processing pipeline is relatively simple:

import spacy
from spacy_cld import LanguageDetector

nlp = spacy.load('en')
language_detector = LanguageDetector()
nlp.add_pipe(language_detector)
doc = nlp('This is some English text.')

doc._.languages  # ['en']
doc._.language_scores['en']  # 0.96

spaCy-CLD operates on Doc and Span spaCy objects. When called on a Doc or Span, the object is given two attributes: languages (a list of up to 3 language codes) and language_scores (a dictionary mapping language codes to confidence scores between 0 and 1).

Under the hood

spacy-cld is a little extension that wraps the PYCLD2 Python library, which in turn wraps the Compact Language Detector 2 C library originally built at Google for the Chromium project. CLD2 uses character n-grams as features and a Naive Bayes classifier to identify 80+ languages from Unicode text strings (or XML/HTML). It can detect up to 3 different languages in a given document, and reports a confidence score (reported in with each language.

For additional details, see the linked project pages for PYCLD2 and CLD2.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].