PersianDataAnnotations is ASP.NET Core MVC & ASP.NET MVC Custom Localization DataAnnotations (Localized MVC Errors) for Persian(Farsi) language - فارسی سازی خطاهای اعتبارسنجی توکار ام.وی.سی. و کور.ام.وی.سی. برای نمایش اعتبار سنجی سمت کلاینت

Stars: ✭ 38 (+35.71%)

Mutual labels: farsi

startup-sozlugu

Startup dünyasında sık kullan kelimeler ve terimler

Stars: ✭ 21 (-25%)

Mutual labels: turkish

View All Similar Projects ➔

language-detector

language-detector detects the language of text

Installation

pip install language-detector

Python Version

Works with both Python 2 and 3

Use

from language_detector import detect_language
text = "I arrived in that city on January 4, 1937"
language = detect_language(text)
# prints English

Features

Languages Supported
Arabic
English
Farsi
French
German
Khmer
Kurmanci (Kurdish)
Mandarin
Russian
Sorani (Kurdish)
Spanish
Turkish

Testing

To test the package run

python -m unittest language_detector.tests.test

Comparison

Test is a comparison of how well language-detector and langid identify languages in the data sources.

package	language-detector	langid
test-duration (in seconds)	0.10	3.83
accuracy	96.77%	67.74%

Excluding Languages

If you don't want language-detector to look for certain languages, you can monkey-patch the code. For example, in order to exclude English:

import language_detector
language_detector.char_language = [cl for cl in char_language if cl[1] != "English"]

# proceed as normal

Datasets

The following is a list of datasets used for each language:

Language	Datasets
Arabic	UN Corpora
English	UN Corpora
Farsi	BBC News Persian
French	UN Corpora
German	Deutsche Welle
Khmer	Cambodia Daily
Kurmanci (Kurdish)	Rudaw
Mandarin	UN Corpora
Russian	UN Corpora
Sorani (Kurdish)	Rudaw
Spanish	UN Corpora
Turkish	BBC News Türkçe

How Does It Work?

When training the model, we scan all the data sources and compute the frequency of how often a character appears in each specific language. We also compute the frequency of how often a characters appears in all of the data sources for all the languages. For each language, we then calculate a score for each character as frequency_in_language / frequency_in_all_languages. We then save the top ten highest scoring characters for each language.
When detecting a language, we simply iterate through the saved characters (ten for each language), and add their score as a weighted-vote for each language. Whichever, language has the highest score is selected as the winner.

Contributing

If you'd like to contribute a new language, please consult CONTRIBUTING.md

Support

Contact the package author, Daniel J. Dufour, at [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DanielJDufour / language-detector

Programming Languages

Labels

Projects that are alternatives of or similar to language-detector