All Projects → pear → Text_LanguageDetect

pear / Text_LanguageDetect

Licence: other
PHP library to identify human languages from text samples.

Programming Languages

PHP
23972 projects - #3 most used programming language

Projects that are alternatives of or similar to Text LanguageDetect

GoogleTranslateBundle
A Symfony bundle to deals with Google Translate API
Stars: ✭ 44 (+18.92%)
Mutual labels:  detect-language
detectlanguage-java
Detect Language API Java Client
Stars: ✭ 23 (-37.84%)
Mutual labels:  detect-language
php-google-translate-for-free
Library for free use Google Translator. With attempts connecting on failure and array support.
Stars: ✭ 124 (+235.14%)
Mutual labels:  detect-language
googletrans
G文⚡️: Concurrency-safe, Free and Unlimited google translate api for Golang. 🔥免费、无限、并发安全的谷歌翻译包
Stars: ✭ 94 (+154.05%)
Mutual labels:  detect-language

Text_LanguageDetect

PHP library to identify human languages from text samples. Returns confidence scores for each.

Installation

PEAR

$ pear install Text_LanguageDetect

Composer

$ composer require pear/text_languagedetect

Usage

Also see the examples in the docs/ directory and the official documentation.

Language detection

Simple language detection:

<?php
require_once 'Text/LanguageDetect.php';

$text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

$ld = new Text_LanguageDetect();
$language = $ld->detectSimple($text);

echo $language;
//output: german

Show the three most probable languages with their confidence score:

<?php
require_once 'Text/LanguageDetect.php';

$text = 'Was wäre, wenn ich Ihnen das jetzt sagen würde?';

$ld = new Text_LanguageDetect();
//3 most probable languages
$results = $ld->detect($text, 3);

foreach ($results as $language => $confidence) {
    echo $language . ': ' . number_format($confidence, 2) . "\n";
}

//output:
//german: 0.35
//dutch: 0.25
//swedish: 0.20
?>

Language code

Instead of returning the full language name, ISO 639-2 two and three letter codes can be returned:

<?php
require_once 'Text/LanguageDetect.php';
$ld = new Text_LanguageDetect();

//will output the ISO 639-1 two-letter language code
// "de"
$ld->setNameMode(2);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";

//will output the ISO 639-2 three-letter language code
// "deu"
$ld->setNameMode(3);
echo $ld->detectSimple('Das ist ein kleiner Text') . "\n";
?>

Supported languages

  • albanian
  • arabic
  • azeri
  • bengali
  • bulgarian
  • cebuano
  • croatian
  • czech
  • danish
  • dutch
  • english
  • estonian
  • farsi
  • finnish
  • french
  • german
  • hausa
  • hawaiian
  • hindi
  • hungarian
  • icelandic
  • indonesian
  • italian
  • kazakh
  • kyrgyz
  • latin
  • latvian
  • lithuanian
  • macedonian
  • mongolian
  • nepali
  • norwegian
  • pashto
  • pidgin
  • polish
  • portuguese
  • romanian
  • russian
  • serbian
  • slovak
  • slovene
  • somali
  • spanish
  • swahili
  • swedish
  • tagalog
  • turkish
  • ukrainian
  • urdu
  • uzbek
  • vietnamese
  • welsh

Links

Homepage
http://pear.php.net/package/Text_LanguageDetect
Bug tracker
http://pear.php.net/bugs/search.php?cmd=display&package_name[]=Text_LanguageDetect
Documentation
http://pear.php.net/package/Text_LanguageDetect/docs
Unit test status

https://travis-ci.org/pear/Text_LanguageDetect

https://travis-ci.org/pear/Text_LanguageDetect.svg?branch=master

Notes

Where are the data from?

I don't recall where I got the original data set. It's just the frequencies of 3-letter combinations in each supported language. It could be generated from a few random wikipedia pages from each language.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].