All Projects → ropensci → Cld2

ropensci / Cld2

R Wrapper for Google's Compact Language Detector 2

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Cld2

Rnaturalearthdata
to hold data used by rnaturalearth
Stars: ✭ 8 (-76.47%)
Mutual labels:  r-package, rstats
Restez
😴 📂 Create and Query a Local Copy of GenBank in R
Stars: ✭ 22 (-35.29%)
Mutual labels:  r-package, rstats
Skimr
A frictionless, pipeable approach to dealing with summary statistics
Stars: ✭ 889 (+2514.71%)
Mutual labels:  r-package, rstats
Rtimicropem
😷 R Package for the Analysis of RTI MicroPEM Output Files 😷
Stars: ✭ 9 (-73.53%)
Mutual labels:  r-package, rstats
Ieeer
Search IEEE publications in R
Stars: ✭ 12 (-64.71%)
Mutual labels:  r-package, rstats
Icpsrdata
Reproducible data downloads from the ICPSR data archive
Stars: ✭ 7 (-79.41%)
Mutual labels:  r-package, rstats
Patentsview
An R client to the PatentsView API
Stars: ✭ 18 (-47.06%)
Mutual labels:  r-package, rstats
Shinyjs
💡 Easily improve the user experience of your Shiny apps in seconds
Stars: ✭ 566 (+1564.71%)
Mutual labels:  r-package, rstats
Originr
Species origin data from the web in R
Stars: ✭ 13 (-61.76%)
Mutual labels:  r-package, rstats
Tidymv
Tidy Model Visualisation for Generalised Additive Models
Stars: ✭ 25 (-26.47%)
Mutual labels:  r-package, rstats
Proj
⛔️ [DEPRECATED] R wrapper for proj4js
Stars: ✭ 5 (-85.29%)
Mutual labels:  r-package, rstats
Graphql
Bindings to libgraphqlparser for R
Stars: ✭ 31 (-8.82%)
Mutual labels:  r-package, rstats
Egretci
A bootstrap method for estimating uncertainty of water quality trends
Stars: ✭ 5 (-85.29%)
Mutual labels:  r-package, rstats
Spenv
Combine environmental and spatial data
Stars: ✭ 8 (-76.47%)
Mutual labels:  r-package, rstats
Vitae
R Markdown Résumés and CVs
Stars: ✭ 627 (+1744.12%)
Mutual labels:  r-package, rstats
Chr
🔤 Lightweight R package for manipulating [string] characters
Stars: ✭ 18 (-47.06%)
Mutual labels:  r-package, rstats
Gtsummary
Presentation-Ready Data Summary and Analytic Result Tables
Stars: ✭ 450 (+1223.53%)
Mutual labels:  r-package, rstats
Timevis
📅 Create interactive timeline visualizations in R
Stars: ✭ 470 (+1282.35%)
Mutual labels:  r-package, rstats
Rdhs
API Client and Data Munging for the Demographic and Health Survey Data
Stars: ✭ 22 (-35.29%)
Mutual labels:  r-package, rstats
Wellknown
WKT <-> GeoJSON
Stars: ✭ 15 (-55.88%)
Mutual labels:  r-package, rstats

cld2

R Wrapper for Google's Compact Language Detector 2

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge CRAN RStudio mirror downloads Github Stars

CLD2 probabilistically detects over 80 languages in Unicode UTF-8 text, either plain text or HTML/XML. For mixed-language input, CLD2 returns the top three languages found and their approximate percentages of the total text bytes (e.g. 80% English and 20% French out of 1000 bytes)

Installation

This package includes a bundled version of libcld2:

devtools::install_github("ropensci/cld2")

Guess a Language

The function detect_language() returns the best guess or NA if the language could not reliablity be determined.

cld2::detect_language("To be or not to be")
# [1] "ENGLISH"

cld2::detect_language("Ce n'est pas grave.")
# [1] "FRENCH"

cld2::detect_language("Nou breekt mijn klomp!")
# [1] "DUTCH"

Set plain_text = FALSE if your input contains HTML:

cld2::detect_language(url('http://www.un.org/ar/universal-declaration-human-rights/'), plain_text = FALSE)
# [1] "ARABIC"

cld2::detect_language(url('http://www.un.org/zh/universal-declaration-human-rights/'), plain_text = FALSE)
# [1] "CHINESE"

Use detect_language_multi() to get detailed classification output.

detect_language_multi(url('http://www.un.org/fr/universal-declaration-human-rights/'), plain_text = FALSE)
# $classification
#   language code latin proportion
# 1   FRENCH   fr  TRUE       0.96
# 2  ENGLISH   en  TRUE       0.03
# 3   ARABIC   ar FALSE       0.00
# 
# $bytes
# [1] 17008
# 
# $reliabale
# [1] TRUE

This shows the top 3 language guesses and the proportion of text that was classified as this language. The bytes attribute shows the total number of text bytes that was classified, and reliable is a complex calculation on if the #1 language is some amount more probable then the second-best Language.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].