Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ropensci → Cld2

ropensci / Cld2

R Wrapper for Google's Compact Language Detector 2

Programming Languages

7636 projects

Labels

rstats r-package language-detection

Projects that are alternatives of or similar to Cld2

Rnaturalearthdata

to hold data used by rnaturalearth

Stars: ✭ 8 (-76.47%)

Mutual labels: r-package, rstats

Restez

😴 📂 Create and Query a Local Copy of GenBank in R

Stars: ✭ 22 (-35.29%)

Mutual labels: r-package, rstats

Skimr

A frictionless, pipeable approach to dealing with summary statistics

Stars: ✭ 889 (+2514.71%)

Mutual labels: r-package, rstats

Rtimicropem

😷 R Package for the Analysis of RTI MicroPEM Output Files 😷

Stars: ✭ 9 (-73.53%)

Mutual labels: r-package, rstats

Ieeer

Search IEEE publications in R

Stars: ✭ 12 (-64.71%)

Mutual labels: r-package, rstats

Icpsrdata

Reproducible data downloads from the ICPSR data archive

Stars: ✭ 7 (-79.41%)

Mutual labels: r-package, rstats

Patentsview

An R client to the PatentsView API

Stars: ✭ 18 (-47.06%)

Mutual labels: r-package, rstats

Shinyjs

💡 Easily improve the user experience of your Shiny apps in seconds

Stars: ✭ 566 (+1564.71%)

Mutual labels: r-package, rstats

Originr

Species origin data from the web in R

Stars: ✭ 13 (-61.76%)

Mutual labels: r-package, rstats

Tidymv

Tidy Model Visualisation for Generalised Additive Models

Stars: ✭ 25 (-26.47%)

Mutual labels: r-package, rstats

Proj

⛔️ [DEPRECATED] R wrapper for proj4js

Stars: ✭ 5 (-85.29%)

Mutual labels: r-package, rstats

Graphql

Bindings to libgraphqlparser for R

Stars: ✭ 31 (-8.82%)

Mutual labels: r-package, rstats

Egretci

A bootstrap method for estimating uncertainty of water quality trends

Stars: ✭ 5 (-85.29%)

Mutual labels: r-package, rstats

Spenv

Combine environmental and spatial data

Stars: ✭ 8 (-76.47%)

Mutual labels: r-package, rstats

Vitae

R Markdown Résumés and CVs

Stars: ✭ 627 (+1744.12%)

Mutual labels: r-package, rstats

Chr

🔤 Lightweight R package for manipulating [string] characters

Stars: ✭ 18 (-47.06%)

Mutual labels: r-package, rstats

Gtsummary

Presentation-Ready Data Summary and Analytic Result Tables

Stars: ✭ 450 (+1223.53%)

Mutual labels: r-package, rstats

Timevis

📅 Create interactive timeline visualizations in R

Stars: ✭ 470 (+1282.35%)

Mutual labels: r-package, rstats

Rdhs

API Client and Data Munging for the Demographic and Health Survey Data

Stars: ✭ 22 (-35.29%)

Mutual labels: r-package, rstats

Wellknown

WKT <-> GeoJSON

Stars: ✭ 15 (-55.88%)

Mutual labels: r-package, rstats

View All Similar Projects ➔

cld2

R Wrapper for Google's Compact Language Detector 2

CLD2 probabilistically detects over 80 languages in Unicode UTF-8 text, either plain text or HTML/XML. For mixed-language input, CLD2 returns the top three languages found and their approximate percentages of the total text bytes (e.g. 80% English and 20% French out of 1000 bytes)

Installation

This package includes a bundled version of libcld2:

devtools::install_github("ropensci/cld2")

Guess a Language

The function detect_language() returns the best guess or NA if the language could not reliablity be determined.

cld2::detect_language("To be or not to be")
# [1] "ENGLISH"

cld2::detect_language("Ce n'est pas grave.")
# [1] "FRENCH"

cld2::detect_language("Nou breekt mijn klomp!")
# [1] "DUTCH"

Set plain_text = FALSE if your input contains HTML:

cld2::detect_language(url('http://www.un.org/ar/universal-declaration-human-rights/'), plain_text = FALSE)
# [1] "ARABIC"

cld2::detect_language(url('http://www.un.org/zh/universal-declaration-human-rights/'), plain_text = FALSE)
# [1] "CHINESE"

Use detect_language_multi() to get detailed classification output.

detect_language_multi(url('http://www.un.org/fr/universal-declaration-human-rights/'), plain_text = FALSE)
# $classification
#   language code latin proportion
# 1   FRENCH   fr  TRUE       0.96
# 2  ENGLISH   en  TRUE       0.03
# 3   ARABIC   ar FALSE       0.00
# 
# $bytes
# [1] 17008
# 
# $reliabale
# [1] TRUE

This shows the top 3 language guesses and the proportion of text that was classified as this language. The bytes attribute shows the total number of text bytes that was classified, and reliable is a complex calculation on if the #1 language is some amount more probable then the second-best Language.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 34

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗