Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → google → Language Resources

google / Language Resources

Licence: apache-2.0

Datasets and tools for basic natural language processing.

Labels

natural-language

Projects that are alternatives of or similar to Language Resources

aprenda-python

Aprendizado, dicas e projetos sobre Python

Stars: ✭ 22 (-91.44%)

Mutual labels: natural-language

retext-spell

plugin to check spelling

Stars: ✭ 53 (-79.38%)

Mutual labels: natural-language

lancaster-stemmer

Lancaster stemming algorithm

Stars: ✭ 22 (-91.44%)

Mutual labels: natural-language

NLP-Natural-Language-Processing

Projects and useful articles / links

Stars: ✭ 149 (-42.02%)

Mutual labels: natural-language

gdpr-fingerprint-pii

Use Watson Natural Language Understanding and Watson Knowledge Studio to fingerprint personal data from unstructured documents

Stars: ✭ 49 (-80.93%)

Mutual labels: natural-language

react-taggy

A simple zero-dependency React component for tagging user-defined entities within a block of text.

Stars: ✭ 29 (-88.72%)

Mutual labels: natural-language

hedges

List of (possible) English hedge words

Stars: ✭ 39 (-84.82%)

Mutual labels: natural-language

genie-toolkit

A Generator of Natural Language Parsers for Compositional Virtual Assistants

Stars: ✭ 115 (-55.25%)

Mutual labels: natural-language

fillers

List of (possible) English filler words

Stars: ✭ 36 (-85.99%)

Mutual labels: natural-language

pixiedust-facebook-analysis

A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio

Stars: ✭ 42 (-83.66%)

Mutual labels: natural-language

remark-retext

plugin to transform from remark (Markdown) to retext (natural language)

Stars: ✭ 18 (-93%)

Mutual labels: natural-language

nl4dv

A python toolkit to create Visualizations (Vis) using natural language (NL) or add an NL interface to existing Vis.

Stars: ✭ 63 (-75.49%)

Mutual labels: natural-language

apertium-html-tools

Web application providing a fully localised interface for text/website/document translation, analysis and generation powered by Apertium.

Stars: ✭ 36 (-85.99%)

Mutual labels: natural-language

n2words

Convert numerical numbers to written numbers, in 25+ languages.

Stars: ✭ 44 (-82.88%)

Mutual labels: natural-language

retext-profanities

plugin to check for profane and vulgar wording

Stars: ✭ 34 (-86.77%)

Mutual labels: natural-language

buzzwords

List of (possible) English buzzword words

Stars: ✭ 51 (-80.16%)

Mutual labels: natural-language

rita

Website, documentation and examples for RiTa

Stars: ✭ 42 (-83.66%)

Mutual labels: natural-language

Rasa nlu gq

turn natural language into structured data(支持中文，自定义了N种模型，支持不同的场景和任务)

Stars: ✭ 256 (-0.39%)

Mutual labels: natural-language

array-to-sentence

Join all elements of an array and create a human-readable string

Stars: ✭ 32 (-87.55%)

Mutual labels: natural-language

nli-go

Natural Language Interface in GO, a semantic parser and execution engine.

Stars: ✭ 20 (-92.22%)

Mutual labels: natural-language

View All Similar Projects ➔

Language Resources and Tools

Datasets and scripts for basic natural language and speech processing.

This is not an official Google product.

Natural Languages

Directory	Language Available
af	Afrikaans
bn	Bengali / Bangla
hi_ur	Hindi & Urdu
is	Icelandic
jv	Javanese
km	Khmer
lo	Lao
my	Burmese / Myanmar
ne	Nepali
si	Sinhala
su	Sundanese
xh	Xhosa
zu	Zulu

Tools

We are including a few tools for working with the natural language datasets. These tools are written in C++ and Python and are built with Bazel. To compile and use these tools, install a recent version of Bazel (minimally Bazel release 0.4.5 is required).

Opensourced Audio Data

Resource	Link
Sinhala TTS recordings (~3K)	https://www.openslr.org/30/
TTS recordings for four South African languages (af, st, tn, xh)	https://www.openslr.org/32/
Large Javanese ASR training data set (~185K)	https://www.openslr.org/35/
Large Sundanese ASR training data set (~220K)	https://www.openslr.org/36/
High quality TTS data for Bengali languages	https://www.openslr.org/37/
High quality TTS data for Javanese	https://www.openslr.org/41/
High quality TTS data for Khmer	https://www.openslr.org/42/
High quality TTS data for Nepali	https://www.openslr.org/43/
High quality TTS data for Sundanese	https://www.openslr.org/44/
Large Sinhala ASR training data set	https://www.openslr.org/52/
Large Bengali ASR training data set	https://www.openslr.org/53/
Large Nepali ASR training data set	https://www.openslr.org/54/
Crowdsourced high-quality Argentinian Spanish speech data set	https://www.openslr.org/61/
Crowdsourced high-quality Malayalam multi-speaker speech data set	https://www.openslr.org/63/
Crowdsourced high-quality Marathi multi-speaker speech data set	https://www.openslr.org/64/
Crowdsourced high-quality Tamil multi-speaker speech data set	https://www.openslr.org/65/
Crowdsourced high-quality Telugu multi-speaker speech data set	https://www.openslr.org/66/
Data set which contains recordings of Catalan	https://www.openslr.org/69
Crowdsourced high-quality Nigerian English speech data set	https://www.openslr.org/70
Crowdsourced high-quality Chilean Spanish speech data set	https://www.openslr.org/71
Crowdsourced high-quality Columbian Spanish speech data set	https://www.openslr.org/72
Crowdsourced high-quality Peruvian Spanish speech data set	https://www.openslr.org/73
Crowdsourced high-quality Puerto Rico Spanish speech data set	https://www.openslr.org/74
Crowdsourced high-quality Venezuelan Spanish speech data set	https://www.openslr.org/75
Crowdsourced high-quality Basque speech data set	https://www.openslr.org/76
Crowdsourced high-quality Galician speech data set	https://www.openslr.org/77
Crowdsourced high-quality Gujarati multi-speaker speech data set	https://www.openslr.org/78
Crowdsourced high-quality Kannada multi-speaker speech data set	https://www.openslr.org/79
Crowdsourced high-quality Burmese speech data set	https://www.openslr.org/80
Data set which contains male and female recordings of English from various dialects of the UK and Ireland.	https://www.openslr.org/83
Crowdsourced high-quality Yoruba speech data set	https://www.openslr.org/86

Publications

License

Unless otherwise noted, all original files are licensed under an Apache License, Version 2.0.

Where specifically noted, some datasets are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

The directory third_party/ contains third-party works, which we are including under the respective licenses of the upstream projects. See third_party/README.md for further details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 257

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

google / Language Resources

Labels

Projects that are alternatives of or similar to Language Resources

Language Resources and Tools

Natural Languages

Tools

Opensourced Audio Data

Other reading resources

Publications

License