All Projects → common-voice → Common Voice

common-voice / Common Voice

Licence: mpl-2.0
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

Programming Languages

typescript
32286 projects
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
HTML
75241 projects
java
68154 projects - #9 most used programming language
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to Common Voice

opendata
Open data of Cofacts collaborative fact-checking database
Stars: ✭ 35 (-98.79%)
Mutual labels:  open-data, crowdsourcing
Voicebook
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Stars: ✭ 236 (-91.84%)
Mutual labels:  voice
Catalogos Dados Brasil
Mapeamento de iniciativas (e catálogos) de dados abertos governamentais no Brasil.
Stars: ✭ 187 (-93.53%)
Mutual labels:  open-data
Vonage Ruby Sdk
Vonage REST API client for Ruby. API support for SMS, Voice, Text-to-Speech, Numbers, Verify (2FA) and more.
Stars: ✭ 203 (-92.98%)
Mutual labels:  voice
Auproximity
AUProximity is an open source proximity voice chat platform, primarily aimed at Among Us.
Stars: ✭ 194 (-93.29%)
Mutual labels:  voice
Scihub
Source code and data analyses for the Sci-Hub Coverage Study
Stars: ✭ 205 (-92.91%)
Mutual labels:  open-data
Rsocrata
Provides easier interaction with Socrata open data portals http://dev.socrata.com. Users can provide a 'Socrata' data set resource URL, or a 'Socrata' Open Data API (SoDA) web query, or a 'Socrata' "human-friendly" URL, returns an R data frame. Converts dates to 'POSIX' format. Manages throttling by 'Socrata'.
Stars: ✭ 182 (-93.7%)
Mutual labels:  open-data
Caster
Dragonfly-Based Voice Programming and Accessibility Toolkit
Stars: ✭ 242 (-91.63%)
Mutual labels:  voice
City Scrapers
Scrape, standardize and share public meetings from local government websites
Stars: ✭ 220 (-92.39%)
Mutual labels:  open-data
Openfoodfacts Ios
Native (Swift) version of Open Food Facts for iOS. Coders & Decoders welcome 🤳🥫 😊
Stars: ✭ 202 (-93.01%)
Mutual labels:  crowdsourcing
Mimic Recording Studio
Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Stars: ✭ 202 (-93.01%)
Mutual labels:  voice
Magda
A federated, open-source data catalog for all your big data and small data
Stars: ✭ 193 (-93.32%)
Mutual labels:  open-data
Weblate
Web based localization tool with tight version control integration.
Stars: ✭ 2,719 (-5.95%)
Mutual labels:  crowdsourcing
Voice Overlay Android
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 189 (-93.46%)
Mutual labels:  voice
Freedom
一个小白对于科学上网的一些切身感受的整理,自己捋思路,同时也为方便他人。发现错误的地方欢迎斧正。顺便也会不断整理一些实用资源及工具。
Stars: ✭ 236 (-91.84%)
Mutual labels:  voice
Wq
📱🌐📋 wq: a modular framework supporting web / native geographic data collection apps for mobile surveys and citizen science. Powered by Django REST Framework, Redux, React, and Material UI.
Stars: ✭ 182 (-93.7%)
Mutual labels:  crowdsourcing
Pudl
The Public Utility Data Liberation Project
Stars: ✭ 200 (-93.08%)
Mutual labels:  open-data
Graphql Camara Deputados
API GraphQL com os dados da câmara de deputados do Brasil
Stars: ✭ 204 (-92.94%)
Mutual labels:  open-data
Voice Gender
Gender recognition by voice and speech analysis
Stars: ✭ 248 (-91.42%)
Mutual labels:  voice
Covid 19 Repo Data
Data archive of identifiable COVID-19 related public projects on GitHub
Stars: ✭ 236 (-91.84%)
Mutual labels:  open-data

Common Voice

This is the web app for Mozilla Common Voice, a platform for collecting speech donations in order to create public domain datasets for training voice recognition-related tools.

Upcoming releases

Type Expected date More info
Platform code & sentences Dec 15, 2021 Release notes
Dataset Jan 2022 Dataset metadata

Quick links

How to contribute

🎉 First off, thanks for taking the time to contribute! This project would not be possible without people like you. 🎉

There are many ways to get involved with Common Voice - you don't have to know how to code to contribute!

  • To add or correct the translation of the web interface, please use the Mozilla localization platform Pontoon. Please note, we do not accept any direct pull requests for changing localization content.
  • For information on how to add or edit sentences to Common Voice, see SENTENCES.md
  • For instructions on setting up a local development environment, see DEVELOPMENT.md
  • For information on how to add a new language to Common Voice, see LANGUAGE.md
  • For information on how to get in contact with existing language communities, see COMMUNITIES.md

For more general guidance on building your own language community using Mozilla voice tools, please refer to the Mozilla Voice Community Playbook.

Discussion

For general discussion (feedback, ideas, random musings), head to our Discourse Category.

For bug reports or specific feature, please use the GitHub issue tracker.

For live chat, join us on Matrix.

Licensing and content source

This repository is released under MPL (Mozilla Public License) 2.0.

The majority of our sentence text in /server/data comes directly from user submissions in our Sentence Collector or they are scraped from Wikipedia using our extractor tool, and are released under a CC0 public domain Creative Commons license.

Any files that follow the pattern europarl-VERSION-LANG.txt (such as europarl-v7-de.txt) were extracted with our thanks from the Europarl Corpus, which features transcripts from proceedings in the European parliament.

Citation

If you use the data in a published academic work we would appreciate if you cite the following article:

  • Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M. and Weber, G. (2020) "Common Voice: A Massively-Multilingual Speech Corpus". Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). pp. 4211—4215

The BiBTex is:

@inproceedings{commonvoice:2020,
  author = {Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G.},
  title = {Common Voice: A Massively-Multilingual Speech Corpus},
  booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
  pages = {4211--4215},
  year = 2020
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].