kdelwat / Onset

Licence: MIT license

A language evolution simulator, using realistic phonetic changes.

Programming Languages

python

139335 projects - #7 most used programming language

Vue

7211 projects

javascript

184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Onset

wikipron

Massively multilingual pronunciation mining

Stars: ✭ 167 (+456.67%)

Mutual labels: linguistics, phonology, phonetics

dev

PHOIBLE data and development.

Stars: ✭ 90 (+200%)

Mutual labels: linguistics, phonology

Ipa Dict

Monolingual wordlists with pronunciation information in IPA

Stars: ✭ 139 (+363.33%)

Mutual labels: linguistics

nyt-first-said

Tweets when words are published for the first time in the NYT

Stars: ✭ 222 (+640%)

Mutual labels: linguistics

Awesome Linguistics

A curated list of anything remotely related to linguistics

Stars: ✭ 207 (+590%)

Mutual labels: linguistics

Hangulize

Hangulize transcribes non-Korean words into Hangul

Stars: ✭ 152 (+406.67%)

Mutual labels: linguistics

pfootprint

Political Discourse Analysis Using Pre-Trained Word Vectors.

Stars: ✭ 20 (-33.33%)

Mutual labels: linguistics

Ichiran

Linguistic tools for texts in Japanese language

Stars: ✭ 120 (+300%)

Mutual labels: linguistics

event-embedding-multitask

*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach

Stars: ✭ 22 (-26.67%)

Mutual labels: linguistics

Opencorpora

A web-based engine for creating and annotating textual corpora

Stars: ✭ 204 (+580%)

Mutual labels: linguistics

feminizator.github.io

Феминизатор слов

Stars: ✭ 29 (-3.33%)

Mutual labels: linguistics

Hangulize

Korean Alphabet Transcription

Stars: ✭ 184 (+513.33%)

Mutual labels: linguistics

Tossi

Chooses correct Korean particle morphs for arbitrary words.

Stars: ✭ 160 (+433.33%)

Mutual labels: linguistics

poesy

Poetic processing, for Python.

Stars: ✭ 28 (-6.67%)

Mutual labels: linguistics

Pycantonese

Cantonese Linguistics and NLP in Python

Stars: ✭ 147 (+390%)

Mutual labels: linguistics

proiel-treebank

Official releases of the PROIEL treebank of ancient Indo-European languages

Stars: ✭ 30 (+0%)

Mutual labels: linguistics

Corpuscrawler

Crawler for linguistic corpora

Stars: ✭ 127 (+323.33%)

Mutual labels: linguistics

Rime Cantonese

Rime Cantonese input schema | 粵語拼音輸入方案

Stars: ✭ 173 (+476.67%)

Mutual labels: linguistics

WonderfulPolishLanguage

This is a repository created for the list of resources for learning and exploring Wonderful Polish language.

Stars: ✭ 31 (+3.33%)

Mutual labels: linguistics

lambda-notebook

Lambda Notebook: Formal Semantics in Jupyter

Stars: ✭ 16 (-46.67%)

Mutual labels: linguistics

View All Similar Projects ➔

Onset

Onset is a language evolution simulator, which evolves a list of words in IPA form according to realistic phonological rules.

The frontend is built with Vue and the CSS framework Bulma. It communicates with the backend using simple REST endpoints.

The backend is built with Python and Flask.

Directory Structure

src is the source code for the Vue frontend
app is the source code for the Python API, written in Flask.
engine is the source code for the evolution engine which is called by Flask.
app/templates/ contains the Webpack-generated index file, served with Flask.
app/static/ contains static assets generated by Webpack.
config contains Webpack configuration files, generated using vue-cli

Build Setup

# install Python dependencies
pip install -r requirements.txt

# install Javascript dependencies
npm install

# build frontend
npm run build

# run using Flask's development server
python run.py

# or use PyPy for a speed boost
pypy3 run.py

To install development requirements, which will allow testing, validation, and script usage:

pip install -r requirements-dev.txt

To validate the YAML data:

pykwalify -d engine/data/rules.yaml -s engine/data/rules.schema.yaml
pykwalify -d engine/data/diacritics.yaml -s engine/data/diacritics.schema.yaml

To run the tests:

py.test

Sources

A variety of sources were used for the information needed to build this app. Please see the LICENCE.md file in the engine/data directory for specific data sources.

The following papers were used when implementing the algorithms:

Harold R. Bauer (1988) The ethologic model of phonetic development: I. Phonetic contrast estimators, Clinical Linguistics & Phonetics, 2:4, 347-380, DOI: 10.3109/02699208808985265
Carol Stoel-Gammon (2010) The Word Complexity Measure: Description and application to developmental phonology and disorders, Clinical Linguistics & Phonetics, 24:4-5, 271-282, DOI: 10.3109/02699200903581059
Carterette, E. and Jones, M. (1974) Informal Speech: Alphabetic and Phonetic Texts with Statistical Analyses and Tables (Berkeley: University of California Press).

General information on the linguistics underpinning the app is from:

Trask's Historical Linguistics by Larry Trask.
Introductory Phonology by Bruce Hayes.
Wikipedia

A lot of technical inspiration was taken from the source code of panphon. In particular, the deparsing algorithm and YAML data files were inspired by panphon's approach. Please check it out!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

kdelwat / Onset

Programming Languages

Labels

Projects that are alternatives of or similar to Onset

Onset

Directory Structure

Build Setup

Sources