All Projects → kdelwat → Onset

kdelwat / Onset

Licence: MIT license
A language evolution simulator, using realistic phonetic changes.

Programming Languages

python
139335 projects - #7 most used programming language
Vue
7211 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Onset

wikipron
Massively multilingual pronunciation mining
Stars: ✭ 167 (+456.67%)
Mutual labels:  linguistics, phonology, phonetics
dev
PHOIBLE data and development.
Stars: ✭ 90 (+200%)
Mutual labels:  linguistics, phonology
Ipa Dict
Monolingual wordlists with pronunciation information in IPA
Stars: ✭ 139 (+363.33%)
Mutual labels:  linguistics
nyt-first-said
Tweets when words are published for the first time in the NYT
Stars: ✭ 222 (+640%)
Mutual labels:  linguistics
Awesome Linguistics
A curated list of anything remotely related to linguistics
Stars: ✭ 207 (+590%)
Mutual labels:  linguistics
Hangulize
Hangulize transcribes non-Korean words into Hangul
Stars: ✭ 152 (+406.67%)
Mutual labels:  linguistics
pfootprint
Political Discourse Analysis Using Pre-Trained Word Vectors.
Stars: ✭ 20 (-33.33%)
Mutual labels:  linguistics
Ichiran
Linguistic tools for texts in Japanese language
Stars: ✭ 120 (+300%)
Mutual labels:  linguistics
event-embedding-multitask
*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach
Stars: ✭ 22 (-26.67%)
Mutual labels:  linguistics
Opencorpora
A web-based engine for creating and annotating textual corpora
Stars: ✭ 204 (+580%)
Mutual labels:  linguistics
feminizator.github.io
Феминизатор слов
Stars: ✭ 29 (-3.33%)
Mutual labels:  linguistics
Hangulize
Korean Alphabet Transcription
Stars: ✭ 184 (+513.33%)
Mutual labels:  linguistics
Tossi
Chooses correct Korean particle morphs for arbitrary words.
Stars: ✭ 160 (+433.33%)
Mutual labels:  linguistics
poesy
Poetic processing, for Python.
Stars: ✭ 28 (-6.67%)
Mutual labels:  linguistics
Pycantonese
Cantonese Linguistics and NLP in Python
Stars: ✭ 147 (+390%)
Mutual labels:  linguistics
proiel-treebank
Official releases of the PROIEL treebank of ancient Indo-European languages
Stars: ✭ 30 (+0%)
Mutual labels:  linguistics
Corpuscrawler
Crawler for linguistic corpora
Stars: ✭ 127 (+323.33%)
Mutual labels:  linguistics
Rime Cantonese
Rime Cantonese input schema | 粵語拼音輸入方案
Stars: ✭ 173 (+476.67%)
Mutual labels:  linguistics
WonderfulPolishLanguage
This is a repository created for the list of resources for learning and exploring Wonderful Polish language.
Stars: ✭ 31 (+3.33%)
Mutual labels:  linguistics
lambda-notebook
Lambda Notebook: Formal Semantics in Jupyter
Stars: ✭ 16 (-46.67%)
Mutual labels:  linguistics

Onset

Onset is a language evolution simulator, which evolves a list of words in IPA form according to realistic phonological rules.

The frontend is built with Vue and the CSS framework Bulma. It communicates with the backend using simple REST endpoints.

The backend is built with Python and Flask.

Directory Structure

  • src is the source code for the Vue frontend
  • app is the source code for the Python API, written in Flask.
  • engine is the source code for the evolution engine which is called by Flask.
  • app/templates/ contains the Webpack-generated index file, served with Flask.
  • app/static/ contains static assets generated by Webpack.
  • config contains Webpack configuration files, generated using vue-cli

Build Setup

# install Python dependencies
pip install -r requirements.txt

# install Javascript dependencies
npm install

# build frontend
npm run build

# run using Flask's development server
python run.py

# or use PyPy for a speed boost
pypy3 run.py

To install development requirements, which will allow testing, validation, and script usage:

pip install -r requirements-dev.txt

To validate the YAML data:

pykwalify -d engine/data/rules.yaml -s engine/data/rules.schema.yaml
pykwalify -d engine/data/diacritics.yaml -s engine/data/diacritics.schema.yaml

To run the tests:

py.test

Sources

A variety of sources were used for the information needed to build this app. Please see the LICENCE.md file in the engine/data directory for specific data sources.

The following papers were used when implementing the algorithms:

  • Harold R. Bauer (1988) The ethologic model of phonetic development: I. Phonetic contrast estimators, Clinical Linguistics & Phonetics, 2:4, 347-380, DOI: 10.3109/02699208808985265
  • Carol Stoel-Gammon (2010) The Word Complexity Measure: Description and application to developmental phonology and disorders, Clinical Linguistics & Phonetics, 24:4-5, 271-282, DOI: 10.3109/02699200903581059
  • Carterette, E. and Jones, M. (1974) Informal Speech: Alphabetic and Phonetic Texts with Statistical Analyses and Tables (Berkeley: University of California Press).

General information on the linguistics underpinning the app is from:

A lot of technical inspiration was taken from the source code of panphon. In particular, the deparsing algorithm and YAML data files were inspired by panphon's approach. Please check it out!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].