All Categories → Machine Learning → linguistics

Top 67 linguistics open source projects

Awesome Linguistics
A curated list of anything remotely related to linguistics
Opencorpora
A web-based engine for creating and annotating textual corpora
✭ 204
linguistics
Hangulize
Korean Alphabet Transcription
Rime Cantonese
Rime Cantonese input schema | 粵語拼音輸入方案
Prosodic
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
Tossi
Chooses correct Korean particle morphs for arbitrary words.
Hangulize
Hangulize transcribes non-Korean words into Hangul
Ipa Dict
Monolingual wordlists with pronunciation information in IPA
Corpuscrawler
Crawler for linguistic corpora
Ichiran
Linguistic tools for texts in Japanese language
Colibri Core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Pyconll
A minimal, pure Python library to interface with CoNLL-U format files.
Elpis
🙊 WIP software for creating speech recognition models.
Wikipron
Massively multilingual pronunciation mining
Flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
Textannotationgraphs
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
Beta
An open source reimplementation of Benny Brodda's BETA in Python
Python Datamuse
Python 3 wrapper for the Datamuse API
Psychopy
For running psychology and neuroscience experiments
Phonemes
Jason Riggle's chart of phonological features in JSON format + extras
Awesome Sentiment Analysis
😀😄😂😭 A curated list of Sentiment Analysis methods, implementations and misc. 😥😟😱😤
Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
rsyntaxtree
Syntax tree generator made with Ruby and RMagic
treebender
A HDPSG-inspired symbolic natural language parser written in Rust
concepticon-data
The curation repository for the data behind Concepticon.
TextGridTools
Read, write, and manipulate Praat TextGrid files with Python
mystem
CGo bindings to Yandex.Mystem
duree
Durée: the longest book ever written.
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
neural-net-linguistics
Papers about NN and linguistics
linguisticsdown
Easy Linguistics Document Writing with R Markdown
lameta
The Metadata Editor for Transparent Archiving of language document materials
lingvo--Ner-ru
Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке
eliza-rs
A rust implementation of ELIZA - a natural language processing program developed by Joseph Weizenbaum in 1966.
LangPad
A word processor/dictionary/generally useful tool for linguistics.
mlconjug3
A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.
libpalaso
Palaso Library: A set of .Net libraries useful for developers of Language Software.
KoParadigm
KoParadigm: Korean Inflectional Paradigm Generator
ngramr
R package to query the Google Ngram Viewer
dev
PHOIBLE data and development.
Onset
A language evolution simulator, using realistic phonetic changes.
1-60 of 67 linguistics projects