All Categories β†’ No Category β†’ computational-linguistics

Top 26 computational-linguistics open source projects

ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
python-arpa
🐍 Python library for n-gram models in ARPA format
foliapy
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
sembei
🍘 単θͺžεˆ†ε‰²γ‚’η΅Œη”±γ—γͺγ„ε˜θͺžεŸ‹γ‚θΎΌγΏ 🍘
folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
kaldi helpers
πŸ™Š A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
citation-function
Measuring the Evolution of a Scientific Field through Citation Frames
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …
lxa5
Linguistica 5: Unsupervised Learning of Linguistic Structure
nytwit
New York Times Word Innovation Types dataset
bllip-parser
BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
1-26 of 26 computational-linguistics projects