All Categories → No Category → computational-linguistics

Top 26 computational-linguistics open source projects

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

✭ 19

python nlp natural-language-processing tokenizer segmentation computational-linguistics text-processing arabic-language

python-arpa

🐍 Python library for n-gram models in ARPA format

✭ 35

python shell nlp library lm computational-linguistics language-model arpa

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

✭ 13

python shell nlp xml computational-linguistics folia clarin clariah pynlpl

CISTEM

Stemmer for German

✭ 33

c C++javascript python swift perl nlp natural-language-processing german segmentation stemmer computational-linguistics deutsch stemmers stemming german-language stemming-algorithm

sembei

🍘 単語分割を経由しない単語埋め込み 🍘

✭ 14

Jupyter Notebook python nlp japanese word-embeddings software computational-linguistics

wikipron

Massively multilingual pronunciation mining

✭ 167

python shell nlp speech pronunciation linguistics phonology python-api scraped-data phonetics computational-linguistics g2p

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

✭ 56

python shell nlp library xml corpus linguistics file-format computational-linguistics folia linguistic-annotation-framework

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

✭ 32

python nlp deep-neural-networks twitter deep-learning sentiment-analysis neural-network word-embeddings pytorch embeddings lstm deeplearning computational-linguistics semeval attention-mechanism nlp-machine-learning twitter-messages sentiment-classification semeval-sentiment

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

✭ 21

scala natural-language-processing yandex tokenizer russian-specific computational-linguistics lemmatizer russian-morphology mystem

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

✭ 51

Jupyter Notebook nlp machine-learning tweets sentiment-analysis word2vec word-embeddings keras jupyter-notebook cnn embeddings machinelearning computational-linguistics convolutional-neural-network nlp-machine-learning word2vec-ru

word2vec-tsne

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

✭ 59

Jupyter Notebook visualization nlp machine-learning word2vec word-embeddings embeddings machinelearning computational-linguistics tsne nlp-machine-learning google-news leo-tolstoy

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

kaldi helpers

🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

✭ 13

python shell docker speech automatic-speech-recognition speech-to-text computational-linguistics kaldi transcription kaldi-helpers

citation-function

Measuring the Evolution of a Scientific Field through Citation Frames

✭ 40

Jupyter Notebook python shell computational-linguistics scientometrics citation-network citation-analysis

datalinguist

Stanford CoreNLP in idiomatic Clojure.

✭ 93

clojure nlp graphviz natural-language-processing stanford stanford-corenlp computational-linguistics dependency-parser pos-tagging part-of-speech-tagger dependency-parsing pos-tagger corenlp rebl datafy

frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

✭ 70

C++M4 nlp syntax natural-language-processing morphology named-entity-recognition computational-linguistics text-processing dutch dependency-parser pos-tagger folia lemmatiser morphological-analyser

ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules …

✭ 58

C++Coq M4 shell python Verilog nlp natural-language-processing computational-linguistics punctuation folia tokeniser

lxa5

Linguistica 5: Unsupervised Learning of Linguistic Structure

✭ 27

python shell natural-language-processing computational-linguistics unsupervised-learning linguistica

nytwit

New York Times Word Innovation Types dataset

✭ 21

nlp news corpus dataset computational-linguistics

embedding evaluation

Evaluate your word embeddings

✭ 32

python shell perl semantic benchmark evaluation embeddings computational-linguistics evaluation-methods feature-norms

linguistics problems

Natural language processing in examples and games

✭ 23

Jupyter Notebook python nlp machine-learning natural-language-processing language-modeling linguistics computational-linguistics ontologies natural-language-understanding

bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

✭ 217

nlp machine-learning natural-language-processing ai parsing artificial-intelligence computational-linguistics nlp-library

perke

A keyphrase extractor for Persian

✭ 60

python nlp natural-language-processing information-retrieval text-mining data-mining keyword persian persian-language computational-linguistics text-processing data-processing keyword-extraction keyphrase-extraction keyword-extractor keyphrase keyphrase-extractor perisan-nlp

yap

Yet Another (natural language) Parser

✭ 40

go python nlp natural-language-processing disambiguation computational-linguistics dependency-parser nlp-dependency-parsing nlp-parsing hebrew transition-systems universal-dependencies morphological-analysis hebrew-analytical-lexicon morphological-disambiguator

pylangacq

Language Acquisition Research Tools

✭ 33

python nlp natural-language-processing linguistics computational-linguistics childes language-development pylangacq language-acquisition child-development talkbank

esapp

An unsupervised Chinese word segmentation tool.

✭ 13

C++python CMake nlp chinese-nlp computational-linguistics chinese-text-segmentation unsupervised-learning word-segmentation

1-26 of 26 computational-linguistics projects