SyntokText tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+547.37%)
Mutual labels: tokenizer, segmentation
KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+2815.79%)
Mutual labels: tokenizer, segmentation
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+2178.95%)
Mutual labels: tokenizer, text-processing
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (+10.53%)
Mutual labels: tokenizer, computational-linguistics
python-mecabA repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (+42.11%)
Mutual labels: tokenizer, text-processing
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (+215.79%)
Mutual labels: computational-linguistics, text-processing
Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+2205.26%)
Mutual labels: tokenizer, text-processing
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+136.84%)
Mutual labels: tokenizer, text-processing
frogFrog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
Stars: ✭ 70 (+268.42%)
Mutual labels: computational-linguistics, text-processing
CISTEMStemmer for German
Stars: ✭ 33 (+73.68%)
Mutual labels: segmentation, computational-linguistics
sembei🍘 単語分割を経由しない単語埋め込み 🍘
Stars: ✭ 14 (-26.32%)
Mutual labels: computational-linguistics
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+647.37%)
Mutual labels: text-processing
HyperDenseNet pytorchPytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation
Stars: ✭ 58 (+205.26%)
Mutual labels: segmentation
daachorse🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
Stars: ✭ 75 (+294.74%)
Mutual labels: text-processing
BaysorBayesian Segmentation of Spatial Transcriptomics Data
Stars: ✭ 53 (+178.95%)
Mutual labels: segmentation
foliapyAn extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Stars: ✭ 13 (-31.58%)
Mutual labels: computational-linguistics
deepflash2A deep-learning pipeline for segmentation of ambiguous microscopic images.
Stars: ✭ 34 (+78.95%)
Mutual labels: segmentation
textQiniu Text Processing Libraries for Go
Stars: ✭ 25 (+31.58%)
Mutual labels: text-processing
dilation-kerasMulti-Scale Context Aggregation by Dilated Convolutions in Keras.
Stars: ✭ 72 (+278.95%)
Mutual labels: segmentation