Open Korean TextOpen Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+1.15%)
FastnlpfastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+463.74%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (-41.34%)
ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (-78.06%)
NagisaA Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (-39.95%)
PythainlpThai Natural Language Processing in Python.
Stars: ✭ 582 (+34.41%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-89.61%)
PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Stars: ✭ 426 (-1.62%)
ArabicProcessingCogA Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-95.61%)
NLP-toolsUseful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-90.99%)
KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+27.94%)
python-mecabA repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-93.76%)
SacremosesPython port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (-32.33%)
Hebrew-TokenizerA very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-96.3%)
youtokentome-rubyHigh performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (-96.07%)
VncorenlpA Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (-18.24%)
Quick NlpPytorch NLP library based on FastAI
Stars: ✭ 279 (-35.57%)
UETsegmenterA toolkit for Vietnamese word segmentation
Stars: ✭ 60 (-86.14%)
PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-96.77%)
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (-56.58%)
Giveme5WExtraction of the five journalistic W-questions (5W) from news articles
Stars: ✭ 16 (-96.3%)
BsedSimple SQL-like syntax on top of Perl text processing.
Stars: ✭ 414 (-4.39%)
Lingua👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (-21.25%)
pascal-interpreterA simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-95.15%)
bredonA modern CSS value compiler in JavaScript
Stars: ✭ 39 (-90.99%)
SentencesA multilingual command line sentence tokenizer in Golang
Stars: ✭ 293 (-32.33%)
textQiniu Text Processing Libraries for Go
Stars: ✭ 25 (-94.23%)
cang-jieChinese tokenizer for tantivy, based on jieba-rs
Stars: ✭ 48 (-88.91%)
TextpipeTextpipe: clean and extract metadata from text
Stars: ✭ 284 (-34.41%)
typ3r.js🍟 [Library] dA aNn0Y1Ng t3Xt g3NeRa7or
Stars: ✭ 22 (-94.92%)
SymspellpyPython port of SymSpell
Stars: ✭ 420 (-3%)
classyclassy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (-85.91%)
Chatbot nerchatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (-36.95%)
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (-19.63%)
SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-93.53%)
customized-symspellJava port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (-88.22%)
hckA sharp cut(1) clone.
Stars: ✭ 542 (+25.17%)
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-92.61%)
hashformersHashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-95.84%)
LexmachineLex machinary for go.
Stars: ✭ 335 (-22.63%)
stringxDrop-in replacements for base R string functions powered by stringi
Stars: ✭ 14 (-96.77%)
TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-93.76%)
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-95.15%)
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (-84.06%)
NLP ToolkitLibrary of state-of-the-art models (PyTorch) for NLP tasks
Stars: ✭ 92 (-78.75%)
Php Parser🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (-7.62%)
Contextualized Topic ModelsA python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.
Stars: ✭ 318 (-26.56%)
daachorse🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.
Stars: ✭ 75 (-82.68%)
andaluh-jsTransliterate español (spanish) spelling to andaluz proposals using javascript
Stars: ✭ 22 (-94.92%)
tokenizerTokenize CSS according to the CSS Syntax
Stars: ✭ 52 (-87.99%)
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-88.91%)
Nuts自然语言处理常见任务(主要包括文本分类,序列标注,自动问答等)解决方案试验田
Stars: ✭ 21 (-95.15%)
Giveme5w1hExtraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Stars: ✭ 316 (-27.02%)
pwsh-preludePowerShell “standard” library for supercharging your productivity. Provides a powerful cross-platform scripting environment enabling efficient analysis and sustainable science in myriad contexts.
Stars: ✭ 26 (-94%)