The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-89.61%)

Mutual labels: tokenizer, text-processing

Pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Stars: ✭ 426 (-1.62%)

Mutual labels: nlp-library, text-processing

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-95.61%)

Mutual labels: tokenizer, text-processing

NLP-tools

Useful python NLP tools (evaluation, GUI interface, tokenization)

Stars: ✭ 39 (-90.99%)

Mutual labels: text-processing, nlp-library

Kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Stars: ✭ 554 (+27.94%)

Mutual labels: tokenizer, nlp-library

python-mecab

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Stars: ✭ 27 (-93.76%)

Mutual labels: tokenizer, text-processing

gnu-linux-shell-scripting

A foundation for GNU/Linux shell scripting

Stars: ✭ 23 (-94.69%)

Mutual labels: text-processing

Sacremoses

Python port of Moses tokenizer, truecaser and normalizer

Stars: ✭ 293 (-32.33%)

Mutual labels: tokenizer

Hebrew-Tokenizer

A very simple python tokenizer for Hebrew text.

Stars: ✭ 16 (-96.3%)

Mutual labels: tokenizer

youtokentome-ruby

High performance unsupervised text tokenization for Ruby

Stars: ✭ 17 (-96.07%)

Mutual labels: word-segmentation

Vncorenlp

A Vietnamese natural language processing toolkit (NAACL 2018)

Stars: ✭ 354 (-18.24%)

Mutual labels: word-segmentation

Quick Nlp

Pytorch NLP library based on FastAI

Stars: ✭ 279 (-35.57%)

Mutual labels: nlp-library

UETsegmenter

A toolkit for Vietnamese word segmentation

Stars: ✭ 60 (-86.14%)

Mutual labels: word-segmentation

PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

Stars: ✭ 14 (-96.77%)

Mutual labels: tokenizer

text2text

Text2Text: Cross-lingual natural language processing and generation toolkit

Stars: ✭ 188 (-56.58%)

Mutual labels: tokenizer

Giveme5W

Extraction of the five journalistic W-questions (5W) from news articles

Stars: ✭ 16 (-96.3%)

Mutual labels: nlp-library

advanced-text-mining

TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.

Stars: ✭ 15 (-96.54%)

Mutual labels: text-processing

Bsed

Simple SQL-like syntax on top of Perl text processing.

Stars: ✭ 414 (-4.39%)

Mutual labels: text-processing

Lingua

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Stars: ✭ 341 (-21.25%)

Mutual labels: nlp-library

pascal-interpreter

A simple interpreter for a large subset of Pascal language written for educational purposes

Stars: ✭ 21 (-95.15%)

Mutual labels: tokenizer

bredon

A modern CSS value compiler in JavaScript

Stars: ✭ 39 (-90.99%)

Mutual labels: tokenizer

rakutenma-python

Rakuten MA (Python version)

Stars: ✭ 15 (-96.54%)

Mutual labels: word-segmentation

Sentences

A multilingual command line sentence tokenizer in Golang

Stars: ✭ 293 (-32.33%)

Mutual labels: tokenizer

text

Qiniu Text Processing Libraries for Go

Stars: ✭ 25 (-94.23%)

Mutual labels: text-processing

Bert Multitask Learning

BERT for Multitask Learning

Stars: ✭ 380 (-12.24%)

Mutual labels: word-segmentation

cang-jie

Chinese tokenizer for tantivy, based on jieba-rs

Stars: ✭ 48 (-88.91%)

Mutual labels: tokenizer

Textpipe

Textpipe: clean and extract metadata from text

Stars: ✭ 284 (-34.41%)

Mutual labels: text-processing

typ3r.js

🍟 [Library] dA aNn0Y1Ng t3Xt g3NeRa7or

Stars: ✭ 22 (-94.92%)

Mutual labels: text-processing

Symspellpy

Python port of SymSpell

Stars: ✭ 420 (-3%)

Mutual labels: word-segmentation

classy

classy is a simple-to-use library for building high-performance Machine Learning models in NLP.

Stars: ✭ 61 (-85.91%)

Mutual labels: nlp-library

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (-36.95%)

Mutual labels: nlp-library

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (-19.63%)

Mutual labels: text-processing

SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Stars: ✭ 28 (-93.53%)

Mutual labels: word-segmentation

customized-symspell

Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

Stars: ✭ 51 (-88.22%)

Mutual labels: word-segmentation

hck

A sharp cut(1) clone.

Stars: ✭ 542 (+25.17%)

Mutual labels: text-processing

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Stars: ✭ 32 (-92.61%)

Mutual labels: tokenizer

hashformers

Hashformers is a framework for hashtag segmentation with transformers.

Stars: ✭ 18 (-95.84%)

Mutual labels: word-segmentation

Lexmachine

Lex machinary for go.

Stars: ✭ 335 (-22.63%)

Mutual labels: tokenizer

stringx

Drop-in replacements for base R string functions powered by stringi

Stars: ✭ 14 (-96.77%)

Mutual labels: text-processing

TextDatasetCleaner

🔬 Очистка датасетов от мусора (нормализация, препроцессинг)

Stars: ✭ 27 (-93.76%)

Mutual labels: text-processing

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Stars: ✭ 21 (-95.15%)

Mutual labels: tokenizer

cws-tensorflow

基于Tensorflow的中文分词模型

Stars: ✭ 25 (-94.23%)

Mutual labels: word-segmentation

hanzi-tools

Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.

Stars: ✭ 69 (-84.06%)

Mutual labels: word-segmentation

NLP Toolkit

Library of state-of-the-art models (PyTorch) for NLP tasks

Stars: ✭ 92 (-78.75%)

Mutual labels: nlp-library

Php Parser

🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)

Stars: ✭ 400 (-7.62%)

Mutual labels: tokenizer

Contextualized Topic Models

A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.

Stars: ✭ 318 (-26.56%)

Mutual labels: nlp-library

daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.

Stars: ✭ 75 (-82.68%)

Mutual labels: text-processing

andaluh-js

Transliterate español (spanish) spelling to andaluz proposals using javascript

Stars: ✭ 22 (-94.92%)

Mutual labels: text-processing

tokenizer

Tokenize CSS according to the CSS Syntax

Stars: ✭ 52 (-87.99%)

Mutual labels: tokenizer

Text-Analysis

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Stars: ✭ 48 (-88.91%)

Mutual labels: text-processing

Nuts

自然语言处理常见任务（主要包括文本分类，序列标注，自动问答等）解决方案试验田

Stars: ✭ 21 (-95.15%)

Mutual labels: nlp-library

Giveme5w1h

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

Stars: ✭ 316 (-27.02%)

Mutual labels: nlp-library

pwsh-prelude

PowerShell “standard” library for supercharging your productivity. Provides a powerful cross-platform scripting environment enabling efficient analysis and sustainable science in myriad contexts.

Stars: ✭ 26 (-94%)

Mutual labels: text-processing

1-60 of 285 similar projects

›