The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-37.5%)

Mutual labels: tokenizer

ark-pixel-font

Open source Pan-CJK pixel font / 开源的泛中日韩像素字体

Stars: ✭ 1,767 (+2354.17%)

Mutual labels: chinese

Somajo

A tokenizer and sentence splitter for German and English web and social media texts.

Stars: ✭ 85 (+18.06%)

Mutual labels: tokenizer

Cols Agent Tasks

Colin's ALM Corner Custom Build Tasks

Stars: ✭ 70 (-2.78%)

Mutual labels: tokenizer

AiSpace

AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0

Stars: ✭ 28 (-61.11%)

Mutual labels: chinese

lexertk

C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html

Stars: ✭ 26 (-63.89%)

Mutual labels: tokenizer

Js Tokens

Tiny JavaScript tokenizer.

Stars: ✭ 166 (+130.56%)

Mutual labels: tokenizer

eslint-config-mingelz

A shared ESLint configuration with Chinese comments. 一份带有完整中文注释的 ESLint 规则。

Stars: ✭ 15 (-79.17%)

Mutual labels: chinese

Lex

Replaced by foonathan/lexy

Stars: ✭ 137 (+90.28%)

Mutual labels: tokenizer

exhentai-tags-chinese-translation

E-Hentai/ExHentai 全部 TAGs 中文翻译

Stars: ✭ 273 (+279.17%)

Mutual labels: chinese

Chevrotain

Parser Building Toolkit for JavaScript

Stars: ✭ 1,795 (+2393.06%)

Mutual labels: tokenizer

ModernSecurityProtectionGuide

Modern Security Protection Guide

Stars: ✭ 72 (+0%)

Mutual labels: chinese

Kadot

Kadot, the unsupervised natural language processing library.

Stars: ✭ 108 (+50%)

Mutual labels: tokenizer

say-it

TTS in command line -- Pronounce the Chinese and English words you typed in.

Stars: ✭ 19 (-73.61%)

Mutual labels: chinese

Hippo

PHP standards checker.

Stars: ✭ 82 (+13.89%)

Mutual labels: tokenizer

next-qrcode

React hooks for generating QRCode for your next React apps.

Stars: ✭ 87 (+20.83%)

Mutual labels: chinese

rime-wugniu zaonhe

上海吳語拼音輸入方案 · 上海吴语拼音输入方案 · Rime input schemas for Shanghai Dialects

Stars: ✭ 20 (-72.22%)

Mutual labels: chinese

String Calc

PHP calculator library for mathematical terms (expressions) passed as strings

Stars: ✭ 60 (-16.67%)

Mutual labels: tokenizer

Greynir

The greynir.is natural language processing website for Icelandic

Stars: ✭ 47 (-34.72%)

Mutual labels: tokenizer

MixPoet

Source codes of MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space (AAAI 2020)

Stars: ✭ 141 (+95.83%)

Mutual labels: chinese

Email-newsletter-RSS

邮箱 📧 newsletter RSS 荟萃 News

Stars: ✭ 1,225 (+1601.39%)

Mutual labels: chinese

SIMCSE unsup

中文无监督SimCSE Pytorch实现

Stars: ✭ 113 (+56.94%)

Mutual labels: chinese

embedding study

中文预训练模型生成字向量学习，测试BERT，ELMO的中文效果

Stars: ✭ 94 (+30.56%)

Mutual labels: chinese

Bitextor

Bitextor generates translation memories from multilingual websites.

Stars: ✭ 168 (+133.33%)

Mutual labels: tokenizer

grasp

Essential NLP & ML, short & fast pure Python code

Stars: ✭ 58 (-19.44%)

Mutual labels: tokenizer

Query Translator

Query Translator is a search query translator with AST representation

Stars: ✭ 165 (+129.17%)

Mutual labels: tokenizer

vocascan-frontend

A highly configurable vocabulary trainer

Stars: ✭ 26 (-63.89%)

Mutual labels: words

Udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (+122.22%)

Mutual labels: tokenizer

tensorflow-chatbot-chinese

網頁聊天機器人 | tensorflow implementation of seq2seq model with bahdanau attention and Word2Vec pretrained embedding

Stars: ✭ 50 (-30.56%)

Mutual labels: chinese

Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Stars: ✭ 132 (+83.33%)

Mutual labels: tokenizer

chinese-learner

A desktop web application for learning Mandarin Chinese and its character stroke order.

Stars: ✭ 22 (-69.44%)

Mutual labels: chinese

Fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Stars: ✭ 125 (+73.61%)

Mutual labels: tokenizer

discussion

記錄有關繁化姬的議題或是內容

Stars: ✭ 33 (-54.17%)

Mutual labels: chinese

Syntok

Text tokenization and sentence segmentation (segtok v2)

Stars: ✭ 123 (+70.83%)

Mutual labels: tokenizer

dialectID siam

Dialect identification using Siamese network

Stars: ✭ 15 (-79.17%)

Mutual labels: words

Tokenizer

Source code tokenizer

Stars: ✭ 119 (+65.28%)

Mutual labels: tokenizer

anki-maobi

máobĭ (毛笔) is an Anki add-on to create cards with writing quizzes for Hanzi (Chinese characters)

Stars: ✭ 42 (-41.67%)

Mutual labels: chinese

Megamark

😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer

Stars: ✭ 100 (+38.89%)

Mutual labels: tokenizer

NLPDataAugmentation

Chinese NLP Data Augmentation， BERT Contextual Augmentation

Stars: ✭ 94 (+30.56%)

Mutual labels: chinese

Djurl

Simple yet helpful library for writing Django urls by an easy, short and intuitive way.

Stars: ✭ 85 (+18.06%)

Mutual labels: tokenizer

Roy VnTokenizer

Vietnamese tokenizer (Maximum Matching and CRF)

Stars: ✭ 49 (-31.94%)

Mutual labels: tokenizer

Sentence Splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Stars: ✭ 82 (+13.89%)

Mutual labels: tokenizer

suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

Stars: ✭ 31 (-56.94%)

Mutual labels: tokenizer

Wirb

Ruby Object Inspection for IRB

Stars: ✭ 69 (-4.17%)

Mutual labels: tokenizer

sinling

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (-47.22%)

Mutual labels: tokenizer

Thot

Thot toolkit for statistical machine translation

Stars: ✭ 53 (-26.39%)

Mutual labels: tokenizer

Tokenizer

A tokenizer for Icelandic text

Stars: ✭ 27 (-62.5%)

Mutual labels: tokenizer

Py Nltools

A collection of basic python modules for spoken natural language processing

Stars: ✭ 46 (-36.11%)

Mutual labels: tokenizer

Robot Arm Write Chinese

使用uArm Swift Pro机械臂写中文-毛笔字

Stars: ✭ 57 (-20.83%)

Mutual labels: chinese

Vanhiupun.github.io

🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.

Stars: ✭ 57 (-20.83%)

Mutual labels: chinese

ime.vim

A Vim input method engine

Stars: ✭ 74 (+2.78%)

Mutual labels: chinese

word2vec-movies

Bag of words meets bags of popcorn in Python 3 中文教程

Stars: ✭ 54 (-25%)

Mutual labels: chinese

Sublime-Fanhuaji

繁化姬的 Sublime Text 插件

Stars: ✭ 48 (-33.33%)

Mutual labels: chinese

1-60 of 437 similar projects

›

next*5