All Projects → suika → Similar Projects or Alternatives

118 Open source projects that are alternatives of or similar to suika

Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+1687.1%)
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (+3.23%)
Jumanpp
Juman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+719.35%)
Nlp Js Tools French
POS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (+3.23%)
Mutual labels:  tokenizer
Greynir
The greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (+51.61%)
Mutual labels:  tokenizer
Works For Me
Collection of developer toolkits
Stars: ✭ 131 (+322.58%)
Mutual labels:  tokenizer
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-48.39%)
Mutual labels:  tokenizer
React Input Tags
React component for tagging inputs.
Stars: ✭ 10 (-67.74%)
Mutual labels:  tokenizer
Japanesetokenizers
aim to use JapaneseTokenizer as easy as possible
Stars: ✭ 120 (+287.1%)
Mutual labels:  tokenizer
Mustard
🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Stars: ✭ 689 (+2122.58%)
Mutual labels:  tokenizer
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+1312.9%)
Mutual labels:  tokenizer
String Calc
PHP calculator library for mathematical terms (expressions) passed as strings
Stars: ✭ 60 (+93.55%)
Mutual labels:  tokenizer
Lex
Replaced by foonathan/lexy
Stars: ✭ 137 (+341.94%)
Mutual labels:  tokenizer
Talismane
NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Stars: ✭ 38 (+22.58%)
Mutual labels:  tokenizer
Roy VnTokenizer
Vietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (+58.06%)
Mutual labels:  tokenizer
Lfuzzer
Fuzzing Parsers with Tokens
Stars: ✭ 28 (-9.68%)
Mutual labels:  tokenizer
Chevrotain
Parser Building Toolkit for JavaScript
Stars: ✭ 1,795 (+5690.32%)
Mutual labels:  tokenizer
Snl Compiler
SNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析
Stars: ✭ 19 (-38.71%)
Mutual labels:  tokenizer
yap
Yet Another (natural language) Parser
Stars: ✭ 40 (+29.03%)
Mutual labels:  morphological-analysis
Soynlp
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+1877.42%)
Mutual labels:  tokenizer
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (+248.39%)
Mutual labels:  tokenizer
Js Tokens
Tiny JavaScript tokenizer.
Stars: ✭ 166 (+435.48%)
Mutual labels:  tokenizer
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+1296.77%)
Mutual labels:  tokenizer
Somajo
A tokenizer and sentence splitter for German and English web and social media texts.
Stars: ✭ 85 (+174.19%)
Mutual labels:  tokenizer
Php Parser
🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (+1190.32%)
Mutual labels:  tokenizer
Lexmachine
Lex machinary for go.
Stars: ✭ 335 (+980.65%)
Mutual labels:  tokenizer
Wirb
Ruby Object Inspection for IRB
Stars: ✭ 69 (+122.58%)
Mutual labels:  tokenizer
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+416.13%)
Mutual labels:  tokenizer
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (+70.97%)
Mutual labels:  tokenizer
Neural-Morphological-Disambiguation-for-Turkish-DEPRECATED
Neural morphological disambiguation for Turkish. Implemented in DyNet
Stars: ✭ 11 (-64.52%)
Mutual labels:  morphological-analysis
Py Nltools
A collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (+48.39%)
Mutual labels:  tokenizer
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+325.81%)
Mutual labels:  tokenizer
Sharpmath
A small .NET math library.
Stars: ✭ 36 (+16.13%)
Mutual labels:  tokenizer
Tokenizer
A tokenizer for Icelandic text
Stars: ✭ 27 (-12.9%)
Mutual labels:  tokenizer
Omnicat Bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-3.23%)
Mutual labels:  tokenizer
Fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+303.23%)
Mutual labels:  tokenizer
Laravel Token
Laravel token management
Stars: ✭ 10 (-67.74%)
Mutual labels:  tokenizer
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (+22.58%)
Mutual labels:  tokenizer
Lisp Esque Language
💠The Lel programming language
Stars: ✭ 24 (-22.58%)
Mutual labels:  tokenizer
Syntok
Text tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+296.77%)
Mutual labels:  tokenizer
Natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+2441.94%)
Mutual labels:  tokenizer
Quantitative-Big-Imaging-2018
(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (+61.29%)
Mutual labels:  morphological-analysis
Tokenizer
Source code tokenizer
Stars: ✭ 119 (+283.87%)
Mutual labels:  tokenizer
Hippo
PHP standards checker.
Stars: ✭ 82 (+164.52%)
Mutual labels:  tokenizer
Sentences
A multilingual command line sentence tokenizer in Golang
Stars: ✭ 293 (+845.16%)
Mutual labels:  tokenizer
Tokenizer
A small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+15287.1%)
Mutual labels:  tokenizer
Bitextor
Bitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+441.94%)
Mutual labels:  tokenizer
Smoothnlp
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+1303.23%)
Mutual labels:  tokenizer
Megamark
😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+222.58%)
Mutual labels:  tokenizer
Moo
Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+1300%)
Mutual labels:  tokenizer
lexertk
C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-16.13%)
Mutual labels:  tokenizer
Jflex
The fast scanner generator for Java™ with full Unicode support
Stars: ✭ 380 (+1125.81%)
Mutual labels:  tokenizer
Djurl
Simple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+174.19%)
Mutual labels:  tokenizer
Friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+909.68%)
Mutual labels:  tokenizer
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+432.26%)
Mutual labels:  tokenizer
Sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+845.16%)
Mutual labels:  tokenizer
Sentence Splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+164.52%)
Mutual labels:  tokenizer
zeyrek
Python morphological analyzer for Turkish language. Partial port of ZemberekNLP.
Stars: ✭ 36 (+16.13%)
Mutual labels:  morphological-analysis
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+45.16%)
Mutual labels:  tokenizer
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+87.1%)
Mutual labels:  tokenizer
1-60 of 118 similar projects