FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+552.08%)
rustfstRust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+116.67%)
Library-SpringThe library web application where you can borrow books. It's Spring MVC and Hibernate project.
Stars: ✭ 73 (+52.08%)
python-mecabA repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-43.75%)
TokenizerA tokenizer for Icelandic text
Stars: ✭ 27 (-43.75%)
tokenizerTokenize CSS according to the CSS Syntax
Stars: ✭ 52 (+8.33%)
lexLex is an implementation of lex tool in Ruby.
Stars: ✭ 49 (+2.08%)
greebGreeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-66.67%)
TokenizersFast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+235.42%)
lunr-moduleFull-text search with pre-build indexes for Nuxt.js using lunr.js
Stars: ✭ 45 (-6.25%)
berserkerBerserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-64.58%)
snapdragon-lexerConverts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Stars: ✭ 19 (-60.42%)
poyongaPython Groonga Client
Stars: ✭ 19 (-60.42%)
suikaSuika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-35.42%)
farasapyA Python implementation of Farasa toolkit
Stars: ✭ 69 (+43.75%)
lexertkC++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-45.83%)
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-33.33%)
sinlingA collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-20.83%)
psr2r-snifferA PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
Stars: ✭ 32 (-33.33%)
Js TokensTiny JavaScript tokenizer.
Stars: ✭ 166 (+245.83%)
vscode-blockmanVSCode extension to highlight nested code blocks
Stars: ✭ 233 (+385.42%)
hunspellHigh-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+110.42%)
LexReplaced by foonathan/lexy
Stars: ✭ 137 (+185.42%)
Works For MeCollection of developer toolkits
Stars: ✭ 131 (+172.92%)
SwiLexA universal lexer library in Swift.
Stars: ✭ 29 (-39.58%)
wink-tokenizerMultilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+6.25%)
gd-tokenizerA small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (-29.17%)
gatsby-plugin-lunrGatsby plugin for full text search implementation based on lunr client-side index. Supports multilanguage search.
Stars: ✭ 69 (+43.75%)
xontrib-output-searchGet identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-45.83%)
jargonTokenizers and lemmatizers for Go
Stars: ✭ 98 (+104.17%)
djangoqueriesThe code of "Making queries" in docs.djangoproject.com that I used in my article "Full-Text Search in Django with PostgreSQL".
Stars: ✭ 39 (-18.75%)
bredonA modern CSS value compiler in JavaScript
Stars: ✭ 39 (-18.75%)
understand-full-text-search📖 Support examples for learning full-text search with use of PostgreSQL. Ready to run.
Stars: ✭ 98 (+104.17%)
neural tokenizerTokenize English sentences using neural networks.
Stars: ✭ 64 (+33.33%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-6.25%)
paperless-ngA supercharged version of paperless: scan, index and archive all your physical documents
Stars: ✭ 4,840 (+9983.33%)
bulksearchLightweight and read-write optimized full text search library.
Stars: ✭ 108 (+125%)
graspEssential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+20.83%)
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+291.67%)
Roy VnTokenizerVietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (+2.08%)
mxusearch🔍 基于讯搜封装的 Laravel 全文检索服务。
Stars: ✭ 40 (-16.67%)
lnx⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable, typo tollerant deployment of the tantivy search engine. Standing on the shoulders of giants.
Stars: ✭ 844 (+1658.33%)
ilmultiTooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-60.42%)
BitextorBitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+250%)
CodeIndexA Code Index Searching Tools Based On Lucene.Net
Stars: ✭ 28 (-41.67%)
Query TranslatorQuery Translator is a search query translator with AST representation
Stars: ✭ 165 (+243.75%)
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-56.25%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+233.33%)
fts🔍 Postgres full-text search (fts)
Stars: ✭ 28 (-41.67%)
TokenizerFast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+175%)
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+160.42%)
tokenizerA simple tokenizer in Ruby for NLP tasks.
Stars: ✭ 44 (-8.33%)
PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-70.83%)
lucillaFast, efficient, in-memory Full Text Search for Kotlin
Stars: ✭ 102 (+112.5%)
bukefull text search manpages
Stars: ✭ 27 (-43.75%)
liblexC library for Lexical Analysis
Stars: ✭ 25 (-47.92%)