FrisoHigh performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+552.08%)
Mutual labels: tokenizer, full-text-search
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-33.33%)
Mutual labels: tokenizer
elasticsearch-pluginsSome native scoring script plugins for elasticsearch
Stars: ✭ 30 (-37.5%)
Mutual labels: tokenizer
tokenizerTokenize CSS according to the CSS Syntax
Stars: ✭ 52 (+8.33%)
Mutual labels: tokenizer
berserkerBerserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-64.58%)
Mutual labels: tokenizer
poyongaPython Groonga Client
Stars: ✭ 19 (-60.42%)
Mutual labels: full-text-search
farasapyA Python implementation of Farasa toolkit
Stars: ✭ 69 (+43.75%)
Mutual labels: tokenizer
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+291.67%)
Mutual labels: tokenizer
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-56.25%)
Mutual labels: tokenizer
ilmultiTooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-60.42%)
Mutual labels: tokenizer
vscode-blockmanVSCode extension to highlight nested code blocks
Stars: ✭ 233 (+385.42%)
Mutual labels: tokenizer
wink-tokenizerMultilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+6.25%)
Mutual labels: tokenizer
gatsby-plugin-lunrGatsby plugin for full text search implementation based on lunr client-side index. Supports multilanguage search.
Stars: ✭ 69 (+43.75%)
Mutual labels: full-text-search
jargonTokenizers and lemmatizers for Go
Stars: ✭ 98 (+104.17%)
Mutual labels: tokenizer
bredonA modern CSS value compiler in JavaScript
Stars: ✭ 39 (-18.75%)
Mutual labels: tokenizer
neural tokenizerTokenize English sentences using neural networks.
Stars: ✭ 64 (+33.33%)
Mutual labels: tokenizer
wink-bm25-text-searchFast Full Text Search based on BM25
Stars: ✭ 44 (-8.33%)
Mutual labels: full-text-search
paperless-ngA supercharged version of paperless: scan, index and archive all your physical documents
Stars: ✭ 4,840 (+9983.33%)
Mutual labels: full-text-search
PaddleTokenizer使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-70.83%)
Mutual labels: tokenizer
lucillaFast, efficient, in-memory Full Text Search for Kotlin
Stars: ✭ 102 (+112.5%)
Mutual labels: full-text-search