All Projects → cang-jie → Similar Projects or Alternatives

132 Open source projects that are alternatives of or similar to cang-jie

Friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Stars: ✭ 313 (+552.08%)
Mutual labels:  tokenizer, full-text-search
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+116.67%)
Mutual labels:  tokenizer
chinese-tokenizer
Tokenizes Chinese texts into words.
Stars: ✭ 72 (+50%)
Mutual labels:  tokenizer
Library-Spring
The library web application where you can borrow books. It's Spring MVC and Hibernate project.
Stars: ✭ 73 (+52.08%)
Mutual labels:  full-text-search
python-mecab
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-43.75%)
Mutual labels:  tokenizer
elasticsearch-plugins
Some native scoring script plugins for elasticsearch
Stars: ✭ 30 (-37.5%)
Mutual labels:  tokenizer
Tokenizer
A tokenizer for Icelandic text
Stars: ✭ 27 (-43.75%)
Mutual labels:  tokenizer
tokenizer
Tokenize CSS according to the CSS Syntax
Stars: ✭ 52 (+8.33%)
Mutual labels:  tokenizer
lex
Lex is an implementation of lex tool in Ruby.
Stars: ✭ 49 (+2.08%)
Mutual labels:  tokenizer
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-66.67%)
Mutual labels:  tokenizer
Tokenizers
Fast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+235.42%)
Mutual labels:  tokenizer
lunr-module
Full-text search with pre-build indexes for Nuxt.js using lunr.js
Stars: ✭ 45 (-6.25%)
Mutual labels:  full-text-search
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-64.58%)
Mutual labels:  tokenizer
snapdragon-lexer
Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Stars: ✭ 19 (-60.42%)
Mutual labels:  tokenizer
poyonga
Python Groonga Client
Stars: ✭ 19 (-60.42%)
Mutual labels:  full-text-search
suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-35.42%)
Mutual labels:  tokenizer
farasapy
A Python implementation of Farasa toolkit
Stars: ✭ 69 (+43.75%)
Mutual labels:  tokenizer
lexertk
C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-45.83%)
Mutual labels:  tokenizer
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-33.33%)
Mutual labels:  tokenizer
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-20.83%)
Mutual labels:  tokenizer
psr2r-sniffer
A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
Stars: ✭ 32 (-33.33%)
Mutual labels:  tokenizer
Js Tokens
Tiny JavaScript tokenizer.
Stars: ✭ 166 (+245.83%)
Mutual labels:  tokenizer
vscode-blockman
VSCode extension to highlight nested code blocks
Stars: ✭ 233 (+385.42%)
Mutual labels:  tokenizer
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+110.42%)
Mutual labels:  tokenizer
Lex
Replaced by foonathan/lexy
Stars: ✭ 137 (+185.42%)
Mutual labels:  tokenizer
Works For Me
Collection of developer toolkits
Stars: ✭ 131 (+172.92%)
Mutual labels:  tokenizer
SwiLex
A universal lexer library in Swift.
Stars: ✭ 29 (-39.58%)
Mutual labels:  tokenizer
wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+6.25%)
Mutual labels:  tokenizer
gd-tokenizer
A small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (-29.17%)
Mutual labels:  tokenizer
gatsby-plugin-lunr
Gatsby plugin for full text search implementation based on lunr client-side index. Supports multilanguage search.
Stars: ✭ 69 (+43.75%)
Mutual labels:  full-text-search
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-45.83%)
Mutual labels:  tokenizer
jargon
Tokenizers and lemmatizers for Go
Stars: ✭ 98 (+104.17%)
Mutual labels:  tokenizer
djangoqueries
The code of "Making queries" in docs.djangoproject.com that I used in my article "Full-Text Search in Django with PostgreSQL".
Stars: ✭ 39 (-18.75%)
Mutual labels:  full-text-search
bredon
A modern CSS value compiler in JavaScript
Stars: ✭ 39 (-18.75%)
Mutual labels:  tokenizer
understand-full-text-search
📖 Support examples for learning full-text search with use of PostgreSQL. Ready to run.
Stars: ✭ 98 (+104.17%)
Mutual labels:  full-text-search
neural tokenizer
Tokenize English sentences using neural networks.
Stars: ✭ 64 (+33.33%)
Mutual labels:  tokenizer
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-6.25%)
Mutual labels:  tokenizer
paperless-ng
A supercharged version of paperless: scan, index and archive all your physical documents
Stars: ✭ 4,840 (+9983.33%)
Mutual labels:  full-text-search
bulksearch
Lightweight and read-write optimized full text search library.
Stars: ✭ 108 (+125%)
Mutual labels:  full-text-search
pg-search-sequelize
Postgres full-text search in Node.js and Sequelize.
Stars: ✭ 31 (-35.42%)
Mutual labels:  full-text-search
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+20.83%)
Mutual labels:  tokenizer
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+291.67%)
Mutual labels:  tokenizer
Roy VnTokenizer
Vietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (+2.08%)
Mutual labels:  tokenizer
mxusearch
🔍 基于讯搜封装的 Laravel 全文检索服务。
Stars: ✭ 40 (-16.67%)
Mutual labels:  full-text-search
lnx
⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable, typo tollerant deployment of the tantivy search engine. Standing on the shoulders of giants.
Stars: ✭ 844 (+1658.33%)
Mutual labels:  tantivy
ilmulti
Tooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-60.42%)
Mutual labels:  tokenizer
Bitextor
Bitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+250%)
Mutual labels:  tokenizer
CodeIndex
A Code Index Searching Tools Based On Lucene.Net
Stars: ✭ 28 (-41.67%)
Mutual labels:  full-text-search
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+243.75%)
Mutual labels:  tokenizer
mystem-scala
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-56.25%)
Mutual labels:  tokenizer
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+233.33%)
Mutual labels:  tokenizer
fts
🔍 Postgres full-text search (fts)
Stars: ✭ 28 (-41.67%)
Mutual labels:  full-text-search
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+175%)
Mutual labels:  tokenizer
wink-bm25-text-search
Fast Full Text Search based on BM25
Stars: ✭ 44 (-8.33%)
Mutual labels:  full-text-search
Fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+160.42%)
Mutual labels:  tokenizer
tokenizer
A simple tokenizer in Ruby for NLP tasks.
Stars: ✭ 44 (-8.33%)
Mutual labels:  tokenizer
PaddleTokenizer
使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-70.83%)
Mutual labels:  tokenizer
lucilla
Fast, efficient, in-memory Full Text Search for Kotlin
Stars: ✭ 102 (+112.5%)
Mutual labels:  full-text-search
buke
full text search manpages
Stars: ✭ 27 (-43.75%)
Mutual labels:  full-text-search
liblex
C library for Lexical Analysis
Stars: ✭ 25 (-47.92%)
Mutual labels:  tokenizer
1-60 of 132 similar projects