All Projects → cang-jie → Similar Projects or Alternatives

132 Open source projects that are alternatives of or similar to cang-jie

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

Stars: ✭ 313 (+552.08%)

Mutual labels: tokenizer, full-text-search

rustfst

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

Stars: ✭ 104 (+116.67%)

Mutual labels: tokenizer

chinese-tokenizer

Tokenizes Chinese texts into words.

Stars: ✭ 72 (+50%)

Mutual labels: tokenizer

Library-Spring

The library web application where you can borrow books. It's Spring MVC and Hibernate project.

Stars: ✭ 73 (+52.08%)

Mutual labels: full-text-search

python-mecab

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Stars: ✭ 27 (-43.75%)

Mutual labels: tokenizer

elasticsearch-plugins

Some native scoring script plugins for elasticsearch

Stars: ✭ 30 (-37.5%)

Mutual labels: tokenizer

Tokenizer

A tokenizer for Icelandic text

Stars: ✭ 27 (-43.75%)

Mutual labels: tokenizer

tokenizer

Tokenize CSS according to the CSS Syntax

Stars: ✭ 52 (+8.33%)

Mutual labels: tokenizer

lex

Lex is an implementation of lex tool in Ruby.

Stars: ✭ 49 (+2.08%)

Mutual labels: tokenizer

greeb

Greeb is a simple Unicode-aware regexp-based tokenizer.

Stars: ✭ 16 (-66.67%)

Mutual labels: tokenizer

Tokenizers

Fast, Consistent Tokenization of Natural Language Text

Stars: ✭ 161 (+235.42%)

Mutual labels: tokenizer

lunr-module

Full-text search with pre-build indexes for Nuxt.js using lunr.js

Stars: ✭ 45 (-6.25%)

Mutual labels: full-text-search

berserker

Berserker - BERt chineSE woRd toKenizER

Stars: ✭ 17 (-64.58%)

Mutual labels: tokenizer

snapdragon-lexer

Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.

Stars: ✭ 19 (-60.42%)

Mutual labels: tokenizer

poyonga

Python Groonga Client

Stars: ✭ 19 (-60.42%)

Mutual labels: full-text-search

suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

Stars: ✭ 31 (-35.42%)

Mutual labels: tokenizer

farasapy

A Python implementation of Farasa toolkit

Stars: ✭ 69 (+43.75%)

Mutual labels: tokenizer

lexertk

C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html

Stars: ✭ 26 (-45.83%)

Mutual labels: tokenizer

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Stars: ✭ 32 (-33.33%)

Mutual labels: tokenizer

sinling

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (-20.83%)

Mutual labels: tokenizer

psr2r-sniffer

A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions

Stars: ✭ 32 (-33.33%)

Mutual labels: tokenizer

Js Tokens

Tiny JavaScript tokenizer.

Stars: ✭ 166 (+245.83%)

Mutual labels: tokenizer

vscode-blockman

VSCode extension to highlight nested code blocks

Stars: ✭ 233 (+385.42%)

Mutual labels: tokenizer

hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Stars: ✭ 101 (+110.42%)

Mutual labels: tokenizer

Lex

Replaced by foonathan/lexy

Stars: ✭ 137 (+185.42%)

Mutual labels: tokenizer

Works For Me

Collection of developer toolkits

Stars: ✭ 131 (+172.92%)

Mutual labels: tokenizer

SwiLex

A universal lexer library in Swift.

Stars: ✭ 29 (-39.58%)

Mutual labels: tokenizer

wink-tokenizer

Multilingual tokenizer that automatically tags each token with its type

Stars: ✭ 51 (+6.25%)

Mutual labels: tokenizer

gd-tokenizer

A small godot project with a tokenizer written in GDScript.

Stars: ✭ 34 (-29.17%)

Mutual labels: tokenizer

gatsby-plugin-lunr

Gatsby plugin for full text search implementation based on lunr client-side index. Supports multilanguage search.

Stars: ✭ 69 (+43.75%)

Mutual labels: full-text-search

xontrib-output-search

Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.

Stars: ✭ 26 (-45.83%)

Mutual labels: tokenizer

jargon

Tokenizers and lemmatizers for Go

Stars: ✭ 98 (+104.17%)

Mutual labels: tokenizer

djangoqueries

The code of "Making queries" in docs.djangoproject.com that I used in my article "Full-Text Search in Django with PostgreSQL".

Stars: ✭ 39 (-18.75%)

Mutual labels: full-text-search

bredon

A modern CSS value compiler in JavaScript

Stars: ✭ 39 (-18.75%)

Mutual labels: tokenizer

understand-full-text-search

📖 Support examples for learning full-text search with use of PostgreSQL. Ready to run.

Stars: ✭ 98 (+104.17%)

Mutual labels: full-text-search

neural tokenizer

Tokenize English sentences using neural networks.

Stars: ✭ 64 (+33.33%)

Mutual labels: tokenizer

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-6.25%)

Mutual labels: tokenizer

paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents

Stars: ✭ 4,840 (+9983.33%)

Mutual labels: full-text-search

bulksearch

Lightweight and read-write optimized full text search library.

Stars: ✭ 108 (+125%)

Mutual labels: full-text-search

pg-search-sequelize

Postgres full-text search in Node.js and Sequelize.

Stars: ✭ 31 (-35.42%)

Mutual labels: full-text-search

grasp

Essential NLP & ML, short & fast pure Python code

Stars: ✭ 58 (+20.83%)

Mutual labels: tokenizer

text2text

Text2Text: Cross-lingual natural language processing and generation toolkit

Stars: ✭ 188 (+291.67%)

Mutual labels: tokenizer

Roy VnTokenizer

Vietnamese tokenizer (Maximum Matching and CRF)

Stars: ✭ 49 (+2.08%)

Mutual labels: tokenizer

mxusearch

🔍 基于讯搜封装的 Laravel 全文检索服务。

Stars: ✭ 40 (-16.67%)

Mutual labels: full-text-search

lnx

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable, typo tollerant deployment of the tantivy search engine. Standing on the shoulders of giants.

Stars: ✭ 844 (+1658.33%)

Mutual labels: tantivy

ilmulti

Tooling to play around with multilingual machine translation for Indian Languages.

Stars: ✭ 19 (-60.42%)

Mutual labels: tokenizer

Bitextor

Bitextor generates translation memories from multilingual websites.

Stars: ✭ 168 (+250%)

Mutual labels: tokenizer

CodeIndex

A Code Index Searching Tools Based On Lucene.Net