Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

Stars: ✭ 104 (-66.77%)

Mutual labels: tokenizer

ex elasticlunr

Elasticlunr is a small, full-text search library for use in the Elixir environment. It indexes JSON documents and provides a friendly search interface to retrieve documents.

Stars: ✭ 125 (-60.06%)

Mutual labels: full-text-search

CodeIndex

A Code Index Searching Tools Based On Lucene.Net

Stars: ✭ 28 (-91.05%)

Mutual labels: full-text-search

poyonga

Python Groonga Client

Stars: ✭ 19 (-93.93%)

Mutual labels: full-text-search

tokenizer

A simple tokenizer in Ruby for NLP tasks.

Stars: ✭ 44 (-85.94%)

Mutual labels: tokenizer

pascal-interpreter

A simple interpreter for a large subset of Pascal language written for educational purposes

Stars: ✭ 21 (-93.29%)

Mutual labels: tokenizer

gd-tokenizer

A small godot project with a tokenizer written in GDScript.

Stars: ✭ 34 (-89.14%)

Mutual labels: tokenizer

vscode-blockman

VSCode extension to highlight nested code blocks

Stars: ✭ 233 (-25.56%)

Mutual labels: tokenizer

djangoqueries

The code of "Making queries" in docs.djangoproject.com that I used in my article "Full-Text Search in Django with PostgreSQL".

Stars: ✭ 39 (-87.54%)

Mutual labels: full-text-search

PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

Stars: ✭ 14 (-95.53%)

Mutual labels: tokenizer

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-85.62%)

Mutual labels: tokenizer

wink-tokenizer

Multilingual tokenizer that automatically tags each token with its type

Stars: ✭ 51 (-83.71%)

Mutual labels: tokenizer

bulksearch

Lightweight and read-write optimized full text search library.

Stars: ✭ 108 (-65.5%)

Mutual labels: full-text-search

Cross-Domain-CWS

Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"

Stars: ✭ 14 (-95.53%)

Mutual labels: chinese-word-segmentation

bredon

A modern CSS value compiler in JavaScript

Stars: ✭ 39 (-87.54%)

Mutual labels: tokenizer

neural tokenizer

Tokenize English sentences using neural networks.

Stars: ✭ 64 (-79.55%)

Mutual labels: tokenizer

rgpipe

lesspipe for ripgrep for common new filetypes using few dependencies

Stars: ✭ 21 (-93.29%)

Mutual labels: full-text-search

pg-search-sequelize

Postgres full-text search in Node.js and Sequelize.

Stars: ✭ 31 (-90.1%)

Mutual labels: full-text-search

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Stars: ✭ 21 (-93.29%)

Mutual labels: tokenizer

mxusearch

🔍 基于讯搜封装的 Laravel 全文检索服务。

Stars: ✭ 40 (-87.22%)

Mutual labels: full-text-search

Jumanpp

Juman++ (a Morphological Analyzer Toolkit)

Stars: ✭ 254 (-18.85%)

Mutual labels: tokenizer

psr2r-sniffer

A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions

Stars: ✭ 32 (-89.78%)

Mutual labels: tokenizer

gatsby-plugin-lunr

Gatsby plugin for full text search implementation based on lunr client-side index. Supports multilanguage search.

Stars: ✭ 69 (-77.96%)

Mutual labels: full-text-search

lex

Lex is an implementation of lex tool in Ruby.

Stars: ✭ 49 (-84.35%)

Mutual labels: tokenizer

search-for-kirby

Kirby 3 plugin for adding a search index (sqlite or Algolia).

Stars: ✭ 42 (-86.58%)

Mutual labels: full-text-search

hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Stars: ✭ 101 (-67.73%)

Mutual labels: tokenizer

paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents

Stars: ✭ 4,840 (+1446.33%)

Mutual labels: full-text-search

lindera

A morphological analysis library.

Stars: ✭ 226 (-27.8%)

Mutual labels: tokenizer

Sacremoses

Python port of Moses tokenizer, truecaser and normalizer

Stars: ✭ 293 (-6.39%)

Mutual labels: tokenizer

lunr-module

Full-text search with pre-build indexes for Nuxt.js using lunr.js

Stars: ✭ 45 (-85.62%)

Mutual labels: full-text-search

ilmulti

Tooling to play around with multilingual machine translation for Indian Languages.

Stars: ✭ 19 (-93.93%)

Mutual labels: tokenizer

python-mecab

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Stars: ✭ 27 (-91.37%)

Mutual labels: tokenizer

snapdragon-lexer

Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.

Stars: ✭ 19 (-93.93%)

Mutual labels: tokenizer

nlpir-analysis-cn-ictclas

Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。

Stars: ✭ 71 (-77.32%)

Mutual labels: chinese-word-segmentation

chinese-tokenizer

Tokenizes Chinese texts into words.

Stars: ✭ 72 (-77%)

Mutual labels: tokenizer

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-93.93%)

Mutual labels: tokenizer

suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby