All Projects → tokenizer → Similar Projects or Alternatives

94 Open source projects that are alternatives of or similar to tokenizer

Thot toolkit for statistical machine translation

Stars: ✭ 53 (+20.45%)

Mutual labels: tokenizer

Laravel token management

Stars: ✭ 10 (-77.27%)

Mutual labels: tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Stars: ✭ 132 (+200%)

Mutual labels: tokenizer

Sentence Splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Stars: ✭ 82 (+86.36%)

Mutual labels: tokenizer

A small library for converting tokenized PHP source code into XML (and potentially other formats)

Stars: ✭ 4,770 (+10740.91%)

Mutual labels: tokenizer

Query Translator

Query Translator is a search query translator with AST representation

Stars: ✭ 165 (+275%)

Mutual labels: tokenizer

A small .NET math library.

Stars: ✭ 36 (-18.18%)

Mutual labels: tokenizer

A tokenizer for Icelandic text

Stars: ✭ 27 (-38.64%)

Mutual labels: tokenizer

Solves basic Russian NLP tasks, API for lower level Natasha projects

Stars: ✭ 788 (+1690.91%)

Mutual labels: tokenizer

Text tokenization and sentence segmentation (segtok v2)

Stars: ✭ 123 (+179.55%)

Mutual labels: tokenizer

Simple yet helpful library for writing Django urls by an easy, short and intuitive way.

Stars: ✭ 85 (+93.18%)

Mutual labels: tokenizer

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

Stars: ✭ 434 (+886.36%)

Mutual labels: tokenizer

Bitextor generates translation memories from multilingual websites.

Stars: ✭ 168 (+281.82%)

Mutual labels: tokenizer

Ruby Object Inspection for IRB

Stars: ✭ 69 (+56.82%)

Mutual labels: tokenizer

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby

Stars: ✭ 31 (-29.55%)

Mutual labels: tokenizer

A collection of basic python modules for spoken natural language processing

Stars: ✭ 46 (+4.55%)

Mutual labels: tokenizer

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (+263.64%)

Mutual labels: tokenizer

Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)

Stars: ✭ 30 (-31.82%)

Mutual labels: tokenizer

A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)

Stars: ✭ 27 (-38.64%)

Mutual labels: tokenizer

Lisp Esque Language

💠The Lel programming language

Stars: ✭ 24 (-45.45%)

Mutual labels: tokenizer

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Stars: ✭ 125 (+184.09%)

Mutual labels: tokenizer

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

Stars: ✭ 613 (+1293.18%)

Mutual labels: tokenizer

Essential NLP & ML, short & fast pure Python code

Stars: ✭ 58 (+31.82%)

Mutual labels: tokenizer

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference

Stars: ✭ 435 (+888.64%)

Mutual labels: tokenizer

Source code tokenizer

Stars: ✭ 119 (+170.45%)

Mutual labels: tokenizer

A tokenizer and sentence splitter for German and English web and social media texts.

Stars: ✭ 85 (+93.18%)

Mutual labels: tokenizer

🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)

Stars: ✭ 400 (+809.09%)

Mutual labels: tokenizer

Greeb is a simple Unicode-aware regexp-based tokenizer.

Stars: ✭ 16 (-63.64%)

Mutual labels: tokenizer

PHP standards checker.

Stars: ✭ 82 (+86.36%)

Mutual labels: tokenizer

chinese-tokenizer

Tokenizes Chinese texts into words.

Stars: ✭ 72 (+63.64%)

Mutual labels: tokenizer

Cols Agent Tasks

Colin's ALM Corner Custom Build Tasks

Stars: ✭ 70 (+59.09%)

Mutual labels: tokenizer

Tiny JavaScript tokenizer.

Stars: ✭ 166 (+277.27%)

Mutual labels: tokenizer

PHP calculator library for mathematical terms (expressions) passed as strings

Stars: ✭ 60 (+36.36%)

Mutual labels: tokenizer

A small godot project with a tokenizer written in GDScript.

Stars: ✭ 34 (-22.73%)

Mutual labels: tokenizer

The greynir.is natural language processing website for Icelandic

Stars: ✭ 47 (+6.82%)

Mutual labels: tokenizer

Fast, Consistent Tokenization of Natural Language Text

Stars: ✭ 161 (+265.91%)

Mutual labels: tokenizer

NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser

Stars: ✭ 38 (-13.64%)

Mutual labels: tokenizer

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+2.27%)

Mutual labels: tokenizer

Nlp Js Tools French

POS Tagger, lemmatizer and stemmer for french language in javascript

Stars: ✭ 32 (-27.27%)

Mutual labels: tokenizer

Replaced by foonathan/lexy

Stars: ✭ 137 (+211.36%)

Mutual labels: tokenizer

Fuzzing Parsers with Tokens

Stars: ✭ 28 (-36.36%)

Mutual labels: tokenizer

A morphological analysis library.

Stars: ✭ 226 (+413.64%)

Mutual labels: tokenizer

React Input Tags

React component for tagging inputs.

Stars: ✭ 10 (-77.27%)

Mutual labels: tokenizer

Collection of developer toolkits

Stars: ✭ 131 (+197.73%)

Mutual labels: tokenizer

SNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理词法分析语法分析

Stars: ✭ 19 (-56.82%)

Mutual labels: tokenizer

C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html

Stars: ✭ 26 (-40.91%)

Mutual labels: tokenizer

🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.

Stars: ✭ 689 (+1465.91%)

Mutual labels: tokenizer

Parser Building Toolkit for JavaScript

Stars: ✭ 1,795 (+3979.55%)

Mutual labels: tokenizer

Self-contained Japanese Morphological Analyzer written in pure Go

Stars: ✭ 554 (+1159.09%)

Mutual labels: tokenizer

xontrib-output-search

Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.

Stars: ✭ 26 (-40.91%)

Mutual labels: tokenizer

Open Korean Text

Open Korean Text Processor - An Open-source Korean Text Processor

Stars: ✭ 438 (+895.45%)

Mutual labels: tokenizer

Japanesetokenizers

aim to use JapaneseTokenizer as easy as possible

Stars: ✭ 120 (+172.73%)

Mutual labels: tokenizer

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+884.09%)

Mutual labels: tokenizer

Roy VnTokenizer

Vietnamese tokenizer (Maximum Matching and CRF)

Stars: ✭ 49 (+11.36%)

Mutual labels: tokenizer

Kadot, the unsupervised natural language processing library.

Stars: ✭ 108 (+145.45%)

Mutual labels: tokenizer

Ruby toolkit for Amazon Alexa service

Stars: ✭ 17 (-61.36%)

Mutual labels: rubynlp

A universal lexer library in Swift.

Stars: ✭ 29 (-34.09%)

Mutual labels: tokenizer

snapdragon-lexer

Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.

Stars: ✭ 19 (-56.82%)

Mutual labels: tokenizer

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (-13.64%)

Mutual labels: tokenizer

😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer

Stars: ✭ 100 (+127.27%)

Mutual labels: tokenizer

1-60 of 94 similar projects