All Projects → tokenizer → Similar Projects or Alternatives

94 Open source projects that are alternatives of or similar to tokenizer

Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (+20.45%)
Mutual labels:  tokenizer
Laravel Token
Laravel token management
Stars: ✭ 10 (-77.27%)
Mutual labels:  tokenizer
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+200%)
Mutual labels:  tokenizer
Sentence Splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Stars: ✭ 82 (+86.36%)
Mutual labels:  tokenizer
Tokenizer
A small library for converting tokenized PHP source code into XML (and potentially other formats)
Stars: ✭ 4,770 (+10740.91%)
Mutual labels:  tokenizer
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+275%)
Mutual labels:  tokenizer
Sharpmath
A small .NET math library.
Stars: ✭ 36 (-18.18%)
Mutual labels:  tokenizer
Tokenizer
A tokenizer for Icelandic text
Stars: ✭ 27 (-38.64%)
Mutual labels:  tokenizer
Natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
Stars: ✭ 788 (+1690.91%)
Mutual labels:  tokenizer
Syntok
Text tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (+179.55%)
Mutual labels:  tokenizer
Djurl
Simple yet helpful library for writing Django urls by an easy, short and intuitive way.
Stars: ✭ 85 (+93.18%)
Mutual labels:  tokenizer
Moo
Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Stars: ✭ 434 (+886.36%)
Mutual labels:  tokenizer
Bitextor
Bitextor generates translation memories from multilingual websites.
Stars: ✭ 168 (+281.82%)
Mutual labels:  tokenizer
Wirb
Ruby Object Inspection for IRB
Stars: ✭ 69 (+56.82%)
Mutual labels:  tokenizer
suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-29.55%)
Mutual labels:  tokenizer
Py Nltools
A collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (+4.55%)
Mutual labels:  tokenizer
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+263.64%)
Mutual labels:  tokenizer
Omnicat Bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-31.82%)
Mutual labels:  tokenizer
python-mecab
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-38.64%)
Mutual labels:  tokenizer
Lisp Esque Language
💠The Lel programming language
Stars: ✭ 24 (-45.45%)
Mutual labels:  tokenizer
Fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (+184.09%)
Mutual labels:  tokenizer
Soynlp
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Stars: ✭ 613 (+1293.18%)
Mutual labels:  tokenizer
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+31.82%)
Mutual labels:  tokenizer
Smoothnlp
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Stars: ✭ 435 (+888.64%)
Mutual labels:  tokenizer
Tokenizer
Source code tokenizer
Stars: ✭ 119 (+170.45%)
Mutual labels:  tokenizer
Somajo
A tokenizer and sentence splitter for German and English web and social media texts.
Stars: ✭ 85 (+93.18%)
Mutual labels:  tokenizer
Php Parser
🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (+809.09%)
Mutual labels:  tokenizer
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-63.64%)
Mutual labels:  tokenizer
Hippo
PHP standards checker.
Stars: ✭ 82 (+86.36%)
Mutual labels:  tokenizer
chinese-tokenizer
Tokenizes Chinese texts into words.
Stars: ✭ 72 (+63.64%)
Mutual labels:  tokenizer
Cols Agent Tasks
Colin's ALM Corner Custom Build Tasks
Stars: ✭ 70 (+59.09%)
Mutual labels:  tokenizer
Js Tokens
Tiny JavaScript tokenizer.
Stars: ✭ 166 (+277.27%)
Mutual labels:  tokenizer
String Calc
PHP calculator library for mathematical terms (expressions) passed as strings
Stars: ✭ 60 (+36.36%)
Mutual labels:  tokenizer
gd-tokenizer
A small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (-22.73%)
Mutual labels:  tokenizer
Greynir
The greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (+6.82%)
Mutual labels:  tokenizer
Tokenizers
Fast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+265.91%)
Mutual labels:  tokenizer
Talismane
NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
Stars: ✭ 38 (-13.64%)
Mutual labels:  tokenizer
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+2.27%)
Mutual labels:  tokenizer
Nlp Js Tools French
POS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (-27.27%)
Mutual labels:  tokenizer
Lex
Replaced by foonathan/lexy
Stars: ✭ 137 (+211.36%)
Mutual labels:  tokenizer
Lfuzzer
Fuzzing Parsers with Tokens
Stars: ✭ 28 (-36.36%)
Mutual labels:  tokenizer
lindera
A morphological analysis library.
Stars: ✭ 226 (+413.64%)
Mutual labels:  tokenizer
React Input Tags
React component for tagging inputs.
Stars: ✭ 10 (-77.27%)
Mutual labels:  tokenizer
Works For Me
Collection of developer toolkits
Stars: ✭ 131 (+197.73%)
Mutual labels:  tokenizer
Snl Compiler
SNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析
Stars: ✭ 19 (-56.82%)
Mutual labels:  tokenizer
lexertk
C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (-40.91%)
Mutual labels:  tokenizer
Mustard
🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Stars: ✭ 689 (+1465.91%)
Mutual labels:  tokenizer
Chevrotain
Parser Building Toolkit for JavaScript
Stars: ✭ 1,795 (+3979.55%)
Mutual labels:  tokenizer
Kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+1159.09%)
Mutual labels:  tokenizer
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-40.91%)
Mutual labels:  tokenizer
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+895.45%)
Mutual labels:  tokenizer
Japanesetokenizers
aim to use JapaneseTokenizer as easy as possible
Stars: ✭ 120 (+172.73%)
Mutual labels:  tokenizer
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+884.09%)
Mutual labels:  tokenizer
Roy VnTokenizer
Vietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (+11.36%)
Mutual labels:  tokenizer
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (+145.45%)
Mutual labels:  tokenizer
alexa-ruby
Ruby toolkit for Amazon Alexa service
Stars: ✭ 17 (-61.36%)
Mutual labels:  rubynlp
SwiLex
A universal lexer library in Swift.
Stars: ✭ 29 (-34.09%)
Mutual labels:  tokenizer
snapdragon-lexer
Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Stars: ✭ 19 (-56.82%)
Mutual labels:  tokenizer
sinling
A collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-13.64%)
Mutual labels:  tokenizer
Megamark
😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Stars: ✭ 100 (+127.27%)
Mutual labels:  tokenizer
1-60 of 94 similar projects