Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Stars: ✭ 433 (+1296.77%)

Mutual labels: tokenizer

Somajo

A tokenizer and sentence splitter for German and English web and social media texts.

Stars: ✭ 85 (+174.19%)

Mutual labels: tokenizer

Php Parser

🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)

Stars: ✭ 400 (+1190.32%)

Mutual labels: tokenizer

Lexmachine

Lex machinary for go.

Stars: ✭ 335 (+980.65%)

Mutual labels: tokenizer

Wirb

Ruby Object Inspection for IRB

Stars: ✭ 69 (+122.58%)

Mutual labels: tokenizer

Udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (+416.13%)

Mutual labels: tokenizer

Thot

Thot toolkit for statistical machine translation

Stars: ✭ 53 (+70.97%)

Mutual labels: tokenizer

Neural-Morphological-Disambiguation-for-Turkish-DEPRECATED

Neural morphological disambiguation for Turkish. Implemented in DyNet

Stars: ✭ 11 (-64.52%)

Mutual labels: morphological-analysis

Py Nltools

A collection of basic python modules for spoken natural language processing

Stars: ✭ 46 (+48.39%)

Mutual labels: tokenizer

Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Stars: ✭ 132 (+325.81%)

Mutual labels: tokenizer

Sharpmath

A small .NET math library.

Stars: ✭ 36 (+16.13%)

Mutual labels: tokenizer

Tokenizer

A tokenizer for Icelandic text

Stars: ✭ 27 (-12.9%)

Mutual labels: tokenizer

Omnicat Bayes

Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)

Stars: ✭ 30 (-3.23%)

Mutual labels: tokenizer

Fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Stars: ✭ 125 (+303.23%)

Mutual labels: tokenizer

Laravel Token

Laravel token management

Stars: ✭ 10 (-67.74%)

Mutual labels: tokenizer

sinling

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (+22.58%)

Mutual labels: tokenizer

Lisp Esque Language

💠The Lel programming language

Stars: ✭ 24 (-22.58%)

Mutual labels: tokenizer

Syntok

Text tokenization and sentence segmentation (segtok v2)

Stars: ✭ 123 (+296.77%)

Mutual labels: tokenizer

Natasha

Solves basic Russian NLP tasks, API for lower level Natasha projects

Stars: ✭ 788 (+2441.94%)

Mutual labels: tokenizer

Quantitative-Big-Imaging-2018

(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (+61.29%)

Mutual labels: morphological-analysis

Tokenizer

Source code tokenizer

Stars: ✭ 119 (+283.87%)

Mutual labels: tokenizer

Hippo

PHP standards checker.

Stars: ✭ 82 (+164.52%)

Mutual labels: tokenizer

Sentences

A multilingual command line sentence tokenizer in Golang

Stars: ✭ 293 (+845.16%)

Mutual labels: tokenizer

Tokenizer

A small library for converting tokenized PHP source code into XML (and potentially other formats)

Stars: ✭ 4,770 (+15287.1%)

Mutual labels: tokenizer

Bitextor

Bitextor generates translation memories from multilingual websites.

Stars: ✭ 168 (+441.94%)

Mutual labels: tokenizer

Smoothnlp

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference

Stars: ✭ 435 (+1303.23%)

Mutual labels: tokenizer

Megamark

😻 Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer

Stars: ✭ 100 (+222.58%)

Mutual labels: tokenizer

Moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

Stars: ✭ 434 (+1300%)

Mutual labels: tokenizer

lexertk

C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html

Stars: ✭ 26 (-16.13%)

Mutual labels: tokenizer

Jflex

The fast scanner generator for Java™ with full Unicode support

Stars: ✭ 380 (+1125.81%)

Mutual labels: tokenizer

Djurl

Simple yet helpful library for writing Django urls by an easy, short and intuitive way.

Stars: ✭ 85 (+174.19%)

Mutual labels: tokenizer

Friso

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

Stars: ✭ 313 (+909.68%)

Mutual labels: tokenizer

Query Translator

Query Translator is a search query translator with AST representation

Stars: ✭ 165 (+432.26%)

Mutual labels: tokenizer

Sacremoses

Python port of Moses tokenizer, truecaser and normalizer

Stars: ✭ 293 (+845.16%)

Mutual labels: tokenizer

Sentence Splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Stars: ✭ 82 (+164.52%)

Mutual labels: tokenizer

zeyrek

Python morphological analyzer for Turkish language. Partial port of ZemberekNLP.

Stars: ✭ 36 (+16.13%)

Mutual labels: morphological-analysis

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (+45.16%)

Mutual labels: tokenizer

grasp

Essential NLP & ML, short & fast pure Python code

Stars: ✭ 58 (+87.1%)

Mutual labels: tokenizer

1-60 of 118 similar projects

›