All Projects → Tokenizer → Similar Projects or Alternatives

1057 Open source projects that are alternatives of or similar to Tokenizer

Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-59.85%)
Stringi
THE String Processing Package for R (with ICU)
Stars: ✭ 204 (+54.55%)
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (+231.82%)
Hardware Aware Transformers
[ACL 2020] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Stars: ✭ 206 (+56.06%)
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-10.61%)
Texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 2,236 (+1593.94%)
Sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+121.97%)
Mutual labels:  tokenizer, machine-translation
Mtbook
《机器翻译:基础与模型》肖桐 朱靖波 著 - Machine Translation: Foundations and Models
Stars: ✭ 2,307 (+1647.73%)
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+1807.58%)
Zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Stars: ✭ 3,307 (+2405.3%)
Nlg Eval
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Stars: ✭ 822 (+522.73%)
Icu
The new home of the ICU project source code.
Stars: ✭ 1,011 (+665.91%)
Mutual labels:  unicode, icu
Greynir
The greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (-64.39%)
String To Tree Nmt
Source code and data for the paper "Towards String-to-Tree Neural Machine Translation"
Stars: ✭ 16 (-87.88%)
ilmulti
Tooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-85.61%)
Mutual labels:  tokenizer, machine-translation
ICU4N
International Components for Unicode for .NET
Stars: ✭ 18 (-86.36%)
Mutual labels:  unicode, icu
Mtnt
Code for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-63.64%)
Opennmt Tf
Neural machine translation and sequence learning using TensorFlow
Stars: ✭ 1,223 (+826.52%)
Attention Mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (+53.79%)
Nlp Progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Stars: ✭ 19,518 (+14686.36%)
icu-dotnet
C# wrapper for ICU4C
Stars: ✭ 48 (-63.64%)
Mutual labels:  unicode, icu
Icu4x
Solving i18n for client-side and resource-constrained environments.
Stars: ✭ 275 (+108.33%)
Mutual labels:  unicode, icu
Texar Pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 636 (+381.82%)
Bytenet Tensorflow
ByteNet for character-level language modelling
Stars: ✭ 319 (+141.67%)
Fasttext multilingual
Multilingual word vectors in 78 languages
Stars: ✭ 1,067 (+708.33%)
Comet
A Neural Framework for MT Evaluation
Stars: ✭ 58 (-56.06%)
Deep Learning Drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Stars: ✭ 9,717 (+7261.36%)
icu-swift
Swift APIs for ICU
Stars: ✭ 23 (-82.58%)
Mutual labels:  unicode, icu
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+21.21%)
Opus Mt
Open neural machine translation models and web services
Stars: ✭ 111 (-15.91%)
stringx
Drop-in replacements for base R string functions powered by stringi
Stars: ✭ 14 (-89.39%)
Mutual labels:  unicode, icu
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
Stars: ✭ 16 (-87.88%)
Mutual labels:  unicode, tokenizer
Py Nltools
A collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-65.15%)
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-18.18%)
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (+0.76%)
Syntok
Text tokenization and sentence segmentation (segtok v2)
Stars: ✭ 123 (-6.82%)
Mutual labels:  tokenizer
Rasa Chatbot Templates
RASA chatbot use case boilerplate
Stars: ✭ 127 (-3.79%)
Files2rouge
Calculating ROUGE score between two files (line-by-line)
Stars: ✭ 120 (-9.09%)
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-7.58%)
Prenlp
Preprocessing Library for Natural Language Processing
Stars: ✭ 130 (-1.52%)
Deep Lyrics
Lyrics Generator aka Character-level Language Modeling with Multi-layer LSTM Recurrent Neural Network
Stars: ✭ 127 (-3.79%)
Turkish Morphology
A two-level morphological analyzer for Turkish.
Stars: ✭ 121 (-8.33%)
Cs230 Code Examples
Code examples in pyTorch and Tensorflow for CS230
Stars: ✭ 1,701 (+1188.64%)
Neuraldialog Larl
PyTorch implementation of latent space reinforcement learning for E2E dialog published at NAACL 2019. It is released by Tiancheng Zhao (Tony) from Dialog Research Center, LTI, CMU
Stars: ✭ 127 (-3.79%)
Dialoglue
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
Stars: ✭ 120 (-9.09%)
Ratel
RAT-el is an open source penetration test tool that allows you to take control of a windows machine. It works on the client-server model, the server sends commands and the client executes the commands and sends the result back to the server. The client is completely undetectable by anti-virus software.
Stars: ✭ 121 (-8.33%)
Mutual labels:  unicode
Persian Stopwords
Persian (Farsi) Stop Words List
Stars: ✭ 131 (-0.76%)
Confusable homoglyphs
ϲοnfuѕаblе_һοmоɡlyphs
Stars: ✭ 130 (-1.52%)
Mutual labels:  unicode
Neuro
🔮 Neuro.js is machine learning library for building AI assistants and chat-bots (WIP).
Stars: ✭ 126 (-4.55%)
Japanesetokenizers
aim to use JapaneseTokenizer as easy as possible
Stars: ✭ 120 (-9.09%)
Mutual labels:  tokenizer
Nlpcc Wordseg Weibo
NLPCC 2016 微博分词评测项目
Stars: ✭ 120 (-9.09%)
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1500%)
Mutual labels:  machine-translation
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+1204.55%)
Discobert
Code for paper "Discourse-Aware Neural Extractive Text Summarization" (ACL20)
Stars: ✭ 120 (-9.09%)
Textacy
NLP, before and after spaCy
Stars: ✭ 1,849 (+1300.76%)
Fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (-5.3%)
Mutual labels:  tokenizer
Js Codepage
💱 Codepages for JS
Stars: ✭ 119 (-9.85%)
Mutual labels:  unicode
Tokenizer
Source code tokenizer
Stars: ✭ 119 (-9.85%)
Mutual labels:  tokenizer
100 Days Of Nlp
Stars: ✭ 125 (-5.3%)
Pymetamap
Python wraper for MetaMap
Stars: ✭ 119 (-9.85%)
1-60 of 1057 similar projects