Indian ParallelCorpusCurated list of publicly available parallel corpus for Indian Languages
Stars: ✭ 23 (+21.05%)
ThotThot toolkit for statistical machine translation
Stars: ✭ 53 (+178.95%)
SacremosesPython port of Moses tokenizer, truecaser and normalizer
Stars: ✭ 293 (+1442.11%)
TokenizerFast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (+594.74%)
farasapyA Python implementation of Farasa toolkit
Stars: ✭ 69 (+263.16%)
rtgReader Translator Generator - NMT toolkit based on pytorch
Stars: ✭ 26 (+36.84%)
sktSanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (+10.53%)
snapdragon-lexerConverts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Stars: ✭ 19 (+0%)
berserkerBerserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-10.53%)
psr2r-snifferA PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
Stars: ✭ 32 (+68.42%)
SwiLexA universal lexer library in Swift.
Stars: ✭ 29 (+52.63%)
dynmt-pyNeural machine translation implementation using dynet's python bindings
Stars: ✭ 17 (-10.53%)
gd-tokenizerA small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (+78.95%)
NiuTrans.NMTA Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.
Stars: ✭ 112 (+489.47%)
packeteventsPacketEvents is a powerful packet library. Our packet wrappers are efficient and easy to use. We support many protocol versions. (1.8+)
Stars: ✭ 235 (+1136.84%)
omegat-tencent-pluginThis is a plugin to allow OmegaT to source machine translations from Tencent Cloud.
Stars: ✭ 31 (+63.16%)
parallel-corpora-toolsTools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Stars: ✭ 35 (+84.21%)
suikaSuika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (+63.16%)
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+136.84%)
lexertkC++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Stars: ✭ 26 (+36.84%)
jargonTokenizers and lemmatizers for Go
Stars: ✭ 98 (+415.79%)
hunspellHigh-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+431.58%)
graspEssential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+205.26%)
linderaA morphological analysis library.
Stars: ✭ 226 (+1089.47%)
SequenceToSequenceA seq2seq with attention dialogue/MT model implemented by TensorFlow.
Stars: ✭ 11 (-42.11%)
MetricMTThe official code repository for MetricMT - a reward optimization method for NMT with learned metrics
Stars: ✭ 23 (+21.05%)
neural tokenizerTokenize English sentences using neural networks.
Stars: ✭ 64 (+236.84%)
python-mecabA repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (+42.11%)
mtdataA tool that locates, downloads, and extracts machine translation corpora
Stars: ✭ 95 (+400%)
rustfstRust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+447.37%)
xontrib-output-searchGet identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (+36.84%)
wink-tokenizerMultilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+168.42%)
Distill-BERT-TextgenResearch code for ACL 2020 paper: "Distilling Knowledge Learned in BERT for Text Generation".
Stars: ✭ 121 (+536.84%)
masakhane-webMasakhane Web is a translation web application for solely African Languages.
Stars: ✭ 27 (+42.11%)
OPUS-MT-trainTraining open neural machine translation models
Stars: ✭ 166 (+773.68%)
vscode-blockmanVSCode extension to highlight nested code blocks
Stars: ✭ 233 (+1126.32%)
tvsubTVsub: DCU-Tencent Chinese-English Dialogue Corpus
Stars: ✭ 40 (+110.53%)
lexLex is an implementation of lex tool in Ruby.
Stars: ✭ 49 (+157.89%)
bergamot-translatorCross platform C++ library focusing on optimized machine translation on the consumer-grade device.
Stars: ✭ 181 (+852.63%)
deepl-rbA simple ruby gem for the DeepL API
Stars: ✭ 38 (+100%)
TokenizerA tokenizer for Icelandic text
Stars: ✭ 27 (+42.11%)
Machine-Translation-Hindi-to-english-Machine translation is the task of converting one language to other. Unlike the traditional phrase-based translation system which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.
Stars: ✭ 19 (+0%)
osdg-toolOSDG is an open-source tool that maps and connects activities to the UN Sustainable Development Goals (SDGs) by identifying SDG-relevant content in any text. The tool is available online at www.osdg.ai. API access available for research purposes.
Stars: ✭ 22 (+15.79%)
ReductionWrappersR wrappers to connect Python dimensional reduction tools and single cell data objects (Seurat, SingleCellExperiment, etc...)
Stars: ✭ 31 (+63.16%)
BSDThe Business Scene Dialogue corpus
Stars: ✭ 51 (+168.42%)
OpenISSOpenISS -- a unified multimodal motion data delivery framework.
Stars: ✭ 22 (+15.79%)
Roy VnTokenizerVietnamese tokenizer (Maximum Matching and CRF)
Stars: ✭ 49 (+157.89%)
sb-nmtCode for Synchronous Bidirectional Neural Machine Translation (SB-NMT)
Stars: ✭ 66 (+247.37%)
urbansA tool for translating text from source grammar to target grammar (context-free) with corresponding dictionary.
Stars: ✭ 19 (+0%)
sinlingA collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (+100%)
apertium-apy📦 Apertium HTTP Server in Python
Stars: ✭ 29 (+52.63%)
tokenizerA simple tokenizer in Ruby for NLP tasks.
Stars: ✭ 44 (+131.58%)