All Projects → Sacremoses → Similar Projects or Alternatives

185 Open source projects that are alternatives of or similar to Sacremoses

ilmulti
Tooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-93.52%)
Mutual labels:  tokenizer, machine-translation
Thot
Thot toolkit for statistical machine translation
Stars: ✭ 53 (-81.91%)
Mutual labels:  tokenizer, machine-translation
Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Stars: ✭ 132 (-54.95%)
Mutual labels:  tokenizer, machine-translation
masakhane-web
Masakhane Web is a translation web application for solely African Languages.
Stars: ✭ 27 (-90.78%)
Mutual labels:  machine-translation
dynmt-py
Neural machine translation implementation using dynet's python bindings
Stars: ✭ 17 (-94.2%)
Mutual labels:  machine-translation
NLP Toolkit
Library of state-of-the-art models (PyTorch) for NLP tasks
Stars: ✭ 92 (-68.6%)
Mutual labels:  machine-translation
cang-jie
Chinese tokenizer for tantivy, based on jieba-rs
Stars: ✭ 48 (-83.62%)
Mutual labels:  tokenizer
Machine-Translation-Hindi-to-english-
Machine translation is the task of converting one language to other. Unlike the traditional phrase-based translation system which consists of many small sub-components that are tuned separately, neural machine translation attempts to build and train a single, large neural network that reads a sentence and outputs a correct translation.
Stars: ✭ 19 (-93.52%)
Mutual labels:  machine-translation
vscode-blockman
VSCode extension to highlight nested code blocks
Stars: ✭ 233 (-20.48%)
Mutual labels:  tokenizer
tokenizer
A simple tokenizer in Ruby for NLP tasks.
Stars: ✭ 44 (-84.98%)
Mutual labels:  tokenizer
SequenceToSequence
A seq2seq with attention dialogue/MT model implemented by TensorFlow.
Stars: ✭ 11 (-96.25%)
Mutual labels:  machine-translation
Machine-Translation-v2
英中机器文本翻译
Stars: ✭ 48 (-83.62%)
Mutual labels:  machine-translation
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-89.08%)
Mutual labels:  tokenizer
farasapy
A Python implementation of Farasa toolkit
Stars: ✭ 69 (-76.45%)
Mutual labels:  tokenizer
Hebrew-Tokenizer
A very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-94.54%)
Mutual labels:  tokenizer
lex
Lex is an implementation of lex tool in Ruby.
Stars: ✭ 49 (-83.28%)
Mutual labels:  tokenizer
parallel-corpora-tools
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Stars: ✭ 35 (-88.05%)
Mutual labels:  machine-translation
BSD
The Business Scene Dialogue corpus
Stars: ✭ 51 (-82.59%)
Mutual labels:  machine-translation
ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-93.52%)
Mutual labels:  tokenizer
lindera
A morphological analysis library.
Stars: ✭ 226 (-22.87%)
Mutual labels:  tokenizer
omegat-tencent-plugin
This is a plugin to allow OmegaT to source machine translations from Tencent Cloud.
Stars: ✭ 31 (-89.42%)
Mutual labels:  machine-translation
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (-35.84%)
Mutual labels:  tokenizer
MetricMT
The official code repository for MetricMT - a reward optimization method for NMT with learned metrics
Stars: ✭ 23 (-92.15%)
Mutual labels:  machine-translation
Deep-NLP-Resources
Curated list of all NLP Resources
Stars: ✭ 65 (-77.82%)
Mutual labels:  machine-translation
python-mecab
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
Stars: ✭ 27 (-90.78%)
Mutual labels:  tokenizer
tai5-uan5 gian5-gi2 kang1-ku7
臺灣言語工具
Stars: ✭ 79 (-73.04%)
Mutual labels:  machine-translation
jargon
Tokenizers and lemmatizers for Go
Stars: ✭ 98 (-66.55%)
Mutual labels:  tokenizer
bredon
A modern CSS value compiler in JavaScript
Stars: ✭ 39 (-86.69%)
Mutual labels:  tokenizer
elasticsearch-plugins
Some native scoring script plugins for elasticsearch
Stars: ✭ 30 (-89.76%)
Mutual labels:  tokenizer
Attention-Visualization
Visualization for simple attention and Google's multi-head attention.
Stars: ✭ 54 (-81.57%)
Mutual labels:  machine-translation
neural tokenizer
Tokenize English sentences using neural networks.
Stars: ✭ 64 (-78.16%)
Mutual labels:  tokenizer
mystem-scala
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-92.83%)
Mutual labels:  tokenizer
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (-64.51%)
Mutual labels:  tokenizer
SpeechTransProgress
Tracking the progress in end-to-end speech translation
Stars: ✭ 139 (-52.56%)
Mutual labels:  machine-translation
psr2r-sniffer
A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
Stars: ✭ 32 (-89.08%)
Mutual labels:  tokenizer
tokenizer
Tokenize CSS according to the CSS Syntax
Stars: ✭ 52 (-82.25%)
Mutual labels:  tokenizer
transformer-pytorch
A PyTorch implementation of Transformer in "Attention is All You Need"
Stars: ✭ 77 (-73.72%)
Mutual labels:  machine-translation
wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (-82.59%)
Mutual labels:  tokenizer
snapdragon-lexer
Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Stars: ✭ 19 (-93.52%)
Mutual labels:  tokenizer
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (-65.53%)
Mutual labels:  tokenizer
Video-guided-Machine-Translation
Starter code for the VMT task and challenge
Stars: ✭ 45 (-84.64%)
Mutual labels:  machine-translation
urbans
A tool for translating text from source grammar to target grammar (context-free) with corresponding dictionary.
Stars: ✭ 19 (-93.52%)
Mutual labels:  machine-translation
Jumanpp
Juman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (-13.31%)
Mutual labels:  tokenizer
inmt
Interactive Neural Machine Translation tool
Stars: ✭ 44 (-84.98%)
Mutual labels:  machine-translation
mtdata
A tool that locates, downloads, and extracts machine translation corpora
Stars: ✭ 95 (-67.58%)
Mutual labels:  machine-translation
SwiLex
A universal lexer library in Swift.
Stars: ✭ 29 (-90.1%)
Mutual labels:  tokenizer
PaddleTokenizer
使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle
Stars: ✭ 14 (-95.22%)
Mutual labels:  tokenizer
rtg
Reader Translator Generator - NMT toolkit based on pytorch
Stars: ✭ 26 (-91.13%)
Mutual labels:  machine-translation
liblex
C library for Lexical Analysis
Stars: ✭ 25 (-91.47%)
Mutual labels:  tokenizer
gd-tokenizer
A small godot project with a tokenizer written in GDScript.
Stars: ✭ 34 (-88.4%)
Mutual labels:  tokenizer
banglanmt
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Stars: ✭ 91 (-68.94%)
Mutual labels:  machine-translation
skt
Sanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-92.83%)
Mutual labels:  machine-translation
NiuTrans.NMT
A Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.
Stars: ✭ 112 (-61.77%)
Mutual labels:  machine-translation
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-91.13%)
Mutual labels:  tokenizer
Natural-Language-Processing
Contains various architectures and novel paper implementations for Natural Language Processing tasks like Sequence Modelling and Neural Machine Translation.
Stars: ✭ 48 (-83.62%)
Mutual labels:  machine-translation
Distill-BERT-Textgen
Research code for ACL 2020 paper: "Distilling Knowledge Learned in BERT for Text Generation".
Stars: ✭ 121 (-58.7%)
Mutual labels:  machine-translation
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-94.2%)
Mutual labels:  tokenizer
Transformer
A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"
Stars: ✭ 271 (-7.51%)
Mutual labels:  machine-translation
pascal-interpreter
A simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-92.83%)
Mutual labels:  tokenizer
nepali-translator
Neural Machine Translation on the Nepali-English language pair
Stars: ✭ 29 (-90.1%)
Mutual labels:  machine-translation
1-60 of 185 similar projects