banglanmtThis repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Stars: ✭ 91 (+78.43%)
Indian ParallelCorpusCurated list of publicly available parallel corpus for Indian Languages
Stars: ✭ 23 (-54.9%)
TALPCoTUFS Asian Language Parallel Corpus
Stars: ✭ 32 (-37.25%)
tvsubTVsub: DCU-Tencent Chinese-English Dialogue Corpus
Stars: ✭ 40 (-21.57%)
FCH-TTSA fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Stars: ✭ 154 (+201.96%)
KWDLCKyoto University Web Document Leads Corpus
Stars: ✭ 64 (+25.49%)
jitenjiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+25.49%)
kanji-frequencyKanji usage frequency data collected from various sources
Stars: ✭ 92 (+80.39%)
nepali-translatorNeural Machine Translation on the Nepali-English language pair
Stars: ✭ 29 (-43.14%)
Mouse Dictionary📘A super fast dictionary for Chrome/Firefox
Stars: ✭ 670 (+1213.73%)
Memorize🚀 Japanese-English-Mongolian dictionary. It lets you find words, kanji and more quickly and easily
Stars: ✭ 72 (+41.18%)
Google Ime Dictionary日英変換・英語略語展開のための IME 追加辞書 📙 日本語から英語への和英変換や英語略語の展開を Google 日本語入力や ATOK などで可能にする IME 拡張辞書です
Stars: ✭ 30 (-41.18%)
GseGo efficient multilingual NLP and text segmentation; support english, chinese, japanese and other. Go 高性能多语言 NLP 和分词
Stars: ✭ 1,695 (+3223.53%)
Distill-BERT-TextgenResearch code for ACL 2020 paper: "Distilling Knowledge Learned in BERT for Text Generation".
Stars: ✭ 121 (+137.25%)
MetricMTThe official code repository for MetricMT - a reward optimization method for NMT with learned metrics
Stars: ✭ 23 (-54.9%)
OPUS-MT-trainTraining open neural machine translation models
Stars: ✭ 166 (+225.49%)
nytwitNew York Times Word Innovation Types dataset
Stars: ✭ 21 (-58.82%)
OpenConvertText conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
Stars: ✭ 20 (-60.78%)
jmdict-kindleJapanese - English dictionary for Kindle based on the JMdict / EDICT database
Stars: ✭ 151 (+196.08%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+1294.12%)
kanjigridFork of the Kanji Grid addon for Anki
Stars: ✭ 21 (-58.82%)
gumRepository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+39.22%)
folketSwedish–English dictionary for macOS (December 20, 2020)
Stars: ✭ 31 (-39.22%)
docker-introductionReproducible Computational Environments using Containers
Stars: ✭ 34 (-33.33%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+270.59%)
introcsharpbook"Fundamentals of Computer Programming with C#" Book
Stars: ✭ 12 (-76.47%)
almancaAlmanca dilbilgisi ve gramer notlari / Lesson notes I have taken to learn the German language beginning from A1.
Stars: ✭ 15 (-70.59%)
Cross-Language-DatasetA multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection
Stars: ✭ 60 (+17.65%)
shell-genomicsIntroduction to the Command Line for Genomics
Stars: ✭ 54 (+5.88%)
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-58.82%)
rhymesGive me an English word and I’ll give you a list of rhymes
Stars: ✭ 34 (-33.33%)
sktSanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-58.82%)
urbansA tool for translating text from source grammar to target grammar (context-free) with corresponding dictionary.
Stars: ✭ 19 (-62.75%)
ocr2textConvert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+76.47%)
frostpunk modFrostpunk / Mod Tools / 非公式日本語化MODツール
Stars: ✭ 17 (-66.67%)
sample-ui-vue-pagesBootstrap + Vue.js [ Scss / Babel ] (Multi-Page/SSR Model)
Stars: ✭ 20 (-60.78%)
SequenceToSequenceA seq2seq with attention dialogue/MT model implemented by TensorFlow.
Stars: ✭ 11 (-78.43%)
bergamot-translatorCross platform C++ library focusing on optimized machine translation on the consumer-grade device.
Stars: ✭ 181 (+254.9%)
nextwordPredict next English words.
Stars: ✭ 65 (+27.45%)
inmtInteractive Neural Machine Translation tool
Stars: ✭ 44 (-13.73%)
rtgReader Translator Generator - NMT toolkit based on pytorch
Stars: ✭ 26 (-49.02%)
KawazuA C# library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported. Inspired by project Kuroshiro.
Stars: ✭ 33 (-35.29%)