banglanmtThis repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Stars: ✭ 91 (+295.65%)
BSDThe Business Scene Dialogue corpus
Stars: ✭ 51 (+121.74%)
ilmultiTooling to play around with multilingual machine translation for Indian Languages.
Stars: ✭ 19 (-17.39%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-4.35%)
Code Docstring CorpusPreprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
Stars: ✭ 137 (+495.65%)
foliaFoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (+143.48%)
jrte-corpusJapanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (+186.96%)
zeroZero -- A neural machine translation system
Stars: ✭ 121 (+426.09%)
LanguageCodesWe present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+204.35%)
PoetryCorpusПоэтический корпус русского языка
Stars: ✭ 40 (+73.91%)
ABD-NMTCode for "Asynchronous bidirectional decoding for neural machine translation" (AAAI, 2018)
Stars: ✭ 32 (+39.13%)
DeepSentiPersRepository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-26.09%)
SSANHow Does Selective Mechanism Improve Self-attention Networks?
Stars: ✭ 18 (-21.74%)
dynmt-pyNeural machine translation implementation using dynet's python bindings
Stars: ✭ 17 (-26.09%)
open-discourseOpen Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (+104.35%)
sentencepiece-jniJava JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 26 (+13.04%)
KWDLCKyoto University Web Document Leads Corpus
Stars: ✭ 64 (+178.26%)
thaigov-corpusโครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย
Stars: ✭ 19 (-17.39%)
SpiCE-CorpusAn open-access corpus of conversational bilingual speech in Cantonese and English
Stars: ✭ 33 (+43.48%)
RNNSearchAn implementation of attention-based neural machine translation using Pytorch
Stars: ✭ 43 (+86.96%)
thesisMy thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
Stars: ✭ 20 (-13.04%)
OpenDialogAn Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+308.7%)
pdf-corpusPython script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-26.09%)
pytorch basic nmtA simple yet strong implementation of neural machine translation in pytorch
Stars: ✭ 66 (+186.96%)
CBLUE中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+1547.83%)
fastmorphFast corpus search engine originally made for the Corpus of Written Tatar language
Stars: ✭ 14 (-39.13%)
minimal-nmtA minimal nmt example to serve as an seq2seq+attention reference.
Stars: ✭ 36 (+56.52%)
thai-languagecomputer tools for thai language
Stars: ✭ 20 (-13.04%)
NiuTrans.NMTA Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.
Stars: ✭ 112 (+386.96%)
dialogue-datasetscollect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (+4.35%)
TV4DialogNo description or website provided.
Stars: ✭ 33 (+43.48%)
transformerNeutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (+160.87%)
When-in-RomeA meta-corpus of functional harmonic analysis.
Stars: ✭ 35 (+52.17%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+721.74%)
kanji-frequencyKanji usage frequency data collected from various sources
Stars: ✭ 92 (+300%)
cljs-corpusA greppable archive of ClojureScript code
Stars: ✭ 37 (+60.87%)
MT-PreparationMachine Translation (MT) Preparation Scripts
Stars: ✭ 15 (-34.78%)
parallel-corpora-toolsTools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Stars: ✭ 35 (+52.17%)
bible-corpusA multilingual parallel corpus created from translations of the Bible.
Stars: ✭ 115 (+400%)
2018-dlslUPC Deep Learning for Speech and Language 2018
Stars: ✭ 18 (-21.74%)
nepali-translatorNeural Machine Translation on the Nepali-English language pair
Stars: ✭ 29 (+26.09%)
transformer-sltSign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)
Stars: ✭ 92 (+300%)
textboxText collections made available by the CLiGS group.
Stars: ✭ 19 (-17.39%)
Attention-VisualizationVisualization for simple attention and Google's multi-head attention.
Stars: ✭ 54 (+134.78%)
Word-Level-Eng-Mar-NMTTranslating English sentences to Marathi using Neural Machine Translation
Stars: ✭ 37 (+60.87%)
xl-sumThis repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Stars: ✭ 160 (+595.65%)
wordfish-pythonextract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-17.39%)