banglanmtThis repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
xl-sumThis repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
thesisMy thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
CogNetCogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates