xl-sumThis repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Stars: ✭ 160 (+515.38%)
ungoliant🕷️ The pipeline for the OSCAR corpus
Stars: ✭ 69 (+165.38%)
wnA modern, interlingual wordnet interface for Python
Stars: ✭ 119 (+357.69%)
ws4jWordNet Similarity for Java provides an API for several Semantic Relatedness/Similarity algorithms
Stars: ✭ 41 (+57.69%)
wordnetStand-alone WordNet API
Stars: ✭ 39 (+50%)
NLIDBNatural Language Interface to DataBases
Stars: ✭ 100 (+284.62%)
wordhoardThis Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
Stars: ✭ 78 (+200%)
PatternWeb mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Stars: ✭ 8,112 (+31100%)
LMMSLanguage Modelling Makes Sense - WSD (and more) with Contextual Embeddings
Stars: ✭ 79 (+203.85%)
m3gmMax-Margin Markov Graph Models for WordNet (EMNLP 2018)
Stars: ✭ 40 (+53.85%)
NatLangNatLang is an English parser with an extensible grammar
Stars: ✭ 20 (-23.08%)
textaugmentTextAugment: Text Augmentation Library
Stars: ✭ 280 (+976.92%)
WordbookWordbook is a dictionary application built for GNOME.
Stars: ✭ 56 (+115.38%)
kontextAn advanced, extensible web front-end for the Manatee-open corpus search engine
Stars: ✭ 50 (+92.31%)
goclassyAn asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
Stars: ✭ 81 (+211.54%)
kanji-frequencyKanji usage frequency data collected from various sources
Stars: ✭ 92 (+253.85%)
nerusLarge silver standart Russian corpus with NER, morphology and syntax markup
Stars: ✭ 47 (+80.77%)
Mecab Ipadic NeologdNeologism dictionary based on the language resources on the Web for mecab-ipadic
Stars: ✭ 2,408 (+9161.54%)
unitex-linguaUnitex/GramLab Language Resources
Stars: ✭ 17 (-34.62%)
Indian ParallelCorpusCurated list of publicly available parallel corpus for Indian Languages
Stars: ✭ 23 (-11.54%)
banglanmtThis repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Stars: ✭ 91 (+250%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-15.38%)
thesisMy thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
Stars: ✭ 20 (-23.08%)