All Projects → Indian_ParallelCorpus → Similar Projects or Alternatives

186 Open source projects that are alternatives of or similar to Indian_ParallelCorpus

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

Stars: ✭ 91 (+295.65%)

Mutual labels: neural-machine-translation, parallel-corpus, parallel-corpora, low-resource-languages, low-resource-machine-translation

BSD

The Business Scene Dialogue corpus

Stars: ✭ 51 (+121.74%)

Mutual labels: corpus, parallel-corpus, parallel-corpora

ilmulti

Tooling to play around with multilingual machine translation for Indian Languages.

Stars: ✭ 19 (-17.39%)

Mutual labels: indian-languages, multilingual-translation

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-4.35%)

Mutual labels: corpus, low-resource-languages

Code Docstring Corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Stars: ✭ 137 (+495.65%)

Mutual labels: corpus, neural-machine-translation

indic nlp resources

Resources to go with the Indic NLP Library

Stars: ✭ 55 (+139.13%)

Mutual labels: indian-languages

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

Stars: ✭ 56 (+143.48%)

Mutual labels: corpus

jrte-corpus

Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)

Stars: ✭ 66 (+186.96%)

Mutual labels: corpus

zero

Zero -- A neural machine translation system

Stars: ✭ 121 (+426.09%)

Mutual labels: neural-machine-translation

LanguageCodes

We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).

Stars: ✭ 70 (+204.35%)

Mutual labels: corpus

named-entity-recognition-template

Build a deep learning model for predicting the named entities from text.

Stars: ✭ 51 (+121.74%)

Mutual labels: corpus

text-classification-cn

中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法

Stars: ✭ 81 (+252.17%)

Mutual labels: corpus

PoetryCorpus

Поэтический корпус русского языка

Stars: ✭ 40 (+73.91%)

Mutual labels: corpus

OneStopEnglishCorpus

No description or website provided.

Stars: ✭ 38 (+65.22%)

Mutual labels: corpus

ABD-NMT

Code for "Asynchronous bidirectional decoding for neural machine translation" (AAAI, 2018)

Stars: ✭ 32 (+39.13%)

Mutual labels: neural-machine-translation

DeepSentiPers

Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"

Stars: ✭ 17 (-26.09%)

Mutual labels: corpus

EntityTargetedActiveLearning

No description or website provided.

Stars: ✭ 17 (-26.09%)

Mutual labels: low-resource-languages

SSAN

How Does Selective Mechanism Improve Self-attention Networks?

Stars: ✭ 18 (-21.74%)

Mutual labels: neural-machine-translation

dynmt-py

Neural machine translation implementation using dynet's python bindings

Stars: ✭ 17 (-26.09%)

Mutual labels: neural-machine-translation

open-discourse

Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).

Stars: ✭ 47 (+104.35%)

Mutual labels: corpus

sentencepiece-jni

Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.

Stars: ✭ 26 (+13.04%)

Mutual labels: neural-machine-translation

KWDLC

Kyoto University Web Document Leads Corpus

Stars: ✭ 64 (+178.26%)

Mutual labels: corpus

thaigov-corpus

โครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย

Stars: ✭ 19 (-17.39%)

Mutual labels: corpus

mev-corpus

MEV Data Corpus

Stars: ✭ 77 (+234.78%)

Mutual labels: corpus

SpiCE-Corpus

An open-access corpus of conversational bilingual speech in Cantonese and English

Stars: ✭ 33 (+43.48%)

Mutual labels: corpus

RNNSearch

An implementation of attention-based neural machine translation using Pytorch

Stars: ✭ 43 (+86.96%)

Mutual labels: neural-machine-translation

thesis

My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University

Stars: ✭ 20 (-13.04%)

Mutual labels: low-resource-languages

OpenDialog

An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统，一键部署微信闲聊机器人)

Stars: ✭ 94 (+308.7%)

Mutual labels: corpus

pdf-corpus

Python script to quickly create hand-crafted PDF files

Stars: ✭ 17 (-26.09%)

Mutual labels: corpus

pytorch basic nmt

A simple yet strong implementation of neural machine translation in pytorch

Stars: ✭ 66 (+186.96%)

Mutual labels: neural-machine-translation

CBLUE

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Stars: ✭ 379 (+1547.83%)

Mutual labels: corpus

PubMed-PICO-Detection

PubMed PICO Element Detection Dataset

Stars: ✭ 37 (+60.87%)

Mutual labels: corpus

egret-wenda-corpus

A Public Corpus for Machine Learning

Stars: ✭ 41 (+78.26%)

Mutual labels: corpus

fastmorph

Fast corpus search engine originally made for the Corpus of Written Tatar language

Stars: ✭ 14 (-39.13%)

Mutual labels: corpus

minimal-nmt

A minimal nmt example to serve as an seq2seq+attention reference.

Stars: ✭ 36 (+56.52%)

Mutual labels: neural-machine-translation

thai-language

computer tools for thai language

Stars: ✭ 20 (-13.04%)

Mutual labels: corpus

NiuTrans.NMT

A Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.

Stars: ✭ 112 (+386.96%)

Mutual labels: neural-machine-translation

dialogue-datasets

collect the open dialog corpus and some useful data processing utils.

Stars: ✭ 24 (+4.35%)

Mutual labels: corpus

TV4Dialog

No description or website provided.

Stars: ✭ 33 (+43.48%)

Mutual labels: corpus

transformer

Neutron: A pytorch based implementation of Transformer and its variants.

Stars: ✭ 60 (+160.87%)

Mutual labels: neural-machine-translation

When-in-Rome

A meta-corpus of functional harmonic analysis.

Stars: ✭ 35 (+52.17%)

Mutual labels: corpus

CLUEmotionAnalysis2020

CLUE Emotion Analysis Dataset 细粒度情感分析数据集

Stars: ✭ 3 (-86.96%)

Mutual labels: corpus

malay-dataset

Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html

Stars: ✭ 189 (+721.74%)

Mutual labels: corpus

kanji-frequency

Kanji usage frequency data collected from various sources

Stars: ✭ 92 (+300%)

Mutual labels: corpus

cljs-corpus

A greppable archive of ClojureScript code

Stars: ✭ 37 (+60.87%)

Mutual labels: corpus

MT-Preparation

Machine Translation (MT) Preparation Scripts

Stars: ✭ 15 (-34.78%)

Mutual labels: neural-machine-translation

fuzzing-corpus

My fuzzing corpus

Stars: ✭ 120 (+421.74%)

Mutual labels: corpus

parallel-corpora-tools

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

Stars: ✭ 35 (+52.17%)

Mutual labels: neural-machine-translation

bible-corpus

A multilingual parallel corpus created from translations of the Bible.

Stars: ✭ 115 (+400%)

Mutual labels: corpus

2018-dlsl

UPC Deep Learning for Speech and Language 2018

Stars: ✭ 18 (-21.74%)

Mutual labels: neural-machine-translation

nepali-translator

Neural Machine Translation on the Nepali-English language pair

Stars: ✭ 29 (+26.09%)

Mutual labels: parallel-corpus

transformer-slt

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

Stars: ✭ 92 (+300%)

Mutual labels: neural-machine-translation

textbox

Text collections made available by the CLiGS group.

Stars: ✭ 19 (-17.39%)

Mutual labels: corpus

Attention-Visualization

Visualization for simple attention and Google's multi-head attention.

Stars: ✭ 54 (+134.78%)

Mutual labels: neural-machine-translation

Word-Level-Eng-Mar-NMT

Translating English sentences to Marathi using Neural Machine Translation

Stars: ✭ 37 (+60.87%)

Mutual labels: neural-machine-translation

xl-sum

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

Stars: ✭ 160 (+595.65%)

Mutual labels: low-resource-languages

EdgarAllanPoetry

Computer-generated poetry

Stars: ✭ 22 (-4.35%)

Mutual labels: corpus

wordfish-python

extract relationships from standardized terms from corpus of interest with deep learning 🐟

Stars: ✭ 19 (-17.39%)

Mutual labels: corpus

Species-Names-Corpus

物种名称语料库。植物名,动物名。