All Projects → chaitanyamalaviya → Lang Reps

chaitanyamalaviya / Lang Reps

Code accompanying our EMNLP paper Learning Language Representations for Typology Prediction

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Lang Reps

Tensorflow Tutorial
TensorFlow and Deep Learning Tutorials
Stars: ✭ 748 (+897.33%)
Mutual labels:  neural-machine-translation
Autocms
AutoCms is a simple solution for your Meteor.js app
Stars: ✭ 34 (-54.67%)
Mutual labels:  multilingual
Na Nmt
Non-autoregressive Neural Machine Translation (not a full version)
Stars: ✭ 61 (-18.67%)
Mutual labels:  neural-machine-translation
Crosslingual Nlu
Zero-shot Cross-lingual Task-Oriented Dialogue Systems (EMNLP 2019)
Stars: ✭ 20 (-73.33%)
Mutual labels:  multilingual
Base
Multilingual CMS built with Laravel.
Stars: ✭ 949 (+1165.33%)
Mutual labels:  multilingual
Lib I18n
The library `Lib-I18N` allows a developer to bind a key-value pair of a `.properties` file to a [StringBinding]. This makes it very easy to change the language during runtime in a [JavaFX] application.
Stars: ✭ 40 (-46.67%)
Mutual labels:  multilingual
Laravel Translatable
A Laravel package for multilingual models
Stars: ✭ 624 (+732%)
Mutual labels:  multilingual
Nlp Tutorial
A list of NLP(Natural Language Processing) tutorials
Stars: ✭ 1,188 (+1484%)
Mutual labels:  neural-machine-translation
Terrarum Sans Bitmap
The real multilingual bitmap font for video games
Stars: ✭ 34 (-54.67%)
Mutual labels:  multilingual
Transformer Dynet
An Implementation of Transformer (Attention Is All You Need) in DyNet
Stars: ✭ 57 (-24%)
Mutual labels:  neural-machine-translation
Nmt
Neural Machine Translation with RNN/ConvS2S/Transoformer
Stars: ✭ 13 (-82.67%)
Mutual labels:  neural-machine-translation
Meta Emb
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Stars: ✭ 28 (-62.67%)
Mutual labels:  multilingual
Nlp Library
curated collection of papers for the nlp practitioner 📖👩‍🔬
Stars: ✭ 1,025 (+1266.67%)
Mutual labels:  neural-machine-translation
Marian
Fast Neural Machine Translation in C++
Stars: ✭ 777 (+936%)
Mutual labels:  neural-machine-translation
Multilingual Latent Dirichlet Allocation Lda
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Stars: ✭ 64 (-14.67%)
Mutual labels:  multilingual
Nematus
Open-Source Neural Machine Translation in Tensorflow
Stars: ✭ 730 (+873.33%)
Mutual labels:  neural-machine-translation
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+1220%)
Mutual labels:  neural-machine-translation
Skeleton
A ready-to-use CodeIgniter skeleton with tons of new features and a whole new concept of hooks (actions and filters) as well as a ready-to-use and application-free themes and plugins system. Facebook Page: http://bit.ly/2oHzpxC | Facebook Group: http://bit.ly/2o3KOrA. Help me carry on making more free stuff → http://bit.ly/2ppNujE ←
Stars: ✭ 74 (-1.33%)
Mutual labels:  multilingual
Xmunmt
An implementation of RNNsearch using TensorFlow
Stars: ✭ 69 (-8%)
Mutual labels:  neural-machine-translation
Rnn Nmt
基于双向RNN,Attention机制的编解码神经机器翻译模型
Stars: ✭ 46 (-38.67%)
Mutual labels:  neural-machine-translation

Learning Language Representations for Typology Prediction

Code accompanying the paper Learning Language Representations for Typology Prediction (To Appear at EMNLP 2017)

Abstract

One central mystery of neural NLP is what neural models ``know'' about their subject matter. When a neural machine translation system learns to translate from one language to another, does it learn the syntax or semantics of the languages? Can this knowledge be extracted from the system to fill holes in human scientific knowledge? Existing typological databases contain relatively full feature specifications for only a few hundred languages. Exploiting the existance of parallel texts in more than a thousand languages, we build a massive many-to-one NMT system from 1017 languages into English, and use this to predict information missing from typological databases. Experiments show that the proposed method is able to infer not only syntactic, but also phonological and phonetic inventory features, and improves over a baseline that has access to information about the languages' geographic and phylogenetic neighbors.

The URIEL database is available at http://www.cs.cmu.edu/~dmortens/uriel.html

Learned Vectors: https://drive.google.com/open?id=0B47fwl2TZnQaa0s5bDJESno0OTQ

After downloading and unzipping the above file, you may access the learned vectors as below:

import numpy as np
# language vectors
vecs = np.load("lang_vecs.npy")
vecs.item()['optsrc'+'fra']  # For French
vecs.item()['optsrc'+'ita']  # For Italian

# language cell states
cell_states = np.load("lang_cell_states.npy")
cell_states.item()['fra'][0]  # For French
cell_states.item()['ita'][0]  # For Italian

Bibtex:

@inproceedings{malaviya17emnlp, title = {Learning Language Representations for Typology Prediction}, author = {Chaitanya Malaviya and Graham Neubig and Patrick Littell}, booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)}, address = {Copenhagen, Denmark}, month = {September}, year = {2017} }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].