Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

dinghanshen / Swem

The Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning tensorflow natural-language-processing representation-learning

Projects that are alternatives of or similar to Swem

Declutr

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!

Stars: ✭ 111 (-60.22%)

Mutual labels: natural-language-processing, representation-learning

Codesearchnet

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+393.91%)

Mutual labels: natural-language-processing, representation-learning

Knowledge Graphs

A collection of research on knowledge graphs

Stars: ✭ 845 (+202.87%)

Mutual labels: natural-language-processing, representation-learning

Good Papers

I try my best to keep updated cutting-edge knowledge in Machine Learning/Deep Learning and Natural Language Processing. These are my notes on some good papers

Stars: ✭ 248 (-11.11%)

Mutual labels: natural-language-processing, representation-learning

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (-2.15%)

Mutual labels: natural-language-processing

Matterport3dsimulator

AI Research Platform for Reinforcement Learning from Real Panoramic Images.

Stars: ✭ 260 (-6.81%)

Mutual labels: natural-language-processing

Bist Parser

Graph-based and Transition-based dependency parsers based on BiLSTMs

Stars: ✭ 257 (-7.89%)

Mutual labels: natural-language-processing

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-8.6%)

Mutual labels: natural-language-processing

Adaptnlp

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Stars: ✭ 278 (-0.36%)

Mutual labels: natural-language-processing

Nlp tasks

Natural Language Processing Tasks and References

Stars: ✭ 2,968 (+963.8%)

Mutual labels: natural-language-processing

Tacred Relation

PyTorch implementation of the position-aware attention model for relation extraction

Stars: ✭ 271 (-2.87%)

Mutual labels: natural-language-processing

Lingua Rs

👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike

Stars: ✭ 260 (-6.81%)

Mutual labels: natural-language-processing

Nlp Tutorial

Tutorial: Natural Language Processing in Python

Stars: ✭ 274 (-1.79%)

Mutual labels: natural-language-processing

Lda

LDA topic modeling for node.js

Stars: ✭ 262 (-6.09%)

Mutual labels: natural-language-processing

Pyswip

PySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs. It features an (incomplete) SWI-Prolog foreign language interface, a utility class that makes it easy querying with Prolog and also a Pythonic interface.

Stars: ✭ 276 (-1.08%)

Mutual labels: natural-language-processing

Ai Job Notes

AI算法岗求职攻略（涵盖准备攻略、刷题指南、内推和AI公司清单等资料）

Stars: ✭ 3,191 (+1043.73%)

Mutual labels: natural-language-processing

Awesomefakenews

This repository contains recent research on fake news.

Stars: ✭ 270 (-3.23%)

Mutual labels: natural-language-processing

Autonlp

🤗 AutoNLP: train state-of-the-art natural language processing models and deploy them in a scalable environment automatically

Stars: ✭ 263 (-5.73%)

Mutual labels: natural-language-processing

Decagon

Graph convolutional neural network for multirelational link prediction

Stars: ✭ 268 (-3.94%)

Mutual labels: representation-learning

Awesome Ai Awesomeness

A curated list of awesome awesomeness about artificial intelligence

Stars: ✭ 268 (-3.94%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

SWEM (Simple Word-Embedding-based Models)

This repository contains source code necessary to reproduce the results presented in the following paper:

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms (ACL 2018)

This project is maintained by Dinghan Shen. Feel free to contact [email protected] for any relevant issues.

Prerequisite:

CUDA, cudnn
Python 2.7
Tensorflow (version >1.0). We used tensorflow 1.5.
Run: pip install -r requirements.txt to install requirements

Data:

For convenience, we provide pre-processed versions for the following datasets: DBpedia, SNLI, Yahoo. Data are prepared in pickle format, and each .p file has the same fields in the same order:
- train_text, val_text, test_text, train_label, val_label, test_label, dictionary(wordtoix), reverse dictionary(ixtoword)
These .p files can be downloaded from the links below. After downloading, you can put them into a data folder:
- Ontology classification: DBpedia (591MB)
- Natural language inference: SNLI (101MB), SNLI-word-embeddings (129MB)
- Topic categorization: Yahoo (1.7GB)

Run

Run: python eval_dbpedia_emb.py for ontology classification on the DBpedia dataset
Run: python eval_snli_emb.py for natural language inference on the SNLI dataset
Run: python eval_yahoo_emb.py for topic categorization on the Yahoo! Answer dataset
Options: options can be made by changing option class in any of the above three files:

opt.emb_size: number of word embedding dimensions.
opt.drop_rate: the keep rate of dropout layer.
opt.lr: learning rate.
opt.batch_size: number of batch size.
opt.H_dis: the dimension of last hidden layer.

On a K80 GPU machine, training roughly takes about 3 minutes each epoch and 5 epochs for Debpedia to converge, 50 seconds each epoch and 20 epochs for SNLI, and 4 minutes each epoch and 5 epochs for the Yahoo dataset.

Subspace Training & Intrinsic Dimension

To measure the intrinsic dimension of word-embedding-based text classification tasks, we compare SWEM and CNNs via subspace training in Section 5.1 of the paper.

Please follow the instructions in folder intrinsic_dimension to reproduce the results.

Citation

Please cite our ACL paper in your publications if it helps your research:

@inproceedings{Shen2018Baseline, 
title={Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms}, 
author={Shen, Dinghan and Wang, Guoyin and Wang, Wenlin and Renqiang Min, Martin and Su, Qinliang and Zhang, Yizhe and Li, Chunyuan and Henao, Ricardo and Carin, Lawrence}, 
booktitle={ACL}, 
year={2018} 
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 279

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗