All Projects → jinfengr → neural-tweet-search

jinfengr / neural-tweet-search

Licence: Apache-2.0 license
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search (Rao et al. AAAI'19)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to neural-tweet-search

felfele
Decentralized social application that respects your privacy
Stars: ✭ 30 (+15.38%)
Mutual labels:  social-media
meta-coronavirus-dataset
MetaCOVID: META-Coronavrius dataset repository
Stars: ✭ 37 (+42.31%)
Mutual labels:  social-media
laterpost
Simple Twitter Status update or social media post scheduling app built using Laravel and Vue.js
Stars: ✭ 41 (+57.69%)
Mutual labels:  social-media
LinkedIn Scraper
🙋 A Selenium based automated program that scrapes profiles data,stores in CSV,follows them and saves their profile in PDF.
Stars: ✭ 25 (-3.85%)
Mutual labels:  social-media
Twitter
[READ ONLY] Subtree split of the SocialiteProviders/Twitter Provider (see SocialiteProviders/Providers)
Stars: ✭ 21 (-19.23%)
Mutual labels:  social-media
TwitterNER
Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html
Stars: ✭ 134 (+415.38%)
Mutual labels:  social-media
awesome-alternatives
A list of alternative websites/software to popular proprietary services.
Stars: ✭ 123 (+373.08%)
Mutual labels:  social-media
SimpleSocial
A simple social network web application using ASP.NET Core 3.1
Stars: ✭ 16 (-38.46%)
Mutual labels:  social-media
mllp
The code of AAAI 2020 paper "Transparent Classification with Multilayer Logical Perceptrons and Random Binarization".
Stars: ✭ 15 (-42.31%)
Mutual labels:  aaai
subsocial-node
NOTE: Development continues in https://github.com/dappforce/subsocial-parachain repo. Subsocial full node with Substrate/Polkadot pallets for decentralized communities: blogs, posts, comments, likes, reputation.
Stars: ✭ 73 (+180.77%)
Mutual labels:  social-media
zapread.com
Website for zapread.com
Stars: ✭ 19 (-26.92%)
Mutual labels:  social-media
awesome-search-engine-optimization
A curated list of backlink, social signal opportunities, and link building strategies and tactics to help improve search engine results and ranking.
Stars: ✭ 82 (+215.38%)
Mutual labels:  social-media
socialx react native
The SocialX ecosystem takes the social media experience to the next level.
Stars: ✭ 20 (-23.08%)
Mutual labels:  social-media
social-media-hacker-list
Growing list of apps and tools for enhancing social media experiences.
Stars: ✭ 198 (+661.54%)
Mutual labels:  social-media
FISR
Official repository of FISR (AAAI 2020).
Stars: ✭ 72 (+176.92%)
Mutual labels:  aaai
big-data-upf
RECSM-UPF Summer School: Social Media and Big Data Research
Stars: ✭ 21 (-19.23%)
Mutual labels:  social-media
Nallagram
Nallagram is an open source social networking platform where users can share their views on various topics and interact among people in which they create, share, and/or exchange information and ideas in virtual communities and networks.
Stars: ✭ 30 (+15.38%)
Mutual labels:  social-media
SocialApp-React-Native
Social Networking mobile app similar to Instagram in React Native.
Stars: ✭ 79 (+203.85%)
Mutual labels:  social-media
Instagram
[READ ONLY] Subtree split of the SocialiteProviders/Instagram Provider (see SocialiteProviders/Providers)
Stars: ✭ 34 (+30.77%)
Mutual labels:  social-media
Meower-Vanilla
Official source code for the Scratch-based Meower client.
Stars: ✭ 24 (-7.69%)
Mutual labels:  social-media

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search

This repo contains code and data for our neural tweet search paper published in AAAI'19.

Given a query, we aim to return the most relevant documents(tweets) by ranking their relevency. In social media search, the scenario is different as standard ad-hoc retrieval: shorter document length, less formal languages and multiple relevance source signals (e.g., URL, hashtag). We propose a hierarchical convolutional model to approach the hetergeneous relevance signals (tweet, URL, hashtag) at multiple perspectives, including character-, word-, phrase- and sentence-level modeling. Our model demonstrated significant gains on multiple twitter datasets against state-of-the-art neural ranking models. More details can be found in our paper.

Requirements

  • Python 2.7
  • Tensorflow or Theano (tested on TF 1.4.1)
  • Keras (tested on 2.0.5)

Install

  • Download our repo:
git clone https://github.com/Jeffyrao/neural-tweet-search.git
cd neural-tweet-search
  • Install gdrive
  • Download required data and word2vec:
$ chmod +x download.sh; ./download.sh
  • Install Tensorflow and Keras dependency:
$ pip install -r requirements.txt

Run

  • Train and test on GPU:
CUDA_VISIBLE_DEVICES=0 python -u train.py -t trec-2013

The path of best model and output predictions will be shown in the log. Default parameters should work reasonably well.

  • Note: you might need around ~40GB memory to create the dataset (because of the large size of IDF weights). Please file a issue if you have any problem in creating the dataset.

  • Parameter sweep to find the best parameter set:

chmod +x param_sweep.sh; ./param_sweep.sh trec-2013 &

This command will save all the outputs under tune-logs folder.

Evaluate with trec_eval

$ ./trec_eval.8.1/trec_eval data/twitter-v0/qrels.microblog2011-2014.txt \
                            best_run/mphcnn_trec_2013_pred.txt

This should return the exact MPHCNN score on TREC 2013 dataset (MAP: 0.2818, P30: 0.5222) we reported in our paper.

Command line parameters

option input format default description
-t [trec-2011, trec-2012, trec-2013, trec-2014] trec-2011 test set
-l [true, false] false whether to load pre-created dataset (set to true when data is ready)
--load_model [true, false] false whether to load pre-trained model
-b [1, n) 64 batch size
-n [1, n) 256 number of convolutional filters
-d [0, 1] 0.1 dropout rate
-o [sgd, adam, rmsprop] sgd optimization method
--lr [0, 1] 0.05 learning rate
--epochs [1, n) 15 number of training epochs
--trainable [true, false] true whether to train word embeddings
--val_split (0, 1) 0.15 percentage of validation set sampled from training set
-v [0, 1, 2] 1 verbose (for logging), 0 for silent, 1 for interactive, 2 for per-epoch logging
--conv_option [normal, ResNet] normal convolutional model, normal or ResNet
--model_option [complete, word-url] complete what input sources to use, complete for MP-HCNN, word-url for only modeling query-tweet (word) and query-url (char)

Reference

If you are using this code or dataset, please kindly cite the paper below:

@article{rao2019multi,
  title={Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search},
  author={Rao, Jinfeng and Yang, Wei and Zhang, Yuhao and Ture, Ferhan and Lin, Jimmy},
  journal={Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI)},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].