All Projects → sunyilgdx → SIFRank

sunyilgdx / SIFRank

Licence: other
The code of our paper "SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SIFRank

Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+3.13%)
Mutual labels:  word-embeddings, elmo
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-79.17%)
Mutual labels:  word-embeddings
dasem
Danish Semantic analysis
Stars: ✭ 17 (-82.29%)
Mutual labels:  word-embeddings
compress-fasttext
Tools for shrinking fastText models (in gensim format)
Stars: ✭ 124 (+29.17%)
Mutual labels:  word-embeddings
word-benchmarks
Benchmarks for intrinsic word embeddings evaluation.
Stars: ✭ 45 (-53.12%)
Mutual labels:  word-embeddings
Active-Explainable-Classification
A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification
Stars: ✭ 28 (-70.83%)
Mutual labels:  word-embeddings
S-WMD
Code for Supervised Word Mover's Distance (SWMD)
Stars: ✭ 90 (-6.25%)
Mutual labels:  word-embeddings
visualizing contextual vectors
Visualizing ELMo Contextual Vectors for Word Sense Disambiguation
Stars: ✭ 17 (-82.29%)
Mutual labels:  elmo
robot-mind-meld
A little game powered by word vectors
Stars: ✭ 31 (-67.71%)
Mutual labels:  word-embeddings
MorphologicalPriorsForWordEmbeddings
Code for EMNLP 2016 paper: Morphological Priors for Probabilistic Word Embeddings
Stars: ✭ 53 (-44.79%)
Mutual labels:  word-embeddings
pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Stars: ✭ 62 (-35.42%)
Mutual labels:  word-embeddings
PromptPapers
Must-read papers on prompt-based tuning for pre-trained language models.
Stars: ✭ 2,317 (+2313.54%)
Mutual labels:  pre-trained-language-models
newt
A web application to visualize and edit pathway models
Stars: ✭ 46 (-52.08%)
Mutual labels:  sif
contextualLSTM
Contextual LSTM for NLP tasks like word prediction and word embedding creation for Deep Learning
Stars: ✭ 28 (-70.83%)
Mutual labels:  word-embeddings
OpenPrompt
An Open-Source Framework for Prompt-Learning.
Stars: ✭ 1,769 (+1742.71%)
Mutual labels:  pre-trained-language-models
word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (-29.17%)
Mutual labels:  word-embeddings
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-76.04%)
Mutual labels:  elmo
ake-datasets
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Stars: ✭ 125 (+30.21%)
Mutual labels:  keyphrase-extraction
position-rank
PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents
Stars: ✭ 89 (-7.29%)
Mutual labels:  keyphrase-extraction
naacl2019-select-pretraining-data-for-ner
BiLSTM-CRF model for NER
Stars: ✭ 15 (-84.37%)
Mutual labels:  elmo

SIFRank

The code of our paper SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model

Versions Notes

  • 2020/02/21——Initial version Provided the most basic functions.
  • 2020/02/28——Second version Added new algorithms DS(document segmentation) and EA(embeddings alignment) to speed up SIFRank and SIFRank+.
  • 2020/03/02——Third version A little change of SIFRank+ in ./model/method.py about making a simple normalization of position_score.

Environment

Python 3.6
nltk 3.4.3
StanfordCoreNLP 3.9.1.1
torch 1.1.0
allennlp 0.8.4

Download

  • ELMo elmo_2x4096_512_2048cnn_2xhighway_options.json and elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5 from here , and save it to the auxiliary_data/ directory
  • StanfordCoreNLP stanford-corenlp-full-2018-02-27 from here, and save it to anywhere

Usage

import nltk
from embeddings import sent_emb_sif, word_emb_elmo
from model.method import SIFRank, SIFRank_plus
from stanfordcorenlp import StanfordCoreNLP
import time

#download from https://allennlp.org/elmo
options_file = "../auxiliary_data/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "../auxiliary_data/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

porter = nltk.PorterStemmer()
ELMO = word_emb_elmo.WordEmbeddings(options_file, weight_file, cuda_device=0)
SIF = sent_emb_sif.SentEmbeddings(ELMO, lamda=1.0)
en_model = StanfordCoreNLP(r'E:\Python_Files\stanford-corenlp-full-2018-02-27',quiet=True)#download from https://stanfordnlp.github.io/CoreNLP/
elmo_layers_weight = [0.0, 1.0, 0.0]

text = "Discrete output feedback sliding mode control of second order systems - a moving switching line approach The sliding mode control systems (SMCS) for which the switching variable is designed independent of the initial conditions are known to be sensitive to parameter variations and extraneous disturbances during the reaching phase. For second order systems this drawback is eliminated by using the moving switching line technique where the switching line is initially designed to pass the initial conditions and is subsequently moved towards a predetermined switching line. In this paper, we make use of the above idea of moving switching line together with the reaching law approach to design a discrete output feedback sliding mode control. The main contributions of this work are such that we do not require to use system states as it makes use of only the output samples for designing the controller. and by using the moving switching line a low sensitivity system is obtained through shortening the reaching phase. Simulation results show that the fast output sampling feedback guarantees sliding motion similar to that obtained using state feedback"
keyphrases = SIFRank(text, SIF, en_model, N=15,elmo_layers_weight=elmo_layers_weight)
keyphrases_ = SIFRank_plus(text, SIF, en_model, N=15, elmo_layers_weight=elmo_layers_weight)
print(keyphrases)
print(keyphrases_)

Evaluate the model

Use this eval/sifrank_eval.py to evaluate SIFRank on Inspec, SemEval2017 and DUC2001 datasets We also have evaluation codes for other baseline models. We will organize and upload them later, so stay tuned. F1 score when the number of keyphrases extracted N is set to 5.

Models Inspec SemEval2017 DUC2001
TFIDF 11.28 12.70 9.21
YAKE 15.73 11.84 10.61
TextRank 24.39 16.43 13.94
SingleRank 24.69 18.23 21.56
TopicRank 22.76 17.10 20.37
PositionRank 25.19 18.23 24.95
Multipartite 23.05 17.39 21.86
RVA 21.91 19.59 20.32
EmbedRank d2v 27.20 20.21 21.74
SIFRank 29.11 22.59 24.27
SIFRank+ 28.49 21.53 30.88

Cite

If you use this code, please cite this paper

@article{DBLP:journals/access/SunQZWZ20,
  author    = {Yi Sun and
               Hangping Qiu and
               Yu Zheng and
               Zhongwei Wang and
               Chaoran Zhang},
  title     = {SIFRank: {A} New Baseline for Unsupervised Keyphrase Extraction Based
               on Pre-Trained Language Model},
  journal   = {{IEEE} Access},
  volume    = {8},
  pages     = {10896--10906},
  year      = {2020},
  url       = {https://doi.org/10.1109/ACCESS.2020.2965087},
  doi       = {10.1109/ACCESS.2020.2965087},
  timestamp = {Fri, 07 Feb 2020 12:04:22 +0100},
  biburl    = {https://dblp.org/rec/journals/access/SunQZWZ20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].