All Projects → ymym3412 → position-rank

ymym3412 / position-rank

Licence: MIT license
PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to position-rank

sqlalchemy-adapter
SQLAlchemy Adapter for PyCasbin
Stars: ✭ 53 (-40.45%)
Mutual labels:  acl
Acl
The Hoa\Acl library.
Stars: ✭ 27 (-69.66%)
Mutual labels:  acl
opentab
开源的轻应用后端(Open Tiny App Backend),轻量,高效,易部署。
Stars: ✭ 27 (-69.66%)
Mutual labels:  acl
graphsim
R package: Simulate Expression data from igraph network using mvtnorm (CRAN; JOSS)
Stars: ✭ 16 (-82.02%)
Mutual labels:  graph-algorithms
sqlx-adapter
Asynchronous casbin adapter for mysql, postgres, sqlite based on sqlx-rs
Stars: ✭ 27 (-69.66%)
Mutual labels:  acl
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-65.17%)
Mutual labels:  graph-algorithms
DPDK SURICATA-4 1 1
dpdk infrastructure for software acceleration. Currently working on RX and ACL pre-filter
Stars: ✭ 81 (-8.99%)
Mutual labels:  acl
A-Persona-Based-Neural-Conversation-Model
No description or website provided.
Stars: ✭ 22 (-75.28%)
Mutual labels:  acl
ake-datasets
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Stars: ✭ 125 (+40.45%)
Mutual labels:  keyphrase-extraction
linguistic-style-transfer-pytorch
Implementation of "Disentangled Representation Learning for Non-Parallel Text Style Transfer(ACL 2019)" in Pytorch
Stars: ✭ 55 (-38.2%)
Mutual labels:  acl
adalanche
Active Directory ACL Visualizer and Explorer - who's really Domain Admin?
Stars: ✭ 862 (+868.54%)
Mutual labels:  acl
Graph-Theory
The Repository is All about the Graph Algorithms. I am Still Working On it. I am trying to Note down all the variations of Popular graph Algorithms. I am also keeping the solution to the problems of Different Online Judges according to the topic. I hope you can find it useful.
Stars: ✭ 16 (-82.02%)
Mutual labels:  graph-algorithms
browser-acl
Simple acceess control (ACL) library for the browser inspired by Laravel's guards and policies.
Stars: ✭ 36 (-59.55%)
Mutual labels:  acl
laminas-permissions-acl
Provides a lightweight and flexible access control list (ACL) implementation for privileges management
Stars: ✭ 29 (-67.42%)
Mutual labels:  acl
ngx-security
Security directives for your Angular application to show/hide elements based on a user roles / permissions.
Stars: ✭ 18 (-79.78%)
Mutual labels:  acl
pgx-samples
Applications using Parallel Graph AnalytiX (PGX) from Oracle Labs
Stars: ✭ 39 (-56.18%)
Mutual labels:  graph-algorithms
swap
A Solver for the Wavelength Assignment Problem (RWA) in WDM networks
Stars: ✭ 27 (-69.66%)
Mutual labels:  graph-algorithms
kaliningraph
🕸️ Graphs, finite fields and discrete dynamical systems in Kotlin
Stars: ✭ 62 (-30.34%)
Mutual labels:  graph-algorithms
emqx-auth-mysql
Authentication, ACL with MySQL Database
Stars: ✭ 52 (-41.57%)
Mutual labels:  acl
TeamReference
Team reference for Competitive Programming. Algorithms implementations very used in the ACM-ICPC contests. Latex template to build your own team reference.
Stars: ✭ 29 (-67.42%)
Mutual labels:  graph-algorithms

PositionRank

PositionRank is a keyphrase extraction method described in the ACL 2017 paper PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents.
This method search keyphrase by graph-based algorithm, which is biased PageRank by co-occurence word's position information.
You can use this method not only English scholarly documents, but also any other language's document if you create your tokenizer for other language.

>>>from position_rank import position_rank
>>>from tokenizer import StanfordCoreNlpTokenizer

>>>title = "PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents."
>>>abstract = """The large and growing amounts of online
... scholarly data present both challenges and
... opportunities to enhance knowledge discovery.
... One such challenge is to automatically
... extract a small set of keyphrases
... from a document that can accurately describe
... the document’s content and can facilitate
... fast information processing. In
... this paper, we propose PositionRank, an
... unsupervised model for keyphrase extraction
... from scholarly documents that incorporates
... information from all positions of a
... word’s occurrences into a biased PageRank.
... Our model obtains remarkable improvements
... in performance over PageRank
... models that do not take into account
... word positions as well as over strong baselines
... for this task. Specifically, on several
... datasets of research papers, PositionRank
... achieves improvements as high as 29.09%."""

>>>tokenizer = StanfordCoreNlpTokenizer("http://localhost", port = 9000)
>>>position_rank(title + abstract, tokenizer)
['Keyphrase_Extraction', 'PositionRank', 'Unsupervised_Approach', 'online_scholarly_data', 'scholarly_documents', 'Scholarly_Documents', 'PageRank_models', 'fast_information_processing', 'unsupervised_model', 'account_word_positions']

SETUP

Prerequirement

For English

Java 1.8+ (for Stanford CoreNLP) (Download)
Stanford CoreNLP 3.7.0 (Download)

For Japanese

Mecab (Installation)

Install Python libraries

$ pip install -r requirements.txt

USAGE (English document)

Start up Stanford CoreNLP Server

First, you start up Stanford CoreNLP server.

$cd /path/to/stanford_corenlp/
$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP -     Threads: 4
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000

Simple example

from position_rank import position_rank
from tokenizer import StanfordCoreNlpTokenizer

title = "PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents."
abstract = """The large and growing amounts of online
scholarly data present both challenges and
opportunities to enhance knowledge discovery.
..."""

tokenizer = StanfordCoreNlpTokenizer("http://localhost", port = 9000)
position_rank(title + abstract, tokenizer)
["keyphrase1", "keyphrase2", ..., "keyphrase10"]

Edit number of output keyphrase.

position_rank(title + abstract, tokenizer, num_keyphrase=5)
["keyphrase1", "keyphrase2", "keyphrase3", "keyphrase4", "keyphrase5"]

Switch other algorith parameters.

position_rank(title + abstract, tokenizer, alpha=0.6, window_size=4, num_keyphrase=10, lang="en")
["keyphrase1", "keyphrase2", ..., "keyphrase10"]

USAGE (Japanese document)

Simple example

from position_rank import position_rank
from tokenizer import MecabTokenizer

title = "{日本語論文のタイトル}"
abstract = "{日本語論文の概要}"

tokenizer = MecabTokenizer()
position_rank(title + abstract, tokenizer, lang="ja")
["keyphrase1", "keyphrase2", ..., "keyphrase10"]

Use dictionary for Mecab. Add Mecab's option string to MecabTokenizer.

from position_rank import position_rank
from tokenizer import MecabTokenizer

title = "{日本語論文のタイトル}"
abstract = "{日本語論文の概要}"

tokenizer = MecabTokenizer("-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
position_rank(title + abstract, tokenizer, lang="ja")
["keyphrase1", "keyphrase2", ..., "keyphrase10"]

Switch other algorith parameters.

position_rank(title + abstract, tokenizer, alpha=0.6, window_size=4, num_keyphrase=10, lang="ja")
["keyphrase1", "keyphrase2", ..., "keyphrase10"]

CUSTOMIZE

You can use PositionRank for other language if you create your tokenizer.
Customize tokenizer must have tokenize() method. tokenize() returns two list, token list and phrase list.
Phrase means continuous tokens which have specific POS(Part-of-Speech) pattern (adjective)*(noun)+ and length are up to 3.
This is sample customize tokenizer.

class CustomizeTokenizer(object):

    def __init__(self):
        # Initialize your tokenizer.

    def tokenize(self, sentence):
        # tokenize sentence and create phrase list, then return them.
        # Tokens must be filterd only adjective and noun POS in your language.
        return token_list, phrase_list

title = "{other language's title}"
abstract = "{other language's abstract}"

tokenizer = CustomTokenizer()
position_rank(title + abstract, tokenizer, alpha=0.85, window_size=6, num_keyphrase=10, lang="custom")
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].