All Projects → AlirezaTheH → perke

AlirezaTheH / perke

Licence: MIT license
A keyphrase extractor for Persian

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to perke

Cogcomp Nlpy
CogComp's light-weight Python NLP annotators
Stars: ✭ 115 (+91.67%)
Mutual labels:  text-mining, data-mining, text-processing
PersianStemmer-Python
PersianStemmer-Python
Stars: ✭ 43 (-28.33%)
Mutual labels:  information-retrieval, persian, persian-language
ake-datasets
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Stars: ✭ 125 (+108.33%)
Mutual labels:  information-retrieval, keyword-extraction, keyphrase-extraction
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-73.33%)
Mutual labels:  text-mining, data-mining, text-processing
advanced-text-mining
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Stars: ✭ 15 (-75%)
Mutual labels:  text-mining, data-mining, text-processing
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+51.67%)
Mutual labels:  text-mining, data-mining, text-processing
Rmdl
RMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (+525%)
Mutual labels:  information-retrieval, text-mining, data-mining
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+480%)
Mutual labels:  text-mining, data-mining, text-processing
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+146.67%)
Mutual labels:  text-mining, data-mining, text-processing
bookworm
📚 social networks from novels
Stars: ✭ 72 (+20%)
Mutual labels:  information-retrieval, data-mining
learning2hash.github.io
Website for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
Stars: ✭ 14 (-76.67%)
Mutual labels:  information-retrieval, text-mining
evildork
Evildork targeting your fiancee👁️
Stars: ✭ 46 (-23.33%)
Mutual labels:  information-retrieval, keyword
ml-nlp-services
机器学习、深度学习、自然语言处理
Stars: ✭ 23 (-61.67%)
Mutual labels:  information-retrieval, data-mining
AILA-Artificial-Intelligence-for-Legal-Assistance
Python implementations of the various methods used in FIRE 2019 conference.
Stars: ✭ 39 (-35%)
Mutual labels:  information-retrieval, data-mining
kex
Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.
Stars: ✭ 46 (-23.33%)
Mutual labels:  information-retrieval, keyword-extraction
Gwu data mining
Materials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+261.67%)
Mutual labels:  text-mining, data-mining
Fxt
A large scale feature extraction tool for text-based machine learning
Stars: ✭ 25 (-58.33%)
Mutual labels:  information-retrieval, text-processing
Wordtokenizers.jl
High performance tokenizers for natural language processing and other related tasks
Stars: ✭ 63 (+5%)
Mutual labels:  information-retrieval, data-mining
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+106.67%)
Mutual labels:  information-retrieval, text-processing
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+101.67%)
Mutual labels:  information-retrieval, text-mining

Perke

Build Status Documentation Status PyPI Version Python Versions

Perke is a Python keyphrase extraction package for Persian language. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extended to develop new models.

Installation

  • The easiest way to install is from PyPI:
    pip install perke
    Alternatively, you can install directly from GitHub:
    pip install git+https://github.com/alirezatheh/perke.git
  • Perke also requires a trained POS tagger model. We use hazm's tagger model. You can easily download latest hazm's resources (tagger and parser models) using the following command:
    python -m perke download
    Alternatively, you can use another model with same tag names and structure, and put it in the resources directory.

Simple Example

Perke provides a standardized API for extracting keyphrases from a text. Start by typing the 4 lines below to use TextRank keyphrase extractor.

from perke.unsupervised.graph_based import TextRank

# Define the set of valid part of speech tags to occur in the model.
valid_pos_tags = {'N', 'Ne', 'AJ', 'AJe'}

# 1. Create a TextRank extractor.
extractor = TextRank(valid_pos_tags=valid_pos_tags)

# 2. Load the text.
extractor.load_text(input='text or path/to/input_file',
                    word_normalization_method=None)

# 3. Build the graph representation of the text and weight the
#    words. Keyphrase candidates are composed from the 33 percent
#    highest weighted words.
extractor.weight_candidates(window_size=2, top_t_percent=0.33)

# 4. Get the 10 highest weighted candidates as keyphrases.
keyphrases = extractor.get_n_best(n=10)

For other models, see the examples directory.

Documentation

Documentation and references are available at Read The Docs.

Implemented Models

Perke currently, implements the following keyphrase extraction models:

  • Unsupervised models
    • Graph-based models
      • TextRank: article by Mihalcea and Tarau, 2004
      • SingleRank: article by Wan and Xiao, 2008
      • TopicRank: article by Bougouin, Boudin and Daille, 2013
      • PositionRank: article by Florescu and Caragea, 2017
      • MultipartiteRank: article by Boudin, 2018

Acknowledgements

Perke is inspired by pke.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].