All Projects → csurfer → Rake Nltk

csurfer / Rake Nltk

Licence: mit
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rake Nltk

Awesome Text Classification
Awesome-Text-Classification Projects,Papers,Tutorial .
Stars: ✭ 158 (-80.08%)
Mutual labels:  text-mining, nltk
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (-90.92%)
Mutual labels:  text-mining, nltk
Introduction-to-text-mining-with-Python
Lectures in Urban Data Science Lab, Seoul
Stars: ✭ 25 (-96.85%)
Mutual labels:  text-mining, nltk
Orange3 Text
🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (-89.53%)
Mutual labels:  text-mining, nltk
Pyshorttextcategorization
Various Algorithms for Short Text Mining
Stars: ✭ 429 (-45.9%)
Mutual labels:  algorithm, text-mining
Algorithm interview notes Chinese
学习笔记:自然语言处理(NLP)/深度学习(Deep Learning)/机器学习(Machine Learning)/Python/Pytorch
Stars: ✭ 29 (-96.34%)
Mutual labels:  algorithm
Numerical Computing Is Fun
Learning numerical computing with notebooks for all ages.
Stars: ✭ 730 (-7.94%)
Mutual labels:  algorithm
Ahocorasickdoublearraytrie
An extremely fast implementation of Aho Corasick algorithm based on Double Array Trie.
Stars: ✭ 695 (-12.36%)
Mutual labels:  algorithm
Arabiccompetitiveprogramming
The repository contains the ENGLISH description files attached to the video series in my ARABIC algorithms channel.
Stars: ✭ 675 (-14.88%)
Mutual labels:  algorithm
Algorithm
Algorithm is a library of tools that is used to create intelligent applications.
Stars: ✭ 787 (-0.76%)
Mutual labels:  algorithm
Android Notes
Android开发核心知识点笔记(不断更新中🔥)
Stars: ✭ 737 (-7.06%)
Mutual labels:  algorithm
Dc Notes
自己的学习笔记。包含:21届秋招经历、🐂客面经问题按照频率总结、Java一系列知识、数据库、分布式、微服务、前端、技术面试、工具教程等(持续更新)
Stars: ✭ 714 (-9.96%)
Mutual labels:  algorithm
Tech Refrigerator
🍰 기술 냉장고입니다. 🛒 기술 면접 , 전공 시험 , 지식 함양 등 분명 도움될 거예요! 🤟
Stars: ✭ 699 (-11.85%)
Mutual labels:  algorithm
Android Notes
✨✨✨这有一包小鱼干,确定不要吃嘛?( 逃
Stars: ✭ 732 (-7.69%)
Mutual labels:  algorithm
Jcsprout
👨‍🎓 Java Core Sprout : basic, concurrent, algorithm
Stars: ✭ 26,536 (+3246.28%)
Mutual labels:  algorithm
Pyswarms
A research toolkit for particle swarm optimization in Python
Stars: ✭ 742 (-6.43%)
Mutual labels:  algorithm
Nltk data
NLTK Data
Stars: ✭ 675 (-14.88%)
Mutual labels:  nltk
Text2vec
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
Stars: ✭ 715 (-9.84%)
Mutual labels:  text-mining
2021 Postgraduate 408
💯✍备考2021年研究生-408
Stars: ✭ 730 (-7.94%)
Mutual labels:  algorithm
Turf
A modular geospatial engine written in JavaScript
Stars: ✭ 6,659 (+739.72%)
Mutual labels:  algorithm

rake-nltk

pypiv pyv Build Status Coverage Status Licence Thanks

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Demo

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install

Quick start

from rake_nltk import Rake

# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()

# Extraction given the text.
r.extract_keywords_from_text(<text to process>)

# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)

# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()

# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()

Debugging Setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk; nltk.download('stopwords')"

References

This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

  • It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
  • There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
  • I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

Pull requests are most welcome.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using

Donate

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].