Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → csurfer → Rake Nltk

csurfer / Rake Nltk

Licence: mit

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

algorithm text-mining nltk

Projects that are alternatives of or similar to Rake Nltk

Awesome Text Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Stars: ✭ 158 (-80.08%)

Mutual labels: text-mining, nltk

Python nlp tutorial

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

Stars: ✭ 72 (-90.92%)

Mutual labels: text-mining, nltk

Introduction-to-text-mining-with-Python

Lectures in Urban Data Science Lab, Seoul

Stars: ✭ 25 (-96.85%)

Mutual labels: text-mining, nltk

Orange3 Text

🍊 📄 Text Mining add-on for Orange3

Stars: ✭ 83 (-89.53%)

Mutual labels: text-mining, nltk

Pyshorttextcategorization

Various Algorithms for Short Text Mining

Stars: ✭ 429 (-45.9%)

Mutual labels: algorithm, text-mining

Algorithm interview notes Chinese

学习笔记：自然语言处理(NLP)/深度学习(Deep Learning)/机器学习(Machine Learning)/Python/Pytorch

Stars: ✭ 29 (-96.34%)

Mutual labels: algorithm

Numerical Computing Is Fun

Learning numerical computing with notebooks for all ages.

Stars: ✭ 730 (-7.94%)

Mutual labels: algorithm

Ahocorasickdoublearraytrie

An extremely fast implementation of Aho Corasick algorithm based on Double Array Trie.

Stars: ✭ 695 (-12.36%)

Mutual labels: algorithm

Arabiccompetitiveprogramming

The repository contains the ENGLISH description files attached to the video series in my ARABIC algorithms channel.

Stars: ✭ 675 (-14.88%)

Mutual labels: algorithm

Algorithm

Algorithm is a library of tools that is used to create intelligent applications.

Stars: ✭ 787 (-0.76%)

Mutual labels: algorithm

Android Notes

Android开发核心知识点笔记（不断更新中🔥）

Stars: ✭ 737 (-7.06%)

Mutual labels: algorithm

Dc Notes

自己的学习笔记。包含：21届秋招经历、🐂客面经问题按照频率总结、Java一系列知识、数据库、分布式、微服务、前端、技术面试、工具教程等(持续更新)

Stars: ✭ 714 (-9.96%)

Mutual labels: algorithm

Tech Refrigerator

🍰 기술 냉장고입니다. 🛒 기술 면접 , 전공 시험 , 지식 함양 등 분명 도움될 거예요! 🤟

Stars: ✭ 699 (-11.85%)

Mutual labels: algorithm

Android Notes

✨✨✨这有一包小鱼干，确定不要吃嘛？( 逃

Stars: ✭ 732 (-7.69%)

Mutual labels: algorithm

Jcsprout

👨‍🎓 Java Core Sprout : basic, concurrent, algorithm

Stars: ✭ 26,536 (+3246.28%)

Mutual labels: algorithm

Pyswarms

A research toolkit for particle swarm optimization in Python

Stars: ✭ 742 (-6.43%)

Mutual labels: algorithm

Nltk data

NLTK Data

Stars: ✭ 675 (-14.88%)

Mutual labels: nltk

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (-9.84%)

Mutual labels: text-mining

2021 Postgraduate 408

💯✍备考2021年研究生-408

Stars: ✭ 730 (-7.94%)

Mutual labels: algorithm

Turf

A modular geospatial engine written in JavaScript

Stars: ✭ 6,659 (+739.72%)

Mutual labels: algorithm

View All Similar Projects ➔

rake-nltk

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install

Quick start

from rake_nltk import Rake

# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()

# Extraction given the text.
r.extract_keywords_from_text(<text to process>)

# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)

# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()

# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()

Debugging Setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk; nltk.download('stopwords')"

References

This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

Pull requests are most welcome.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 793

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (20) 🔗