All Projects → BALaka-18 → rake_new2

BALaka-18 / rake_new2

Licence: MIT license
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to rake new2

tagify
Tagify produces a set of tags from a given source. Source can be either an HTML page, a Markdown document or a plain text. Supports English, Russian, Chinese, Hindi, Spanish, Arabic, Japanese, German, Hebrew, French and Korean languages.
Stars: ✭ 24 (+4.35%)
Mutual labels:  keywords, keyword-extraction
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (+43.48%)
Mutual labels:  keywords, keyword-extraction
keywordsextract
keywords-extract - Command line tool extract keywords from any web page.
Stars: ✭ 50 (+117.39%)
Mutual labels:  keywords, keyword-extraction
html-comment-regex
Regular expression for matching HTML comments
Stars: ✭ 15 (-34.78%)
Mutual labels:  text
text-generator
Golang text generator for generate SEO texts
Stars: ✭ 18 (-21.74%)
Mutual labels:  text
react-native-styled-text
Styled Text for React Native
Stars: ✭ 57 (+147.83%)
Mutual labels:  text
instagram-text-editor
An Instagram like text editor Flutter widget that helps you to change your text style.
Stars: ✭ 66 (+186.96%)
Mutual labels:  text
super rich text
The easiest way to style custom text snippets in flutter
Stars: ✭ 14 (-39.13%)
Mutual labels:  text
aframe-bmfont-text-component
A-Frame component for rendering bitmap fonts.
Stars: ✭ 62 (+169.57%)
Mutual labels:  text
probabilistic nlg
Tensorflow Implementation of Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation (NAACL 2019).
Stars: ✭ 28 (+21.74%)
Mutual labels:  text
ogrep-rs
Outline grep — search in indentation-structured texts (Rust version)
Stars: ✭ 32 (+39.13%)
Mutual labels:  text
Keyword-Extracter
Problem Statement: Given a particular PDF/Text document ,How to extract keywords and arrange in order of their weightage using Python?
Stars: ✭ 17 (-26.09%)
Mutual labels:  keyword-extraction
dobbi
An open-source NLP library: fast text cleaning and preprocessing
Stars: ✭ 21 (-8.7%)
Mutual labels:  text
Text2Image
The most useful & easy2use PHP library for converting any text into image
Stars: ✭ 29 (+26.09%)
Mutual labels:  text
link text
Easy to use text widget for Flutter apps, which converts inlined urls into working, clickable links
Stars: ✭ 20 (-13.04%)
Mutual labels:  text
textics
📉 JavaScript Text Statistics that counts lines, words, chars, and spaces.
Stars: ✭ 36 (+56.52%)
Mutual labels:  text
Thirukkural-Tamil-Dataset
திருக்குறள் by திருவள்ளுவர்.
Stars: ✭ 44 (+91.3%)
Mutual labels:  text
regXwild
⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.
Stars: ✭ 20 (-13.04%)
Mutual labels:  text
awesome-search-engine-optimization
A curated list of backlink, social signal opportunities, and link building strategies and tactics to help improve search engine results and ranking.
Stars: ✭ 82 (+256.52%)
Mutual labels:  keywords
text-classification-baseline
Pipeline for fast building text classification TF-IDF + LogReg baselines.
Stars: ✭ 55 (+139.13%)
Mutual labels:  text

PyPI PyPI - Python Version GitHub Maintenance

GitHub issues GitHub forks GitHub stars



ABOUT THIS PROJECT

rake_new2

rake_new2 is a Python library that enables simple and fast keyword extraction from any text. This library helps beginners or those lost while finding keywords, understand which keywords are more important.

HOW IS THIS DIFFERENT FROM ANY OTHER ALGORITHM ? : This library gives you weights/scores along with each keyword/keyphrase. This helps you pick out the correct key-phrases. Just choose the ones with more weights.

Demo

New in version 1.0.5

  1. Handles repetitive keywords/key-phrases

  2. Handles consecutive punctuations.

  3. Handles HTML tags in text : The user is allowed an option to choose if they want to keep HTML tags as keywords too.

Demo 2

Installation

Use the package manager pip to install rake_new2.

pip install rake_new2

Quick Start

from rake_new2 import Rake

text = "Red apples are good in taste."
text2 = "<h1> Hello world !</h1>"
rk,rk_new1,rk_new2 = Rake(),Rake(keep_html_tags=True),Rake(keep_html_tags=False)

# Case 1
# Initialize
rk.get_keywords_from_raw_text(text)
kw_s = rk.get_keywords_with_scores()
# Returns keywords with degree scores : {(1.0, 'taste'), (1.0, 'good'), (4.0, 'red apples')}
kw = rk.get_ranked_keywords()
# Returns keywords only : ['red apples', 'taste', 'good']
f = rk.get_word_freq()
# Returns word frequencies as a Counter object : {'red': 1, 'apples': 1, 'good': 1, 'taste': 1}
deg = rk.get_kw_degree()
# Returns word degrees as defaultdict object : {'red': 2.0, 'apples': 2.0, 'good': 1.0, 'taste': 1.0}

# Case 2 : Sample case for testing the 'keep_html_tags' parameter. Default = False
print("\nORIGINAL TEXT : {}".format(text))
# Sub Case 1 : Keeping the HTMLtags
rk_new1.get_keywords_from_raw_text(text2)
kw_s1 = rk_new1.get_keywords_with_scores()
kw1 = rk_new1.get_ranked_keywords()
print("Keeping the tags : ",kw1)

# Sub Case 2 : Eliminating the HTML tags
rk_new2.get_keywords_from_raw_text(text2)
kw_s2 = rk_new2.get_keywords_with_scores()
kw2 = rk_new2.get_ranked_keywords()
print("Eliminating the tags : ",kw2)

'''OUTPUT >>
ORIGINAL TEXT : <h1> Hello world !</h1>
Keeping the tags :  {'h1', 'hello'}
Eliminating the tags :  {'hello world'}
'''

Debugging

You might come across a stopwords error.

It implies that you do not have the stopwords corpus downloaded from NLTK.

To download it, use the command below.

python -c "import nltk; nltk.download('stopwords')"

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Contributors

Student Name GitHub ID Merged PR No. Open source programme name If DWOC, level of PR
Sabarish Rajamohan sabarish98 #16 Hacktoberfest --
Soham Kar 2bit-hack #20 Hacktoberfest --
Jawen Voon jawsvk #26 Hacktoberfest --
Ananthakrishnan Nair RS akrish4 #47 DWOC Level-1
Tushar Nankani tusharnankani #43 DWOC Level-3
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].