All Projects → dpressel → textrank-js

dpressel / textrank-js

Licence: other
TextRank algorithm implementation in Javascript

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to textrank-js

KeywordExtraction
Implementation of algorithm in keyword extraction,including TextRank,TF-IDF and the combination of both
Stars: ✭ 95 (+171.43%)
Mutual labels:  textrank
Textrank4zh
🌳从中文文本中自动提取关键词和摘要
Stars: ✭ 2,518 (+7094.29%)
Mutual labels:  textrank
Pytextrank
Python implementation of TextRank for phrase extraction and summarization of text documents
Stars: ✭ 1,675 (+4685.71%)
Mutual labels:  textrank
TextRank-node
No description or website provided.
Stars: ✭ 21 (-40%)
Mutual labels:  textrank
FG
基于Nonebot的QQ群机器人🤖️,特色功能是利用机器学习算法,基于每日聊天记录生成每日总结。可在酷Q/Mirai平台上运行
Stars: ✭ 74 (+111.43%)
Mutual labels:  textrank
TextSummarizer
TextRank implementation for C#
Stars: ✭ 29 (-17.14%)
Mutual labels:  textrank
TextRankPlus
基于深度学习的中文NLP工具
Stars: ✭ 36 (+2.86%)
Mutual labels:  textrank
weibo-summary
微博自动摘要系统 Chinese Microblog Automatic Summary System
Stars: ✭ 28 (-20%)
Mutual labels:  textrank
Keywords-Abstract-TFIDF-TextRank4ZH
使用tf-idf, TextRank4ZH等不同方式从中文文本中提取关键字,从中文文本中提取摘要和关键词
Stars: ✭ 26 (-25.71%)
Mutual labels:  textrank
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-34.29%)
Mutual labels:  textrank

textrank-js

TextRank is an algorithm for Text Summarization, by Rada Mihalcea & Paul Tarau. The code here is based on their paper "TextRank: Bringing Order into Texts". I've noticed that there are many implementations out there, but this one is intended to demonstrate the algorithm without any additional baggage. Also, unlike many other implementations I have seen, this has no algorithm dependencies and could also work in the browser. I wanted to show how elegant, simple and clean the algorithm is, so I kept it short -- about 130 lines of Javascript (ES5). It currently depends only on lodash ('_'), a standard JS library used in many (most?) projects, for a few choice zingey one-liners.

The algorithm itself can extend to any type of graph, as they note in their paper, but I have provided the two types of graphs explored in the paper: keyword extraction an undirected graph derived from collocation, and a sentence extraction graph using the similarity weighting (as described in the paper) on the edges. There is a function for building a graph of each type, and once the graph has been built, the textRank function performs the algorithm on the generated graph.

Note this code only implements the TextRank algorithm itself, the sentences must be properly formatted upfront. I have provided example tokenization for both tasks in the tests directory, both derived from tokenizing the Wikipedia entry for "Automatic summarization", both minimally processed using a custom (very minimal) tokenizer, and OpenNLP's default models for sentence splitting and POS, and converted to JSON. As long as you get the format right that this is expecting, you should be able to use whatever library you want to preprocess. The keyword extraction builder needs the format to include POS tags since it filters the content while it is building its adjacencies. The sentence extraction builder does not require POS, but requires pre-split sentences.

The "tests" are not currently testing anything, but serve as demonstration code for how to run the software. Note that textRank() has a default number of iterations -- it doesnt try and test for convergence. This is just to keep it simple, it would be simple to modify to test this instead, but for now you can pass in any number you want if that default isnt suitable (see test examples).

Build using Grunt:

$ npm install
$ grunt

Here's what the output looks like for the first 5 extracted sentences when performing the sentence extraction task:

automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document

two particular types of summarization often addressed in the literature are keyphrase extraction where the goal is to select individual words or phrases to tag a document and document summarization where the goal is to select whole sentences to create a short paragraph summary

in general abstraction can condense a text more strongly than extraction but the programs that can do this are harder to develop as they require the use of natural language generation technology which itself is a growing field

while some work has been done in abstractive summarization creating an abstract synopsis like that of a human the majority of summarization systems are extractive selecting a subset of sentences to place in a summary

apart from fully automated summarizers fas there are systems that aid users with the task of summarization mahs = machine aided human summarization for example by highlighting candidate passages to be included in the summary and there are systems that depend on post-processing by a human hams = human aided machine summarization

You are welcome to use this code for whatever nefarious purposes, but please attribute it to this implementation if you do.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].