Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+2878.95%)

Mutual labels: text-summarization, gensim

Kr Wordrank

비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다

Stars: ✭ 182 (+378.95%)

Mutual labels: text-summarization

TextSumma

reimplementing Neural Summarization by Extracting Sentences and Words

Stars: ✭ 16 (-57.89%)

Mutual labels: text-summarization

Rouge 2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Stars: ✭ 167 (+339.47%)

Mutual labels: text-summarization

Text Summarizer Pytorch

Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network

Stars: ✭ 203 (+434.21%)

Mutual labels: text-summarization

Financial-News-Analysis

招商银行FinTech-复赛-财经新闻分析

Stars: ✭ 17 (-55.26%)

Mutual labels: gensim

Nlg Yongzhuo

中文文本生成（NLG）之文本摘要（text summarization）工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。（graph，feature，topic model，summarize tool or tookit）

Stars: ✭ 175 (+360.53%)

Mutual labels: text-summarization

NLP-Extractive-NEWS-summarization-using-MMR

A simple python implementation of the Maximal Marginal Relevance (MMR) baseline system for text summarization.

Stars: ✭ 59 (+55.26%)

Mutual labels: text-summarization

Pythonrouge

Python wrapper for evaluating summarization quality by ROUGE package

Stars: ✭ 155 (+307.89%)

Mutual labels: text-summarization

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (+18.42%)

Mutual labels: gensim

Textsum

Preparing a dataset for TensorFlow text summarization (TextSum) model.

Stars: ✭ 140 (+268.42%)

Mutual labels: text-summarization

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-42.11%)

Mutual labels: gensim

View All Similar Projects ➔

For more details you can refer to paper in the following link If you find this repository helpful, please cite the paper

Persian-Summarization

Statistical and semantical text summarizer in Persian language

It’s a project for text summarization in Persian language. It uses text summarization of Gensim python library for implementing TextRank algorithm. This algorithm assumes each sentence a node in a graph and returns nodes with highest relation with other nodes (sentences). In other words it returns most important nodes with some statistical calculation and does not include any semantics of the sentences. For instance if you use different words for the same meaning it won’t recognize and assumes they are different which in reality they are not. For solving this problem and including semantic in the result I trained a doc2vec model by doc2vec.py in Genism with Hamshahri corpus as training set. The doc2vec model is included in the repository (my_model_sents_from_res2.doc2vec). I used this model for calculating similarity of two sentences for weighting the graph edges. (instead of weighting based on some tf-idf algorithm which is used in Gensim) and return the result by TextRank algorithm.

Some modification is made on Gensim library for making it compatible with Persian language, I used Hazm library for text normalizing, sentence tokenizing and POS tagging.

Python pagages versions you need to install on your device

pip install six == 1.11.0

pip install gensim == 3.1.0

pip install numpy == 1.11.3

pip install scipy == 1.0.0

pip install hazm==0.5.2

How to start

copy summarization file and replace it with the one in Gensim library. In play.py you can see an example of text summarization with the command below:

summarize(text, ratio, word_count)

ratio is 0.2 and word_count is None by default. ratio returns the fraction of the input text you want to summarize and word_count specify minimum number of words you want in the result summarization.

You can train your own doc2vec model and load that in your project instead of the file included in project also POS tagger model in resource folder as well. The stopwords in STOPWORD file is obtained from persian-stopwords

Thanks

I developed this project at Irsapardaz Pasargad. Thanks to Mr. Amin Mozhgani for his selfless helps during this project.

Contact

[email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

minasmz / Persian-Summarization

Programming Languages

Labels

Projects that are alternatives of or similar to Persian-Summarization

Persian-Summarization

Statistical and semantical text summarizer in Persian language

Python pagages versions you need to install on your device

How to start

Thanks

Contact