All Projects → minasmz → Persian-Summarization

minasmz / Persian-Summarization

Licence: other
Statistical and Semantical Text Summarizer in Persian Language

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Persian-Summarization

TextRank-node
No description or website provided.
Stars: ✭ 21 (-44.74%)
Mutual labels:  text-summarization, textrank-algorithm
Scripts-for-extractive-summarization
Scripts for an upcoming blog "Extractive vs. Abstractive Summarization" for RaRe Technologies.
Stars: ✭ 12 (-68.42%)
Mutual labels:  text-summarization, gensim
PersianNER
Named-Entity Recognition in Persian Language
Stars: ✭ 48 (+26.32%)
Mutual labels:  persian-language, persian-nlp
perstem
Persian stemmer and morphological analyzer
Stars: ✭ 18 (-52.63%)
Mutual labels:  persian-language, persian-nlp
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-21.05%)
Mutual labels:  gensim, doc2vec-model
PersianStemmer-Python
PersianStemmer-Python
Stars: ✭ 43 (+13.16%)
Mutual labels:  persian-language, persian-nlp
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (+200%)
Mutual labels:  persian-language, persian-nlp
TextSummarizer
TextRank implementation for C#
Stars: ✭ 29 (-23.68%)
Mutual labels:  text-summarization, textrank-algorithm
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+2878.95%)
Mutual labels:  text-summarization, gensim
Kr Wordrank
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Stars: ✭ 182 (+378.95%)
Mutual labels:  text-summarization
TextSumma
reimplementing Neural Summarization by Extracting Sentences and Words
Stars: ✭ 16 (-57.89%)
Mutual labels:  text-summarization
Rouge 2.0
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Stars: ✭ 167 (+339.47%)
Mutual labels:  text-summarization
Text Summarizer Pytorch
Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network
Stars: ✭ 203 (+434.21%)
Mutual labels:  text-summarization
Financial-News-Analysis
招商银行FinTech-复赛-财经新闻分析
Stars: ✭ 17 (-55.26%)
Mutual labels:  gensim
Nlg Yongzhuo
中文文本生成(NLG)之文本摘要(text summarization)工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。(graph,feature,topic model,summarize tool or tookit)
Stars: ✭ 175 (+360.53%)
Mutual labels:  text-summarization
NLP-Extractive-NEWS-summarization-using-MMR
A simple python implementation of the Maximal Marginal Relevance (MMR) baseline system for text summarization.
Stars: ✭ 59 (+55.26%)
Mutual labels:  text-summarization
Pythonrouge
Python wrapper for evaluating summarization quality by ROUGE package
Stars: ✭ 155 (+307.89%)
Mutual labels:  text-summarization
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (+18.42%)
Mutual labels:  gensim
Textsum
Preparing a dataset for TensorFlow text summarization (TextSum) model.
Stars: ✭ 140 (+268.42%)
Mutual labels:  text-summarization
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-42.11%)
Mutual labels:  gensim

For more details you can refer to paper in the following link If you find this repository helpful, please cite the paper

Persian-Summarization

Statistical and semantical text summarizer in Persian language

It’s a project for text summarization in Persian language. It uses text summarization of Gensim python library for implementing TextRank algorithm. This algorithm assumes each sentence a node in a graph and returns nodes with highest relation with other nodes (sentences). In other words it returns most important nodes with some statistical calculation and does not include any semantics of the sentences. For instance if you use different words for the same meaning it won’t recognize and assumes they are different which in reality they are not. For solving this problem and including semantic in the result I trained a doc2vec model by doc2vec.py in Genism with Hamshahri corpus as training set. The doc2vec model is included in the repository (my_model_sents_from_res2.doc2vec). I used this model for calculating similarity of two sentences for weighting the graph edges. (instead of weighting based on some tf-idf algorithm which is used in Gensim) and return the result by TextRank algorithm.

Some modification is made on Gensim library for making it compatible with Persian language, I used Hazm library for text normalizing, sentence tokenizing and POS tagging.

Python pagages versions you need to install on your device

pip install six == 1.11.0

pip install gensim == 3.1.0

pip install numpy == 1.11.3

pip install scipy == 1.0.0

pip install hazm==0.5.2

How to start

copy summarization file and replace it with the one in Gensim library. In play.py you can see an example of text summarization with the command below:

summarize(text, ratio, word_count)

ratio is 0.2 and word_count is None by default. ratio returns the fraction of the input text you want to summarize and word_count specify minimum number of words you want in the result summarization.

You can train your own doc2vec model and load that in your project instead of the file included in project also POS tagger model in resource folder as well. The stopwords in STOPWORD file is obtained from persian-stopwords

Thanks

I developed this project at Irsapardaz Pasargad. Thanks to Mr. Amin Mozhgani for his selfless helps during this project.

Contact

[email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].