Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

yongzhuo / Nlg Yongzhuo

Licence: mit

中文文本生成（NLG）之文本摘要（text summarization）工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。（graph，feature，topic model，summarize tool or tookit）

Programming Languages

python

139335 projects - #7 most used programming language

Labels

text-summarization nlg lda

Projects that are alternatives of or similar to Nlg Yongzhuo

Extendedsumm

On Generating Extended Summaries of Long Documents

Stars: ✭ 63 (-64%)

Mutual labels: text-summarization

Text Summarizer

Python Framework for Extractive Text Summarization

Stars: ✭ 96 (-45.14%)

Mutual labels: text-summarization

Discobert

Code for paper "Discourse-Aware Neural Extractive Text Summarization" (ACL20)

Stars: ✭ 120 (-31.43%)

Mutual labels: text-summarization

Cvpr2019

Displays all the 2019 CVPR Accepted Papers in a way that they are easy to parse.

Stars: ✭ 65 (-62.86%)

Mutual labels: lda

Lda Topic Modeling

A PureScript, browser-based implementation of LDA topic modeling.

Stars: ✭ 91 (-48%)

Mutual labels: lda

Isl Python

Solutions to labs and excercises from An Introduction to Statistical Learning, as Jupyter Notebooks.

Stars: ✭ 108 (-38.29%)

Mutual labels: lda

Lightlda

fast sampling algorithm based on CGS

Stars: ✭ 49 (-72%)

Mutual labels: lda

Pythonrouge

Python wrapper for evaluating summarization quality by ROUGE package

Stars: ✭ 155 (-11.43%)

Mutual labels: text-summarization

Xnlg

AAAI-20 paper: Cross-Lingual Natural Language Generation via Pre-Training

Stars: ✭ 95 (-45.71%)

Mutual labels: nlg

Textsum Gan

Tensorflow re-implementation of GAN for text summarization

Stars: ✭ 111 (-36.57%)

Mutual labels: text-summarization

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+546.86%)

Mutual labels: text-summarization

Nlp Journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+637.14%)

Mutual labels: lda

Transformersum

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.

Stars: ✭ 107 (-38.86%)

Mutual labels: text-summarization

Multilingual Latent Dirichlet Allocation Lda

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Stars: ✭ 64 (-63.43%)

Mutual labels: lda

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+1106.86%)

Mutual labels: text-summarization

Producttitlesummarizationcorpus

Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"

Stars: ✭ 61 (-65.14%)

Mutual labels: text-summarization

Sttm

Short Text Topic Modeling, JAVA

Stars: ✭ 100 (-42.86%)

Mutual labels: lda

Rouge 2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Stars: ✭ 167 (-4.57%)

Mutual labels: text-summarization

Textsum

Preparing a dataset for TensorFlow text summarization (TextSum) model.

Stars: ✭ 140 (-20%)

Mutual labels: text-summarization

Nlp Papers

Papers and Book to look at when starting NLP 📚

Stars: ✭ 111 (-36.57%)

Mutual labels: nlg

View All Similar Projects ➔

nlg-yongzhuo

Install(安装)

pip install nlg-yongzhuo

API(联合调用, 整合几种算法)

from nlg_yongzhuo import *

doc = """PageRank算法简介。" \
              "是上世纪90年代末提出的一种计算网页权重的算法! " \
              "当时，互联网技术突飞猛进，各种网页网站爆炸式增长。 " \
              "业界急需一种相对比较准确的网页重要性计算方法。 " \
              "是人们能够从海量互联网世界中找出自己需要的信息。 " \
              "百度百科如是介绍他的思想:PageRank通过网络浩瀚的超链接关系来确定一个页面的等级。 " \
              "Google把从A页面到B页面的链接解释为A页面给B页面投票。 " \
              "Google根据投票来源甚至来源的来源，即链接到A页面的页面。 " \
              "和投票目标的等级来决定新的等级。简单的说， " \
              "一个高等级的页面可以使其他低等级页面的等级提升。 " \
              "具体说来就是，PageRank有两个基本思想，也可以说是假设。 " \
              "即数量假设：一个网页被越多的其他页面链接，就越重）。 " \
              "质量假设：一个网页越是被高质量的网页链接，就越重要。 " \
              "总的来说就是一句话，从全局角度考虑，获取重要的信。 """.replace(" ", "").replace('"', '')

# fs可以填其中一个或几个 text_pronouns, text_teaser, mmr, text_rank, lead3, lda, lsi, nmf
res_score = text_summarize(doc, fs=[text_pronouns, text_teaser, mmr, text_rank, lead3, lda, lsi, nmf])
for rs in res_score:
    print(rs)

Usage(调用),详情见/test/目录下

# feature_base
from nlg_yongzhuo import word_significance
from nlg_yongzhuo import text_pronouns
from nlg_yongzhuo import text_teaser
from nlg_yongzhuo import mmr
# graph_base
from nlg_yongzhuo import text_rank
# topic_base
from nlg_yongzhuo import lda
from nlg_yongzhuo import lsi
from nlg_yongzhuo import nmf
# nous_base
from nlg_yongzhuo import lead3


docs ="和投票目标的等级来决定新的等级.简单的说。" \
          "是上世纪90年代末提出的一种计算网页权重的算法! " \
          "当时，互联网技术突飞猛进，各种网页网站爆炸式增长。" \
          "业界急需一种相对比较准确的网页重要性计算方法。" \
          "是人们能够从海量互联网世界中找出自己需要的信息。" \
          "百度百科如是介绍他的思想:PageRank通过网络浩瀚的超链接关系来确定一个页面的等级。" \
          "Google把从A页面到B页面的链接解释为A页面给B页面投票。" \
          "Google根据投票来源甚至来源的来源，即链接到A页面的页面。" \
          "一个高等级的页面可以使其他低等级页面的等级提升。" \
          "具体说来就是，PageRank有两个基本思想，也可以说是假设。" \
          "即数量假设：一个网页被越多的其他页面链接，就越重）。" \
          "质量假设：一个网页越是被高质量的网页链接，就越重要。" \
          "总的来说就是一句话，从全局角度考虑，获取重要的信。"
# 1. word_significance
sums_word_significance = word_significance.summarize(docs, num=6)
print("word_significance:")
for sum_ in sums_word_significance:
    print(sum_)

# 2. text_pronouns
sums_text_pronouns = text_pronouns.summarize(docs, num=6)
print("text_pronouns:")
for sum_ in sums_text_pronouns:
    print(sum_)

# 3. text_teaser
sums_text_teaser = text_teaser.summarize(docs, num=6)
print("text_teaser:")
for sum_ in sums_text_teaser:
    print(sum_)
# 4. mmr
sums_mmr = mmr.summarize(docs, num=6)
print("mmr:")
for sum_ in sums_mmr:
    print(sum_)
# 5.text_rank
sums_text_rank = text_rank.summarize(docs, num=6)
print("text_rank:")
for sum_ in sums_text_rank:
    print(sum_)
# 6. lda
sums_lda = lda.summarize(docs, num=6)
print("lda:")
for sum_ in sums_lda:
    print(sum_)
# 7. lsi
sums_lsi = lsi.summarize(docs, num=6)
print("mmr:")
for sum_ in sums_lsi:
    print(sum_)
# 8. nmf
sums_nmf = nmf.summarize(docs, num=6)
print("nmf:")
for sum_ in sums_nmf:
    print(sum_)
# 9. lead3
sums_lead3 = lead3.summarize(docs, num=6)
print("lead3:")
for sum_ in sums_lead3:
    print(sum_)

nlg_yongzhuo

- text_summary
- text_augnment(todo)
- text_generation(todo)
- text_translation(todo)

run(运行, 以text_teaser为例)

- 1. 直接进入目录文件运行即可, 例如进入:nlg_yongzhuo/text_summary/feature_base/
- 2. 运行: python text_teaser.py

nlg_yongzhuo/data

哈工大的新浪微博短文本摘要LCSTS
教育新闻自动摘要语料chinese_abstractive_corpus
NLPCC 2017 task3Single Document Summarization
娱乐新闻等“神策杯”2018高校算法大师赛

模型与论文paper与地址

pagerank: The PageRank citation ranking: Bringing order to the Web. 1999
textrank: TextRank: Bringing Order into Texts
textteaser: [Automatic Text Summarization for Indonesian Language Using TextTeaser]
significance: The Automatic Creation of Literature Abstracts*
LSI: Text summarization using Latent Semantic Analysis
LDA: Latent Dirichlet Allocation

参考/感谢

文本摘要综述: https://github.com/icoxfog417/awesome-text-summarization
textteaser: https://github.com/IndigoResearch/textteaser
NaiveSumm: https://github.com/amsqr/NaiveSumm
ML主题模型: https://github.com/ljpzzz/machinelearning

*希望对你有所帮助!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 175

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗