Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mattzheng → Ltpextraction

mattzheng / Ltpextraction

基于ltp的简单评论观点抽取模块

Labels

jupyter-notebook

Projects that are alternatives of or similar to Ltpextraction

Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation (http://fcn.berkeleyvision.org)

Stars: ✭ 1,230 (+1418.52%)

Mutual labels: jupyter-notebook

Dl in nlp deeppavlov cs224n spring2020

"Deep Learning in Natural Language Processing" - a course by DeepPavlov built on top of Stanford's cs224n

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Learn machine learning

Road to Machine Learning

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Slides and materials for most of my talks by year

Stars: ✭ 80 (-1.23%)

Mutual labels: jupyter-notebook

Wellnessconversation Languagemodel

Korean Language Model을 이용한 심리상담 대화 언어 모델.

Stars: ✭ 80 (-1.23%)

Mutual labels: jupyter-notebook

Knowledge Graph Toolkit

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Hands On Algorithmic Problem Solving

A middle-to-high level algorithm book designed with coding interview at heart!

Stars: ✭ 1,227 (+1414.81%)

Mutual labels: jupyter-notebook

A collection of examples for the ML.NET machine learning package from Microsoft

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Embed strange attractors using a regularizer for autoencoders

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Neural Structural Optimization

Neural reparameterization improves structural optimization

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

D3 Js Step By Step

http://zeroviscosity.com/category/d3-js-step-by-step

Stars: ✭ 80 (-1.23%)

Mutual labels: jupyter-notebook

Style Semantics

Code for the paper "Controlling Style and Semantics in Weakly-Supervised Image Generation", ECCV 2020

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

MIMIC Code Repository: Code shared by the research community for the MIMIC-III database

Stars: ✭ 1,225 (+1412.35%)

Mutual labels: jupyter-notebook

Scala Cheatsheet

The Biggest Scala Cheat-Sheet.

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Attention Transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Stars: ✭ 1,231 (+1419.75%)

Mutual labels: jupyter-notebook

Object Detection On Thermal Images

Robust Object Classification of Occluded Objects in Forward Looking Infrared (FLIR) Cameras

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Augmented environments with RL

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Applying UNET Model on TGS Salt Identification Challenge hosted on Kaggle

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

Deep transfer learning nlp dhs2019

Contains the code and deck for the presentation on Applying Deep Transfer Learning for NLP in Analytics Vidhya's DataHack Summit 2019

Stars: ✭ 81 (+0%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

LtpExtraction

基于ltp的简单评论观点抽取模块

无监督信息抽取较多都是使用哈工大的ltp作为底层框架。那么基于ltp其实有了非常多的小伙伴进行了尝试，笔者私自将其归纳为：

事件抽取（三元组）
观点抽取

“语言云” 以哈工大社会计算与信息检索研究中心研发的 “语言技术平台（LTP）” 为基础，为用户提供高效精准的中文自然语言处理云服务。 pyltp 是 LTP 的 Python 封装，提供了分词，词性标注，命名实体识别，依存句法分析，语义角色标注的功能。

技术文档：http://pyltp.readthedocs.io/zh_CN/latest/api.html#id15
介绍文档：https://www.ltp-cloud.com/intro/#introduction
介绍文档：http://ltp.readthedocs.io/zh_CN/latest/appendix.html#id5

需要先载入他们训练好的模型，下载地址

初始化pyltp的时候一定要留意内存问题，初始化任何子模块（Postagger() /NamedEntityRecognizer()等等）都是需要占用内存，如果不及时释放会爆内存。之前比较好的尝试是由该小伙伴已经做的小项目：liuhuanyong/EventTriplesExtraction，是做三元组抽取的一个实验，该同学另外一个liuhuanyong/CausalityEventExtraction因果事件抽取的项目也很不错，辛苦写了一大堆规则，之后会对因果推理进行简单描述。

笔者也自己写了一个抽取模块，不过只是简单评论观点抽取模块。留心的小伙伴可以基于此继续做很多拓展：搭配用语挖掘，同义词挖掘，新词挖掘笔者的博客连接：ltp︱基于ltp的无监督信息抽取模块（事件抽取/评论观点抽取）

1 信息抽取 - 搭配抽取

1.1 逻辑整理

整个逻辑主要根据依存句法分析，笔者主要利用了以下的关系类型：

那么笔者理解 + 整理后得到四类抽取类型：

搭配用语查找（SVB,ATT,ADV）
并列词查找（COO）
核心观点抽取（HED+主谓宾逻辑）
实体名词搭配（词性n ）

其中笔者还加入了停词,可以对结果进行一些筛选。

1.2 code粗解读

这边细节会在github上公开，提一下code主要分的内容：ltp启动模块 / 依存句法解读 / 结果筛选。

ltp模块，一定要注意释放模型，不要反复 Postagger() / Segmentor() / NamedEntityRecognizer() /SementicRoleLabeller()，会持续Load进内存，然后boom...
依存句法模块，笔者主要是整理结果，将其整理为一个dataframe，便于后续结构化理解与抽取内容，可见：
结果筛选模块，根据上述的几个关系进行拼接。

案例句：艇仔粥料很足，香葱自己添加，很贴心。

表的解读，其中：

word列，就是这句话主要分词结果
relation列/pos列，代表该词的词性与关系
match_word列/match_word_n列，根据关系匹配到的词条
tuples_words列，就是两者贴一起

同时若觉得需要去掉一些无效词搭配，也可以额外添加无效词进来，还是比较弹性的。

1.3 结果展示

句子一:

句子二：

句子三：

2 LTP的语义角色标注(Semantic Role Labeling,SRL)

更新于20181113

该模块是利用LTP中的SRL模块进行分析

print(SRLparsing(labeller,words,postags,ToAfter = ['TMP','A1','DIS']))

----- 语义角色 -----

([['ADV', ('最后', '打')], ['ADV', (['平均', '下来'], '便宜')], ['ADV', ('才', '便宜')], ['A0', ('40', '便宜')]], (True, ['40', '便宜', []]))

与句法模块相似，利用一些组合规则来进行信息抽取,主要以A0为主，A0 - 动作的施事,相当于动作的主体

此时可以理解为核心主语，然后去找主语的修饰，TMP(时间),A1(动作的影响),DIS(标记语),PRP(目的)。

具体可见SRLparsing.py

当然，实际使用的时候,发现会经常报错：

RuntimeError: CPU memory allocation failed

因为用LTP跑这个耗时 + 耗内存，顶多只是试玩一下，不太利用用于大批量操作。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 81

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗