All Projects → tim5go → Zhopenie

tim5go / Zhopenie

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Zhopenie

Segmentit
任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment
Stars: ✭ 139 (+41.84%)
Mutual labels:  chinese, chinese-nlp
chinese-nlp-ner
一套针对中文实体识别的BLSTM-CRF解决方案
Stars: ✭ 14 (-85.71%)
Mutual labels:  chinese, chinese-nlp
Pytorch multi head selection re
BERT + reproduce "Joint entity recognition and relation extraction as a multi-head selection problem" for Chinese and English IE
Stars: ✭ 105 (+7.14%)
Mutual labels:  chinese, relation-extraction
Information Extraction Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
Stars: ✭ 1,888 (+1826.53%)
Mutual labels:  relation-extraction, chinese-nlp
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+6691.84%)
Mutual labels:  chinese, chinese-nlp
Zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (+326.53%)
Mutual labels:  chinese, chinese-nlp
Nlp4han
中文自然语言处理工具集【断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查】
Stars: ✭ 206 (+110.2%)
Mutual labels:  chinese, chinese-nlp
Cnn Question Classification Keras
Chinese Question Classifier (Keras Implementation) on BQuLD
Stars: ✭ 28 (-71.43%)
Mutual labels:  chinese, chinese-nlp
Deepke
基于深度学习的开源中文关系抽取框架
Stars: ✭ 525 (+435.71%)
Mutual labels:  chinese, relation-extraction
Chinesenre
中文实体关系抽取,pytorch,bilstm+attention
Stars: ✭ 463 (+372.45%)
Mutual labels:  chinese, relation-extraction
Lightnlp
基于Pytorch和torchtext的自然语言处理深度学习框架。
Stars: ✭ 739 (+654.08%)
Mutual labels:  chinese, relation-extraction
Chinese Xinhua
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Stars: ✭ 8,705 (+8782.65%)
Mutual labels:  chinese, chinese-nlp
Cws
Source code for an ACL2016 paper of Chinese word segmentation
Stars: ✭ 81 (-17.35%)
Mutual labels:  chinese
Chinese Copywriting Guidelines
Chinese copywriting guidelines for better written communication/中文文案排版指北
Stars: ✭ 10,648 (+10765.31%)
Mutual labels:  chinese
Py3aiml chinese
官方py3AIML基于英文,现为其增加中文支持,并将代码注释翻译为中文。实测可正常解析带中文pattern和template的aiml文件。
Stars: ✭ 80 (-18.37%)
Mutual labels:  chinese
Awesome Telegram Cn
telegram 开发资源、机器人资源整理
Stars: ✭ 78 (-20.41%)
Mutual labels:  chinese
Limes
Link Discovery Framework for Metric Spaces.
Stars: ✭ 94 (-4.08%)
Mutual labels:  semantic-web
Uer Py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Stars: ✭ 1,295 (+1221.43%)
Mutual labels:  chinese
Chinesenlp
Datasets, SOTA results of every fields of Chinese NLP
Stars: ✭ 1,206 (+1130.61%)
Mutual labels:  chinese-nlp
Distre
[ACL 19] Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
Stars: ✭ 75 (-23.47%)
Mutual labels:  relation-extraction

Chinese Open Information Extraction (Zhopenie)

Installation

This module makes heavily use of pyltp

  1. Install pyltp
    pip install pyltp
    
  2. Download NLP model from 百度雲

Why use LTP?

LTP has an excellent semantic parsing module shown below: Alt text

Also, in general, LTP performs better than other open-source Chinese NLP libraries,like Jieba ,here's the comparison on word tokenization for SIGHAN Bakeoff 2005 PKU, 510KB dataset: Alt text

Usage

The extractor module tries to break down a Chinese sentence into a Triple relation (e1, e2, r), which can be understood by computer
e.g. 星展集团是亚洲最大的金融服务集团之一, 拥有约3千5百亿美元资产和超过280间分行, 业务遍及18个市场。
are parsed as follows:

e1:星展集团, e2:亚洲最大的金融服务集团之一, r:是
e1:星展集团, e2:约3千5百亿美元资产, r:拥有
e1:业务, e2:18个市场, r:遍及

However, this extractor is about ~70% accurate and is still under improvement at this moment. Feel free to comment and make pull request.

Credits

哈工大社会计算与信息检索研究中心研制的语言技术平台 LTP

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].