All Projects → sakuranew → Bert Attributeextraction

sakuranew / Bert Attributeextraction

USING BERT FOR Attribute Extraction in KnowledgeGraph. fine-tuning and feature extraction. 使用基于bert的微调和特征提取方法来进行知识图谱百度百科人物词条属性抽取。

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bert Attributeextraction

Intra Bag And Inter Bag Attentions
Code for NAACL 2019 paper: Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions
Stars: ✭ 98 (-56.25%)
Mutual labels:  deeplearning, relation-extraction
Best ai paper 2020
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code
Stars: ✭ 2,140 (+855.36%)
Mutual labels:  ai, deeplearning
Djl Demo
Demo applications showcasing DJL
Stars: ✭ 126 (-43.75%)
Mutual labels:  ai, deeplearning
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+380.36%)
Mutual labels:  ai, deeplearning
Clearml
ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management
Stars: ✭ 2,868 (+1180.36%)
Mutual labels:  ai, deeplearning
Micromlp
A micro neural network multilayer perceptron for MicroPython (used on ESP32 and Pycom modules)
Stars: ✭ 92 (-58.93%)
Mutual labels:  ai, deeplearning
All4nlp
All For NLP, especially Chinese.
Stars: ✭ 141 (-37.05%)
Mutual labels:  ai, deeplearning
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+277.23%)
Mutual labels:  knowledge-graph, relation-extraction
Clearml Server
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, ML-Ops and Data-Management
Stars: ✭ 186 (-16.96%)
Mutual labels:  ai, deeplearning
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-26.34%)
Mutual labels:  ai, deeplearning
Gbrain
GPU Javascript Library for Machine Learning
Stars: ✭ 48 (-78.57%)
Mutual labels:  ai, deeplearning
Learnopencv
Learn OpenCV : C++ and Python Examples
Stars: ✭ 15,385 (+6768.3%)
Mutual labels:  ai, deeplearning
Bbw
Semantic annotator: Matching CSV to a Wikibase instance (e.g., Wikidata) via Meta-lookup
Stars: ✭ 42 (-81.25%)
Mutual labels:  knowledge-graph, relation-extraction
Blurr
Data transformations for the ML era
Stars: ✭ 96 (-57.14%)
Mutual labels:  ai, feature-extraction
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+281.25%)
Mutual labels:  ai, deeplearning
Xlearning
AI on Hadoop
Stars: ✭ 1,709 (+662.95%)
Mutual labels:  ai, deeplearning
Ffdl
Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning Platform offering TensorFlow, Caffe, PyTorch etc. as a Service on Kubernetes
Stars: ✭ 640 (+185.71%)
Mutual labels:  ai, deeplearning
Basic reinforcement learning
An introductory series to Reinforcement Learning (RL) with comprehensive step-by-step tutorials.
Stars: ✭ 826 (+268.75%)
Mutual labels:  ai, deeplearning
Airsim
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research
Stars: ✭ 12,528 (+5492.86%)
Mutual labels:  ai, deeplearning
Halite Ii
Season 2 of @twosigma's artificial intelligence programming challenge
Stars: ✭ 201 (-10.27%)
Mutual labels:  ai, deeplearning

BERT-Attribute-Extraction

基于bert的知识图谱属性抽取

USING BERT FOR Attribute Extraction in KnowledgeGraph with two method,fine-tuning and feature extraction.

知识图谱百度百科人物词条属性抽取,使用基于bert的微调fine-tuning和特征提取feature-extraction方法进行实验。

Prerequisites

Tensorflow >=1.10
scikit-learn

Pre-trained models

BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

Installing

None

Dataset

The dataset is constructed according to Baidu Encyclopedia character entries. Filter out corpus that does not contain entities and attributes.

Entities and attributes are obtained from name entity recognition.

Labels are obtained from the Baidu Encyclopedia infobox, and most of them are labeled manually,so some are not very good.
For example:

黄维#1904年#1#黄维(1904年-1989年),字悟我,出生于江西贵溪一农户家庭。        
陈昂#山东省滕州市#1#邀请担任诗词嘉宾。1992年1月26日,陈昂出生于山东省滕州市一个普通的知识分子家庭,其祖父、父亲都
陈伟庆#肇庆市鼎湖区#0#长。任免信息2016年10月21日下午,肇庆市鼎湖区八届人大一次会议胜利闭幕。陈伟庆当选区人民政府副区长。

Getting Started

  • run strip.py can get striped data
  • run data_process.py can process data to get numpy file input
  • parameters file is the parameters that run model need

Running the tests

For example with birthplace dataset:

  • fine-tuning

    • run run_classifier.py to get predicted probability outputs
    python run_classifier.py \
            --task_name=my \
            --do_train=true \
            --do_predict=true \
            --data_dir=a \
            --vocab_file=/home/tiny/zhaomeng/bertmodel/vocab.txt \
            --bert_config_file=/home/tiny/zhaomeng/bertmodel/bert_config.json \
            --init_checkpoint=/home/tiny/zhaomeng/bertmodel/bert_model.ckpt \
            --max_seq_length=80 \
            --train_batch_size=32 \
            --learning_rate=2e-5 \
            --num_train_epochs=1.0 \
            --output_dir=./output
    
    • then run proba2metrics.py to get final result with wrong classification
  • feature-extraction

    • run extract_features.py to get the vector representation of train and test data in json file format
    python extract_features.py \
            --input_file=../data/birth_place_train.txt \
            --output_file=../data/birth_place_train.jsonl \
            --vocab_file=/home/tiny/zhaomeng/bertmodel/vocab.txt \
            --bert_config_file=/home/tiny/zhaomeng/bertmodel/bert_config.json \
            --init_checkpoint=/home/tiny/zhaomeng/bertmodel/bert_model.ckpt \
            --layers=-1 \
            --max_seq_length=80 \
            --batch_size=16
    
    • then run json2vector.py to transfer json file to vector representation
    • finally run run_classifier.py to use machine learning methods to do classification,MLP usually peforms best

Result

The predicted results and misclassified corpus are saved in result dir.

  • For example with birthplace dataset using fine-tuning method,the result is:

                precision    recall  f1-score   support
    
         0      0.963     0.967     0.965       573
         1      0.951     0.946     0.948       389
    

Authors

  • zhao meng

License

This project is licensed under the MIT License

Acknowledgments

  • etc
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].