All Projects → malllabiisc → Cesi

malllabiisc / Cesi

Licence: apache-2.0
WWW 2018: CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Cesi

CaRE
EMNLP 2019: CaRe: Open Knowledge Graph Embeddings
Stars: ✭ 34 (-60%)
Mutual labels:  embeddings, knowledge-graph
Graph Parser
GraphParser is a semantic parser which can convert natural language sentences to logical forms and graphs.
Stars: ✭ 110 (+29.41%)
Mutual labels:  knowledge-graph, dataset
Pytorch Nlp
Basic Utilities for PyTorch Natural Language Processing (NLP)
Stars: ✭ 1,996 (+2248.24%)
Mutual labels:  dataset, embeddings
Onepiece Kg
a knowledge graph project for ONEPIECE /《海贼王》知识图谱
Stars: ✭ 123 (+44.71%)
Mutual labels:  knowledge-graph, dataset
Kprn
Reasoning Over Knowledge Graph Paths for Recommendation
Stars: ✭ 220 (+158.82%)
Mutual labels:  knowledge-graph, embeddings
Entity2rec
entity2rec generates item recommendation using property-specific knowledge graph embeddings
Stars: ✭ 159 (+87.06%)
Mutual labels:  knowledge-graph, embeddings
Datasets knowledge embedding
Datasets for Knowledge Graph Completion with textual information about the entities
Stars: ✭ 116 (+36.47%)
Mutual labels:  knowledge-graph, dataset
cskg
CSKG: The CommonSense Knowledge Graph
Stars: ✭ 86 (+1.18%)
Mutual labels:  embeddings, knowledge-graph
Awesome chinese medical nlp
中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc
Stars: ✭ 623 (+632.94%)
Mutual labels:  knowledge-graph, dataset
Waymo Open Dataset
Waymo Open Dataset
Stars: ✭ 1,222 (+1337.65%)
Mutual labels:  dataset
Kgpolicy
Reinforced Negative Sampling over Knowledge Graph for Recommendation, WWW2020
Stars: ✭ 83 (-2.35%)
Mutual labels:  knowledge-graph
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-7.06%)
Mutual labels:  dataset
Pointclouddatasets
3D point cloud datasets in HDF5 format, containing uniformly sampled 2048 points per shape.
Stars: ✭ 80 (-5.88%)
Mutual labels:  dataset
Evtx Attack Samples
Windows Events Attack Samples
Stars: ✭ 1,243 (+1362.35%)
Mutual labels:  dataset
Symbolic Musical Datasets
🎹 symbolic musical datasets
Stars: ✭ 79 (-7.06%)
Mutual labels:  dataset
Keypointnet
KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations (CVPR2020)
Stars: ✭ 84 (-1.18%)
Mutual labels:  dataset
Urbannavdataset
UrbanNav: an Open-Sourcing Localization Data Collected in Asian Urban Canyons, Including Tokyo and Hong Kong
Stars: ✭ 79 (-7.06%)
Mutual labels:  dataset
Stock Rnn
Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.
Stars: ✭ 1,213 (+1327.06%)
Mutual labels:  embeddings
Dataset List
lists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-1.18%)
Mutual labels:  dataset
Ccpd
[ECCV 2018] CCPD: a diverse and well-annotated dataset for license plate detection and recognition
Stars: ✭ 1,252 (+1372.94%)
Mutual labels:  dataset

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Conference Paper Slides Poster

Source code and dataset for The WebConf 2018 (WWW 2018) paper: CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information.

Overview of CESI. CESI first acquires side information of noun and relation phrases of Open KB triples. In the second step, it learns embeddings of these NPs and relation phrases while utilizing the side information obtained in previous step. In the third step, CESI performs clustering over the learned embeddings to canonicalize NP and relation phrases. Please refer paper for more details

Dependencies

  • Compatible with both Python 2.7/3.x
  • Dependencies can be installed using requirements.txt

Datasets

  • Datasets ReVerb45k, Base and Ambiguous are included with the repository.
  • The input to CESI is a KG as list of triples. Each triple is stored as a json in a new line. An example entry is shown below:
{
	"_id": 	  36952,
	"triple": [
		"Frederick",
		"had reached",
		"Alessandria"
	],
	"triple_norm": [
		"frederick",
		"have reach",
		"alessandria"
	],
  	"true_link": {
		"subject": "/m/09w_9",
		"object":  "/m/02bb_4"
	},
  	"src_sentences": [
		"Frederick had reached Alessandria",
		"By late October, Frederick had reached Alessandria."
	],
	"entity_linking": {
		"subject":  "Frederick,_Maryland",
		"object":   "Alessandria"
	},
	"kbp_info": []
}        
  • _id unique id of each triple in the Knowledge Graph.
  • triple denotes the actual triple in the Knowledge Graph
  • triple_norm denotes the normalized form of the triple (after lemmatization, lower casing ...)
  • true_link is the gold canonicalization of subject and object. For relations gold linking is not available.
  • src_sentences is the list of sentences from which the triple was extracted by Open IE algorithms.
  • entity_linking is the Entity Linking side information which is utilized by CESI.
  • kbp_info Knowledge-Base Propagation side information used by CESI.

Usage:

Setup Environment:
  • After installing python dependencies, execute sh setup.sh for setting up required things.
  • Pattern library is required to run the code. Please install it from Python 2.x/Python 3.x.
Start PPDB server:
  • Running PPDB server is essential for running the main code.
  • To start the server execute: python ppdb/ppdb_server.py -port 9997 (Let the server run in a separate terminal)
Run the main code:
  • python src/cesi_main.py -name reverb45_test_run
  • On executing the above command, all the output will be dumped in output/reverb45_test_run directory.
  • -name is an arbitrary name assigned to the run.

Citing:

Please cite the following paper if you use this code in your work.

@inproceedings{cesi2018,
	author = {Vashishth, Shikhar and Jain, Prince and Talukdar, Partha},
	title = {{CESI}: Canonicalizing Open Knowledge Bases Using Embeddings and Side Information},
	booktitle = {Proceedings of the 2018 World Wide Web Conference},
	series = {WWW '18},
	year = {2018},
	isbn = {978-1-4503-5639-8},
	location = {Lyon, France},
	pages = {1317--1327},
	numpages = {11},
	url = {https://doi.org/10.1145/3178876.3186030},
	doi = {10.1145/3178876.3186030},
	acmid = {3186030},
	publisher = {International World Wide Web Conferences Steering Committee},
	address = {Republic and Canton of Geneva, Switzerland},
	keywords = {canonicalization, knowledge graph embeddings, knowledge graphs, open knowledge bases},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].