All Projects → gkiril → Oie Resources

gkiril / Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Projects that are alternatives of or similar to Oie Resources

Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (-57.24%)
Mutual labels:  dataset, natural-language-processing, natural-language-understanding, nlu, information-extraction
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+7666.08%)
Mutual labels:  ai, data-science, natural-language-processing
Articutapi
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (-10.95%)
Mutual labels:  natural-language-processing, natural-language-understanding, nlu
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+1314.49%)
Mutual labels:  ai, data-science, dataset
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-39.58%)
Mutual labels:  data-science, dataset, datascience
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (-11.66%)
Mutual labels:  data-science, datascience, natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+348.76%)
Mutual labels:  ai, data-science, natural-language-processing
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+123.67%)
Mutual labels:  data-science, natural-language-processing, natural-language-understanding
Nlpaug
Data augmentation for NLP
Stars: ✭ 2,761 (+875.62%)
Mutual labels:  ai, data-science, natural-language-processing
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-45.94%)
Mutual labels:  ai, data-science, big-data
Tacred Relation
PyTorch implementation of the position-aware attention model for relation extraction
Stars: ✭ 271 (-4.24%)
Mutual labels:  natural-language-processing, relation-extraction, information-extraction
Awesome Pytorch List
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Stars: ✭ 12,475 (+4308.13%)
Mutual labels:  data-science, papers, natural-language-processing
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-72.08%)
Mutual labels:  data-science, dataset, big-data
Metaflow
🚀 Build and manage real-life data science projects with ease!
Stars: ✭ 5,108 (+1704.95%)
Mutual labels:  ai, data-science, datascience
Datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Stars: ✭ 814 (+187.63%)
Mutual labels:  data-science, dataset, datascience
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+201.77%)
Mutual labels:  ai, data-science, big-data
Delbot
It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.
Stars: ✭ 191 (-32.51%)
Mutual labels:  ai, data-science, natural-language-processing
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (+60.42%)
Mutual labels:  data-science, big-data, natural-language-processing
Bert As Service
Mapping a variable-length sentence to a fixed-length vector using BERT model
Stars: ✭ 9,779 (+3355.48%)
Mutual labels:  ai, natural-language-processing, natural-language-understanding
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-41.7%)
Mutual labels:  ai, data-science, natural-language-processing

Open Information Extraction (OIE) Resources

A curated list of Open Information Extraction (OIE) resources: research papers, code, data, applications, etc. The list is not limited to Open Information Extraction systems exclusively. It also includes work highly related to OIE, such as taxonomizing open relations and using OIE in downstream applications.

Table of content

Introduction to OIE

Open Information Extraction (OIE) systems aim to extract unseen relations and their arguments from unstructured text in unsupervised manner. In its simplest form, given a natural language sentence, they extract information in the form of a triple, consisted of subject (S), relation (R) and object (O).

Suppose we have the following input sentence:

AMD, which is based in U.S., is a technology company.

An OIE system aims to make the following extractions:

("AMD"; "is based in"; "U.S.")
("AMD"; "is"; "technology company")

Papers sorted in chronological order

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

Papers grouped by category

Surveys

Evaluation

OIE for downstream applications

OIE's output has been shown to be a useful input for many downstream tasks. In this section, several downstream tasks that benefited from OIE output are listed.

Question Answering

Slot Filling

Event Schema Induction

Text Summarization

Knowledge Base Population

Relating Entities

OIE in Different Languages

Most of the OIE systems are focused on extractions made from text written on English. However, some OIE systems either are focused on a language other than English, or are multilingual. In this section, OIE systems on languages other than English or multilingual OIE systems are listed.

Multilingual OIE Systems

OIE Systems for German Language

OIE Systems for Portugese Language

OIE Systems for Spanish Language

OIE Systems for Chinese Language

OIE Systems for Persian Language

OIE Systems for Italian Language

OIE Systems for Indonesian Language

Supervised OIE

Canonicalization of OIE

Slides

Talks

Code

  • MinIE: Open Information Extraction System
    • MinIE: originally written in Java
    • Python wrapper for MinIE
    • MinScIE - an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations (based on MinIE).
    • SalIE - Salient Open Information Extraction (based on MinIE)
  • ClausIE: Clause-based OIE
  • OpenIE at IIT Delhi:
  • OpenIE at UW:
  • Stanford's OpenIE:
  • Graphene: OpenIE system containing coreference resolution, simplification and open relation extraction pipeline
  • EXEMPLAR
  • DefIE: Open information extraction from textual definitions
  • ReMine: Integrating Local and Global Cohesiveness for Open Information Extraction
  • OIE systems for languages other than English or cross-lingual systems:
    • Zhopenie - Chinese OIE: OIE system for Chinese language written in Python.
    • Open Relation Extraction for Chinese: Knowledge triples extraction (entities and relations extraction) and knowledge base construction based on dependency syntax for open domain text (for Chinese)
    • Baaz: Open information extraction from Persian web (Python)
    • MT/IE: Cross-lingual Open IE. Attention-based sequence-to-sequence model for cross-lingual open IE. Written in Python
    • Relation Extraction on German Websites: This repository holds a collection of three Open Information Extraction approaches for the German language
    • DptOIE: A Portuguese Open Information Extraction system based on Dependency Analysis
    • PragmaticOIE: a rule-based approach to extract facts in Portuguese in a first pragmatic level
  • CORE: Context-Aware Open Relation Extraction with Factorization Machines
  • CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
  • IMPLIE: IMPLIE (IMPLicit relation Information Extraction) is a program that extracts binary relations from English sentences where the relationship between the two entities is not explicitly stated in the text.
  • Ranking: Iterative Rank-Aware Open IE (confidence score).

Data

OIE output is used as a useful input in many other downstream tasks, such as question answering, event schema induction or generating inference rules. Moreover, OIE output can be used as a "fuel" to derive further resources. Here, the data is organized into two major categories: 1) OIE corpora; 2) Resources derived from OIE output.

OIE corpora

  • OPIEC: An Open Information Extraction Corpus: the largest OIE corpus to date, containing more than 341M triples extracted from the entire English Wikipedia. Each triple from the corpus is composed of rich meta-data: each token from the subj / obj / rel along with NLP annotations (POS tag, NER tag, ...), provenance sentence along with the dependency parse, original (golden) left from Wikipedia, sentence order, space / time, etc.
  • [.gz] ReVerb extractions: 15 million high-precision OIE extractions (826MB compressed) from the OIE system ReVerb. The extractions were made from the ClueWeb09 corpus. The data contains (subject, relation, object) triples, accompanied by a confidence score (estimating the likelihood of whether the triple was correctly extracted) and provenance information (the link of the web-page where the triple was extracted from).
  • ReVerb extractions (linked): 3 million triples with linked argument (a subset of the 15 M high-precision ReVerb extractions). The links (to Freebase) are provided by an entity linker. The data fields are: argument 1, relation phrase, argument 2, freebase ID for argument 1 link, corresponding freebase entity name, link score, link ambiguity score
  • PATTY: PATTY is a system that takes open relations between two arguments, structures them into relational synsets and then organizes the synsets into a taxonomy. This resource contains over 15M triples with disambiguated arguments (links to WikiPedia articles) and relation synset ID between them. Additionaly, the resource contains: 1) relation pattern synsets with type signatures; 2) relation pattern subsumptions; 3) relation paraphrases; 4) evaluation data;
  • WiseNet (1.0 and 2.0): similarly as PATTY, WiseNet 1.0/2.0 is a source containing of OIE triples, where the arguments are disambiguated and the open relations are organized into relation synsets and then taxonomized. One of the main differences between PATTY and WiseNet is that WiseNet contains "golden links" for the arguments (annotated by humans) by keeping the original links from the WikiPedia articles.
  • KB-Unify: KB-Unify takes as an input several OIE corpora and unifies them into a single disambiguated OIE repository. The open relations are organized into relational synsets and the arguments are disambiguated with BabelFy.

Resources derived from OIE output

PhD theses

Demos

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].