All Projects → uma-pi1 → minie

uma-pi1 / minie

Licence: GPL-3.0 license
An open information extraction system that provides compact extractions

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to minie

NLP-Natural-Language-Processing
Projects and useful articles / links
Stars: ✭ 149 (+79.52%)
Mutual labels:  paper, nlp-resources, nlp-library, natural-language-understanding
TextFeatureSelection
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Stars: ✭ 42 (-49.4%)
Mutual labels:  nlp-resources, nlp-library, natural-language-understanding
simple NER
simple rule based named entity recognition
Stars: ✭ 29 (-65.06%)
Mutual labels:  extract-information, information-extraction, nlp-library
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+67059.04%)
Mutual labels:  nlp-library, natural-language-understanding
slotminer
Tool for slot extraction from text
Stars: ✭ 15 (-81.93%)
Mutual labels:  information-extraction, natural-language-understanding
classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (-26.51%)
Mutual labels:  nlp-library, natural-language-understanding
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+240.96%)
Mutual labels:  information-extraction, natural-language-understanding
nlp-notebooks
A collection of natural language processing notebooks.
Stars: ✭ 19 (-77.11%)
Mutual labels:  nlp-resources, natural-language-understanding
lima
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: ✭ 75 (-9.64%)
Mutual labels:  information-extraction, nlp-library
ppdb
Interface for reading the Paraphrase Database (PPDB)
Stars: ✭ 22 (-73.49%)
Mutual labels:  nlp-resources, nlp-library
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+45.78%)
Mutual labels:  information-extraction, natural-language-understanding
OpenPrompt
An Open-Source Framework for Prompt-Learning.
Stars: ✭ 1,769 (+2031.33%)
Mutual labels:  nlp-library, natural-language-understanding
neji
Flexible and powerful platform for biomedical information extraction from text
Stars: ✭ 37 (-55.42%)
Mutual labels:  information-extraction
ScriptBlockPlus
任意のブロックにスクリプトを追加するプラグインです。
Stars: ✭ 25 (-69.88%)
Mutual labels:  paper
PaperDownload
知网/万方 论文/期刊批量检索下载
Stars: ✭ 18 (-78.31%)
Mutual labels:  paper
trinity-ie
Information extraction pipeline containing coreference resolution, named entity linking, and relationship extraction
Stars: ✭ 59 (-28.92%)
Mutual labels:  information-extraction
LearningMetersPoems
Official repo of the article: Yousef, W. A., Ibrahime, O. M., Madbouly, T. M., & Mahmoud, M. A. (2019), "Learning meters of arabic and english poems with recurrent neural networks: a step forward for language understanding and synthesis", arXiv preprint arXiv:1905.05700
Stars: ✭ 18 (-78.31%)
Mutual labels:  natural-language-understanding
smart-email-support
Smart Email Support for Telecom Organisations - Provide automated customer support for emails
Stars: ✭ 19 (-77.11%)
Mutual labels:  natural-language-understanding
FSL-Mate
FSL-Mate: A collection of resources for few-shot learning (FSL).
Stars: ✭ 1,346 (+1521.69%)
Mutual labels:  paper
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-67.47%)
Mutual labels:  natural-language-understanding

MinIE: Open Information Extraction system


Open Information Extraction - brief introduction

Open Information Extraction (OIE) systems aim to extract unseen relations and their arguments from unstructured text in unsupervised manner. In its simplest form, given a natural language sentence, they extract information in the form of a triple, consisted of subject (S), relation (R) and object (O).

Suppose we have the following input sentence:

AMD, which is based in U.S., is a technology company.

An OIE system aims to make the following extractions:

("AMD"; "is based in"; "U.S.")
("AMD"; "is"; "technology company")

MinIE - Open Information Extraction system

An Open Information Extraction system, providing useful extractions:

  • represents contextual information with semantic annotations
  • identifies and removes words that are considered overly specific
  • high precision/recall
  • shorter, semantically enriched extractions

Version

This is the latest version of MinIE, which may give you different (improved!) results than the original EMNLP-2017 version. The EMNLP-2017 version can be found here.

Demo

In general, the code for running MinIE in all of its modes is almost the same. The only exception is MinIE-D, which requires additional input (list of multi-word dictionaries). You can still use MinIE-D without providing multi-word dictionaries, but then MinIE-D assumes that you provided an empty dictionary, thus minimizing all the words which are candidates for dropping.

The following code demo is for MinIE-S (note that you can use the same for the rest of the modes, you just need to change MinIE.Mode accordingly):

import de.uni_mannheim.minie.MinIE;
import de.uni_mannheim.minie.annotation.AnnotatedProposition;
import de.uni_mannheim.utils.coreNLP.CoreNLPUtils;

import edu.stanford.nlp.pipeline.StanfordCoreNLP;

public class Demo {
    public static void main(String args[]) {
        // Dependency parsing pipeline initialization
        StanfordCoreNLP parser = CoreNLPUtils.StanfordDepNNParser();
        
        // Input sentence
        String sentence = "The Joker believes that the hero Batman was not actually born in 
                           foggy Gotham City.";
        
        // Generate the extractions (With SAFE mode)
        MinIE minie = new MinIE(sentence, parser, MinIE.Mode.SAFE);
        
        // Print the extractions
        System.out.println("\nInput sentence: " + sentence);
        System.out.println("=============================");
        System.out.println("Extractions:");
        for (AnnotatedProposition ap: minie.getPropositions()) {
            System.out.println("\tTriple: " + ap.getTripleAsString());
            System.out.print("\tFactuality: " + ap.getFactualityAsString());
            if (ap.getAttribution().getAttributionPhrase() != null) 
                System.out.print("\tAttribution: " + ap.getAttribution().toStringCompact());
            else
                System.out.print("\tAttribution: NONE");
            System.out.println("\n\t----------");
        }
        
        System.out.println("\n\nDONE!");
    }
}

If you want to use MinIE-D, then the only difference would be the way MinIE is called:

import de.uni_mannheim.utils.Dictionary;
. . .

// Initialize dictionaries
String [] filenames = new String [] {"/minie-resources/wiki-freq-args-mw.txt", 
                                     "/minie-resources/wiki-freq-rels-mw.txt"};
Dictionary collocationsDict = new Dictionary(filenames);

// Use MinIE
MinIE minie = new MinIE(sentence, parser, MinIE.Mode.DICTIONARY, collocationsDict);

In resources/minie-resources/ you can find multi-word dictionaries constructed from WordNet (wn-mwe.txt) and from wiktionary (wiktionary-mw-titles.txt). This will give you some sort of functionality for MinIE-D. The multi-word dictionaries constructed with MinIE-S (as explained in the paper) are not here because of their size. If you want to use them, please refer to the download link in the section "Resources".

MinIE Service

Code for exposing MinIE as a service (developed by Pasquale Minervini).

Start with:

$ mvn clean compile exec:java
[..]

[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ minie-service ---
MinIE Service
Mar 06, 2018 8:43:13 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [localhost:8080]
Mar 06, 2018 8:43:13 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
Application started.
Stop the application using CTRL+C

Use the service with:

$ curl 'http://localhost:8080/minie/query' -X POST -d 'Obama visited the white house.' | jq .
{
  "facts": [
    {
      "subject": "Obama",
      "predicate": "visited",
      "object": "white house"
    }
  ]
}

Python wrapper

You can find a python wrapper for MinIE here. If you want to use MinIE with python, please follow the guidelines provided on the repo's README.

Resources

  • Documentation: for more thorough documentation for the code, please visit MinIE's project page.
  • Paper: "MinIE: Minimizing Facts in Open Information Extraction" - appeared on EMNLP 2017 [pdf]
  • Dictionary: Wikipedia: frequent relations and arguments [zip]
  • Experiments datasets: datasets from the paper

MinIE in other downstream applications

  • Fact Salience: MinIE is used for the task of "Fact Salience". Details can be found in the paper "Facts that Matter" by Marco Ponza, Luciano Del Corro, Gerhard Weikum, published on EMNLP 2018. As a result, the fact salience system SalIE was published.
  • Large-Scale OIE: MinIE was used to create the largest OIE corpus to date - OPIEC. The corpus contains more than 341M triples. Details can be found in the paper "OPIEC: An Open Information Extraction Corpus" by Kiril Gashteovski, Sebastian Wanner, Sven Hertling, Samuel Broscheit, Rainer Gemulla, published on AKBC 2019.
  • OIE from Scientific Publications: An extension of MinIE was created which provides structured knowledge enriched with semantic information about citations - MinScIE: Citation-centered Open Information Extraction. Details can be found in the paper "MinScIE: Citation-centered Open Information Extraction" by Anne Lauscher, Yide Song and Kiril Gashteovski, published on JCDL 2019.
  • Entity Aspect Linking: MinIE was used for creating EAL: toolkit and dataset for entity-aspect linking. Details can be found in the paper "EAL: A Toolkit and Dataset for Entity-Aspect Linking" by Federico Nanni, Jingyi Zhang, Ferdinand Betz, Kiril Gashteovski, published on JCDL 2019.

Citing

If you use MinIE in your work, please cite our paper:

@inproceedings{gashteovski2017minie,
  title={MinIE: Minimizing Facts in Open Information Extraction},
  author={Gashteovski, Kiril and Gemulla, Rainer and Del Corro, Luciano},
  booktitle={Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
  pages={2630--2640},
  year={2017}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].