All Projects → davidsbatista → Snowball

davidsbatista / Snowball

Licence: gpl-3.0
Implementation with some extensions of the paper "Snowball: Extracting Relations from Large Plain-Text Collections" (Agichtein and Gravano, 2000)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Snowball

Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (-35.11%)
Mutual labels:  tf-idf
Deepergnn
Official PyTorch implementation of "Towards Deeper Graph Neural Networks" [KDD2020]
Stars: ✭ 106 (-19.08%)
Mutual labels:  semi-supervised-learning
Mixmatch Pytorch
Pytorch Implementation of the paper MixMatch: A Holistic Approach to Semi-Supervised Learning (https://arxiv.org/pdf/1905.02249.pdf)
Stars: ✭ 120 (-8.4%)
Mutual labels:  semi-supervised-learning
Bible text gcn
Pytorch implementation of "Graph Convolutional Networks for Text Classification"
Stars: ✭ 90 (-31.3%)
Mutual labels:  semi-supervised-learning
Clustype
Automatic Entity Recognition and Typing for Domain-Specific Corpora (KDD'15)
Stars: ✭ 99 (-24.43%)
Mutual labels:  information-extraction
Pytorch multi head selection re
BERT + reproduce "Joint entity recognition and relation extraction as a multi-head selection problem" for Chinese and English IE
Stars: ✭ 105 (-19.85%)
Mutual labels:  information-extraction
Hypergcn
NeurIPS 2019: HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs
Stars: ✭ 80 (-38.93%)
Mutual labels:  semi-supervised-learning
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (-7.63%)
Mutual labels:  information-extraction
Wandora
Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java.
Stars: ✭ 105 (-19.85%)
Mutual labels:  information-extraction
Adversarial text
Code for Adversarial Training Methods for Semi-Supervised Text Classification
Stars: ✭ 109 (-16.79%)
Mutual labels:  semi-supervised-learning
Geotext
Geotext extracts country and city mentions from text
Stars: ✭ 91 (-30.53%)
Mutual labels:  information-extraction
Tre
[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations
Stars: ✭ 95 (-27.48%)
Mutual labels:  information-extraction
Ict
Code for reproducing ICT ( published in IJCAI 2019)
Stars: ✭ 107 (-18.32%)
Mutual labels:  semi-supervised-learning
Textclustering
Stars: ✭ 89 (-32.06%)
Mutual labels:  tf-idf
Daguan 2019 rank9
datagrand 2019 information extraction competition rank9
Stars: ✭ 121 (-7.63%)
Mutual labels:  information-extraction
Dig Etl Engine
Download DIG to run on your laptop or server.
Stars: ✭ 81 (-38.17%)
Mutual labels:  information-extraction
Self Supervised Speech Recognition
speech to text with self-supervised learning based on wav2vec 2.0 framework
Stars: ✭ 106 (-19.08%)
Mutual labels:  semi-supervised-learning
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-5.34%)
Mutual labels:  information-extraction
Cleanlab
The standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.
Stars: ✭ 2,526 (+1828.24%)
Mutual labels:  semi-supervised-learning
Vtext
Simple NLP in Rust with Python bindings
Stars: ✭ 108 (-17.56%)
Mutual labels:  tf-idf

Snowball: Extracting Relations from Large Plain-Text Collections

This is my own implementation of the the Snowball system to bootstrap relationship instances. You can find more details about the original system here:

For more details about this particular implementation please refer to:

A sample file containing sentences where the named-entities are already tagged can be downloaded, which has 1 million sentences taken from the New York Times articles part of the English Gigaword Collection.

NOTE: look at the desription of BREDS to understand how to give a tagged document collection and seeds to setup the bootstrapping of relationship instances with Snowball, both systems have a similar setup.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].