All Projects → kevalmorabia97 → CoVA-Web-Object-Detection

kevalmorabia97 / CoVA-Web-Object-Detection

Licence: Apache-2.0 License
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to CoVA-Web-Object-Detection

how attentive are gats
Code for the paper "How Attentive are Graph Attention Networks?" (ICLR'2022)
Stars: ✭ 200 (+1011.11%)
Mutual labels:  attention, graph-attention-networks
mtad-gat-pytorch
PyTorch implementation of MTAD-GAT (Multivariate Time-Series Anomaly Detection via Graph Attention Networks) by Zhao et. al (2020, https://arxiv.org/abs/2009.02040).
Stars: ✭ 85 (+372.22%)
Mutual labels:  attention, graph-attention-networks
Awesome Graph Classification
A collection of important graph embedding, classification and representation learning papers with implementations.
Stars: ✭ 4,309 (+23838.89%)
Mutual labels:  graph-convolutional-networks, graph-attention-networks
SelfGNN
A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling" paper, which appeared in The International Workshop on Self-Supervised Learning for the Web (SSL'21) @ the Web Conference 2021 (WWW'21).
Stars: ✭ 24 (+33.33%)
Mutual labels:  graph-convolutional-networks, graph-attention-networks
evildork
Evildork targeting your fiancee👁️
Stars: ✭ 46 (+155.56%)
Mutual labels:  information-extraction
STRA-Net
STRAL-Net
Stars: ✭ 21 (+16.67%)
Mutual labels:  visual-attention
kGCN
A graph-based deep learning framework for life science
Stars: ✭ 91 (+405.56%)
Mutual labels:  graph-convolutional-networks
Image-Captioning
Image Captioning with Keras
Stars: ✭ 60 (+233.33%)
Mutual labels:  attention
PLE
Label Noise Reduction in Entity Typing (KDD'16)
Stars: ✭ 53 (+194.44%)
Mutual labels:  information-extraction
PyTorch-GNNs
The implement of GNN based on Pytorch
Stars: ✭ 121 (+572.22%)
Mutual labels:  graph-convolutional-networks
multimodal-vae-public
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
Stars: ✭ 98 (+444.44%)
Mutual labels:  multimodal-learning
QA4IE
Original implementation of QA4IE
Stars: ✭ 24 (+33.33%)
Mutual labels:  information-extraction
graph-nvp
GraphNVP: An Invertible Flow Model for Generating Molecular Graphs
Stars: ✭ 69 (+283.33%)
Mutual labels:  graph-convolutional-networks
ntua-slp-semeval2018
Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.
Stars: ✭ 79 (+338.89%)
Mutual labels:  attention
TAGCN
Tensorflow Implementation of the paper "Topology Adaptive Graph Convolutional Networks" (Du et al., 2017)
Stars: ✭ 17 (-5.56%)
Mutual labels:  graph-convolutional-networks
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (+138.89%)
Mutual labels:  attention
Literatures-on-GNN-Acceleration
A reading list for deep graph learning acceleration.
Stars: ✭ 50 (+177.78%)
Mutual labels:  graph-convolutional-networks
Base-On-Relation-Method-Extract-News-DA-RNN-Model-For-Stock-Prediction--Pytorch
基於關聯式新聞提取方法之雙階段注意力機制模型用於股票預測
Stars: ✭ 33 (+83.33%)
Mutual labels:  attention
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (+5.56%)
Mutual labels:  attention
RNNSearch
An implementation of attention-based neural machine translation using Pytorch
Stars: ✭ 43 (+138.89%)
Mutual labels:  attention

CoVA: Context-aware Visual Attention for Webpage Information Extraction

Abstract

Webpage information extraction (WIE) is an important step to create knowledge bases. For this, classical WIE methods leverage the Document Object Model (DOM) tree of a website. However, use of the DOM tree poses significant challenges as context and appearance are encoded in an abstract manner. To address this challenge we propose to reformulate WIE as a context-aware Webpage Object Detection task. Specifically, we develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree. To study the approach we collect a new large-scale dataset of e-commerce websites for which we manually annotate every web element with four labels: product price, product title, product image and background. On this dataset we show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.

CoVA Dataset

We labeled 7,740 webpages spanning 408 domains (Amazon, Walmart, Target, etc.). Each of these webpages contains exactly one labeled price, title, and image. All other web elements are labeled as background. On average, there are 90 web elements in a webpage.

Webpage screenshots and bounding boxes can be obtained here

Train-Val-Test split

We create a cross-domain split which ensures that each of the train, val and test sets contains webpages from different domains. Specifically, we construct a 3 : 1 : 1 split based on the number of distinct domains. We observed that the top-5 domains (based on number of samples) were Amazon, EBay, Walmart, Etsy, and Target. So, we created 5 different splits for 5-Fold Cross Validation such that each of the major domains is present in one of the 5 splits for test data. These splits can be accessed here

CoVA End-to-end Training Pipeline

Our Context-Aware Visual Attention-based end-to-end pipeline for Webpage Object Detection (CoVA) aims to learn function f to predict labels y = [y1, y2, ..., yN] for a webpage containing N elements. The input to CoVA consists of:

  1. a screenshot of a webpage,
  2. list of bounding boxes [x, y, w, h] of the web elements, and
  3. neighborhood information for each element obtained from the DOM tree.

This information is processed in four stages:

  1. the graph representation extraction for the webpage,
  2. the Representation Network (RN),
  3. the Graph Attention Network (GAT), and
  4. a fully connected (FC) layer.

The graph representation extraction computes for every web element i its set of K neighboring web elements Ni. The RN consists of a Convolutional Neural Net (CNN) and a positional encoder aimed to learn a visual representation vi for each web element i ∈ {1, ..., N}. The GAT combines the visual representation vi of the web element i to be classified and those of its neighbors, i.e., vk ∀k ∈ Ni to compute the contextual representation ci for web element i. Finally, the visual and contextual representations of the web element are concatenated and passed through the FC layer to obtain the classification output.

Pipeline

Experimental Results

Table of Comparison Cross Domain Accuracy (mean ± standard deviation) for 5-fold cross validation.

NOTE: Cross Domain means we train the model on some web domains and test it on completely different domains to evaluate the generalizability of the models to unseen web templates.

Attention Visualizations!

Attention Visualizations Attention Visualizations where red border denotes web element to be classified, and its contexts have green shade whose intensity denotes score. Price in (a) get much more score than other contexts. Title and image in (b) are scored higher than other contexts for price.

Cite

If you find this useful in your research, please cite our ArXiv pre-print:

@misc{kumar2021cova,
      title={CoVA: Context-aware Visual Attention for Webpage Information Extraction}, 
      author={Anurendra Kumar and Keval Morabia and Jingjin Wang and Kevin Chen-Chuan Chang and Alexander Schwing},
      year={2021},
      eprint={2110.12320},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].