All Projects → YingtongDou → CARE-GNN

YingtongDou / CARE-GNN

Licence: Apache-2.0 license
Code for CIKM 2020 paper Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to CARE-GNN

DGFraud-TF2
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
Stars: ✭ 84 (-30.58%)
Mutual labels:  fraud-prevention, datamining, fraud-detection, graphneuralnetwork
IDVerification
"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.
Stars: ✭ 44 (-63.64%)
Mutual labels:  fraud-prevention, fraud-detection
suspicidy
Suspicidy aims to detect suspicious web requests
Stars: ✭ 13 (-89.26%)
Mutual labels:  fraud-prevention, fraud-detection
SentryPeer
A distributed peer to peer list of bad actor IP addresses and phone numbers collected via a SIP Honeypot.
Stars: ✭ 108 (-10.74%)
Mutual labels:  fraud-prevention, fraud-detection
Misp
MISP (core software) - Open Source Threat Intelligence and Sharing Platform
Stars: ✭ 3,485 (+2780.17%)
Mutual labels:  fraud-prevention, fraud-detection
predict-fraud-using-auto-ai
Use AutoAI to detect fraud
Stars: ✭ 27 (-77.69%)
Mutual labels:  fraud-prevention, fraud-detection
deepAD
Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks - A lab we prepared for the KDD'19 Workshop on Anomaly Detection in Finance that will walk you through the detection of interpretable accounting anomalies using adversarial autoencoder neural networks. The majority of the lab content is based on J…
Stars: ✭ 65 (-46.28%)
Mutual labels:  fraud-prevention, fraud-detection
xgboost-smote-detect-fraud
Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!
Stars: ✭ 59 (-51.24%)
Mutual labels:  fraud-prevention, fraud-detection
Machinelearning
Machine learning resources
Stars: ✭ 3,042 (+2414.05%)
Mutual labels:  datamining
Deep-Learning-for-BCI
Resources for Book: Deep Learning for EEG-based Brain-Computer Interface: Representations, Algorithms and Applications
Stars: ✭ 63 (-47.93%)
Mutual labels:  datamining
FortniteTracker
🔎 A tracker for the various Fortnite Files
Stars: ✭ 32 (-73.55%)
Mutual labels:  datamining
tabula
A Go library for working with rows, columns, or matrix (deprecated, see https://github.com/shuLhan/share/tree/master/lib/tabula).
Stars: ✭ 11 (-90.91%)
Mutual labels:  datamining
TabFormer
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
Stars: ✭ 209 (+72.73%)
Mutual labels:  fraud-detection
RioGNN
Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks
Stars: ✭ 46 (-61.98%)
Mutual labels:  fraud-detection
GraphMix
Code for reproducing results in GraphMix paper
Stars: ✭ 64 (-47.11%)
Mutual labels:  graphneuralnetwork
Openrefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+6950.41%)
Mutual labels:  datamining
Discord-Datamining
Datamining Discord changes from the JS files
Stars: ✭ 1,441 (+1090.91%)
Mutual labels:  datamining
disposable-email-domain-list
A list of disposable email domains, cleaned and validated by scanning MX records.
Stars: ✭ 68 (-43.8%)
Mutual labels:  fraud-prevention
keystroke-dynamics
Demo to show keystroke dynamics / keystroke biometrics
Stars: ✭ 25 (-79.34%)
Mutual labels:  fraud-detection
bitcloutDAO
Decentralized Social Network Money Frauds/Scams including BitClout / DeSo, Twetch, Steemit, PeakD
Stars: ✭ 29 (-76.03%)
Mutual labels:  fraud-prevention

CARE-GNN

A PyTorch implementation for the CIKM 2020 paper below:
Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, Philip S. Yu.
[Paper][Toolbox][DGL Example][Benchmark]

Bug Fixes and Update (06/2021)

Similarity score

The feature and label similarity scores presented in Table 2 of the paper are incorrect. The updated equations for calculating two similarity scores are shown below:



The code for calculating the similarity scores is in simi_comp.py.

The updated similarity scores for the two datasets are shown below. Note that we only compute the similarity scores for positive nodes to demonstrate the camouflage of fraudsters (positive nodes).

YelpChi rur rtr rsr homo
Avg. Feature Similarity 0.991 0.988 0.988 0.988
Avg. Label Similarity 0.909 0.176 0.186 0.184
Amazon upu usu uvu homo
Avg. Feature Similarity 0.711 0.687 0.697 0.687
Avg. Label Similarity 0.167 0.056 0.053 0.072

Relation weight in Figure 3

According to this issue, the weighted aggregation of CARE-Weight (a variant of CARE-GNN) has an error. After fixing it, the relation weight will not converge to the same value. Thus, the relation weight subfigure in Figure 3 and its associated conclusion are wrong.

Extended version CARE-GNN

Please check out RioGNN, a GNN model extended based on CARE-GNN with more reinforcement learning modules integrated. We are actively developing an efficient multi-layer version of CARE-GNN. Stay tuned.

Overview



CAmouflage-REsistant Graph Neural Network (CARE-GNN) is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.

Three enhancement modules are:

  • A label-aware similarity measure which measures the similarity scores between a center node and its neighboring nodes;
  • A similarity-aware neighbor selector which leverages top-p sampling and reinforcement learning to select the optimal amount of neighbors under each relation;
  • A relation-aware neighbor aggregator which directly aggregates information from different relations using the optimal neighbor selection thresholds as weights.

CARE-GNN has following advantages:

  • Adaptability. CARE-GNN adaptively selects best neighbors for aggregation given arbitrary multi-relation graph;
  • High-efficiency. CARE-GNN has a high computational efficiency without attention and deep reinforcement learning;
  • Flexibility. Many other neural modules and external knowledge can be plugged into the CARE-GNN;

We have integrated more than eight GNN-based fraud detectors as a TensorFlow toolbox.

Setup

You can download the project and install the required packages using the following commands:

git clone https://github.com/YingtongDou/CARE-GNN.git
cd CARE-GNN
pip3 install -r requirements.txt

To run the code, you need to have at least Python 3.6 or later versions.

Running

  1. In CARE-GNN directory, run unzip /data/Amazon.zip and unzip /data/YelpChi.zip to unzip the datasets;
  2. Run python data_process.py to generate adjacency lists used by CARE-GNN;
  3. Run python train.py to run CARE-GNN with default settings.

For other dataset and parameter settings, please refer to the arg parser in train.py. Our model supports both CPU and GPU mode.

Running on your datasets

To run CARE-GNN on your datasets, you need to prepare the following data:

  • Multiple-single relation graphs with the same nodes where each graph is stored in scipy.sparse matrix format, you can use sparse_to_adjlist() in utils.py to transfer the sparse matrix into adjacency lists used by CARE-GNN;
  • A numpy array with node labels. Currently, CARE-GNN only supports binary classification;
  • A node feature matrix stored in scipy.sparse matrix format.

Repo Structure

The repository is organized as follows:

  • data/: dataset files;
  • data_process.py: transfer sparse matrix to adjacency lists;
  • graphsage.py: model code for vanilla GraphSAGE model;
  • layers.py: CARE-GNN layers implementations;
  • model.py: CARE-GNN model implementations;
  • train.py: training and testing all models;
  • utils.py: utility functions for data i/o and model evaluation.

Citation

If you use our code, please cite the paper below:

@inproceedings{dou2020enhancing,
  title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},
  author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},
  booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].