All Projects → dachosen1 → Feature-Engineering-for-Fraud-Detection

dachosen1 / Feature-Engineering-for-Fraud-Detection

Licence: other
Implementation of feature engineering from Feature engineering strategies for credit card fraud

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Feature-Engineering-for-Fraud-Detection

benfordslaw
benfordslaw is about the frequency distribution of leading digits.
Stars: ✭ 29 (-6.45%)
Mutual labels:  fraud-detection, anomaly-detection
Remixautoml
R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.
Stars: ✭ 159 (+412.9%)
Mutual labels:  feature-engineering, anomaly-detection
MemStream
MemStream: Memory-Based Streaming Anomaly Detection
Stars: ✭ 58 (+87.1%)
Mutual labels:  fraud-detection, anomaly-detection
A-Detector
⭐ An anomaly-based intrusion detection system.
Stars: ✭ 69 (+122.58%)
Mutual labels:  anomaly-detection, isolation-forest
deepAD
Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks - A lab we prepared for the KDD'19 Workshop on Anomaly Detection in Finance that will walk you through the detection of interpretable accounting anomalies using adversarial autoencoder neural networks. The majority of the lab content is based on J…
Stars: ✭ 65 (+109.68%)
Mutual labels:  fraud-detection, anomaly-detection
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+406.45%)
Mutual labels:  kmeans, feature-engineering
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+16296.77%)
Mutual labels:  fraud-detection, anomaly-detection
gouda
Golang Utilities for Data Analysis
Stars: ✭ 18 (-41.94%)
Mutual labels:  kmeans, dbscan
DGFraud-TF2
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
Stars: ✭ 84 (+170.97%)
Mutual labels:  fraud-detection, anomaly-detection
text clustering
文本聚类(Kmeans、DBSCAN、LDA、Single-pass)
Stars: ✭ 230 (+641.94%)
Mutual labels:  kmeans, dbscan
msda
Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (+158.06%)
Mutual labels:  feature-engineering, anomaly-detection
kmeans-dbscan-tutorial
A clustering tutorial with scikit-learn for beginners.
Stars: ✭ 20 (-35.48%)
Mutual labels:  kmeans, dbscan
Rough-Sketch-Simplification-Using-FCNN
This is a PyTorch implementation of the the Paper by Simo-Sera et.al. on Cleaning Rough Sketches using Fully Convolutional Neural Networks.
Stars: ✭ 31 (+0%)
Mutual labels:  research-paper
Quora-Paraphrase-Question-Identification
Paraphrase question identification using Feature Fusion Network (FFN).
Stars: ✭ 19 (-38.71%)
Mutual labels:  feature-engineering
KMeans elbow
Code for determining optimal number of clusters for K-means algorithm using the 'elbow criterion'
Stars: ✭ 35 (+12.9%)
Mutual labels:  kmeans
data-science-popular-algorithms
Data Science algorithms and topics that you must know. (Newly Designed) Recommender Systems, Decision Trees, K-Means, LDA, RFM-Segmentation, XGBoost in Python, R, and Scala.
Stars: ✭ 65 (+109.68%)
Mutual labels:  kmeans
favorite-research-papers
Listing my favorite research papers 📝 from different fields as I read them.
Stars: ✭ 12 (-61.29%)
Mutual labels:  research-paper
sioyek
Sioyek is a PDF viewer designed for reading research papers and technical books.
Stars: ✭ 3,890 (+12448.39%)
Mutual labels:  research-paper
NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Stars: ✭ 797 (+2470.97%)
Mutual labels:  feature-engineering
A-Hierarchical-Transformation-Discriminating-Generative-Model-for-Few-Shot-Anomaly-Detection
Official pytorch implementation of the paper: "A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection"
Stars: ✭ 42 (+35.48%)
Mutual labels:  anomaly-detection

Summary

Anomaly detection helps in the early detection of critical outliers in a system. Based on the context, these outliers can be detrimental and result in loss of resources, and time through errors, fraud, manipulation of stocks, and other such malicious activities. Outliers can also be beneficial for example in investing, and arbitrage. Business decisions that leverage anomaly detection, which used to require intense human resource and capacity can now be completed in a short time through versatile models and automation. In this project I implemented the findings of the Feature Engineering for Credit Card Fraud research paper to create both supervised and unsupervised models for fraud detection.

Feature Engineering for Credit Card Fraud Paper Summary

In the feature engineering for credit card fraud paper, the author examines a new approach for developing features for machine learning algorithms. They address the cost-sensitivity, and the features are preprocessing to achieve improved fraud detection and savings. Typical models only use raw transactional features, such as time, amount, place of the transaction. However, these approaches do not take into account the spending behavior of the customer, which is expected to help discover fraud patterns.

The Feature engineering strategies for credit card fraud detection was an essential framework in creating features to analyze credit card transaction data.

  1. A more compressive way for feature creation is to derive some features using a transaction aggregation strategy.

  2. The derivation of the aggregation features consists in grouping the transactions made during the last given number of hours, first by card or account number, then by transaction type, merchant group, country or other, followed by calculating the number of transactions or the total amount spent on those transactions.

  3. When aggregating customer transactions, there is an essential question on how much to accumulate, in the sense that the marginal value of new information may diminish as time passes. Indeed, when time passes, information loses their value, in the sense that customer spending patterns are not expected to remain constant over the years. In particular, Whitrow et al. define a fixed time frame to be 24, 60, or 168.

When using the aggregated features, there is still some information that is not completely captured by those features. The issue when dealing with the time of the transaction, specifically, when analyzing a feature such as the mean of transaction time, is that it is easy to make the mistake of using the arithmetic mean, it does not take into account the periodic behavior of the time feature. For the experiment, they used three cost-sensitive classification algorithm: Decision Tree, Logistics Regression, Random Forest, Bayes minimum risk model, a cost-sensitive decision tree algorithm, and measured the results.

The paper shown the importance of using features that analyze the consumer behavior of individual cardholders when constructing a credit card fraud detection model. We show that by preprocessing the data to include their cent consumer behavior,the performance increases by more than 200% compared to using only the raw transaction information.

Modeling Approaches:

In this repository, I experimented with different anomomly detection methods, both supervised and unsupervised.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].