All Projects → sharmaroshan → Fraud-Detection-in-Online-Transactions

sharmaroshan / Fraud-Detection-in-Online-Transactions

Licence: GPL-3.0 license
Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Fraud-Detection-in-Online-Transactions

Morpheus Core
The foundational library of the Morpheus data science framework
Stars: ✭ 203 (+395.12%)
Mutual labels:  finance, data-analytics, data-analysis
Octosql
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Stars: ✭ 2,579 (+6190.24%)
Mutual labels:  query, data-analysis
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (+317.07%)
Mutual labels:  data-analytics, data-analysis
trading sim
📈📆 Backtest trading strategies concurrently using historical chart data from various financial exchanges.
Stars: ✭ 21 (-48.78%)
Mutual labels:  finance, data-analysis
Countly Sdk Cordova
Countly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (+68.29%)
Mutual labels:  data-analytics, data-analysis
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+103885.37%)
Mutual labels:  data-analytics, data-analysis
akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 5,155 (+12473.17%)
Mutual labels:  finance, data-analysis
Google-Data-Analytics-Professional-Certificate
Quizzes & Assignment Solutions for Google Data Analytics Professional Certificate on Coursera. Also included a few resources on side that I found helpful.
Stars: ✭ 19 (-53.66%)
Mutual labels:  data-analytics, data-analysis
Akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (+10470.73%)
Mutual labels:  finance, data-analysis
Data Science Portfolio
A Portfolio of my Data Science Projects
Stars: ✭ 149 (+263.41%)
Mutual labels:  finance, data-analysis
Pandas Datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (+5224.39%)
Mutual labels:  finance, data-analysis
Riceteacatpanda
repo with challenge material for riceteacatpanda (2020)
Stars: ✭ 18 (-56.1%)
Mutual labels:  data-analytics, data-analysis
Data Science With Ruby
Practical Data Science with Ruby based tools.
Stars: ✭ 549 (+1239.02%)
Mutual labels:  data-analytics, data-analysis
Countly Sdk Web
Countly Product Analytics SDK for websites and web applications
Stars: ✭ 165 (+302.44%)
Mutual labels:  data-analytics, data-analysis
Dataanalysisinaction
(已完结)《极客时间数据分析实战45讲-详细笔记》包含markdown、图片、思维导图、代码 、数据。 可直接阅读代码、测试!
Stars: ✭ 482 (+1075.61%)
Mutual labels:  data-analytics, data-analysis
stock-market-scraper
Scraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)
Stars: ✭ 110 (+168.29%)
Mutual labels:  finance, query
r4dswebsite
Public repository for the R4DS community website.
Stars: ✭ 19 (-53.66%)
Mutual labels:  data-analytics, data-analysis
GreyNSights
Privacy-Preserving Data Analysis using Pandas
Stars: ✭ 18 (-56.1%)
Mutual labels:  data-analytics, data-analysis
neural-finance
Neural Network for HFT-trading [experimental]
Stars: ✭ 67 (+63.41%)
Mutual labels:  finance, data-analysis
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-24.39%)
Mutual labels:  data-analytics, data-analysis

Fraud-Detection-in-Online-Transactions

Detecting Frauds in Online Transactions using Anamoly Detection Techniques Such as Over Sampling and Under-Sampling as the ratio of Frauds is less than 0.00005 thus, simply applying Classification Algorithm may result in Overfitting

Description

Context

There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets.

We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods.

Content

PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.

This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.

columns

This is a sample of 1 row with headers explanation:

1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0

step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).

type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

amount - amount of the transaction in local currency.

nameOrig - customer who started the transaction

oldbalanceOrg - initial balance before the transaction

newbalanceOrig - new balance after the transaction

nameDest - customer who is the recipient of the transaction

oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.

Past Research

There are 5 similar files that contain the run of 5 different scenarios. These files are better explained at my PhD thesis chapter 7 (PhD Thesis Available here http://urn.kb.se/resolve?urn=urn:nbn:se:bth-12932).

We ran PaySim several times using random seeds for 744 steps, representing each hour of one month of real time, which matches the original logs. Each run took around 45 minutes on an i7 intel processor with 16GB of RAM. The final result of a run contains approximately 24 million of financial records divided into the 5 types of categories: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

Acknowledgements

This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

Please refer to this dataset using the following citations:

PaySim first paper of the simulator:

E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].