All Projects → smaddikonda → Bankruptcy-Prediction

smaddikonda / Bankruptcy-Prediction

Licence: MIT license
Mining the Polish Bankruptcy Data

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Bankruptcy-Prediction

candis
🎀 A data mining suite for gene expression data.
Stars: ✭ 28 (+33.33%)
Mutual labels:  data-mining
Network-Embedding-Resources
Network Embedding Survey and Resources
Stars: ✭ 43 (+104.76%)
Mutual labels:  data-mining
hpipe
Workflow engine for various computing systems.
Stars: ✭ 26 (+23.81%)
Mutual labels:  data-mining
csmath-2021
This mathematics course is taught for the first year Ph.D. students of computer science and related areas @zju
Stars: ✭ 30 (+42.86%)
Mutual labels:  data-mining
machine learning in python
Demo of basic machine learning models in python with Jupter Notebook
Stars: ✭ 16 (-23.81%)
Mutual labels:  data-mining
kmeans
A simple implementation of K-means (and Bisecting K-means) clustering algorithm in Python
Stars: ✭ 18 (-14.29%)
Mutual labels:  data-mining
XCloud
Official Code for Paper <XCloud: Design and Implementation of AI Cloud Platform with RESTful API Service> (arXiv1912.10344)
Stars: ✭ 58 (+176.19%)
Mutual labels:  data-mining
CS259D Notes HW cn
本笔记是对课程CS 259D中涉及的论文和讲义的扩展,建议阅读原始论文和讲义。
Stars: ✭ 63 (+200%)
Mutual labels:  data-mining
BLUELAY
Searches online paste sites for certain search terms which can indicate a possible data breach.
Stars: ✭ 24 (+14.29%)
Mutual labels:  data-mining
dayder
Search lots of data sets for spurious correlations
Stars: ✭ 44 (+109.52%)
Mutual labels:  data-mining
sql-cookbook
Common SQL recipes and best practises
Stars: ✭ 68 (+223.81%)
Mutual labels:  data-mining
DataCon
🏆DataCon大数据安全分析大赛,2019年方向二(恶意代码检测)冠军源码、2020年方向五(恶意代码分析)季军源码
Stars: ✭ 69 (+228.57%)
Mutual labels:  data-mining
4chanMarkovText
Text Generation using Markov Chains fed by 4chan APIs
Stars: ✭ 28 (+33.33%)
Mutual labels:  data-mining
emperor-os
(new released v2.5 LTS.2022-06-25) It has focused on developing an All in One operating system for programming, designing and data science.Emperor-OS has over 500 apps and important tools
Stars: ✭ 32 (+52.38%)
Mutual labels:  data-mining
taller SparkR
Taller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-42.86%)
Mutual labels:  data-mining
chainRec
Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in Proc. of 2018 ACM Conference on Recommender Systems (RecSys'18), Vancouver, Canada, Oct. 2018.
Stars: ✭ 52 (+147.62%)
Mutual labels:  data-mining
pathpy
pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models
Stars: ✭ 124 (+490.48%)
Mutual labels:  data-mining
Tencent2017 Final Rank28 code
2017第一届腾讯社交广告高校算法大赛Rank28_code
Stars: ✭ 85 (+304.76%)
Mutual labels:  data-mining
Instagram-Comments-Scraper
Instagram comment scraper using python and selenium. Save the comments into excel.
Stars: ✭ 73 (+247.62%)
Mutual labels:  data-mining
Awesome-DataScience-Cheatsheets
Collection of cheatsheets for data science, machine learning and deep learning :).
Stars: ✭ 48 (+128.57%)
Mutual labels:  data-mining

Bankruptcy-Prediction

Mining the Polish Bankruptcy Data

Tags: Data Mining, Machine Learning, Data Visualization.

Co-created by Sree Keerthi Matta

Links:

Project presentation: slideshow
Dataset: Polish Bankruptcy Dataset

Summary:

Bankruptcy prediction is the task of predicting bankruptcy and various measures of financial distress of firms, and is important due to the relevance for creditors and investors in evaluating the likelihood that a firm may go bankrupt.

The aim of predicting financial distress is to develop a predictive model that combines various econometric parameters which allow foreseeing the financial condition of a firm. In this project we document our observations as we explore, build, and compare, some of the widely used classification models:

  1. Gaussian Naïve Bayes
  2. Logistic Regression
  3. Decision Trees
  4. Random Forests
  5. Extreme Gradient Boosting
  6. Balanced Bagging

We have chosen the Polish companies’ bankruptcy data set where synthetic features were used to reflect higher-order statistics.

We begin by carrying out data preprocessing and exploratory analysis where we impute the missing data values using some of the popular data imputation techniques like Mean, k-Nearest Neighbors, Expectation-Maximization and Multivariate Imputation by Chained Equations (MICE).

To address the data imbalance issue, we apply Synthetic Minority Oversampling Technique (SMOTE) to oversample the minority class labels.

Later, we model the data using K-Fold Cross Validation on the said models, and the imputed and resampled datasets.

Finally, we analyze and evaluate the performance of the models on the validation datasets using several metrics such as accuracy, precision, recall, etc., and rank the models accordingly.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].