All Projects → yanx27 → Scorecard-Modeling

yanx27 / Scorecard-Modeling

Licence: other
Use Machine learning to build scorecard model

Programming Languages

python
139335 projects - #7 most used programming language
r
7636 projects

Projects that are alternatives of or similar to Scorecard-Modeling

emfcloud
Components and frameworks for building web-based modeling tools
Stars: ✭ 17 (-34.62%)
Mutual labels:  modeling
Arteries
A procedural modeling toolkit base on UE4 blueprint
Stars: ✭ 92 (+253.85%)
Mutual labels:  modeling
Covid-19-analysis
Analysis with Covid-19 data
Stars: ✭ 49 (+88.46%)
Mutual labels:  modeling
geometry sketcher
Constraint-based geometry sketcher for blender
Stars: ✭ 1,119 (+4203.85%)
Mutual labels:  modeling
legend-engine
Legend Engine module
Stars: ✭ 33 (+26.92%)
Mutual labels:  modeling
COMOKIT-Model
A GAMA (http://gama-platform.org) model on the assessment and comparisons of intervention policies against the CoVid19 pandemics
Stars: ✭ 23 (-11.54%)
Mutual labels:  modeling
polyReorder
Maya plugin with tools to reorder the vertices on a mesh to match the order of another.
Stars: ✭ 21 (-19.23%)
Mutual labels:  modeling
Fusion360WrapSketch
Wrap sketch curves around a cylinder
Stars: ✭ 33 (+26.92%)
Mutual labels:  modeling
TournamentBrackets
Android project written in Java to display Tournaments brackets with animation
Stars: ✭ 39 (+50%)
Mutual labels:  scorecard
PVSystems
A Modelica library for photovoltaic system and power converter design
Stars: ✭ 20 (-23.08%)
Mutual labels:  modeling
Stormwater-Management-Model
Open Water Analytics Stormwater Management Model repository
Stars: ✭ 71 (+173.08%)
Mutual labels:  modeling
AdTree
Accurate, Detailed, and Automatic Modelling of Laser-Scanned Trees
Stars: ✭ 88 (+238.46%)
Mutual labels:  modeling
legend-sdlc
Legend SDLC module
Stars: ✭ 24 (-7.69%)
Mutual labels:  modeling
NCAA Hoops
All the files used in my NCAA Men's Basketball modeling, predictions, bracketology, and Ivy League simulations.
Stars: ✭ 39 (+50%)
Mutual labels:  modeling
Mote3D toolbox
Toolbox for particulate microstructure modelling
Stars: ✭ 36 (+38.46%)
Mutual labels:  modeling
hms-3d-modeling-demo
HUAWEI 3D Modeling Kit project contains a sample app. Guided by this demo, you will be able to implement full 3D Modeling Kit capabilities, including 3D object reconstruction and material generation.
Stars: ✭ 45 (+73.08%)
Mutual labels:  modeling
react-visual-modeling
A DAG React Component for visualization modeling, suitable for UML, database modeling, data warehouse construction.(一个基于React的数据可视化建模的DAG图,适用于UML,数据库建模,数据仓库建设等业务)
Stars: ✭ 96 (+269.23%)
Mutual labels:  modeling
Vehicle-Dynamics-Lateral
OpenVD: Vehicle Dynamics - Lateral
Stars: ✭ 50 (+92.31%)
Mutual labels:  modeling
libROM
Model reduction library with an emphasis on large scale parallelism and linear subspace methods
Stars: ✭ 66 (+153.85%)
Mutual labels:  modeling
data-science-notes
Open-source project hosted at https://makeuseofdata.com to crowdsource a robust collection of notes related to data science (math, visualization, modeling, etc)
Stars: ✭ 52 (+100%)
Mutual labels:  modeling

评分卡模型建模流程

数据导入和建立

  • 读入数据:导入数据集 application.csv
  • 选择合适的建模样本
  • 数据集划分成训练集和测试集

第一步:数据预处理

  • 数据清洗:时间、类型特征等
  • 格式转换
  • 缺失值填补

第二步:变量衍生

  • 考虑申请额度与收入的占比
  • 考虑earliest_cr_line到申请日期的跨度,以月份记

第三步:分箱

  • 采用ChiMerge,要求分箱完之后:
    (1)不超过5箱
    (2)Bad Rate单调
    (3)每箱同时包含好坏样本
    (4)特殊值如-1,单独成一箱

  • 连续型变量可直接分箱

  • 类别型变量:
    (a)当取值较多时,先用bad rate编码,再用连续型分箱的方式进行分箱
    (b)当取值较少时:

    (b1)如果每种类别同时包含好坏样本,无需分箱
    (b2)如果有类别只包含好坏样本的一种,需要合并

第四步:WOE编码、计算IV

  • WOE的公式:

  • WOE 的值越高,代表着该分组中客户是坏客户的风险越低。
  • IV值是用来衡量某个变量对好坏客户区分能力的一个指标,IV值公式如下:
  • 关于更详细的WOE和IV可见:数据挖掘模型中的IV和WOE详解

第五步:单变量分析和多变量分析,均基于WOE编码后的值

  • 选择IV高于0.02的变量
  • 比较两两线性相关性,如果相关系数的绝对值高于阈值,剔除IV较低的一个
  • 亦可使用机器学习的特征选择方法(RF、Xgboost)

第六步:逻辑回归模型(或其他机器学习算法)

  • 要求:
    (1)变量显著
    (2)系数为负
  • 逻辑回归的原理可见逻辑回归 - 理论篇
  • 每次迭代中,剔除最不显著的变量,直到
    (1) 剩余所有变量均显著
    (2) 没有特征可选
  • 亦可尝试L1或L2约束

第七步:评估

  • 利用ks和AUC等评估指标(亦可使用混淆矩阵)

  • KS值越大,表示模型能够将正、负客户区分开的程度越大。

  • 通常来讲,KS>0.2即表示模型有较好的预测准确性。

  • KS绘制方式与ROC曲线略有相同,都要计算TPR和FPR。但是TPR和FPR都要做纵轴,横轴为把样本分成多少份。

  • 步骤:
    (1)按照分类模型返回的概率降序排列
    (2)把0-1之间等分N份,等分点为阈值,计算TPR、FPR
    (3)对TPR、FPR描点画图即可

  • KS值即为Max(TPR-FPR)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].