ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解
相关文章
ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解
ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解代码实现
基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之以toad框架全流程讲解
# 1、定义数据集
# 1.1、加载德国信用卡数据集
将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。 https://archive.ics.uci.edu/ml/datasets/Statlog+
status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomepersonal.status.and.sexother.debtors.or.guarantorspresent.residence.sincepropertyage.in.yearsother.installment.planshousingnumber.of.existing.credits.at.this.bankjobnumber.of.people.being.liable.to.provide.maintenance.fortelephoneforeign.workercreditability0... < 0 DM6critical account/ other credits existing (not at this bank)radio/television1169unknown/ no savings account... >= 7 years4male : divorced/separatednone4real estate67noneown2skilled employee / official1yes, registered under the customers nameyesgood10 <= ... < 200 DM48existing credits paid back duly till nowradio/television5951... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2real estate22noneown1skilled employee / official1noneyesbad2no checking account12critical account/ other credits existing (not at this bank)education2096... < 100 DM4 <= ... < 7 years2male : divorced/separatednone3real estate49noneown1unskilled - resident2noneyesgood3... < 0 DM42existing credits paid back duly till nowfurniture/equipment7882... < 100 DM4 <= ... < 7 years2male : divorced/separatedguarantor4building society savings agreement/ life insurance45nonefor free1skilled employee / official2noneyesgood4... < 0 DM24delay in paying off in the pastcar (new)4870... < 100 DM1 <= ... < 4 years3male : divorced/separatednone4unknown / no property53nonefor free2skilled employee / official2noneyesbad5no checking account36existing credits paid back duly till noweducation9055unknown/ no savings account1 <= ... < 4 years2male : divorced/separatednone4unknown / no property35nonefor free1unskilled - resident2yes, registered under the customers nameyesgood6no checking account24existing credits paid back duly till nowfurniture/equipment2835500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone4building society savings agreement/ life insurance53noneown1skilled employee / official1noneyesgood70 <= ... < 200 DM36existing credits paid back duly till nowcar (used)6948... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2car or other, not in attribute Savings account/bonds35nonerent1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesgood8no checking account12existing credits paid back duly till nowradio/television3059... >= 1000 DM4 <= ... < 7 years2male : divorced/separatednone4real estate61noneown1unskilled - resident1noneyesgood90 <= ... < 200 DM30critical account/ other credits existing (not at this bank)car (new)5234... < 100 DMunemployed4male : divorced/separatednone2car or other, not in attribute Savings account/bonds28noneown2management/ self-employed/ highly qualified employee/ officer1noneyesbad100 <= ... < 200 DM12existing credits paid back duly till nowcar (new)1295... < 100 DM... < 1 year3male : divorced/separatednone1car or other, not in attribute Savings account/bonds25nonerent1skilled employee / official1noneyesbad11... < 0 DM48existing credits paid back duly till nowbusiness4308... < 100 DM... < 1 year3male : divorced/separatednone4building society savings agreement/ life insurance24nonerent1skilled employee / official1noneyesbad120 <= ... < 200 DM12existing credits paid back duly till nowradio/television1567... < 100 DM1 <= ... < 4 years1male : divorced/separatednone1car or other, not in attribute Savings account/bonds22noneown1skilled employee / official1yes, registered under the customers nameyesgood13... < 0 DM24critical account/ other credits existing (not at this bank)car (new)1199... < 100 DM... >= 7 years4male : divorced/separatednone4car or other, not in attribute Savings account/bonds60noneown2unskilled - resident1noneyesbad14... < 0 DM15existing credits paid back duly till nowcar (new)1403... < 100 DM1 <= ... < 4 years2male : divorced/separatednone4car or other, not in attribute Savings account/bonds28nonerent1skilled employee / official1noneyesgood15... < 0 DM24existing credits paid back duly till nowradio/television1282100 <= ... < 500 DM1 <= ... < 4 years4male : divorced/separatednone2car or other, not in attribute Savings account/bonds32noneown1unskilled - resident1noneyesbad16no checking account24critical account/ other credits existing (not at this bank)radio/television2424unknown/ no savings account... >= 7 years4male : divorced/separatednone4building society savings agreement/ life insurance53noneown2skilled employee / official1noneyesgood17... < 0 DM30no credits taken/ all credits paid back dulybusiness8072unknown/ no savings account... < 1 year2male : divorced/separatednone3car or other, not in attribute Savings account/bonds25bankown3skilled employee / official1noneyesgood180 <= ... < 200 DM24existing credits paid back duly till nowcar (used)12579... < 100 DM... >= 7 years4male : divorced/separatednone2unknown / no property44nonefor free1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesbad19no checking account24existing credits paid back duly till nowradio/television3430500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone2car or other, not in attribute Savings account/bonds31noneown1skilled employee / official2yes, registered under the customers nameyesgood
#1.2、对各个变量进行EDA分析
数值型变量:数据类型、缺失率、唯一值、均值、标准差、分位数等,
分类型变量:数据类型、缺失率、唯一值、top1(占比第一的数据类)等。
typesizemissinguniquemean_or_top1std_or_top2min_or_top31%_or_top410%_or_top550%_or_bottom575%_or_bottom490%_or_bottom399%_or_bottom2max_or_bottom1status.of.existing.checking.accountcategory10000.00%4no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%duration.in.monthint6410000.00%3320.90312.058814454691824366072credit.historycategory10000.00%5existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%purposeobject10000.00%10radio/television:28.00%car (new):23.40%furniture/equipment:18.10%car (used):10.30%business:9.70%education:5.00%repairs:2.20%domestic appliances:1.20%others:1.20%retraining:0.90%credit.amountint6410000.00%9213271.2582822.736876250425.839322319.53972.257179.414180.3918424savings.account.and.bondscategory10000.00%5... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%present.employment.sincecategory10000.00%51 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%1 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%installment.rate.in.percentage.of.disposable.incomeint6410000.00%42.9731.11871467411134444personal.status.and.sexcategory10000.00%4male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%other.debtors.or.guarantorscategory10000.00%3none:90.70%guarantor:5.20%co-applicant:4.10%none:90.70%guarantor:5.20%co-applicant:4.10%present.residence.sinceint6410000.00%42.8451.10371789611134444propertycategory10000.00%4car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%age.in.yearsint6410000.00%5335.54611.3754685719202333425267.0175other.installment.planscategory10000.00%3none:81.40%bank:13.90%stores:4.70%none:81.40%bank:13.90%stores:4.70%housingcategory10000.00%3own:71.30%rent:17.90%for free:10.80%own:71.30%rent:17.90%for free:10.80%number.of.existing.credits.at.this.bankint6410000.00%41.4070.57765446811112234jobcategory10000.00%4skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%number.of.people.being.liable.to.provide.maintenance.forint6410000.00%21.1550.36208577211111222telephonecategory10000.00%2none:59.60%yes, registered under the customers name:40.40%none:59.60%yes, registered under the customers name:40.40%foreign.workercategory10000.00%2yes:96.30%no:3.70%yes:96.30%no:3.70%creditabilityobject10000.00%2good:70.00%bad:30.00%good:70.00%bad:30.00%
# 1.3、输出连续型变量的mean、std、min、3种分位数、max
duration.in.monthcredit.amountinstallment.rate.in.percentage.of.disposable.incomepresent.residence.sinceage.in.yearsnumber.of.existing.credits.at.this.banknumber.of.people.being.liable.to.provide.maintenance.formean20.9033271.2582.9732.84535.5461.4071.155std12.058814452822.7368761.1187146741.10371789611.375468570.5776544680.362085772min425011191125%121365.522271150%182319.533331175%243972.25444221max7218424447542
# 2、数据预处理
# 2.1、对类别型目标变量映射成数值型变量
# 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等
ivginientropyuniquecreditability12.22649152002status.of.existing.checking.account0.6660115030.3680372040.5451963414duration.in.month0.3547835740.4067550430.60965916133credit.amount0.3514549660.4086798340.610864302921credit.history0.2932335470.3940896130.5806307475age.in.years0.211196620.414339280.61086320653savings.account.and.bonds0.1960095570.404838450.5913766945purpose0.1691950660.4059902920.59360941510property0.1126382620.4100377880.5990910684present.employment.since0.0864336310.4122853250.6017824645housing0.0832934340.4123560670.6020244673other.installment.plans0.0576145420.4146075410.6047125723foreign.worker0.0438774120.4171704410.6068281122other.debtors.or.guarantors0.0320193220.4172089460.6075392613installment.rate.in.percentage.of.disposable.income0.026322090.4176997470.608111034number.of.existing.credits.at.this.bank0.0132665240.4188780970.6094930274personal.status.and.sex0.0088399190.4192381710.6099442874job0.0087627660.4192082340.6099373174telephone0.0063776050.4194414910.6101963442present.residence.since0.0035887730.4196852950.6104882694number.of.people.being.liable.to.provide.maintenance.for4.34E-050.4199961820.610859752
# 2.3、筛选特征:分别基于IV、empty、corr指标
drop_cols:
{'empty': array([], dtype=float64), 'iv': array(['personal.status.and.sex', 'present.residence.since',
'number.of.existing.credits.at.this.bank', 'job',
'number.of.people.being.liable.to.provide.maintenance.for',
'telephone'], dtype=object), 'corr': array([], dtype=object)}
# 2.4、分箱处理
对数值型变量和分类型变量进行分箱,分箱方法支持卡方chi、决策树、百分位、等频、等距分箱
data_df_s2bins_dict:
{'status.of.existing.checking.account': [['no checking account'], ['... >= 200 DM / salary assignments for at least 1 year'], ['0 <= ... < 200 DM'], ['... < 0 DM']], 'duration.in.month': [9, 12, 13, 16, 36, 45], 'credit.history': [['critical account/ other credits existing (not at this bank)'], ['delay in paying off in the past', 'existing credits paid back duly till now'], ['all credits at this bank paid back duly', 'no credits taken/ all credits paid back duly']], 'purpose': [['retraining', 'car (used)'], ['radio/television'], ['furniture/equipment'], ['domestic appliances', 'business', 'repairs'], ['car (new)'], ['others', 'education']], 'credit.amount': [3556], 'savings.account.and.bonds': [['... >= 1000 DM', '500 <= ... < 1000 DM', 'unknown/ no savings account'], ['100 <= ... < 500 DM'], ['... < 100 DM']], 'present.employment.since': [['4 <= ... < 7 years'], ['... >= 7 years'], ['1 <= ... < 4 years'], ['unemployed'], ['... < 1 year']], 'installment.rate.in.percentage.of.disposable.income': [2, 3, 4], 'other.debtors.or.guarantors': [['guarantor', 'none', 'co-applicant']], 'property': [['real estate'], ['building society savings agreement/ life insurance'], ['car or other, not in attribute Savings account/bonds'], ['unknown / no property']], 'age.in.years': [26, 35, 37, 49], 'other.installment.plans': [['none'], ['stores', 'bank']], 'housing': [['own'], ['rent'], ['for free']], 'foreign.worker': [['no', 'yes']], 'creditability': [['good'], ['bad']]}
# 2.5、利用badrate图进一步调整分箱
# 2.5.1、自定义调整分箱示例
# 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图
# 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势
# 2.6、对分箱后的数据进行WOE转换
status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomeother.debtors.or.guarantorspropertyage.in.yearsother.installment.planshousingforeign.workercreditabilitycreditability_map00.818098706-1.280933845-0.733740578-0.410062817-0.153492135-0.762140052-0.2355660710.1573002890-0.461034959-0.194156014-0.121178625-0.1941560140-5.703782475010.4013917831.1349799330.087868755-0.4100628170.315638150.2713578440.032103245-0.1554664690-0.4610349590.48083491-0.121178625-0.19415601406.55108033512-1.176263223-0.128416292-0.7337405780.587786665-0.1534921350.271357844-0.394415272-0.1554664690-0.461034959-0.266352306-0.121178625-0.1941560140-5.703782475030.8180987060.5245244680.0878687550.0955565160.315638150.271357844-0.394415272-0.15546646900.028573372-0.266352306-0.1211786250.4726044110-5.703782475040.8180987060.1086883060.0878687550.3592004880.315638150.2713578440.032103245-0.06453852100.586082361-0.266352306-0.1211786250.47260441106.55108033515-1.1762632230.5245244680.0878687550.5877866650.31563815-0.7621400520.032103245-0.15546646900.586082361-0.044353168-0.1211786250.4726044110-5.70378247506-1.1762632230.1086883060.0878687550.095556516-0.153492135-0.762140052-0.235566071-0.06453852100.028573372-0.266352306-0.121178625-0.1941560140-5.703782475070.4013917830.5245244680.087868755-0.8056251640.315638150.2713578440.032103245-0.15546646900.034191365-0.044353168-0.1211786250.404445220-5.70378247508-1.176263223-0.1284162920.087868755-0.410062817-0.153492135-0.762140052-0.394415272-0.1554664690-0.461034959-0.266352306-0.121178625-0.1941560140-5.703782475090.4013917830.108688306-0.7337405780.3592004880.315638150.2713578440.319230430.15730028900.034191365-0.044353168-0.121178625-0.19415601406.5510803351100.401391783-0.1284162920.0878687550.359200488-0.1534921350.2713578440.470820289-0.06453852100.034191365-0.044353168-0.1211786250.4044452206.5510803351110.8180987061.1349799330.0878687550.2332880.315638150.2713578440.470820289-0.06453852100.0285733720.48083491-0.1211786250.4044452206.5510803351120.401391783-0.1284162920.087868755-0.410062817-0.1534921350.2713578440.032103245-0.25131442800.0341913650.48083491-0.121178625-0.1941560140-5.7037824750130.8180987060.108688306-0.7337405780.359200488-0.1534921350.271357844-0.2355660710.15730028900.034191365-0.266352306-0.121178625-0.19415601406.5510803351140.818098706-0.6652902260.0878687550.359200488-0.1534921350.2713578440.032103245-0.15546646900.034191365-0.044353168-0.1211786250.404445220-5.7037824750150.8180987060.1086883060.087868755-0.410062817-0.1534921350.139551880.0321032450.15730028900.034191365-0.044353168-0.121178625-0.19415601406.551080335116-1.1762632230.108688306-0.733740578-0.410062817-0.153492135-0.762140052-0.2355660710.15730028900.028573372-0.266352306-0.121178625-0.1941560140-5.7037824750170.8180987060.1086883061.2340708350.2332880.31563815-0.7621400520.470820289-0.15546646900.034191365-0.0443531680.477550835-0.1941560140-5.7037824750180.4013917830.1086883060.087868755-0.8056251640.315638150.271357844-0.2355660710.15730028900.586082361-0.044353168-0.1211786250.47260441106.551080335119-1.1762632230.1086883060.087868755-0.410062817-0.153492135-0.762140052-0.235566071-0.06453852100.034191365-0.044353168-0.121178625-0.1941560140-5.7037824750
# 2.7、特征选择
通过向前、向后、双向选择来进行特征选择,使用aic、bic、ks、auc 作为选择标准
final_data:
(1000, 3)
final_data:
Index(['status.of.existing.checking.account', 'creditability',
'creditability_map'],
dtype='object')
# 3、模型建立、训练、评估
# 3.1、切分训练集、测试集
# 3.2、模型训练
# 3.3、模型评估:F1、KS、AUC
# 4、模型上线评估,并计算信用分
# 4.1、评估变量的稳定性PSI:比较训练集和测试集
cal PSI 0.012897491574571578
# 4.2、训练集等频分箱,观测每组的区别
minmaxbadsgoodstotalbad_rategood_rateoddsbad_propgood_proptotal_propcum_bad_ratecum_bad_rate_revcum_bads_propcum_bads_prop_revcum_goods_propcum_goods_prop_revcum_total_propcum_total_prop_revkslift00.0001949760.000204106029229201000.55831740.38933333300.302666667010.558317410.38933333310.5583174110.0002141220.000214122012512501000.2390057360.16666666700.495633188010.7973231360.44168260.5560.6106666670.7973231361.63755458520.0002194860.000219486010610601000.2026768640.14133333300.6816816820110.2026768640.6973333330.44412.25225225230.9994849360.999507974804810inf0.21145374400.0640.08406304710.2114537441100.7613333330.3026666670.7885462563.30396475840.999530980.999530987807810inf0.34361233500.1040.19414483810.5550660790.788546256100.8653333330.2386666670.4449339213.30396475850.9995424390.999542439101010110inf0.44493392100.1346666670.302666667110.4449339211010.13466666703.303964758
# 4.3、评分卡分数变换
namevaluescore0status.of.existing.checking.accountno checking account261.941status.of.existing.checking.account... >= 200 DM / salary assignments for at least 1 year258.872status.of.existing.checking.account0 <= ... < 200 DM255.663status.of.existing.checking.account... < 0 DM254.014creditabilitygood744.25creditabilitybad-302.01
版权归原作者 一个处女座的程序猿 所有, 如有侵权,请联系我们删除。