



# 1、定义数据集

# 1.1、加载德国信用卡数据集

将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。 https://archive.ics.uci.edu/ml/datasets/Statlog+
status.of.existing.checking.accountduration.in.monthcredit.historypurposecredit.amountsavings.account.and.bondspresent.employment.sinceinstallment.rate.in.percentage.of.disposable.incomepersonal.status.and.sexother.debtors.or.guarantorspresent.residence.sincepropertyage.in.yearsother.installment.planshousingnumber.of.existing.credits.at.this.bankjobnumber.of.people.being.liable.to.provide.maintenance.fortelephoneforeign.workercreditability0... < 0 DM6critical account/ other credits existing (not at this bank)radio/television1169unknown/ no savings account... >= 7 years4male : divorced/separatednone4real estate67noneown2skilled employee / official1yes, registered under the customers nameyesgood10 <= ... < 200 DM48existing credits paid back duly till nowradio/television5951... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2real estate22noneown1skilled employee / official1noneyesbad2no checking account12critical account/ other credits existing (not at this bank)education2096... < 100 DM4 <= ... < 7 years2male : divorced/separatednone3real estate49noneown1unskilled - resident2noneyesgood3... < 0 DM42existing credits paid back duly till nowfurniture/equipment7882... < 100 DM4 <= ... < 7 years2male : divorced/separatedguarantor4building society savings agreement/ life insurance45nonefor free1skilled employee / official2noneyesgood4... < 0 DM24delay in paying off in the pastcar (new)4870... < 100 DM1 <= ... < 4 years3male : divorced/separatednone4unknown / no property53nonefor free2skilled employee / official2noneyesbad5no checking account36existing credits paid back duly till noweducation9055unknown/ no savings account1 <= ... < 4 years2male : divorced/separatednone4unknown / no property35nonefor free1unskilled - resident2yes, registered under the customers nameyesgood6no checking account24existing credits paid back duly till nowfurniture/equipment2835500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone4building society savings agreement/ life insurance53noneown1skilled employee / official1noneyesgood70 <= ... < 200 DM36existing credits paid back duly till nowcar (used)6948... < 100 DM1 <= ... < 4 years2male : divorced/separatednone2car or other, not in attribute Savings account/bonds35nonerent1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesgood8no checking account12existing credits paid back duly till nowradio/television3059... >= 1000 DM4 <= ... < 7 years2male : divorced/separatednone4real estate61noneown1unskilled - resident1noneyesgood90 <= ... < 200 DM30critical account/ other credits existing (not at this bank)car (new)5234... < 100 DMunemployed4male : divorced/separatednone2car or other, not in attribute Savings account/bonds28noneown2management/ self-employed/ highly qualified employee/ officer1noneyesbad100 <= ... < 200 DM12existing credits paid back duly till nowcar (new)1295... < 100 DM... < 1 year3male : divorced/separatednone1car or other, not in attribute Savings account/bonds25nonerent1skilled employee / official1noneyesbad11... < 0 DM48existing credits paid back duly till nowbusiness4308... < 100 DM... < 1 year3male : divorced/separatednone4building society savings agreement/ life insurance24nonerent1skilled employee / official1noneyesbad120 <= ... < 200 DM12existing credits paid back duly till nowradio/television1567... < 100 DM1 <= ... < 4 years1male : divorced/separatednone1car or other, not in attribute Savings account/bonds22noneown1skilled employee / official1yes, registered under the customers nameyesgood13... < 0 DM24critical account/ other credits existing (not at this bank)car (new)1199... < 100 DM... >= 7 years4male : divorced/separatednone4car or other, not in attribute Savings account/bonds60noneown2unskilled - resident1noneyesbad14... < 0 DM15existing credits paid back duly till nowcar (new)1403... < 100 DM1 <= ... < 4 years2male : divorced/separatednone4car or other, not in attribute Savings account/bonds28nonerent1skilled employee / official1noneyesgood15... < 0 DM24existing credits paid back duly till nowradio/television1282100 <= ... < 500 DM1 <= ... < 4 years4male : divorced/separatednone2car or other, not in attribute Savings account/bonds32noneown1unskilled - resident1noneyesbad16no checking account24critical account/ other credits existing (not at this bank)radio/television2424unknown/ no savings account... >= 7 years4male : divorced/separatednone4building society savings agreement/ life insurance53noneown2skilled employee / official1noneyesgood17... < 0 DM30no credits taken/ all credits paid back dulybusiness8072unknown/ no savings account... < 1 year2male : divorced/separatednone3car or other, not in attribute Savings account/bonds25bankown3skilled employee / official1noneyesgood180 <= ... < 200 DM24existing credits paid back duly till nowcar (used)12579... < 100 DM... >= 7 years4male : divorced/separatednone2unknown / no property44nonefor free1management/ self-employed/ highly qualified employee/ officer1yes, registered under the customers nameyesbad19no checking account24existing credits paid back duly till nowradio/television3430500 <= ... < 1000 DM... >= 7 years3male : divorced/separatednone2car or other, not in attribute Savings account/bonds31noneown1skilled employee / official2yes, registered under the customers nameyesgood




typesizemissinguniquemean_or_top1std_or_top2min_or_top31%_or_top410%_or_top550%_or_bottom575%_or_bottom490%_or_bottom399%_or_bottom2max_or_bottom1status.of.existing.checking.accountcategory10000.00%4no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%no checking account:39.40%... < 0 DM:27.40%0 <= ... < 200 DM:26.90%... >= 200 DM / salary assignments for at least 1 year:6.30%duration.in.monthint6410000.00%3320.90312.058814454691824366072credit.historycategory10000.00%5existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%existing credits paid back duly till now:53.00%critical account/ other credits existing (not at this bank):29.30%delay in paying off in the past:8.80%all credits at this bank paid back duly:4.90%no credits taken/ all credits paid back duly:4.00%purposeobject10000.00%10radio/television:28.00%car (new):23.40%furniture/equipment:18.10%car (used):10.30%business:9.70%education:5.00%repairs:2.20%domestic appliances:1.20%others:1.20%retraining:0.90%credit.amountint6410000.00%9213271.2582822.736876250425.839322319.53972.257179.414180.3918424savings.account.and.bondscategory10000.00%5... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%... < 100 DM:60.30%unknown/ no savings account:18.30%100 <= ... < 500 DM:10.30%500 <= ... < 1000 DM:6.30%... >= 1000 DM:4.80%present.employment.sincecategory10000.00%51 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%1 <= ... < 4 years:33.90%... >= 7 years:25.30%4 <= ... < 7 years:17.40%... < 1 year:17.20%unemployed:6.20%installment.rate.in.percentage.of.disposable.incomeint6410000.00%42.9731.11871467411134444personal.status.and.sexcategory10000.00%4male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%male : single:54.80%female : divorced/separated/married:31.00%male : married/widowed:9.20%male : divorced/separated:5.00%female : single:0.00%other.debtors.or.guarantorscategory10000.00%3none:90.70%guarantor:5.20%co-applicant:4.10%none:90.70%guarantor:5.20%co-applicant:4.10%present.residence.sinceint6410000.00%42.8451.10371789611134444propertycategory10000.00%4car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%car or other, not in attribute Savings account/bonds:33.20%real estate:28.20%building society savings agreement/ life insurance:23.20%unknown / no property:15.40%age.in.yearsint6410000.00%5335.54611.3754685719202333425267.0175other.installment.planscategory10000.00%3none:81.40%bank:13.90%stores:4.70%none:81.40%bank:13.90%stores:4.70%housingcategory10000.00%3own:71.30%rent:17.90%for free:10.80%own:71.30%rent:17.90%for free:10.80%number.of.existing.credits.at.this.bankint6410000.00%41.4070.57765446811112234jobcategory10000.00%4skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%skilled employee / official:63.00%unskilled - resident:20.00%management/ self-employed/ highly qualified employee/ officer:14.80%unemployed/ unskilled - non-resident:2.20%number.of.people.being.liable.to.provide.maintenance.forint6410000.00%21.1550.36208577211111222telephonecategory10000.00%2none:59.60%yes, registered under the customers name:40.40%none:59.60%yes, registered under the customers name:40.40%foreign.workercategory10000.00%2yes:96.30%no:3.70%yes:96.30%no:3.70%creditabilityobject10000.00%2good:70.00%bad:30.00%good:70.00%bad:30.00%

# 1.3、输出连续型变量的mean、std、min、3种分位数、max


# 2、数据预处理

# 2.1、对类别型目标变量映射成数值型变量

# 2.2、分析每个特征的iv、基尼系数gini、熵entropy、unique等


# 2.3、筛选特征:分别基于IV、empty、corr指标

{'empty': array([], dtype=float64), 'iv': array(['personal.status.and.sex', 'present.residence.since',
'number.of.existing.credits.at.this.bank', 'job',
'telephone'], dtype=object), 'corr': array([], dtype=object)}

# 2.4、分箱处理


 {'status.of.existing.checking.account': [['no checking account'], ['... >= 200 DM / salary assignments for at least 1 year'], ['0 <= ... < 200 DM'], ['... < 0 DM']], 'duration.in.month': [9, 12, 13, 16, 36, 45], 'credit.history': [['critical account/ other credits existing (not at this bank)'], ['delay in paying off in the past', 'existing credits paid back duly till now'], ['all credits at this bank paid back duly', 'no credits taken/ all credits paid back duly']], 'purpose': [['retraining', 'car (used)'], ['radio/television'], ['furniture/equipment'], ['domestic appliances', 'business', 'repairs'], ['car (new)'], ['others', 'education']], 'credit.amount': [3556], 'savings.account.and.bonds': [['... >= 1000 DM', '500 <= ... < 1000 DM', 'unknown/ no savings account'], ['100 <= ... < 500 DM'], ['... < 100 DM']], 'present.employment.since': [['4 <= ... < 7 years'], ['... >= 7 years'], ['1 <= ... < 4 years'], ['unemployed'], ['... < 1 year']], 'installment.rate.in.percentage.of.disposable.income': [2, 3, 4], 'other.debtors.or.guarantors': [['guarantor', 'none', 'co-applicant']], 'property': [['real estate'], ['building society savings agreement/ life insurance'], ['car or other, not in attribute Savings account/bonds'], ['unknown / no property']], 'age.in.years': [26, 35, 37, 49], 'other.installment.plans': [['none'], ['stores', 'bank']], 'housing': [['own'], ['rent'], ['for free']], 'foreign.worker': [['no', 'yes']], 'creditability': [['good'], ['bad']]}

# 2.5、利用badrate图进一步调整分箱

# 2.5.1、自定义调整分箱示例

# 2.5.2、绘制每一箱的占比柱状图、及其对应的坏样本率折线图

# 2.5.3、调整分箱:使得bad_rate整体上呈现单调的趋势

# 2.6、对分箱后的数据进行WOE转换


# 2.7、特征选择

通过向前、向后、双向选择来进行特征选择,使用aic、bic、ks、auc 作为选择标准

 (1000, 3)
 Index(['status.of.existing.checking.account', 'creditability',

# 3、模型建立、训练、评估

# 3.1、切分训练集、测试集

# 3.2、模型训练

# 3.3、模型评估:F1、KS、AUC

# 4、模型上线评估,并计算信用分

# 4.1、评估变量的稳定性PSI:比较训练集和测试集

cal PSI 0.012897491574571578

# 4.2、训练集等频分箱,观测每组的区别


# 4.3、评分卡分数变换

namevaluescore0status.of.existing.checking.accountno checking account261.941status.of.existing.checking.account... >= 200 DM / salary assignments for at least 1 year258.872status.of.existing.checking.account0 <= ... < 200 DM255.663status.of.existing.checking.account... < 0 DM254.014creditabilitygood744.25creditabilitybad-302.01

本文转载自: https://blog.csdn.net/qq_41185868/article/details/125418213
版权归原作者 一个处女座的程序猿 所有, 如有侵权,请联系我们删除。

