问题描述
最近在学习《机器学习实战》这本书时,朴素贝叶斯那里遇到了这样的问题。
deftrain_native_bayes(train_matrix,train_category):
num_train_docs=len(train_matrix)
num_words=len(train_matrix[0])
p=sum(train_category)/float(num_train_docs)
p_0_num=zeros(num_words)
p_1_num=zeros(num_words)
p_0_denom=0.0
p_1_denom=0.0for i inrange(num_train_docs):if train_category[i]==1:
p_1_num+=train_matrix[i]
p_1_denom+=sum(train_matrix[i])else:
p_0_num+=train_matrix[i]
p_0_denom+=sum(train_matrix[i])
p_1_vector=log(p_1_num/p_1_denom)
p_0_vector=log(p_0_num/p_0_denom)return p_0_vector,p_1_vector,p
然后运行时出现了下面的问题:
F:\PycharmProject\bayes_practice_1.py:74: RuntimeWarning: divide by zero encountered in log
p_1_vector=log(p_1_num/p_1_denom)
F:\PycharmProject\bayes_practice_1.py:75: RuntimeWarning: divide by zero encountered in log
p_0_vector=log(p_0_num/p_0_denom)
F:\PycharmProject\bayes_practice_1.py:84: RuntimeWarning: invalid value encountered in multiply
p_1 =sum(need_to_classify_vector * p_1_vector)+ log(p_class)#element-wise mult
F:\PycharmProject\bayes_practice_1.py:85: RuntimeWarning: invalid value encountered in multiply
p_0 =sum(need_to_classify_vector * p_0_vector)+ log(1.0- p_class)
虽然不影响最终的结果,但是警告看起来让人不舒服。
我们排查原因,是存在数字太小的原因,溢出,计算过程中出现-inf,再做其他运算,结果还是-inf。
比如我们展示一下结果:
train_mat=[]for i in dataset:
train_mat.append(set_of_words_vector(my_vacab_set,i))
p_0_vector,p_1_vector,p=train_native_bayes(train_mat,class_vector)print(p_0_vector)
结果如下:
[-3.17805383-3.17805383-3.17805383-inf -3.17805383-2.48490665-3.17805383-3.17805383-inf -3.17805383-3.17805383-3.17805383-inf -inf -inf -inf -3.17805383-inf
-3.17805383-3.17805383-inf -inf -3.17805383-2.07944154-3.17805383-3.17805383-inf -3.17805383-3.17805383-inf
-3.17805383-3.17805383]
探索原因
当概率很小时,取对数后结果趋于负无穷大。
解决方法
我们改变浮点数的精度为1e-5
p_1_vector=log(p_1_num/p_1_denom+1e-5)
p_0_vector=log(p_0_num/p_0_denom+1e-5)
这样就不会再报错,结果也没有-inf了。
[-3.17781386-3.17781386-3.17781386-3.17781386-3.17781386-3.17781386-3.17781386-3.17781386-2.07936154-3.17781386-11.51292546-3.17781386-11.51292546-11.51292546-3.17781386-3.17781386-11.51292546-3.17781386-3.17781386-11.51292546-11.51292546-11.51292546-11.51292546-11.51292546-3.17781386-3.17781386-11.51292546-2.48478666-3.17781386-3.17781386-11.51292546-3.17781386]
版权归原作者 旅途中的宽~ 所有, 如有侵权,请联系我们删除。