



1. 人脸检测

1.1 人脸检测概述


1.2 人脸检测的难点


  • 相似性:从人脸的构造上来看,个体之间的人脸构造区别不大,甚至人脸器官的构造都很相似。这种相似性对于利用人脸进行定位是能偶提供很大的便利的,但同时对于个体的区分确实难的。
  • 易变性:抛去构造仅仅关注外形的话,人脸的外形又是十分多变的,面部表情多变,而在不同观察角度,人脸的视觉图像也相差很大,另外,人脸识别还受光照条件(例如白天和夜晚,室内和室外等)、人脸的很多遮盖物(例如口罩、墨镜、头发、胡须等)、年龄等多方面因素的影响。

在人脸识别中,第一类的变化是应该放大而作为区分个体的标准的,而第二类的变化应该消除,因为它们可以代表同一个个体。通常称第一类变化为类间变化(inter-class difference),而称第二类变化为类内变化(intra-class difference)。对于人脸,类内变化往往大于类间变化,从而使在受类内变化干扰的情况下利用类间变化区分个体变得异常困难。

1.3 人脸检测的应用场景


2. mtcnn

2.1 mtcnn概述

MTCNN,英文全称是Multi-task convolutional neural network,中文全称是多任务卷积神经网络,该神经网络将人脸区域检测与人脸关键点检测放在了一起。


2.2 mtcnn的网络结构




  1. 由原始图片和PNet生成预测的bounding boxes。
  2. 输入原始图片和PNet生成的bounding box,通过RNet,生成校正后的bounding box。
  3. 输入原始图片和RNet生成的bounding box,通过ONet,生成校正后的bounding box和人脸面部轮廓关键点。







  1. 第一层P-Net将经过卷积,池化操作后输出分类(对应像素点是否存在人脸)和回归(回归box)结果。
  2. 第二层网络将第一层输出的结果使用非极大抑制(NMS)来去除高度重合的候选框,并将这些候选框放入R-Net中进行精细的操作,拒绝大量错误框,再对回归框做校正,并使用NMS去除重合框,输出分支同样两个分类和回归。
  3. 最后将R-Net输出认为是人脸的候选框输入到O-Net中再一次进行精细操作,拒绝掉错误的框,此时输出分支包含三个分类: a. 是否有人脸:2个输出; b. 回归:回归得到的框的起始点(或中心点)的xy坐标和框的长宽,4个输出; c. 人脸特征点定位:5个人脸特征点的xy坐标,10个输出。


2.3 图像金字塔








  • 第一阶段会多次缩放原图得到图片金字塔,目的是为了让缩放后图片中的人脸与P-NET训练时候的图片尺度( 12 p x × 12 p x 12px\times 12px 12px×12px)接近。
  • 引申优化项:先把图像缩放到一定大小,再通过factor对这个大小进行缩放。可以减少计算量。



  • 生成图像金字塔的过程比较慢。
  • 每种尺度的图片都需要输入进模型,相当于执行了多次的模型推理流程。

2.4 P-Net

P-Net(Proposal Network)的网络结构




在P-Net中,经过了三次卷积和一次池化(MP:Max Pooling),输入

  1. 70
  2. ×
  3. 70
  4. 70\times 70
  5. 70×70的图,经过P网络全卷积后,输出为
  6. 70
  7. 2
  8. 2
  9. 2
  10. 2
  11. =
  12. 30
  13. \frac{70-2}{2} -2 -2 =30
  14. 2702​−22=30,即一个5通道的
  15. 30
  16. ×
  17. 30
  18. 30\times 30
  19. 30×30的特征图。这就意味着该图经过p的一次滑窗操作,得到
  20. 30
  21. ×
  22. 30
  23. =
  24. 900
  25. 30\times 30=900
  26. 30×30=900个建议框,而每个建议框对应1个置信度与4个偏移量。再经nms把置信度分数大于设定的阈值0.6对应的建议框保留下来,将其对应的边框偏移量经边框回归操作,得到在原图中的坐标信息,即得到符合P-Net的这些建议框了。之后传给R-Net

2.5 R-Net

R-Net(Refine Network),从网络图可以看到,该网络结构只是和P-Net网络结构多了一个全连接层。图片在输入R-Net之前,都需要缩放到24x24x3。网络的输出与P-Net是相同的,R-Net的目的是为了去除大量的非人脸框。


2.6 O-Net

O-Net(Output Network),该层比R-Net层又多了一层卷积层,所以处理的结果会更加精细。输入的图像大小48x48x3,输出包括N个边界框的坐标信息,score以及关键点位置。



3. 工程实践(基于Keras)




  1. from keras.layers import Conv2D, Input,MaxPool2D, Reshape,Activation,Flatten, Dense, Permute
  2. from keras.layers.advanced_activations import PReLU
  3. from keras.models import Model, Sequential
  4. import tensorflow as tf
  5. import numpy as np
  6. import utils
  7. import cv2
  8. #-----------------------------## 粗略获取人脸框# 输出bbox位置和是否有人脸#-----------------------------#defcreate_Pnet(weight_path):input= Input(shape=[None,None,3])
  9. x = Conv2D(10,(3,3), strides=1, padding='valid', name='conv1')(input)
  10. x = PReLU(shared_axes=[1,2],name='PReLU1')(x)
  11. x = MaxPool2D(pool_size=2)(x)
  12. x = Conv2D(16,(3,3), strides=1, padding='valid', name='conv2')(x)
  13. x = PReLU(shared_axes=[1,2],name='PReLU2')(x)
  14. x = Conv2D(32,(3,3), strides=1, padding='valid', name='conv3')(x)
  15. x = PReLU(shared_axes=[1,2],name='PReLU3')(x)
  16. classifier = Conv2D(2,(1,1), activation='softmax', name='conv4-1')(x)# 无激活函数,线性。
  17. bbox_regress = Conv2D(4,(1,1), name='conv4-2')(x)
  18. model = Model([input],[classifier, bbox_regress])
  19. model.load_weights(weight_path, by_name=True)return model
  20. #-----------------------------## mtcnn的第二段# 精修框#-----------------------------#defcreate_Rnet(weight_path):input= Input(shape=[24,24,3])# 24,24,3 -> 11,11,28
  21. x = Conv2D(28,(3,3), strides=1, padding='valid', name='conv1')(input)
  22. x = PReLU(shared_axes=[1,2], name='prelu1')(x)
  23. x = MaxPool2D(pool_size=3,strides=2, padding='same')(x)# 11,11,28 -> 4,4,48
  24. x = Conv2D(48,(3,3), strides=1, padding='valid', name='conv2')(x)
  25. x = PReLU(shared_axes=[1,2], name='prelu2')(x)
  26. x = MaxPool2D(pool_size=3, strides=2)(x)# 4,4,48 -> 3,3,64
  27. x = Conv2D(64,(2,2), strides=1, padding='valid', name='conv3')(x)
  28. x = PReLU(shared_axes=[1,2], name='prelu3')(x)# 3,3,64 -> 64,3,3
  29. x = Permute((3,2,1))(x)
  30. x = Flatten()(x)# 576 -> 128
  31. x = Dense(128, name='conv4')(x)
  32. x = PReLU( name='prelu4')(x)# 128 -> 2 128 -> 4
  33. classifier = Dense(2, activation='softmax', name='conv5-1')(x)
  34. bbox_regress = Dense(4, name='conv5-2')(x)
  35. model = Model([input],[classifier, bbox_regress])
  36. model.load_weights(weight_path, by_name=True)return model
  37. #-----------------------------## mtcnn的第三段# 精修框并获得五个点#-----------------------------#defcreate_Onet(weight_path):input= Input(shape =[48,48,3])# 48,48,3 -> 23,23,32
  38. x = Conv2D(32,(3,3), strides=1, padding='valid', name='conv1')(input)
  39. x = PReLU(shared_axes=[1,2],name='prelu1')(x)
  40. x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)# 23,23,32 -> 10,10,64
  41. x = Conv2D(64,(3,3), strides=1, padding='valid', name='conv2')(x)
  42. x = PReLU(shared_axes=[1,2],name='prelu2')(x)
  43. x = MaxPool2D(pool_size=3, strides=2)(x)# 8,8,64 -> 4,4,64
  44. x = Conv2D(64,(3,3), strides=1, padding='valid', name='conv3')(x)
  45. x = PReLU(shared_axes=[1,2],name='prelu3')(x)
  46. x = MaxPool2D(pool_size=2)(x)# 4,4,64 -> 3,3,128
  47. x = Conv2D(128,(2,2), strides=1, padding='valid', name='conv4')(x)
  48. x = PReLU(shared_axes=[1,2],name='prelu4')(x)# 3,3,128 -> 128,12,12
  49. x = Permute((3,2,1))(x)# 1152 -> 256
  50. x = Flatten()(x)
  51. x = Dense(256, name='conv5')(x)
  52. x = PReLU(name='prelu5')(x)# 鉴别# 256 -> 2 256 -> 4 256 -> 10
  53. classifier = Dense(2, activation='softmax',name='conv6-1')(x)
  54. bbox_regress = Dense(4,name='conv6-2')(x)
  55. landmark_regress = Dense(10,name='conv6-3')(x)
  56. model = Model([input],[classifier, bbox_regress, landmark_regress])
  57. model.load_weights(weight_path, by_name=True)return model
  58. classmtcnn():def__init__(self):
  59. self.Pnet = create_Pnet('model_data/pnet.h5')
  60. self.Rnet = create_Rnet('model_data/rnet.h5')
  61. self.Onet = create_Onet('model_data/onet.h5')defdetectFace(self, img, threshold):#-----------------------------## 归一化,加快收敛速度# 把[0,255]映射到(-1,1)#-----------------------------#
  62. copy_img =(img.copy()-127.5)/127.5
  63. origin_h, origin_w, _ = copy_img.shape
  64. #-----------------------------## 计算原始输入图像# 每一次缩放的比例#-----------------------------#
  65. scales = utils.calculateScales(img)
  66. out =[]#-----------------------------## 粗略计算人脸框# pnet部分#-----------------------------#for scale in scales:
  67. hs =int(origin_h * scale)
  68. ws =int(origin_w * scale)
  69. scale_img = cv2.resize(copy_img,(ws, hs))
  70. inputs = scale_img.reshape(1,*scale_img.shape)# 图像金字塔中的每张图片分别传入Pnet得到output
  71. output = self.Pnet.predict(inputs)# 将所有output加入out
  72. out.append(output)
  73. image_num =len(scales)
  74. rectangles =[]for i inrange(image_num):# 有人脸的概率
  75. cls_prob = out[i][0][0][:,:,1]# 其对应的框的位置
  76. roi = out[i][1][0]# 取出每个缩放后图片的长宽
  77. out_h, out_w = cls_prob.shape
  78. out_side =max(out_h, out_w)print(cls_prob.shape)# 解码过程
  79. rectangle = utils.detect_face_12net(cls_prob, roi, out_side,1/ scales[i], origin_w, origin_h, threshold[0])
  80. rectangles.extend(rectangle)# 进行非极大抑制
  81. rectangles = utils.NMS(rectangles,0.7)iflen(rectangles)==0:return rectangles
  82. #-----------------------------## 稍微精确计算人脸框# Rnet部分#-----------------------------#
  83. predict_24_batch =[]for rectangle in rectangles:
  84. crop_img = copy_img[int(rectangle[1]):int(rectangle[3]),int(rectangle[0]):int(rectangle[2])]
  85. scale_img = cv2.resize(crop_img,(24,24))
  86. predict_24_batch.append(scale_img)
  87. predict_24_batch = np.array(predict_24_batch)
  88. out = self.Rnet.predict(predict_24_batch)
  89. cls_prob = out[0]
  90. cls_prob = np.array(cls_prob)
  91. roi_prob = out[1]
  92. roi_prob = np.array(roi_prob)
  93. rectangles = utils.filter_face_24net(cls_prob, roi_prob, rectangles, origin_w, origin_h, threshold[1])iflen(rectangles)==0:return rectangles
  94. #-----------------------------## 计算人脸框# onet部分#-----------------------------#
  95. predict_batch =[]for rectangle in rectangles:
  96. crop_img = copy_img[int(rectangle[1]):int(rectangle[3]),int(rectangle[0]):int(rectangle[2])]
  97. scale_img = cv2.resize(crop_img,(48,48))
  98. predict_batch.append(scale_img)
  99. predict_batch = np.array(predict_batch)
  100. output = self.Onet.predict(predict_batch)
  101. cls_prob = output[0]
  102. roi_prob = output[1]
  103. pts_prob = output[2]
  104. rectangles = utils.filter_face_48net(cls_prob, roi_prob, pts_prob, rectangles, origin_w, origin_h, threshold[2])return rectangles


  1. import cv2
  2. import numpy as np
  3. from mtcnn import mtcnn
  4. img = cv2.imread('img/test1.jpg')
  5. model = mtcnn()
  6. threshold =[0.5,0.6,0.7]# 三段网络的置信度阈值不同
  7. rectangles = model.detectFace(img, threshold)
  8. draw = img.copy()for rectangle in rectangles:if rectangle isnotNone:
  9. W =-int(rectangle[0])+int(rectangle[2])
  10. H =-int(rectangle[1])+int(rectangle[3])
  11. paddingH =0.01* W
  12. paddingW =0.02* H
  13. crop_img = img[int(rectangle[1]+paddingH):int(rectangle[3]-paddingH),int(rectangle[0]-paddingW):int(rectangle[2]+paddingW)]if crop_img isNone:continueif crop_img.shape[0]<0or crop_img.shape[1]<0:continue
  14. cv2.rectangle(draw,(int(rectangle[0]),int(rectangle[1])),(int(rectangle[2]),int(rectangle[3])),(255,0,0),1)for i inrange(5,15,2):
  15. cv2.circle(draw,(int(rectangle[i +0]),int(rectangle[i +1])),2,(0,255,0))
  16. cv2.imwrite("img/out.jpg",draw)
  17. cv2.imshow("test", draw)
  18. c = cv2.waitKey(0)


  1. import sys
  2. from operator import itemgetter
  3. import numpy as np
  4. import cv2
  5. import matplotlib.pyplot as plt
  6. #-----------------------------## 计算原始输入图像# 每一次缩放的比例#-----------------------------#defcalculateScales(img):
  7. copy_img = img.copy()
  8. pr_scale =1.0
  9. h,w,_ = copy_img.shape
  10. # 引申优化项 = resize(h*500/min(h,w), w*500/min(h,w))ifmin(w,h)>500:
  11. pr_scale =500.0/min(h,w)
  12. w =int(w*pr_scale)
  13. h =int(h*pr_scale)elifmax(w,h)<500:
  14. pr_scale =500.0/max(h,w)
  15. w =int(w*pr_scale)
  16. h =int(h*pr_scale)
  17. scales =[]
  18. factor =0.709
  19. factor_count =0
  20. minl =min(h,w)while minl >=12:
  21. scales.append(pr_scale*pow(factor, factor_count))
  22. minl *= factor
  23. factor_count +=1return scales
  24. #-------------------------------------## 对pnet处理后的结果进行处理#-------------------------------------#defdetect_face_12net(cls_prob,roi,out_side,scale,width,height,threshold):
  25. cls_prob = np.swapaxes(cls_prob,0,1)
  26. roi = np.swapaxes(roi,0,2)
  27. stride =0# stride略等于2if out_side !=1:
  28. stride =float(2*out_side-1)/(out_side-1)(x,y)= np.where(cls_prob>=threshold)
  29. boundingbox = np.array([x,y]).T
  30. # 找到对应原图的位置
  31. bb1 = np.fix((stride *(boundingbox)+0)* scale)
  32. bb2 = np.fix((stride *(boundingbox)+11)* scale)# plt.scatter(bb1[:,0],bb1[:,1],linewidths=1)# plt.scatter(bb2[:,0],bb2[:,1],linewidths=1,c='r')# plt.show()
  33. boundingbox = np.concatenate((bb1,bb2),axis =1)
  34. dx1 = roi[0][x,y]
  35. dx2 = roi[1][x,y]
  36. dx3 = roi[2][x,y]
  37. dx4 = roi[3][x,y]
  38. score = np.array([cls_prob[x,y]]).T
  39. offset = np.array([dx1,dx2,dx3,dx4]).T
  40. boundingbox = boundingbox + offset*12.0*scale
  41. rectangles = np.concatenate((boundingbox,score),axis=1)
  42. rectangles = rect2square(rectangles)
  43. pick =[]for i inrange(len(rectangles)):
  44. x1 =int(max(0,rectangles[i][0]))
  45. y1 =int(max(0,rectangles[i][1]))
  46. x2 =int(min(width ,rectangles[i][2]))
  47. y2 =int(min(height,rectangles[i][3]))
  48. sc = rectangles[i][4]if x2>x1 and y2>y1:
  49. pick.append([x1,y1,x2,y2,sc])return NMS(pick,0.3)#-----------------------------## 将长方形调整为正方形#-----------------------------#defrect2square(rectangles):
  50. w = rectangles[:,2]- rectangles[:,0]
  51. h = rectangles[:,3]- rectangles[:,1]
  52. l = np.maximum(w,h).T
  53. rectangles[:,0]= rectangles[:,0]+ w*0.5- l*0.5
  54. rectangles[:,1]= rectangles[:,1]+ h*0.5- l*0.5
  55. rectangles[:,2:4]= rectangles[:,0:2]+ np.repeat([l],2, axis =0).T
  56. return rectangles
  57. #-------------------------------------## 非极大抑制#-------------------------------------#defNMS(rectangles,threshold):iflen(rectangles)==0:return rectangles
  58. boxes = np.array(rectangles)
  59. x1 = boxes[:,0]
  60. y1 = boxes[:,1]
  61. x2 = boxes[:,2]
  62. y2 = boxes[:,3]
  63. s = boxes[:,4]
  64. area = np.multiply(x2-x1+1, y2-y1+1)
  65. I = np.array(s.argsort())
  66. pick =[]whilelen(I)>0:
  67. xx1 = np.maximum(x1[I[-1]], x1[I[0:-1]])#I[-1] have hightest prob score, I[0:-1]->others
  68. yy1 = np.maximum(y1[I[-1]], y1[I[0:-1]])
  69. xx2 = np.minimum(x2[I[-1]], x2[I[0:-1]])
  70. yy2 = np.minimum(y2[I[-1]], y2[I[0:-1]])
  71. w = np.maximum(0.0, xx2 - xx1 +1)
  72. h = np.maximum(0.0, yy2 - yy1 +1)
  73. inter = w * h
  74. o = inter /(area[I[-1]]+ area[I[0:-1]]- inter)
  75. pick.append(I[-1])
  76. I = I[np.where(o<=threshold)[0]]
  77. result_rectangle = boxes[pick].tolist()return result_rectangle
  78. #-------------------------------------## 对Rnet处理后的结果进行处理#-------------------------------------#deffilter_face_24net(cls_prob,roi,rectangles,width,height,threshold):
  79. prob = cls_prob[:,1]
  80. pick = np.where(prob>=threshold)
  81. rectangles = np.array(rectangles)
  82. x1 = rectangles[pick,0]
  83. y1 = rectangles[pick,1]
  84. x2 = rectangles[pick,2]
  85. y2 = rectangles[pick,3]
  86. sc = np.array([prob[pick]]).T
  87. dx1 = roi[pick,0]
  88. dx2 = roi[pick,1]
  89. dx3 = roi[pick,2]
  90. dx4 = roi[pick,3]
  91. w = x2-x1
  92. h = y2-y1
  93. x1 = np.array([(x1+dx1*w)[0]]).T
  94. y1 = np.array([(y1+dx2*h)[0]]).T
  95. x2 = np.array([(x2+dx3*w)[0]]).T
  96. y2 = np.array([(y2+dx4*h)[0]]).T
  97. rectangles = np.concatenate((x1,y1,x2,y2,sc),axis=1)
  98. rectangles = rect2square(rectangles)
  99. pick =[]for i inrange(len(rectangles)):
  100. x1 =int(max(0,rectangles[i][0]))
  101. y1 =int(max(0,rectangles[i][1]))
  102. x2 =int(min(width ,rectangles[i][2]))
  103. y2 =int(min(height,rectangles[i][3]))
  104. sc = rectangles[i][4]if x2>x1 and y2>y1:
  105. pick.append([x1,y1,x2,y2,sc])return NMS(pick,0.3)#-------------------------------------## onet处理后的结果进行处理#-------------------------------------#deffilter_face_48net(cls_prob,roi,pts,rectangles,width,height,threshold):
  106. prob = cls_prob[:,1]
  107. pick = np.where(prob>=threshold)
  108. rectangles = np.array(rectangles)
  109. x1 = rectangles[pick,0]
  110. y1 = rectangles[pick,1]
  111. x2 = rectangles[pick,2]
  112. y2 = rectangles[pick,3]
  113. sc = np.array([prob[pick]]).T
  114. dx1 = roi[pick,0]
  115. dx2 = roi[pick,1]
  116. dx3 = roi[pick,2]
  117. dx4 = roi[pick,3]
  118. w = x2-x1
  119. h = y2-y1
  120. pts0= np.array([(w*pts[pick,0]+x1)[0]]).T
  121. pts1= np.array([(h*pts[pick,5]+y1)[0]]).T
  122. pts2= np.array([(w*pts[pick,1]+x1)[0]]).T
  123. pts3= np.array([(h*pts[pick,6]+y1)[0]]).T
  124. pts4= np.array([(w*pts[pick,2]+x1)[0]]).T
  125. pts5= np.array([(h*pts[pick,7]+y1)[0]]).T
  126. pts6= np.array([(w*pts[pick,3]+x1)[0]]).T
  127. pts7= np.array([(h*pts[pick,8]+y1)[0]]).T
  128. pts8= np.array([(w*pts[pick,4]+x1)[0]]).T
  129. pts9= np.array([(h*pts[pick,9]+y1)[0]]).T
  130. x1 = np.array([(x1+dx1*w)[0]]).T
  131. y1 = np.array([(y1+dx2*h)[0]]).T
  132. x2 = np.array([(x2+dx3*w)[0]]).T
  133. y2 = np.array([(y2+dx4*h)[0]]).T
  134. rectangles=np.concatenate((x1,y1,x2,y2,sc,pts0,pts1,pts2,pts3,pts4,pts5,pts6,pts7,pts8,pts9),axis=1)
  135. pick =[]for i inrange(len(rectangles)):
  136. x1 =int(max(0,rectangles[i][0]))
  137. y1 =int(max(0,rectangles[i][1]))
  138. x2 =int(min(width ,rectangles[i][2]))
  139. y2 =int(min(height,rectangles[i][3]))if x2>x1 and y2>y1:
  140. pick.append([x1,y1,x2,y2,rectangles[i][4],
  141. rectangles[i][5],rectangles[i][6],rectangles[i][7],rectangles[i][8],rectangles[i][9],rectangles[i][10],rectangles[i][11],rectangles[i][12],rectangles[i][13],rectangles[i][14]])return NMS(pick,0.3)





