0


Yolov7学习笔记(一)模型结构

文章目录

前言

个人学习笔记,项目代码参考Bubbliiiing的yolov7-pytorch-master版
参考:
1、Pytorch搭建YoloV7目标检测平台 源码
2、最终版本YOLOv1-v7全系列大解析
3、三万字硬核详解:yolov1、yolov2、yolov3、yolov4、yolov5、yolov7
4、yolo系列的Neck模块

在这里插入图片描述如图所示,yolo系类的结构主要由主干提取结构(Backbone)、特征强化结构(Neck)、检测头(Head)组成,其中各版本用到不同的Tricks,不同的损失函数,不同的锚框的匹配策略等等。

YOLOV7结构

在这里插入图片描述YOLOv7的Backbone结构在YOLOv5的基础上,设计了Multi_Concat_Block和Transition_Block结构
YOLOv7的Neck结构主要包含了SPPSCP模块和优化的PAN模块。
YOLOv7的Head结构使用了和YOLOv5一样的损失函数,引入RepVGG style改造了Head网络结构,并使用了辅助头(auxiliary Head)训练以及相应的正负样本匹配策略。

在这里插入图片描述

Multi_Concat_Block结构由多个卷积+BN+Silu组合传递。
直接引用参考文章1里的解释:
如此多的堆叠其实也对应了更密集的残差结构,残差网络的特点是容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接,缓解了在深度神经网络中增加深度带来的梯度消失问题。

在这里插入图片描述
直接引用参考文章1里的解释:
使用创新的过渡模块Transition_Block来进行下采样,在卷积神经网络中,常见的用于下采样的过渡模块是一个卷积核大小为3x3、步长为2x2的卷积或者一个步长为2x2的最大池化。在YoloV7中,作者将两种过渡模块进行了集合,一个过渡模块存在两个分支,如图所示。左分支是一个步长为2x2的最大池化+一个1x1卷积,右分支是一个1x1卷积+一个卷积核大小为3x3、步长为2x2的卷积,两个分支的结果在输出时会进行堆叠。

Backbone

先来看Backbone中用到的几个小模块

Conv2D_BN_SiLU

向前传播时候 按次序经过:卷积+Batch Normalization+SiLU激活函数

  1. classConv(nn.Module):def__init__(self, c1, c2, k=1, s=1, p=None, g=1, act=SiLU()):# ch_in, ch_out, kernel, stride, padding, groupssuper(Conv, self).__init__()
  2. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
  3. self.bn = nn.BatchNorm2d(c2, eps=0.001, momentum=0.03)
  4. self.act = nn.LeakyReLU(0.1, inplace=True)if act isTrueelse(act ifisinstance(act, nn.Module)else nn.Identity())defforward(self, x):return self.act(self.bn(self.conv(x)))deffuseforward(self, x):return self.act(self.conv(x))

激活函数:SiLu
在这里插入图片描述

  1. f
  2. (
  3. x
  4. )
  5. =
  6. x
  7. σ
  8. (
  9. x
  10. )
  11. f(x)=x⋅σ(x)
  12. f(x)=x⋅σ(x)

在这里插入图片描述

  1. f
  2. (
  3. x
  4. )
  5. =
  6. f
  7. (
  8. x
  9. )
  10. +
  11. σ
  12. (
  13. x
  14. )
  15. (
  16. 1
  17. f
  18. (
  19. x
  20. )
  21. )
  22. f^{'}(x)=f(x)+σ(x)(1−f(x))
  23. f′(x)=f(x)+σ(x)(1−f(x))
  1. classSiLU(nn.Module):@staticmethoddefforward(x):return x * torch.sigmoid(x)

Multi_Concat_Block

Multi_Concat_Block结构如下图所示:
在这里插入图片描述
在代码中,输入图像(input),经过Conv2D_BN_SiLu计算后用一个list全部装起来,然后通过索引ids去检索所需要的torch.cat的输出层。

  1. ids ={'l':[-1,-3,-5,-6],'x':[-1,-3,-5,-7,-8],}[phi]classMulti_Concat_Block(nn.Module):def__init__(self, c1, c2, c3, n=4, e=1, ids=[0]):super(Multi_Concat_Block, self).__init__()
  2. c_ =int(c2 * e)
  3. self.ids = ids
  4. self.cv1 = Conv(c1, c_,1,1)
  5. self.cv2 = Conv(c1, c_,1,1)
  6. self.cv3 = nn.ModuleList([Conv(c_ if i ==0else c2, c2,3,1)for i inrange(n)])
  7. self.cv4 = Conv(c_ *2+ c2 *(len(ids)-2), c3,1,1)defforward(self, x):
  8. x_1 = self.cv1(x)
  9. x_2 = self.cv2(x)
  10. x_all =[x_1, x_2]# [-1, -3, -5, -6] => [5, 3, 1, 0]for i inrange(len(self.cv3)):
  11. x_2 = self.cv3[i](x_2)
  12. x_all.append(x_2)
  13. out = self.cv4(torch.cat([x_all[id]foridin self.ids],1))return out

Transition_Block

具体流程如图:
在这里插入图片描述
具体代码如下:

  1. classTransition_Block(nn.Module):def__init__(self, c1, c2):super(Transition_Block, self).__init__()
  2. self.cv1 = Conv(c1, c2,1,1)
  3. self.cv2 = Conv(c1, c2,1,1)
  4. self.cv3 = Conv(c2, c2,3,2)
  5. self.mp = MP()defforward(self, x):# 160, 160, 256 => 80, 80, 256 => 80, 80, 128
  6. x_1 = self.mp(x)
  7. x_1 = self.cv1(x_1)# 160, 160, 256 => 160, 160, 128 => 80, 80, 128
  8. x_2 = self.cv2(x)
  9. x_2 = self.cv3(x_2)# 80, 80, 128 cat 80, 80, 128 => 80, 80, 256return torch.cat([x_2, x_1],1)

Backbone结构

在项目代码中Backbone用nn.Sequential一步一步的将每个细节存起来。
nn.Sequential是一个有序的容器,神经网络模块将按照在传入构造器的顺序依次被添加到计算图中执行,同时以神经网络模块为元素的有序字典也可以作为传入参数。
在这里插入图片描述
Backebone再返回3个特征层,用以和特征强化网络torch.cat,特征融合,这一部分像残差网络的操作,可以增强有效信息的提取,缓解梯度消失或爆炸问题,同时,渐层拥有大感受野,深层拥有更强的特征提取效果,结合在一起对于目标检测更加有效。

  1. classBackbone(nn.Module):def__init__(self, transition_channels, block_channels, n, phi, pretrained=False):super().__init__()#-----------------------------------------------## 输入图片是640, 640, 3#-----------------------------------------------#
  2. ids ={'l':[-1,-3,-5,-6],'x':[-1,-3,-5,-7,-8],}[phi]# 640, 640, 3 => 640, 640, 32 => 320, 320, 64
  3. self.stem = nn.Sequential(
  4. Conv(3, transition_channels,3,1),
  5. Conv(transition_channels, transition_channels *2,3,2),
  6. Conv(transition_channels *2, transition_channels *2,3,1),)# 320, 320, 64 => 160, 160, 128 => 160, 160, 256
  7. self.dark2 = nn.Sequential(
  8. Conv(transition_channels *2, transition_channels *4,3,2),
  9. Multi_Concat_Block(transition_channels *4, block_channels *2, transition_channels *8, n=n, ids=ids),)# 160, 160, 256 => 80, 80, 256 => 80, 80, 512
  10. self.dark3 = nn.Sequential(
  11. Transition_Block(transition_channels *8, transition_channels *4),
  12. Multi_Concat_Block(transition_channels *8, block_channels *4, transition_channels *16, n=n, ids=ids),)# 80, 80, 512 => 40, 40, 512 => 40, 40, 1024
  13. self.dark4 = nn.Sequential(
  14. Transition_Block(transition_channels *16, transition_channels *8),
  15. Multi_Concat_Block(transition_channels *16, block_channels *8, transition_channels *32, n=n, ids=ids),)# 40, 40, 1024 => 20, 20, 1024 => 20, 20, 1024
  16. self.dark5 = nn.Sequential(
  17. Transition_Block(transition_channels *32, transition_channels *16),
  18. Multi_Concat_Block(transition_channels *32, block_channels *8, transition_channels *32, n=n, ids=ids),defforward(self, x):
  19. x = self.stem(x)
  20. x = self.dark2(x)#-----------------------------------------------## dark3的输出为80, 80, 512,是一个有效特征层#-----------------------------------------------#
  21. x = self.dark3(x)
  22. feat1 = x
  23. #-----------------------------------------------## dark4的输出为40, 40, 1024,是一个有效特征层#-----------------------------------------------#
  24. x = self.dark4(x)
  25. feat2 = x
  26. #-----------------------------------------------## dark5的输出为20, 20, 1024,是一个有效特征层#-----------------------------------------------#
  27. x = self.dark5(x)
  28. feat3 = x
  29. return feat1, feat2, feat3
  30. )

SPPCSPC

在Backbone特征提取后经过SPPCSPC进入特征强化网络,具体为将Backbone提出处理的数据经过3次Conv2D_BN_SiLU后,下一步单独进行3次池化:
在这里插入图片描述
再将池化结构torch.cat合并到一起,再与Backbone输出图像仅通过一次Conv2D_BN_SiLU的结构torch.cat在一起,完成此步操作。

在这里插入图片描述

  1. classSPPCSPC(nn.Module):# CSP https://github.com/WongKinYiu/CrossStagePartialNetworksdef__init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5,9,13)):super(SPPCSPC, self).__init__()
  2. c_ =int(2* c2 * e)# hidden channels
  3. self.cv1 = Conv(c1, c_,1,1)
  4. self.cv2 = Conv(c1, c_,1,1)
  5. self.cv3 = Conv(c_, c_,3,1)
  6. self.cv4 = Conv(c_, c_,1,1)
  7. self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x //2)for x in k])
  8. self.cv5 = Conv(4* c_, c_,1,1)
  9. self.cv6 = Conv(c_, c_,3,1)# 输出通道数为c2
  10. self.cv7 = Conv(2* c_, c2,1,1)defforward(self, x):
  11. x1 = self.cv4(self.cv3(self.cv1(x)))
  12. y1 = self.cv6(self.cv5(torch.cat([x1]+[m(x1)for m in self.m],1)))
  13. y2 = self.cv2(x)return self.cv7(torch.cat((y1, y2), dim=1))

Neck(特征强化结构)

从下图可看出Neck网络由FPN跟PAN构成,以知乎江大白的示意图为例,YOLOv4开始应用了该技术。
在这里插入图片描述

SPPCSPC输出后经过一次卷积一次上采样,再与backbone的返回的特征层进行融合,融合在进行同样的操作一次。完成FPN结构操作。PAN的下采样用的Transition_Block模块,如结构图所示,下采样后融合在卷积,再次下采用,再融合卷积。最后输出三个检测头。

在这里插入图片描述

  1. # backbone
  2. feat1, feat2, feat3 = self.backbone.forward(x)#------------------------加强特征提取网络------------------------# # 20, 20, 1024 => 20, 20, 512
  3. P5 = self.sppcspc(feat3)# 20, 20, 512 => 20, 20, 256
  4. P5_conv = self.conv_for_P5(P5)# 20, 20, 256 => 40, 40, 256
  5. P5_upsample = self.upsample(P5_conv)# 40, 40, 256 cat 40, 40, 256 => 40, 40, 512
  6. P4 = torch.cat([self.conv_for_feat2(feat2), P5_upsample],1)# 40, 40, 512 => 40, 40, 256
  7. P4 = self.conv3_for_upsample1(P4)# 40, 40, 256 => 40, 40, 128
  8. P4_conv = self.conv_for_P4(P4)# 40, 40, 128 => 80, 80, 128
  9. P4_upsample = self.upsample(P4_conv)# 80, 80, 128 cat 80, 80, 128 => 80, 80, 256
  10. P3 = torch.cat([self.conv_for_feat1(feat1), P4_upsample],1)# 80, 80, 256 => 80, 80, 128
  11. P3 = self.conv3_for_upsample2(P3)# 80, 80, 128 => 40, 40, 256
  12. P3_downsample = self.down_sample1(P3)# 40, 40, 256 cat 40, 40, 256 => 40, 40, 512
  13. P4 = torch.cat([P3_downsample, P4],1)# 40, 40, 512 => 40, 40, 256
  14. P4 = self.conv3_for_downsample1(P4)# 40, 40, 256 => 20, 20, 512
  15. P4_downsample = self.down_sample2(P4)# 20, 20, 512 cat 20, 20, 512 => 20, 20, 1024
  16. P5 = torch.cat([P4_downsample, P5],1)# 20, 20, 1024 => 20, 20, 512
  17. P5 = self.conv3_for_downsample2(P5)

Head(检测头)

特征加强网络得到的特征层分别经过RepConv处理后,再按YoLo格式【len(anchors_mask[2]) * (5 + num_classes)】转换。
RepVGG style在训练过程中可以通过多路分支提升性能,推理可以通过结构重新参数化实现推理速度的加快。
YOLOV的时候首次将Rep-PAN引入PAN模块,RepBlock替换了YOLOv5中使用的CSP-Block,同时对整体Neck中的算子进行了调整,目的是在硬件上达到高效推理的同时,保持较好的多尺度特征融合能力。
YOLOV6:
在这里插入图片描述
在这里插入图片描述
RepBlock结构如图所示:
在这里插入图片描述
在这里插入图片描述

  1. self.yolo_head_P3 = nn.Conv2d(transition_channels *8,len(anchors_mask[2])*(5+ num_classes),1)# 40, 40, 512 => 40, 40, 3 * 25 & 85
  2. self.yolo_head_P4 = nn.Conv2d(transition_channels *16,len(anchors_mask[1])*(5+ num_classes),1)# 20, 20, 512 => 20, 20, 3 * 25 & 85
  3. self.yolo_head_P5 = nn.Conv2d(transition_channels *32,len(anchors_mask[0])*(5+ num_classes),1)
  4. P3 = self.rep_conv_1(P3)
  5. P4 = self.rep_conv_2(P4)
  6. P5 = self.rep_conv_3(P5)#---------------------------------------------------## 第三个特征层# y3=(batch_size, 75, 80, 80)#---------------------------------------------------#
  7. out2 = self.yolo_head_P3(P3)#---------------------------------------------------## 第二个特征层# y2=(batch_size, 75, 40, 40)#---------------------------------------------------#
  8. out1 = self.yolo_head_P4(P4)#---------------------------------------------------## 第一个特征层# y1=(batch_size, 75, 20, 20)#---------------------------------------------------#
  9. out0 = self.yolo_head_P5(P5)return[out0, out1, out2]

再将输出结构返回,用以Loss计算,以及梯度下降,参数更新。

RepConv代码如下:

  1. classRepConv(nn.Module):# Represented convolution# https://arxiv.org/abs/2101.03697def__init__(self, c1, c2, k=3, s=1, p=None, g=1, act=SiLU(), deploy=False):super(RepConv, self).__init__()
  2. self.deploy = deploy
  3. self.groups = g
  4. self.in_channels = c1
  5. self.out_channels = c2
  6. assert k ==3assert autopad(k, p)==1
  7. padding_11 = autopad(k, p)- k //2
  8. self.act = nn.LeakyReLU(0.1, inplace=True)if act isTrueelse(act ifisinstance(act, nn.Module)else nn.Identity())if deploy:
  9. self.rbr_reparam = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=True)else:
  10. self.rbr_identity =(nn.BatchNorm2d(num_features=c1, eps=0.001, momentum=0.03)if c2 == c1 and s ==1elseNone)
  11. self.rbr_dense = nn.Sequential(
  12. nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False),
  13. nn.BatchNorm2d(num_features=c2, eps=0.001, momentum=0.03),)
  14. self.rbr_1x1 = nn.Sequential(
  15. nn.Conv2d( c1, c2,1, s, padding_11, groups=g, bias=False),
  16. nn.BatchNorm2d(num_features=c2, eps=0.001, momentum=0.03),)defforward(self, inputs):ifhasattr(self,"rbr_reparam"):return self.act(self.rbr_reparam(inputs))if self.rbr_identity isNone:
  17. id_out =0else:
  18. id_out = self.rbr_identity(inputs)return self.act(self.rbr_dense(inputs)+ self.rbr_1x1(inputs)+ id_out)defget_equivalent_kernel_bias(self):
  19. kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
  20. kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
  21. kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)return(
  22. kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1)+ kernelid,
  23. bias3x3 + bias1x1 + biasid,)def_pad_1x1_to_3x3_tensor(self, kernel1x1):if kernel1x1 isNone:return0else:return nn.functional.pad(kernel1x1,[1,1,1,1])def_fuse_bn_tensor(self, branch):if branch isNone:return0,0ifisinstance(branch, nn.Sequential):
  24. kernel = branch[0].weight
  25. running_mean = branch[1].running_mean
  26. running_var = branch[1].running_var
  27. gamma = branch[1].weight
  28. beta = branch[1].bias
  29. eps = branch[1].eps
  30. else:assertisinstance(branch, nn.BatchNorm2d)ifnothasattr(self,"id_tensor"):
  31. input_dim = self.in_channels // self.groups
  32. kernel_value = np.zeros((self.in_channels, input_dim,3,3), dtype=np.float32
  33. )for i inrange(self.in_channels):
  34. kernel_value[i, i % input_dim,1,1]=1
  35. self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
  36. kernel = self.id_tensor
  37. running_mean = branch.running_mean
  38. running_var = branch.running_var
  39. gamma = branch.weight
  40. beta = branch.bias
  41. eps = branch.eps
  42. std =(running_var + eps).sqrt()
  43. t =(gamma / std).reshape(-1,1,1,1)return kernel * t, beta - running_mean * gamma / std
  44. defrepvgg_convert(self):
  45. kernel, bias = self.get_equivalent_kernel_bias()return(
  46. kernel.detach().cpu().numpy(),
  47. bias.detach().cpu().numpy(),)deffuse_conv_bn(self, conv, bn):
  48. std =(bn.running_var + bn.eps).sqrt()
  49. bias = bn.bias - bn.running_mean * bn.weight / std
  50. t =(bn.weight / std).reshape(-1,1,1,1)
  51. weights = conv.weight * t
  52. bn = nn.Identity()
  53. conv = nn.Conv2d(in_channels = conv.in_channels,
  54. out_channels = conv.out_channels,
  55. kernel_size = conv.kernel_size,
  56. stride=conv.stride,
  57. padding = conv.padding,
  58. dilation = conv.dilation,
  59. groups = conv.groups,
  60. bias =True,
  61. padding_mode = conv.padding_mode)
  62. conv.weight = torch.nn.Parameter(weights)
  63. conv.bias = torch.nn.Parameter(bias)return conv
  64. deffuse_repvgg_block(self):if self.deploy:returnprint(f"RepConv.fuse_repvgg_block")
  65. self.rbr_dense = self.fuse_conv_bn(self.rbr_dense[0], self.rbr_dense[1])
  66. self.rbr_1x1 = self.fuse_conv_bn(self.rbr_1x1[0], self.rbr_1x1[1])
  67. rbr_1x1_bias = self.rbr_1x1.bias
  68. weight_1x1_expanded = torch.nn.functional.pad(self.rbr_1x1.weight,[1,1,1,1])# Fuse self.rbr_identityif(isinstance(self.rbr_identity, nn.BatchNorm2d)orisinstance(self.rbr_identity, nn.modules.batchnorm.SyncBatchNorm)):
  69. identity_conv_1x1 = nn.Conv2d(
  70. in_channels=self.in_channels,
  71. out_channels=self.out_channels,
  72. kernel_size=1,
  73. stride=1,
  74. padding=0,
  75. groups=self.groups,
  76. bias=False)
  77. identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.to(self.rbr_1x1.weight.data.device)
  78. identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.squeeze().squeeze()
  79. identity_conv_1x1.weight.data.fill_(0.0)
  80. identity_conv_1x1.weight.data.fill_diagonal_(1.0)
  81. identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.unsqueeze(2).unsqueeze(3)
  82. identity_conv_1x1 = self.fuse_conv_bn(identity_conv_1x1, self.rbr_identity)
  83. bias_identity_expanded = identity_conv_1x1.bias
  84. weight_identity_expanded = torch.nn.functional.pad(identity_conv_1x1.weight,[1,1,1,1])else:
  85. bias_identity_expanded = torch.nn.Parameter( torch.zeros_like(rbr_1x1_bias))
  86. weight_identity_expanded = torch.nn.Parameter( torch.zeros_like(weight_1x1_expanded))
  87. self.rbr_dense.weight = torch.nn.Parameter(self.rbr_dense.weight + weight_1x1_expanded + weight_identity_expanded)
  88. self.rbr_dense.bias = torch.nn.Parameter(self.rbr_dense.bias + rbr_1x1_bias + bias_identity_expanded)
  89. self.rbr_reparam = self.rbr_dense
  90. self.deploy =Trueif self.rbr_identity isnotNone:del self.rbr_identity
  91. self.rbr_identity =Noneif self.rbr_1x1 isnotNone:del self.rbr_1x1
  92. self.rbr_1x1 =Noneif self.rbr_dense isnotNone:del self.rbr_dense
  93. self.rbr_dense =None

全部代码:
yolo模型代码

  1. import numpy as np
  2. import torch
  3. import torch.nn as nn
  4. from nets.backbone import Backbone, Multi_Concat_Block, Conv, SiLU, Transition_Block, autopad
  5. classSPPCSPC(nn.Module):# CSP https://github.com/WongKinYiu/CrossStagePartialNetworksdef__init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5,9,13)):super(SPPCSPC, self).__init__()
  6. c_ =int(2* c2 * e)# hidden channels
  7. self.cv1 = Conv(c1, c_,1,1)
  8. self.cv2 = Conv(c1, c_,1,1)
  9. self.cv3 = Conv(c_, c_,3,1)
  10. self.cv4 = Conv(c_, c_,1,1)
  11. self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x //2)for x in k])
  12. self.cv5 = Conv(4* c_, c_,1,1)
  13. self.cv6 = Conv(c_, c_,3,1)# 输出通道数为c2
  14. self.cv7 = Conv(2* c_, c2,1,1)defforward(self, x):
  15. x1 = self.cv4(self.cv3(self.cv1(x)))
  16. y1 = self.cv6(self.cv5(torch.cat([x1]+[m(x1)for m in self.m],1)))
  17. y2 = self.cv2(x)return self.cv7(torch.cat((y1, y2), dim=1))classRepConv(nn.Module):# Represented convolution# https://arxiv.org/abs/2101.03697def__init__(self, c1, c2, k=3, s=1, p=None, g=1, act=SiLU(), deploy=False):super(RepConv, self).__init__()
  18. self.deploy = deploy
  19. self.groups = g
  20. self.in_channels = c1
  21. self.out_channels = c2
  22. assert k ==3assert autopad(k, p)==1
  23. padding_11 = autopad(k, p)- k //2
  24. self.act = nn.LeakyReLU(0.1, inplace=True)if act isTrueelse(act ifisinstance(act, nn.Module)else nn.Identity())if deploy:
  25. self.rbr_reparam = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=True)else:
  26. self.rbr_identity =(nn.BatchNorm2d(num_features=c1, eps=0.001, momentum=0.03)if c2 == c1 and s ==1elseNone)
  27. self.rbr_dense = nn.Sequential(
  28. nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False),
  29. nn.BatchNorm2d(num_features=c2, eps=0.001, momentum=0.03),)
  30. self.rbr_1x1 = nn.Sequential(
  31. nn.Conv2d( c1, c2,1, s, padding_11, groups=g, bias=False),
  32. nn.BatchNorm2d(num_features=c2, eps=0.001, momentum=0.03),)defforward(self, inputs):ifhasattr(self,"rbr_reparam"):return self.act(self.rbr_reparam(inputs))if self.rbr_identity isNone:
  33. id_out =0else:
  34. id_out = self.rbr_identity(inputs)return self.act(self.rbr_dense(inputs)+ self.rbr_1x1(inputs)+ id_out)defget_equivalent_kernel_bias(self):
  35. kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
  36. kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
  37. kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)return(
  38. kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1)+ kernelid,
  39. bias3x3 + bias1x1 + biasid,)def_pad_1x1_to_3x3_tensor(self, kernel1x1):if kernel1x1 isNone:return0else:return nn.functional.pad(kernel1x1,[1,1,1,1])def_fuse_bn_tensor(self, branch):if branch isNone:return0,0ifisinstance(branch, nn.Sequential):
  40. kernel = branch[0].weight
  41. running_mean = branch[1].running_mean
  42. running_var = branch[1].running_var
  43. gamma = branch[1].weight
  44. beta = branch[1].bias
  45. eps = branch[1].eps
  46. else:assertisinstance(branch, nn.BatchNorm2d)ifnothasattr(self,"id_tensor"):
  47. input_dim = self.in_channels // self.groups
  48. kernel_value = np.zeros((self.in_channels, input_dim,3,3), dtype=np.float32
  49. )for i inrange(self.in_channels):
  50. kernel_value[i, i % input_dim,1,1]=1
  51. self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
  52. kernel = self.id_tensor
  53. running_mean = branch.running_mean
  54. running_var = branch.running_var
  55. gamma = branch.weight
  56. beta = branch.bias
  57. eps = branch.eps
  58. std =(running_var + eps).sqrt()
  59. t =(gamma / std).reshape(-1,1,1,1)return kernel * t, beta - running_mean * gamma / std
  60. defrepvgg_convert(self):
  61. kernel, bias = self.get_equivalent_kernel_bias()return(
  62. kernel.detach().cpu().numpy(),
  63. bias.detach().cpu().numpy(),)deffuse_conv_bn(self, conv, bn):
  64. std =(bn.running_var + bn.eps).sqrt()
  65. bias = bn.bias - bn.running_mean * bn.weight / std
  66. t =(bn.weight / std).reshape(-1,1,1,1)
  67. weights = conv.weight * t
  68. bn = nn.Identity()
  69. conv = nn.Conv2d(in_channels = conv.in_channels,
  70. out_channels = conv.out_channels,
  71. kernel_size = conv.kernel_size,
  72. stride=conv.stride,
  73. padding = conv.padding,
  74. dilation = conv.dilation,
  75. groups = conv.groups,
  76. bias =True,
  77. padding_mode = conv.padding_mode)
  78. conv.weight = torch.nn.Parameter(weights)
  79. conv.bias = torch.nn.Parameter(bias)return conv
  80. deffuse_repvgg_block(self):if self.deploy:returnprint(f"RepConv.fuse_repvgg_block")
  81. self.rbr_dense = self.fuse_conv_bn(self.rbr_dense[0], self.rbr_dense[1])
  82. self.rbr_1x1 = self.fuse_conv_bn(self.rbr_1x1[0], self.rbr_1x1[1])
  83. rbr_1x1_bias = self.rbr_1x1.bias
  84. weight_1x1_expanded = torch.nn.functional.pad(self.rbr_1x1.weight,[1,1,1,1])# Fuse self.rbr_identityif(isinstance(self.rbr_identity, nn.BatchNorm2d)orisinstance(self.rbr_identity, nn.modules.batchnorm.SyncBatchNorm)):
  85. identity_conv_1x1 = nn.Conv2d(
  86. in_channels=self.in_channels,
  87. out_channels=self.out_channels,
  88. kernel_size=1,
  89. stride=1,
  90. padding=0,
  91. groups=self.groups,
  92. bias=False)
  93. identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.to(self.rbr_1x1.weight.data.device)
  94. identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.squeeze().squeeze()
  95. identity_conv_1x1.weight.data.fill_(0.0)
  96. identity_conv_1x1.weight.data.fill_diagonal_(1.0)
  97. identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.unsqueeze(2).unsqueeze(3)
  98. identity_conv_1x1 = self.fuse_conv_bn(identity_conv_1x1, self.rbr_identity)
  99. bias_identity_expanded = identity_conv_1x1.bias
  100. weight_identity_expanded = torch.nn.functional.pad(identity_conv_1x1.weight,[1,1,1,1])else:
  101. bias_identity_expanded = torch.nn.Parameter( torch.zeros_like(rbr_1x1_bias))
  102. weight_identity_expanded = torch.nn.Parameter( torch.zeros_like(weight_1x1_expanded))
  103. self.rbr_dense.weight = torch.nn.Parameter(self.rbr_dense.weight + weight_1x1_expanded + weight_identity_expanded)
  104. self.rbr_dense.bias = torch.nn.Parameter(self.rbr_dense.bias + rbr_1x1_bias + bias_identity_expanded)
  105. self.rbr_reparam = self.rbr_dense
  106. self.deploy =Trueif self.rbr_identity isnotNone:del self.rbr_identity
  107. self.rbr_identity =Noneif self.rbr_1x1 isnotNone:del self.rbr_1x1
  108. self.rbr_1x1 =Noneif self.rbr_dense isnotNone:del self.rbr_dense
  109. self.rbr_dense =Nonedeffuse_conv_and_bn(conv, bn):
  110. fusedconv = nn.Conv2d(conv.in_channels,
  111. conv.out_channels,
  112. kernel_size=conv.kernel_size,
  113. stride=conv.stride,
  114. padding=conv.padding,
  115. groups=conv.groups,
  116. bias=True).requires_grad_(False).to(conv.weight.device)
  117. w_conv = conv.weight.clone().view(conv.out_channels,-1)
  118. w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))# fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))
  119. fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape).detach())
  120. b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device)if conv.bias isNoneelse conv.bias
  121. b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))# fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
  122. fusedconv.bias.copy_((torch.mm(w_bn, b_conv.reshape(-1,1)).reshape(-1)+ b_bn).detach())return fusedconv
  123. #---------------------------------------------------## yolo_body#---------------------------------------------------#classYoloBody(nn.Module):def__init__(self, anchors_mask, num_classes, phi, pretrained=False):super(YoloBody, self).__init__()#-----------------------------------------------## 定义了不同yolov7版本的参数#-----------------------------------------------#
  124. transition_channels ={'l':32,'x':40}[phi]
  125. block_channels =32
  126. panet_channels ={'l':32,'x':64}[phi]
  127. e ={'l':2,'x':1}[phi]
  128. n ={'l':4,'x':6}[phi]
  129. ids ={'l':[-1,-2,-3,-4,-5,-6],'x':[-1,-3,-5,-7,-8]}[phi]
  130. conv ={'l': RepConv,'x': Conv}[phi]#-----------------------------------------------## 输入图片是640, 640, 3#-----------------------------------------------##---------------------------------------------------# # 生成主干模型# 获得三个有效特征层,他们的shape分别是:# 80, 80, 512# 40, 40, 1024# 20, 20, 1024#---------------------------------------------------#
  131. self.backbone = Backbone(transition_channels, block_channels, n, phi, pretrained=pretrained)#------------------------加强特征提取网络------------------------#
  132. self.upsample = nn.Upsample(scale_factor=2, mode="nearest")# 20, 20, 1024 => 20, 20, 512
  133. self.sppcspc = SPPCSPC(transition_channels *32, transition_channels *16)# 20, 20, 512 => 20, 20, 256 => 40, 40, 256
  134. self.conv_for_P5 = Conv(transition_channels *16, transition_channels *8)# 40, 40, 1024 => 40, 40, 256
  135. self.conv_for_feat2 = Conv(transition_channels *32, transition_channels *8)# 40, 40, 512 => 40, 40, 256
  136. self.conv3_for_upsample1 = Multi_Concat_Block(transition_channels *16, panet_channels *4, transition_channels *8, e=e, n=n, ids=ids)# 40, 40, 256 => 40, 40, 128 => 80, 80, 128
  137. self.conv_for_P4 = Conv(transition_channels *8, transition_channels *4)# 80, 80, 512 => 80, 80, 128
  138. self.conv_for_feat1 = Conv(transition_channels *16, transition_channels *4)# 80, 80, 256 => 80, 80, 128
  139. self.conv3_for_upsample2 = Multi_Concat_Block(transition_channels *8, panet_channels *2, transition_channels *4, e=e, n=n, ids=ids)# 80, 80, 128 => 40, 40, 256
  140. self.down_sample1 = Transition_Block(transition_channels *4, transition_channels *4)# 40, 40, 512 => 40, 40, 256
  141. self.conv3_for_downsample1 = Multi_Concat_Block(transition_channels *16, panet_channels *4, transition_channels *8, e=e, n=n, ids=ids)# 40, 40, 256 => 20, 20, 512
  142. self.down_sample2 = Transition_Block(transition_channels *8, transition_channels *8)# 20, 20, 1024 => 20, 20, 512
  143. self.conv3_for_downsample2 = Multi_Concat_Block(transition_channels *32, panet_channels *8, transition_channels *16, e=e, n=n, ids=ids)#------------------------加强特征提取网络------------------------# # 80, 80, 128 => 80, 80, 256
  144. self.rep_conv_1 = conv(transition_channels *4, transition_channels *8,3,1)# 40, 40, 256 => 40, 40, 512
  145. self.rep_conv_2 = conv(transition_channels *8, transition_channels *16,3,1)# 20, 20, 512 => 20, 20, 1024
  146. self.rep_conv_3 = conv(transition_channels *16, transition_channels *32,3,1)# 4 + 1 + num_classes# 80, 80, 256 => 80, 80, 3 * 25 (4 + 1 + 20) & 85 (4 + 1 + 80)
  147. self.yolo_head_P3 = nn.Conv2d(transition_channels *8,len(anchors_mask[2])*(5+ num_classes),1)# 40, 40, 512 => 40, 40, 3 * 25 & 85
  148. self.yolo_head_P4 = nn.Conv2d(transition_channels *16,len(anchors_mask[1])*(5+ num_classes),1)# 20, 20, 512 => 20, 20, 3 * 25 & 85
  149. self.yolo_head_P5 = nn.Conv2d(transition_channels *32,len(anchors_mask[0])*(5+ num_classes),1)
  150. self.reviseleakyrulu_list =[]deffuse(self):print('Fusing layers... ')for m in self.modules():ifisinstance(m, RepConv):
  151. m.fuse_repvgg_block()eliftype(m)is Conv andhasattr(m,'bn'):
  152. m.conv = fuse_conv_and_bn(m.conv, m.bn)delattr(m,'bn')
  153. m.forward = m.fuseforward
  154. return self
  155. defforward(self, x):# backbone
  156. feat1, feat2, feat3 = self.backbone.forward(x)#------------------------加强特征提取网络------------------------# # 20, 20, 1024 => 20, 20, 512
  157. P5 = self.sppcspc(feat3)# 20, 20, 512 => 20, 20, 256
  158. P5_conv = self.conv_for_P5(P5)# 20, 20, 256 => 40, 40, 256
  159. P5_upsample = self.upsample(P5_conv)# 40, 40, 256 cat 40, 40, 256 => 40, 40, 512
  160. P4 = torch.cat([self.conv_for_feat2(feat2), P5_upsample],1)# 40, 40, 512 => 40, 40, 256
  161. P4 = self.conv3_for_upsample1(P4)# 40, 40, 256 => 40, 40, 128
  162. P4_conv = self.conv_for_P4(P4)# 40, 40, 128 => 80, 80, 128
  163. P4_upsample = self.upsample(P4_conv)# 80, 80, 128 cat 80, 80, 128 => 80, 80, 256
  164. P3 = torch.cat([self.conv_for_feat1(feat1), P4_upsample],1)# 80, 80, 256 => 80, 80, 128
  165. P3 = self.conv3_for_upsample2(P3)# 80, 80, 128 => 40, 40, 256
  166. P3_downsample = self.down_sample1(P3)# 40, 40, 256 cat 40, 40, 256 => 40, 40, 512
  167. P4 = torch.cat([P3_downsample, P4],1)# 40, 40, 512 => 40, 40, 256
  168. P4 = self.conv3_for_downsample1(P4)# 40, 40, 256 => 20, 20, 512
  169. P4_downsample = self.down_sample2(P4)# 20, 20, 512 cat 20, 20, 512 => 20, 20, 1024
  170. P5 = torch.cat([P4_downsample, P5],1)# 20, 20, 1024 => 20, 20, 512
  171. P5 = self.conv3_for_downsample2(P5)#------------------------加强特征提取网络------------------------# # P3 80, 80, 128 # P4 40, 40, 256# P5 20, 20, 512
  172. P3 = self.rep_conv_1(P3)
  173. P4 = self.rep_conv_2(P4)
  174. P5 = self.rep_conv_3(P5)#---------------------------------------------------## 第三个特征层# y3=(batch_size, 75, 80, 80)#---------------------------------------------------#
  175. out2 = self.yolo_head_P3(P3)#---------------------------------------------------## 第二个特征层# y2=(batch_size, 75, 40, 40)#---------------------------------------------------#
  176. out1 = self.yolo_head_P4(P4)#---------------------------------------------------## 第一个特征层# y1=(batch_size, 75, 20, 20)#---------------------------------------------------#
  177. out0 = self.yolo_head_P5(P5)return[out0, out1, out2]if __name__ =="__main__":
  178. anchors_mask=[[6,7,8],[3,4,5],[0,1,2]]
  179. net = YoloBody(anchors_mask,20,'l')print(net)
  180. x = torch.randn(2,3,640,640)
  181. out0, out1, out2 = net(x)for i in out0:print(i.shape)

backbone代码:

  1. import torch
  2. import torch.nn as nn
  3. defautopad(k, p=None):if p isNone:
  4. p = k //2ifisinstance(k,int)else[x //2for x in k]return p
  5. classSiLU(nn.Module):@staticmethoddefforward(x):return x * torch.sigmoid(x)classConv(nn.Module):def__init__(self, c1, c2, k=1, s=1, p=None, g=1, act=SiLU()):# ch_in, ch_out, kernel, stride, padding, groupssuper(Conv, self).__init__()
  6. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
  7. self.bn = nn.BatchNorm2d(c2, eps=0.001, momentum=0.03)
  8. self.act = nn.LeakyReLU(0.1, inplace=True)if act isTrueelse(act ifisinstance(act, nn.Module)else nn.Identity())defforward(self, x):return self.act(self.bn(self.conv(x)))deffuseforward(self, x):return self.act(self.conv(x))classMulti_Concat_Block(nn.Module):# e = {'l': 2, 'x': 1}[phi]# n = {'l': 4, 'x': 6}[phi]# conv = {'l': RepConv, 'x': Conv}[phi]# ids = {'l': [-1, -2, -3, -4, -5, -6], 'x': [-1, -3, -5, -7, -8]}[phi]def__init__(self, c1, c2, c3, n=4, e=1, ids=[0]):super(Multi_Concat_Block, self).__init__()
  9. c_ =int(c2 * e)
  10. self.ids = ids
  11. self.cv1 = Conv(c1, c_,1,1)
  12. self.cv2 = Conv(c1, c_,1,1)
  13. self.cv3 = nn.ModuleList([Conv(c_ if i ==0else c2, c2,3,1)for i inrange(n)])
  14. self.cv4 = Conv(c_ *2+ c2 *(len(ids)-2), c3,1,1)defforward(self, x):
  15. x_1 = self.cv1(x)
  16. x_2 = self.cv2(x)
  17. x_all =[x_1, x_2]# [-1, -3, -5, -6] => [5, 3, 1, 0]for i inrange(len(self.cv3)):
  18. x_2 = self.cv3[i](x_2)
  19. x_all.append(x_2)
  20. out = self.cv4(torch.cat([x_all[id]foridin self.ids],1))return out
  21. classMP(nn.Module):def__init__(self, k=2):super(MP, self).__init__()
  22. self.m = nn.MaxPool2d(kernel_size=k, stride=k)defforward(self, x):return self.m(x)classTransition_Block(nn.Module):def__init__(self, c1, c2):super(Transition_Block, self).__init__()
  23. self.cv1 = Conv(c1, c2,1,1)
  24. self.cv2 = Conv(c1, c2,1,1)
  25. self.cv3 = Conv(c2, c2,3,2)
  26. self.mp = MP()defforward(self, x):# 160, 160, 256 => 80, 80, 256 => 80, 80, 128
  27. x_1 = self.mp(x)
  28. x_1 = self.cv1(x_1)# 160, 160, 256 => 160, 160, 128 => 80, 80, 128
  29. x_2 = self.cv2(x)
  30. x_2 = self.cv3(x_2)# 80, 80, 128 cat 80, 80, 128 => 80, 80, 256return torch.cat([x_2, x_1],1)classBackbone(nn.Module):def__init__(self, transition_channels, block_channels, n, phi, pretrained=False):super().__init__()#-----------------------------------------------## 输入图片是640, 640, 3#-----------------------------------------------#
  31. ids ={'l':[-1,-3,-5,-6],'x':[-1,-3,-5,-7,-8],}[phi]# 640, 640, 3 => 640, 640, 32 => 320, 320, 64
  32. self.stem = nn.Sequential(
  33. Conv(3, transition_channels,3,1),
  34. Conv(transition_channels, transition_channels *2,3,2),
  35. Conv(transition_channels *2, transition_channels *2,3,1),)# 320, 320, 64 => 160, 160, 128 => 160, 160, 256
  36. self.dark2 = nn.Sequential(
  37. Conv(transition_channels *2, transition_channels *4,3,2),
  38. Multi_Concat_Block(transition_channels *4, block_channels *2, transition_channels *8, n=n, ids=ids),)# 160, 160, 256 => 80, 80, 256 => 80, 80, 512
  39. self.dark3 = nn.Sequential(
  40. Transition_Block(transition_channels *8, transition_channels *4),
  41. Multi_Concat_Block(transition_channels *8, block_channels *4, transition_channels *16, n=n, ids=ids),)# 80, 80, 512 => 40, 40, 512 => 40, 40, 1024
  42. self.dark4 = nn.Sequential(
  43. Transition_Block(transition_channels *16, transition_channels *8),
  44. Multi_Concat_Block(transition_channels *16, block_channels *8, transition_channels *32, n=n, ids=ids),)# 40, 40, 1024 => 20, 20, 1024 => 20, 20, 1024
  45. self.dark5 = nn.Sequential(
  46. Transition_Block(transition_channels *32, transition_channels *16),
  47. Multi_Concat_Block(transition_channels *32, block_channels *8, transition_channels *32, n=n, ids=ids),)defforward(self, x):
  48. x = self.stem(x)
  49. x = self.dark2(x)#-----------------------------------------------## dark3的输出为80, 80, 512,是一个有效特征层#-----------------------------------------------#
  50. x = self.dark3(x)
  51. feat1 = x
  52. #-----------------------------------------------## dark4的输出为40, 40, 1024,是一个有效特征层#-----------------------------------------------#
  53. x = self.dark4(x)
  54. feat2 = x
  55. #-----------------------------------------------## dark5的输出为20, 20, 1024,是一个有效特征层#-----------------------------------------------#
  56. x = self.dark5(x)
  57. feat3 = x
  58. return feat1, feat2, feat3

本文转载自: https://blog.csdn.net/weixin_55224780/article/details/129959418
版权归原作者 Deen.. 所有, 如有侵权,请联系我们删除。

“Yolov7学习笔记(一)模型结构”的评论:

还没有评论