0


【YOLOv5/v7改进系列】引入中心化特征金字塔的EVC模块

一、导言

现有的特征金字塔方法过于关注层间特征交互而忽视了层内特征的调控。尽管有些方法尝试通过注意力机制或视觉变换器来学习紧凑的层内特征表示,但这些方法往往忽略了对密集预测任务非常重要的被忽视的角落区域。

为了解决这个问题,作者提出了CFP,它首先在最深层的特征图上应用显式视觉中心方案,然后利用这些信息去调整较浅层的特征图。这种方法使得CFP不仅能够捕捉全局的长距离依赖,还能高效地获得全面且有判别性的特征表示。

CFP通过其显式视觉中心方案和全局集中化调节机制,在保持较低计算复杂度的同时提高了特征金字塔的质量,从而在目标检测任务中实现了更好的性能。

本文主要利用EVC模块进行改进工作。

EVC 的主要目的是捕捉全局的长距离依赖关系,并保留输入图像中的局部关键区域信息。下面是对 EVC 模块的详细介绍:

EVC 模块组成

EVC 模块由两个并行连接的块组成:

  1. 轻量级 MLP:用于捕获全局的长距离依赖关系(即全局信息)。
  2. 可学习的视觉中心机制:用于保留输入图像中的局部关键区域信息(即局部信息)。
轻量级 MLP

轻量级 MLP 是一个多层感知机,用于捕捉全局信息。相较于基于多头注意力机制的标准变换器编码器,轻量级 MLP 不仅结构简单,而且体积更小、计算效率更高。它取代了标准变换器编码器中的多头自注意力模块。

可学习的视觉中心机制

可学习的视觉中心机制是专门设计用来保留图像局部角落区域信息的。这部分机制与轻量级 MLP 并行运行,共同捕捉全局和局部特征。

输出融合

EVC 模块的输出是这两个块的结果在通道维度上的拼接。即轻量级 MLP 和可学习视觉中心机制的输出特征图沿通道方向进行拼接。

具体实现过程
  1. 输入特征图:输入到 EVC 的特征图是特征金字塔中最顶层的特征图X4​。
  2. 特征平滑:在输入特征图 X4​ 和 EVC 之间,会有一个 Stem 块用于特征平滑。Stem 块由一个 7x7 的卷积层组成,输出通道大小为 256,后面跟着批量归一化层和激活函数层。
  3. 轻量级 MLP:用于捕获全局信息。
  4. 可学习视觉中心机制:用于保留局部关键区域信息。
  5. 特征融合:轻量级 MLP 和可学习视觉中心机制的输出通过通道拼接的方式组合起来作为 EVC 的输出。
EVC 的作用

EVC 模块通过结合全局和局部特征信息,能够为后续的全局集中化调节 (GCR) 提供丰富的视觉中心信息。这种信息有助于浅层特征的调节,使得整个特征金字塔不仅能捕捉全局的长距离依赖关系,还能有效地获得全面且具有判别力的特征表示。

二、准备工作

首先在YOLOv5/v7的models文件夹下新建文件evc.py,导入如下代码

  1. from models.common import *
  2. from functools import partial
  3. from timm.models.layers import DropPath, trunc_normal_
  4. # LVC
  5. class Encoding(nn.Module):
  6. def __init__(self, in_channels, num_codes):
  7. super(Encoding, self).__init__()
  8. # init codewords and smoothing factor
  9. self.in_channels, self.num_codes = in_channels, num_codes
  10. num_codes = 64
  11. std = 1. / ((num_codes * in_channels) ** 0.5)
  12. # [num_codes, channels]
  13. self.codewords = nn.Parameter(
  14. torch.empty(num_codes, in_channels, dtype=torch.float).uniform_(-std, std), requires_grad=True)
  15. # [num_codes]
  16. self.scale = nn.Parameter(torch.empty(num_codes, dtype=torch.float).uniform_(-1, 0), requires_grad=True)
  17. @staticmethod
  18. def scaled_l2(x, codewords, scale):
  19. num_codes, in_channels = codewords.size()
  20. b = x.size(0)
  21. expanded_x = x.unsqueeze(2).expand((b, x.size(1), num_codes, in_channels))
  22. reshaped_codewords = codewords.view((1, 1, num_codes, in_channels))
  23. reshaped_scale = scale.view((1, 1, num_codes)) # N, num_codes
  24. scaled_l2_norm = reshaped_scale * (expanded_x - reshaped_codewords).pow(2).sum(dim=3)
  25. return scaled_l2_norm
  26. @staticmethod
  27. def aggregate(assignment_weights, x, codewords):
  28. num_codes, in_channels = codewords.size()
  29. reshaped_codewords = codewords.view((1, 1, num_codes, in_channels))
  30. b = x.size(0)
  31. expanded_x = x.unsqueeze(2).expand((b, x.size(1), num_codes, in_channels))
  32. assignment_weights = assignment_weights.unsqueeze(3) # b, N, num_codes,
  33. encoded_feat = (assignment_weights * (expanded_x - reshaped_codewords)).sum(1)
  34. return encoded_feat
  35. def forward(self, x):
  36. assert x.dim() == 4 and x.size(1) == self.in_channels
  37. b, in_channels, w, h = x.size()
  38. # [batch_size, height x width, channels]
  39. x = x.view(b, self.in_channels, -1).transpose(1, 2).contiguous()
  40. # assignment_weights: [batch_size, channels, num_codes]
  41. assignment_weights = torch.softmax(self.scaled_l2(x, self.codewords, self.scale), dim=2)
  42. # aggregate
  43. encoded_feat = self.aggregate(assignment_weights, x, self.codewords)
  44. return encoded_feat
  45. class Mlp(nn.Module):
  46. """
  47. Implementation of MLP with 1*1 convolutions. Input: tensor with shape [B, C, H, W]
  48. """
  49. def __init__(self, in_features, hidden_features=None,
  50. out_features=None, act_layer=nn.GELU, drop=0.):
  51. super().__init__()
  52. out_features = out_features or in_features
  53. hidden_features = hidden_features or in_features
  54. self.fc1 = nn.Conv2d(in_features, hidden_features, 1)
  55. self.act = act_layer()
  56. self.fc2 = nn.Conv2d(hidden_features, out_features, 1)
  57. self.drop = nn.Dropout(drop)
  58. self.apply(self._init_weights)
  59. def _init_weights(self, m):
  60. if isinstance(m, nn.Conv2d):
  61. trunc_normal_(m.weight, std=.02)
  62. if m.bias is not None:
  63. nn.init.constant_(m.bias, 0)
  64. def forward(self, x):
  65. x = self.fc1(x)
  66. x = self.act(x)
  67. x = self.drop(x)
  68. x = self.fc2(x)
  69. x = self.drop(x)
  70. return x
  71. # 1*1 3*3 1*1
  72. class ConvBlock(nn.Module):
  73. def __init__(self, in_channels, out_channels, stride=1, res_conv=False, act_layer=nn.ReLU, groups=1,
  74. norm_layer=partial(nn.BatchNorm2d, eps=1e-6)):
  75. super(ConvBlock, self).__init__()
  76. self.in_channels = in_channels
  77. expansion = 4
  78. c = out_channels // expansion
  79. self.conv1 = Conv(in_channels, c, act=nn.ReLU())
  80. self.conv2 = Conv(c, c, k=3, s=stride, g=groups, act=nn.ReLU())
  81. self.conv3 = Conv(c, out_channels, 1, act=False)
  82. self.act3 = act_layer(inplace=True)
  83. if res_conv:
  84. self.residual_conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False)
  85. self.residual_bn = norm_layer(out_channels)
  86. self.res_conv = res_conv
  87. def zero_init_last_bn(self):
  88. nn.init.zeros_(self.bn3.weight)
  89. def forward(self, x, return_x_2=True):
  90. residual = x
  91. x = self.conv1(x)
  92. x2 = self.conv2(x) # if x_t_r is None else self.conv2(x + x_t_r)
  93. x = self.conv3(x2)
  94. if self.res_conv:
  95. residual = self.residual_conv(residual)
  96. residual = self.residual_bn(residual)
  97. x += residual
  98. x = self.act3(x)
  99. if return_x_2:
  100. return x, x2
  101. else:
  102. return x
  103. class Mean(nn.Module):
  104. def __init__(self, dim, keep_dim=False):
  105. super(Mean, self).__init__()
  106. self.dim = dim
  107. self.keep_dim = keep_dim
  108. def forward(self, input):
  109. return input.mean(self.dim, self.keep_dim)
  110. class LVCBlock(nn.Module):
  111. def __init__(self, in_channels, out_channels, num_codes, channel_ratio=0.25, base_channel=64):
  112. super(LVCBlock, self).__init__()
  113. self.out_channels = out_channels
  114. self.num_codes = num_codes
  115. num_codes = 64
  116. self.conv_1 = ConvBlock(in_channels=in_channels, out_channels=in_channels, res_conv=True, stride=1)
  117. self.LVC = nn.Sequential(
  118. Conv(in_channels, in_channels, 1, act=nn.ReLU()),
  119. Encoding(in_channels=in_channels, num_codes=num_codes),
  120. nn.BatchNorm1d(num_codes),
  121. nn.ReLU(inplace=True),
  122. Mean(dim=1))
  123. self.fc = nn.Sequential(nn.Linear(in_channels, in_channels), nn.Sigmoid())
  124. def forward(self, x):
  125. x = self.conv_1(x, return_x_2=False)
  126. en = self.LVC(x)
  127. gam = self.fc(en)
  128. b, in_channels, _, _ = x.size()
  129. y = gam.view(b, in_channels, 1, 1)
  130. x = F.relu_(x + x * y)
  131. return x
  132. class GroupNorm(nn.GroupNorm):
  133. """
  134. Group Normalization with 1 group.
  135. Input: tensor in shape [B, C, H, W]
  136. """
  137. def __init__(self, num_channels, **kwargs):
  138. super().__init__(1, num_channels, **kwargs)
  139. class DWConv_LMLP(nn.Module):
  140. """Depthwise Conv + Conv"""
  141. def __init__(self, in_channels, out_channels, ksize, stride=1, act="silu"):
  142. super().__init__()
  143. self.dconv = Conv(
  144. in_channels,
  145. in_channels,
  146. k=ksize,
  147. s=stride,
  148. g=in_channels,
  149. )
  150. self.pconv = Conv(
  151. in_channels, out_channels, k=1, s=1, g=1
  152. )
  153. def forward(self, x):
  154. x = self.dconv(x)
  155. return self.pconv(x)
  156. # LightMLPBlock
  157. class LightMLPBlock(nn.Module):
  158. def __init__(self, in_channels, out_channels, ksize=1, stride=1, act="silu",
  159. mlp_ratio=4., drop=0., act_layer=nn.GELU,
  160. use_layer_scale=True, layer_scale_init_value=1e-5, drop_path=0.,
  161. norm_layer=GroupNorm): # act_layer=nn.GELU,
  162. super().__init__()
  163. self.dw = DWConv_LMLP(in_channels, out_channels, ksize=1, stride=1, act="silu")
  164. self.linear = nn.Linear(out_channels, out_channels) # learnable position embedding
  165. self.out_channels = out_channels
  166. self.norm1 = norm_layer(in_channels)
  167. self.norm2 = norm_layer(in_channels)
  168. mlp_hidden_dim = int(in_channels * mlp_ratio)
  169. self.mlp = Mlp(in_features=in_channels, hidden_features=mlp_hidden_dim, act_layer=nn.GELU,
  170. drop=drop)
  171. self.drop_path = DropPath(drop_path) if drop_path > 0. \
  172. else nn.Identity()
  173. self.use_layer_scale = use_layer_scale
  174. if use_layer_scale:
  175. self.layer_scale_1 = nn.Parameter(
  176. layer_scale_init_value * torch.ones((out_channels)), requires_grad=True)
  177. self.layer_scale_2 = nn.Parameter(
  178. layer_scale_init_value * torch.ones((out_channels)), requires_grad=True)
  179. def forward(self, x):
  180. if self.use_layer_scale:
  181. x = x + self.drop_path(self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.dw(self.norm1(x)))
  182. x = x + self.drop_path(self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) * self.mlp(self.norm2(x)))
  183. else:
  184. x = x + self.drop_path(self.dw(self.norm1(x)))
  185. x = x + self.drop_path(self.mlp(self.norm2(x)))
  186. return x
  187. # EVCBlock
  188. class EVCBlock(nn.Module):
  189. def __init__(self, in_channels, out_channels, channel_ratio=4, base_channel=16):
  190. super().__init__()
  191. expansion = 2
  192. ch = out_channels * expansion
  193. self.conv1 = Conv(in_channels, in_channels, k=7, act=nn.ReLU())
  194. self.maxpool = nn.MaxPool2d(kernel_size=3, stride=1, padding=1) # 1 / 4 [56, 56]
  195. # LVC
  196. self.lvc = LVCBlock(in_channels=in_channels, out_channels=out_channels, num_codes=64) # c1值暂时未定
  197. # LightMLPBlock
  198. self.l_MLP = LightMLPBlock(in_channels, out_channels, ksize=1, stride=1, act="silu", act_layer=nn.GELU,
  199. mlp_ratio=4., drop=0.,
  200. use_layer_scale=True, layer_scale_init_value=1e-5, drop_path=0.,
  201. norm_layer=GroupNorm)
  202. self.cnv1 = nn.Conv2d(ch, out_channels, kernel_size=1, stride=1, padding=0)
  203. def forward(self, x):
  204. x1 = self.maxpool((self.conv1(x)))
  205. # LVCBlock
  206. x_lvc = self.lvc(x1)
  207. # LightMLPBlock
  208. x_lmlp = self.l_MLP(x1)
  209. # concat
  210. x = torch.cat((x_lvc, x_lmlp), dim=1)
  211. x = self.cnv1(x)
  212. return x

其次在在YOLOv5/v7项目文件下的models/yolo.py中在文件首部添加代码

  1. from models.evc import EVCBlock

并搜索def parse_model(d, ch)

定位到如下行添加以下代码

  1. elif m is EVCBlock:
  2. c2 = ch[f]
  3. args = [c2, c2]

三、YOLOv7-tiny改进工作

完成二后,在YOLOv7项目文件下的models文件夹下创建新的文件yolov7-tiny-evc.yaml,导入如下代码。

  1. # parameters
  2. nc: 80 # number of classes
  3. depth_multiple: 1.0 # model depth multiple
  4. width_multiple: 1.0 # layer channel multiple
  5. # anchors
  6. anchors:
  7. - [10,13, 16,30, 33,23] # P3/8
  8. - [30,61, 62,45, 59,119] # P4/16
  9. - [116,90, 156,198, 373,326] # P5/32
  10. # yolov7-tiny backbone
  11. backbone:
  12. # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True
  13. [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 0-P1/2
  14. [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 1-P2/4
  15. [-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  16. [-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  17. [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  18. [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  19. [[-1, -2, -3, -4], 1, Concat, [1]],
  20. [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7
  21. [-1, 1, MP, []], # 8-P3/8
  22. [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  23. [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  24. [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  25. [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  26. [[-1, -2, -3, -4], 1, Concat, [1]],
  27. [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 14
  28. [-1, 1, MP, []], # 15-P4/16
  29. [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  30. [-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  31. [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  32. [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  33. [[-1, -2, -3, -4], 1, Concat, [1]],
  34. [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 21
  35. [-1, 1, MP, []], # 22-P5/32
  36. [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  37. [-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  38. [-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  39. [-1, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  40. [[-1, -2, -3, -4], 1, Concat, [1]],
  41. [-1, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 28
  42. [-1, 1, EVCBlock, [512, 512]], # 29-a
  43. ]
  44. # yolov7-tiny head
  45. head:
  46. [[-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  47. [-2, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  48. [-1, 1, SP, [5]],
  49. [-2, 1, SP, [9]],
  50. [-3, 1, SP, [13]],
  51. [[-1, -2, -3, -4], 1, Concat, [1]],
  52. [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  53. [[-1, -7], 1, Concat, [1]],
  54. [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 38
  55. [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  56. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  57. [21, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4
  58. [[-1, -2], 1, Concat, [1]],
  59. [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  60. [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  61. [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  62. [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  63. [[-1, -2, -3, -4], 1, Concat, [1]],
  64. [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 48
  65. [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  66. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  67. [14, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3
  68. [[-1, -2], 1, Concat, [1]],
  69. [-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  70. [-2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  71. [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  72. [-1, 1, Conv, [32, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  73. [[-1, -2, -3, -4], 1, Concat, [1]],
  74. [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 58
  75. [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],
  76. [[-1, 48], 1, Concat, [1]],
  77. [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  78. [-2, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  79. [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  80. [-1, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  81. [[-1, -2, -3, -4], 1, Concat, [1]],
  82. [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 66
  83. [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],
  84. [[-1, 38], 1, Concat, [1]],
  85. [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  86. [-2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],
  87. [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  88. [-1, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  89. [[-1, -2, -3, -4], 1, Concat, [1]],
  90. [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 74
  91. [58, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  92. [66, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  93. [74, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]],
  94. [[75,76,77], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
  95. ]
  1. from n params module arguments
  2. 0 -1 1 928 models.common.Conv [3, 32, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
  3. 1 -1 1 18560 models.common.Conv [32, 64, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
  4. 2 -1 1 2112 models.common.Conv [64, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  5. 3 -2 1 2112 models.common.Conv [64, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  6. 4 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  7. 5 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  8. 6 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  9. 7 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  10. 8 -1 1 0 models.common.MP []
  11. 9 -1 1 4224 models.common.Conv [64, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  12. 10 -2 1 4224 models.common.Conv [64, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  13. 11 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  14. 12 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  15. 13 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  16. 14 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  17. 15 -1 1 0 models.common.MP []
  18. 16 -1 1 16640 models.common.Conv [128, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  19. 17 -2 1 16640 models.common.Conv [128, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  20. 18 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  21. 19 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  22. 20 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  23. 21 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  24. 22 -1 1 0 models.common.MP []
  25. 23 -1 1 66048 models.common.Conv [256, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  26. 24 -2 1 66048 models.common.Conv [256, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  27. 25 -1 1 590336 models.common.Conv [256, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  28. 26 -1 1 590336 models.common.Conv [256, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  29. 27 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  30. 28 -1 1 525312 models.common.Conv [1024, 512, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  31. 29 -1 1 17103040 models.evc.EVCBlock [512, 512]
  32. 30 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  33. 31 -2 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  34. 32 -1 1 0 models.common.SP [5]
  35. 33 -2 1 0 models.common.SP [9]
  36. 34 -3 1 0 models.common.SP [13]
  37. 35 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  38. 36 -1 1 262656 models.common.Conv [1024, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  39. 37 [-1, -7] 1 0 models.common.Concat [1]
  40. 38 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  41. 39 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  42. 40 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
  43. 41 21 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  44. 42 [-1, -2] 1 0 models.common.Concat [1]
  45. 43 -1 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  46. 44 -2 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  47. 45 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  48. 46 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  49. 47 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  50. 48 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  51. 49 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  52. 50 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
  53. 51 14 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  54. 52 [-1, -2] 1 0 models.common.Concat [1]
  55. 53 -1 1 4160 models.common.Conv [128, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  56. 54 -2 1 4160 models.common.Conv [128, 32, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  57. 55 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  58. 56 -1 1 9280 models.common.Conv [32, 32, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  59. 57 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  60. 58 -1 1 8320 models.common.Conv [128, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  61. 59 -1 1 73984 models.common.Conv [64, 128, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
  62. 60 [-1, 48] 1 0 models.common.Concat [1]
  63. 61 -1 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  64. 62 -2 1 16512 models.common.Conv [256, 64, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  65. 63 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  66. 64 -1 1 36992 models.common.Conv [64, 64, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  67. 65 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  68. 66 -1 1 33024 models.common.Conv [256, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  69. 67 -1 1 295424 models.common.Conv [128, 256, 3, 2, None, 1, LeakyReLU(negative_slope=0.1)]
  70. 68 [-1, 38] 1 0 models.common.Concat [1]
  71. 69 -1 1 65792 models.common.Conv [512, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  72. 70 -2 1 65792 models.common.Conv [512, 128, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  73. 71 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  74. 72 -1 1 147712 models.common.Conv [128, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  75. 73 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
  76. 74 -1 1 131584 models.common.Conv [512, 256, 1, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  77. 75 58 1 73984 models.common.Conv [64, 128, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  78. 76 66 1 295424 models.common.Conv [128, 256, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  79. 77 74 1 1180672 models.common.Conv [256, 512, 3, 1, None, 1, LeakyReLU(negative_slope=0.1)]
  80. 78 [75, 76, 77] 1 17132 models.yolo.IDetect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
  81. Model Summary: 318 layers, 23118028 parameters, 23118028 gradients, 26.7 GFLOPS

运行后若打印出如上文本代表改进成功。

四、YOLOv5s改进工作

完成二后,在YOLOv5项目文件下的models文件夹下创建新的文件yolov5s-evc.yaml,导入如下代码。

  1. # Parameters
  2. nc: 1 # number of classes
  3. depth_multiple: 0.33 # model depth multiple
  4. width_multiple: 0.50 # layer channel multiple
  5. anchors:
  6. - [10,13, 16,30, 33,23] # P3/8
  7. - [30,61, 62,45, 59,119] # P4/16
  8. - [116,90, 156,198, 373,326] # P5/32
  9. # YOLOv5 v6.0 backbone
  10. backbone:
  11. # [from, number, module, args]
  12. [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
  13. [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
  14. [-1, 3, C3, [128]],
  15. [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
  16. [-1, 6, C3, [256]],
  17. [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
  18. [-1, 9, C3, [512]],
  19. [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
  20. [-1, 3, C3, [1024]],
  21. [-1, 1, EVCBlock, [1024, 1024]],# 9-a
  22. [-1, 1, SPPF, [1024, 5]], # 10
  23. ]
  24. # YOLOv5 v6.0 head
  25. head:
  26. [[-1, 1, Conv, [512, 1, 1]],
  27. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  28. [[-1, 6], 1, Concat, [1]], # cat backbone P4
  29. [-1, 3, C3, [512, False]], # 13
  30. [-1, 1, Conv, [256, 1, 1]],
  31. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  32. [[-1, 4], 1, Concat, [1]], # cat backbone P3
  33. [-1, 3, C3, [256, False]], # 17 (P3/8-small)
  34. [-1, 1, Conv, [256, 3, 2]],
  35. [[-1, 15], 1, Concat, [1]], # cat head P4
  36. [-1, 3, C3, [512, False]], # 20 (P4/16-medium)
  37. [-1, 1, Conv, [512, 3, 2]],
  38. [[-1, 11], 1, Concat, [1]], # cat head P5
  39. [-1, 3, C3, [1024, False]], # 23 (P5/32-large)
  40. [[18, 21, 24], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  41. ]
  1. from n params module arguments
  2. 0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
  3. 1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
  4. 2 -1 1 18816 models.common.C3 [64, 64, 1]
  5. 3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
  6. 4 -1 2 115712 models.common.C3 [128, 128, 2]
  7. 5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
  8. 6 -1 3 625152 models.common.C3 [256, 256, 3]
  9. 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
  10. 8 -1 1 1182720 models.common.C3 [512, 512, 1]
  11. 9 -1 1 17103040 models.evc.EVCBlock [512, 512]
  12. 10 -1 1 656896 models.common.SPPF [512, 512, 5]
  13. 11 -1 1 131584 models.common.Conv [512, 256, 1, 1]
  14. 12 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
  15. 13 [-1, 6] 1 0 models.common.Concat [1]
  16. 14 -1 1 361984 models.common.C3 [512, 256, 1, False]
  17. 15 -1 1 33024 models.common.Conv [256, 128, 1, 1]
  18. 16 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
  19. 17 [-1, 4] 1 0 models.common.Concat [1]
  20. 18 -1 1 90880 models.common.C3 [256, 128, 1, False]
  21. 19 -1 1 147712 models.common.Conv [128, 128, 3, 2]
  22. 20 [-1, 15] 1 0 models.common.Concat [1]
  23. 21 -1 1 296448 models.common.C3 [256, 256, 1, False]
  24. 22 -1 1 590336 models.common.Conv [256, 256, 3, 2]
  25. 23 [-1, 11] 1 0 models.common.Concat [1]
  26. 24 -1 1 1182720 models.common.C3 [512, 512, 1, False]
  27. 25 [18, 21, 24] 1 16182 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
  28. Model Summary: 325 layers, 24125366 parameters, 24125366 gradients, 29.5 GFLOPs

运行后若打印出如上文本代表改进成功。

五、YOLOv5n改进工作

完成二后,在YOLOv5项目文件下的models文件夹下创建新的文件yolov5n-evc.yaml,导入如下代码。

  1. # Parameters
  2. nc: 1 # number of classes
  3. depth_multiple: 0.33 # model depth multiple
  4. width_multiple: 0.25 # layer channel multiple
  5. anchors:
  6. - [10,13, 16,30, 33,23] # P3/8
  7. - [30,61, 62,45, 59,119] # P4/16
  8. - [116,90, 156,198, 373,326] # P5/32
  9. # YOLOv5 v6.0 backbone
  10. backbone:
  11. # [from, number, module, args]
  12. [[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
  13. [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
  14. [-1, 3, C3, [128]],
  15. [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
  16. [-1, 6, C3, [256]],
  17. [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
  18. [-1, 9, C3, [512]],
  19. [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
  20. [-1, 3, C3, [1024]],
  21. [-1, 1, EVCBlock, [1024, 1024]],# 9-a
  22. [-1, 1, SPPF, [1024, 5]], # 10
  23. ]
  24. # YOLOv5 v6.0 head
  25. head:
  26. [[-1, 1, Conv, [512, 1, 1]],
  27. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  28. [[-1, 6], 1, Concat, [1]], # cat backbone P4
  29. [-1, 3, C3, [512, False]], # 13
  30. [-1, 1, Conv, [256, 1, 1]],
  31. [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  32. [[-1, 4], 1, Concat, [1]], # cat backbone P3
  33. [-1, 3, C3, [256, False]], # 17 (P3/8-small)
  34. [-1, 1, Conv, [256, 3, 2]],
  35. [[-1, 15], 1, Concat, [1]], # cat head P4
  36. [-1, 3, C3, [512, False]], # 20 (P4/16-medium)
  37. [-1, 1, Conv, [512, 3, 2]],
  38. [[-1, 11], 1, Concat, [1]], # cat head P5
  39. [-1, 3, C3, [1024, False]], # 23 (P5/32-large)
  40. [[18, 21, 24], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  41. ]
  1. from n params module arguments
  2. 0 -1 1 1760 models.common.Conv [3, 16, 6, 2, 2]
  3. 1 -1 1 4672 models.common.Conv [16, 32, 3, 2]
  4. 2 -1 1 4800 models.common.C3 [32, 32, 1]
  5. 3 -1 1 18560 models.common.Conv [32, 64, 3, 2]
  6. 4 -1 2 29184 models.common.C3 [64, 64, 2]
  7. 5 -1 1 73984 models.common.Conv [64, 128, 3, 2]
  8. 6 -1 3 156928 models.common.C3 [128, 128, 3]
  9. 7 -1 1 295424 models.common.Conv [128, 256, 3, 2]
  10. 8 -1 1 296448 models.common.C3 [256, 256, 1]
  11. 9 -1 1 4287680 models.evc.EVCBlock [256, 256]
  12. 10 -1 1 164608 models.common.SPPF [256, 256, 5]
  13. 11 -1 1 33024 models.common.Conv [256, 128, 1, 1]
  14. 12 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
  15. 13 [-1, 6] 1 0 models.common.Concat [1]
  16. 14 -1 1 90880 models.common.C3 [256, 128, 1, False]
  17. 15 -1 1 8320 models.common.Conv [128, 64, 1, 1]
  18. 16 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
  19. 17 [-1, 4] 1 0 models.common.Concat [1]
  20. 18 -1 1 22912 models.common.C3 [128, 64, 1, False]
  21. 19 -1 1 36992 models.common.Conv [64, 64, 3, 2]
  22. 20 [-1, 15] 1 0 models.common.Concat [1]
  23. 21 -1 1 74496 models.common.C3 [128, 128, 1, False]
  24. 22 -1 1 147712 models.common.Conv [128, 128, 3, 2]
  25. 23 [-1, 11] 1 0 models.common.Concat [1]
  26. 24 -1 1 296448 models.common.C3 [256, 256, 1, False]
  27. 25 [18, 21, 24] 1 8118 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
  28. Model Summary: 325 layers, 6052950 parameters, 6052950 gradients, 7.6 GFLOPs
六、注意

本文是一个示例修改,EVC这个模块添加在此处会导致参数量较为复杂,实际修改可以不按本文yaml示例进行修改,也可以按照官方改进点进行添加,同时加在骨干第一个输出的尺度位置可以控制参数量,但实际有条件的话还是建议多测几次,找到适合自己的改进点。

运行后打印如上代码说明改进成功。

更多文章产出中,主打简洁和准确,欢迎关注我,共同探讨!


本文转载自: https://blog.csdn.net/2401_84870184/article/details/140723542
版权归原作者 拿下Nahida 所有, 如有侵权,请联系我们删除。

“【YOLOv5/v7改进系列】引入中心化特征金字塔的EVC模块”的评论:

还没有评论