空间金字塔池化改进 SPP / SPPF / ASPP / RFB / SPPCSPC

文章目录

更新日志：2022年8月16日上午9:33分前在图片中增加感受野标注🍀

1 原理

1.1 SPP（Spatial Pyramid Pooling）

SPP

模块是何凯明大神在2015年的论文《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》中被提出。

SPP

全程为空间金字塔池化结构，主要是为了解决两个问题：

有效避免了对图像区域裁剪、缩放操作导致的图像失真等问题；
解决了卷积神经网络对图相关重复特征提取的问题，大大提高了产生候选框的速度，且节省了计算成本。

在这里插入图片描述

请添加图片描述

classSPP(nn.Module):# Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729def__init__(self, c1, c2, k=(5,9,13)):super().__init__()
        c_ = c1 //2# hidden channels
        self.cv1 = Conv(c1, c_,1,1)
        self.cv2 = Conv(c_ *(len(k)+1), c2,1,1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x //2)for x in k])defforward(self, x):
        x = self.cv1(x)with warnings.catch_warnings():
            warnings.simplefilter('ignore')# suppress torch 1.9.0 max_pool2d() warningreturn self.cv2(torch.cat([x]+[m(x)for m in self.m],1))

1.2 SPPF（Spatial Pyramid Pooling - Fast）

这个是YOLOv5作者

Glenn Jocher

基于

SPP

提出的，速度较

SPP

快很多，所以叫SPP-Fast

请添加图片描述

classSPPF(nn.Module):# Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocherdef__init__(self, c1, c2, k=5):# equivalent to SPP(k=(5, 9, 13))super().__init__()
        c_ = c1 //2# hidden channels
        self.cv1 = Conv(c1, c_,1,1)
        self.cv2 = Conv(c_ *4, c2,1,1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k //2)defforward(self, x):
        x = self.cv1(x)with warnings.catch_warnings():
            warnings.simplefilter('ignore')# suppress torch 1.9.0 max_pool2d() warning
            y1 = self.m(x)
            y2 = self.m(y1)return self.cv2(torch.cat((x, y1, y2, self.m(y2)),1))

1.3 ASPP（Atrous Spatial Pyramid Pooling）

受到

SPP

的启发，语义分割模型DeepLabv2中提出了

ASPP

模块(空洞空间卷积池化金字塔)，该模块使用具有不同采样率的多个并行空洞卷积层。为每个采样率提取的特征在单独的分支中进一步处理，并融合以生成最终结果。该模块通过不同的空洞率构建不同感受野的卷积核，用来获取多尺度物体信息，具体结构比较简单如下图所示：

请添加图片描述

ASPP

是在DeepLab中提出来的，在后续的DeepLab版本中对其做了改进，如加入BN层、加入深度可分离卷积等，但基本的思路还是没变。

# without BN versionclassASPP(nn.Module):def__init__(self, in_channel=512, out_channel=256):super(ASPP, self).__init__()
        self.mean = nn.AdaptiveAvgPool2d((1,1))# (1,1)means ouput_dim
        self.conv = nn.Conv2d(in_channel,out_channel,1,1)
        self.atrous_block1 = nn.Conv2d(in_channel, out_channel,1,1)
        self.atrous_block6 = nn.Conv2d(in_channel, out_channel,3,1, padding=6, dilation=6)
        self.atrous_block12 = nn.Conv2d(in_channel, out_channel,3,1, padding=12, dilation=12)
        self.atrous_block18 = nn.Conv2d(in_channel, out_channel,3,1, padding=18, dilation=18)
        self.conv_1x1_output = nn.Conv2d(out_channel *5, out_channel,1,1)defforward(self, x):
        size = x.shape[2:]

        image_features = self.mean(x)
        image_features = self.conv(image_features)
        image_features = F.upsample(image_features, size=size, mode='bilinear')

        atrous_block1 = self.atrous_block1(x)
        atrous_block6 = self.atrous_block6(x)
        atrous_block12 = self.atrous_block12(x)
        atrous_block18 = self.atrous_block18(x)

        net = self.conv_1x1_output(torch.cat([image_features, atrous_block1, atrous_block6,
                                              atrous_block12, atrous_block18], dim=1))return net

1.4 RFB（Receptive Field Block）

RFB

模块是在《ECCV2018:Receptive Field Block Net for Accurate and Fast Object Detection》一文中提出的，该文的出发点是模拟人类视觉的感受野从而加强网络的特征提取能力，在结构上

RFB

借鉴了

Inception

的思想，主要是在

Inception

的基础上加入了空洞卷积，从而有效增大了感受野
在这里插入图片描述
请添加图片描述

RFB

和

RFB-s

的架构。

RFB-s

用于在浅层人类视网膜主题图中模拟较小的

pRF

，使用具有较小内核的更多分支。

classBasicConv(nn.Module):def__init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True):super(BasicConv, self).__init__()
        self.out_channels = out_planes
        if bn:
            self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=False)
            self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True)
            self.relu = nn.ReLU(inplace=True)if relu elseNoneelse:
            self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=True)
            self.bn =None
            self.relu = nn.ReLU(inplace=True)if relu elseNonedefforward(self, x):
        x = self.conv(x)if self.bn isnotNone:
            x = self.bn(x)if self.relu isnotNone:
            x = self.relu(x)return x

classBasicRFB(nn.Module):def__init__(self, in_planes, out_planes, stride=1, scale=0.1, map_reduce=8, vision=1, groups=1):super(BasicRFB, self).__init__()
        self.scale = scale
        self.out_channels = out_planes
        inter_planes = in_planes // map_reduce

        self.branch0 = nn.Sequential(
            BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
            BasicConv(inter_planes,2* inter_planes, kernel_size=(3,3), stride=stride, padding=(1,1), groups=groups),
            BasicConv(2* inter_planes,2* inter_planes, kernel_size=3, stride=1, padding=vision +1, dilation=vision +1, relu=False, groups=groups))
        self.branch1 = nn.Sequential(
            BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
            BasicConv(inter_planes,2* inter_planes, kernel_size=(3,3), stride=stride, padding=(1,1), groups=groups),
            BasicConv(2* inter_planes,2* inter_planes, kernel_size=3, stride=1, padding=vision +2, dilation=vision +2, relu=False, groups=groups))
        self.branch2 = nn.Sequential(
            BasicConv(in_planes, inter_planes, kernel_size=1, stride=1, groups=groups, relu=False),
            BasicConv(inter_planes,(inter_planes //2)*3, kernel_size=3, stride=1, padding=1, groups=groups),
            BasicConv((inter_planes //2)*3,2* inter_planes, kernel_size=3, stride=stride, padding=1, groups=groups),
            BasicConv(2* inter_planes,2* inter_planes, kernel_size=3, stride=1, padding=vision +4, dilation=vision +4, relu=False, groups=groups))

        self.ConvLinear = BasicConv(6* inter_planes, out_planes, kernel_size=1, stride=1, relu=False)
        self.shortcut = BasicConv(in_planes, out_planes, kernel_size=1, stride=stride, relu=False)
        self.relu = nn.ReLU(inplace=False)defforward(self, x):
        x0 = self.branch0(x)
        x1 = self.branch1(x)
        x2 = self.branch2(x)

        out = torch.cat((x0, x1, x2),1)
        out = self.ConvLinear(out)
        short = self.shortcut(x)
        out = out * self.scale + short
        out = self.relu(out)return out

1.5 SPPCSPC

该模块是

YOLOv7

中使用的

SPP

结构，在

COCO数据集

上表现优于

SPPF

（其它的数据集并不一定）

请添加图片描述

classSPPCSPC(nn.Module):# CSP https://github.com/WongKinYiu/CrossStagePartialNetworksdef__init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5,9,13)):super(SPPCSPC, self).__init__()
        c_ =int(2* c2 * e)# hidden channels
        self.cv1 = Conv(c1, c_,1,1)
        self.cv2 = Conv(c1, c_,1,1)
        self.cv3 = Conv(c_, c_,3,1)
        self.cv4 = Conv(c_, c_,1,1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x //2)for x in k])
        self.cv5 = Conv(4* c_, c_,1,1)
        self.cv6 = Conv(c_, c_,3,1)
        self.cv7 = Conv(2* c_, c2,1,1)defforward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1]+[m(x1)for m in self.m],1)))
        y2 = self.cv2(x)return self.cv7(torch.cat((y1, y2), dim=1))

#分组SPPCSPC 分组后参数量和计算量与原本差距不大，不知道效果怎么样classSPPCSPC_group(nn.Module):def__init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5,9,13)):super(SPPCSPC_group, self).__init__()
        c_ =int(2* c2 * e)# hidden channels
        self.cv1 = Conv(c1, c_,1,1, g=4)
        self.cv2 = Conv(c1, c_,1,1, g=4)
        self.cv3 = Conv(c_, c_,3,1, g=4)
        self.cv4 = Conv(c_, c_,1,1, g=4)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x //2)for x in k])
        self.cv5 = Conv(4* c_, c_,1,1, g=4)
        self.cv6 = Conv(c_, c_,3,1, g=4)
        self.cv7 = Conv(2* c_, c2,1,1, g=4)defforward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1]+[m(x1)for m in self.m],1)))
        y2 = self.cv2(x)return self.cv7(torch.cat((y1, y2), dim=1))

2 参数量对比

这里我在

yolov5s.yaml

中使用各个模型替换

SPP

模块
模型****参数量(parameters)*计算量(GFLOPs)*SPP722588516.5*SPPF723538916.5*ASPP1548572523.1BasicRFB789542117.1SPPCSPC1366354921.7分组SPPCSPC835513317.4

3 改进方式

第一步；各个代码放入

common.py

中
第二步；

yolo.py

中加入类名
第三步；修改

配置文件

yolov5配置文件如下：

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license# YOLOv5 v6.0 backbone
backbone:# [from, number, module, args][[-1,1, Conv,[64,6,2,2]],# 0-P1/2[-1,1, Conv,[128,3,2]],# 1-P2/4[-1,3, C3,[128]],[-1,1, Conv,[256,3,2]],# 3-P3/8[-1,6, C3,[256]],[-1,1, Conv,[512,3,2]],# 5-P4/16[-1,9, C3,[512]],[-1,1, Conv,[1024,3,2]],# 7-P5/32[-1,3, C3,[1024]],[-1,1, SPPF,[1024,5]],# 9#[-1, 1, ASPP, [1024]],  # 9#[-1, 1, SPP, [1024]],#[-1, 1, BasicRFB, [1024]],#[-1, 1, SPPCSPC, [1024]],]

空间金字塔池化改进 SPP / SPPF / ASPP / RFB / SPPCSPC

文章目录

1 原理

1.1 SPP（Spatial Pyramid Pooling）

1.2 SPPF（Spatial Pyramid Pooling - Fast）

1.3 ASPP（Atrous Spatial Pyramid Pooling）

1.4 RFB（Receptive Field Block）

1.5 SPPCSPC

2 参数量对比

3 改进方式

更多内容导航

发表评论

“空间金字塔池化改进 SPP / SPPF / ASPP / RFB / SPPCSPC”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航