实验6：目标检测算法

一：实验目的与要求

1：了解两阶段目标检测模型 RCNN或Faster RCNN模型的原理和结构。

2：学习通过RCNN或Faster RCNN模型解决目标检测问题。

二：实验内容

常用的深度学习框架包括PyTorch和PaddlePaddle等，请选择一种深度学习框架，完成后续实验。

2.1 RCNN模型简介

区域卷积神经网络（RCNN）系列模型为两阶段目标检测器，包含对图像生成候选区域，提取特征，判别特征类别并修正候选框位置等几个步骤。 RCNN系列目前包含两个代表模型：Faster RCNN和Mask RCNN。

Faster RCNN 整体网络可以分为4个主要内容：

1、基础卷积层。作为一种卷积神经网络目标检测方法，Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续区域生成网络RPN层和全连接层共享。本示例采用ResNet-50作为基础卷积层。

2、区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景，再利用区域回归修正锚点从而获得精确的候选区域。

3、RoI Align。该层收集输入的特征图和候选区域，将候选区域映射到特征图中并池化为统一大小的区域特征图，送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式，在config.py中设置roi_func。

4、检测层。利用区域特征图计算候选区域的类别，同时再次通过区域回归获得检测框最终的精确位置。

2.2 数据集

实验提供了昆虫数据集和螺丝螺母检测数据集供同学们在实验中使用，同学们也可以结合自身研究，使用其它真实数据集进行实验。

三：实验资源

1.Faster RCNN模型实验代码。

参考paddle版本代码：
Faster RCNN目标检测 - 飞桨AI Studio星河社区

3：螺丝螺母数据集和昆虫数据集。

四：实验要求

1、阅读目标检测算法的相关论文，理解两阶段目标检测的主要思想。

2、结合目标检测算法的相关论文https://arxiv.org/abs/1506.01497，补充完成参考代码（或自选框架实现），实现目标检测算法（Faster RCNN）。

3、通过昆虫数据集、螺丝螺母检测数据集或自选数据集验证模型效果，并进行对比分析。

4、根据上述要求，撰写实验报告。

五：实验环境

本实验所使用的环境条件如下表所示。

操作系统

Ubuntu（Linux）

程序语言

Python（3.11.4）

第三方依赖

numpy, matplotlib，pytorch，openmmlab等

六：算法流程

两阶段（two-stage）目标检测模型通常由两个阶段组成：区域提议（Region Proposal）和目标分类与定位（Object Classification and Localization）。

【1】区域提议

（1）步骤：

输入图像：将待检测的图像输入模型。
特征提取：使用预训练的卷积神经网络（CNN）对输入图像进行特征提取，例如VGG、ResNet等。
生成候选框：在提取的特征图上使用滑动窗口或者其他方法生成候选目标框，这些框是可能包含目标的区域。

（2）输出：生成的候选目标框

【2】目标分类与定位

（1）步骤：

候选框裁剪：对于每个候选目标框，从原始图像中提取相应的区域。
特征提取与调整：将提取的候选框区域输入到CNN中，进行特征提取，并进行必要的调整以适应分类和定位的任务。
目标分类：使用softmax分类器对提取的特征进行目标分类，确定候选框内是否包含目标物体。
边界框回归：对于被分类为目标的候选框，进一步使用回归器来调整其位置，以更准确地定位目标的边界框。

（2）输出：每个目标框的类别标签和边界框位置调整参数。

七：实验展示

本实验所用的数据集来源于URP《基于机器视觉的智慧选种研究》，数据集包括广泛种植在中国的6类大米的显微镜拍摄图像，共计约有30000张大米图像。

【1】Faster RCNN

训练过程中，各类参数的设置如下：batch_size为16，backbone为resnet，neck为fpn，optimizer为SGD，学习速率为0.02，动量为0.9，权重衰减为0.0001，迭代次数为12，计算目标定位的损失函数为平滑L1损失，计算目标分类的损失函数为交叉熵损失。

12次迭代的训练时间约为6小时，训练过程中的验证集mAP结果如下所示。

（1）平均精度（mAP），无检测框的置信度限制

（2）平均精度（mAP），限制检测框的置信度大于等于0.5

（3）平均精度（mAP），限制检测框的置信度大于等于0.75

（4）平均精度（mAP），目标的尺寸范围为大尺寸（large）

图（1）到图（4）的mAP数据如下图所示。

采用OOD数据集的测试结果如下图所示。

放大结果如下图所示。

【2】TridentNet

训练过程中，各类参数的设置类似Faster RCNN。

训练过程中的验证集mAP结果如下所示。

（1）平均精度（mAP），无检测框的置信度限制

（2）平均精度（mAP），限制检测框的置信度大于等于0.5

（3）平均精度（mAP），限制检测框的置信度大于等于0.75

（4）平均精度（mAP），目标的尺寸范围为大尺寸（large）

图（1）到图（4）的mAP数据如下图所示。

采用OOD数据集的测试结果如下图所示。

放大结果如下图所示。

八：实验结论与心得

1：两阶段目标检测模型主要具备以下几个特点：

（1）区域提议阶段：通过在图像上生成候选目标框，提高了效率，减少了需要处理的区域数量。

（2）目标分类与定位阶段：对提取的候选框进行分类和位置调整，以确定目标物体的类别并准确地定位其边界框。

（3）端到端训练：整个模型通常采用端到端的方式进行训练，通过联合优化区域提议和目标分类与定位的任务来学习模型参数。

2：在两阶段目标检测模型中，代表性的模型包括Faster R-CNN、R-CNN、Mask R-CNN等。

3：在目标检测中，评估目标检测模型的性能包括准确率、召回率、平均精确度（mAP）等指标。

4：TridentNet最初由微软亚洲研究院提出，是一种两阶段的目标检测模型，其设计旨在提高检测器对小目标和长尾类别的检测性能。

5：TridentNet模型主要具备以下几个特点：

（1）多尺度检测：引入了三个并行的检测分支，每个分支专注于检测不同尺度的目标，使得在检测小目标和大目标时都能保持高效性。

（2）语义上下文信息：引入了一种新的上下文引导模块（Context Guided Module），用于利用图像语义上下文信息来增强目标检测。

（3）级联检测：级联检测器由一系列级联的子检测器组成，每个子检测器在前一个阶段的基础上进一步筛选和优化目标框，从而提高最终的检测精度。

6：mAP 表示平均精度，它是在不同类别上的平均准确率（Precision）值。通常情况下，这个值是在所有目标尺寸范围内计算的平均精度。

7：在coco/bbox_mAP_s、coco/bbox_mAP_m、coco/bbox_mAP_l中，s、m、l 分别代表小尺寸（small）、中尺寸（medium）、大尺寸（large）目标。这些指标用于衡量模型在不同尺寸范围内目标的检测性能。

九：主要代码

绘图

打开文本文件

root = '/home/ubuntu/mmdetection-main/work_dirs/faster_rcnn_r50_fpn_1x_coco/20240421_074656/20240421_074656.log'

withopen(root, 'r') asfile:

lines = file.readlines()

初始化存储数值的列表

map1 = [] # coco/bbox_mAP

map2 = [] # coco/bbox_mAP_50

map3 = [] # coco/bbox_mAP_75

map4 = [] # coco/bbox_mAP_l

遍历每一行文本，提取数值

forlineinlines:

if'coco/bbox_mAP'inline:

    values = line.split('coco/bbox_mAP: ') # 使用split方法分割字符串，并获取数值部分

    iflen(values) > 1:

        value_str = values[1].strip().split()[0]

        map1.append(float(value_str)) # 将数值转换为浮点数并添加到列表中

    

    values = line.split('coco/bbox_mAP_50: ') # 使用split方法分割字符串，并获取数值部分

    iflen(values) > 1:

        value_str = values[1].strip().split()[0]

        map2.append(float(value_str)) # 将数值转换为浮点数并添加到列表中

    

    values = line.split('coco/bbox_mAP_75: ') # 使用split方法分割字符串，并获取数值部分

    iflen(values) > 1:

        value_str = values[1].strip().split()[0]

        map3.append(float(value_str)) # 将数值转换为浮点数并添加到列表中

    

    values = line.split('coco/bbox_mAP_l: ') # 使用split方法分割字符串，并获取数值部分

    iflen(values) > 1:

        value_str = values[1].strip().split()[0]

        map4.append(float(value_str)) # 将数值转换为浮点数并添加到列表中

print(map1)

print(map2)

print(map3)

print(map4)

importmatplotlib.pyplotasplt

获取模型迭代次数（假设为每迭代一次记录一次）

iterations = range(1, len(map1) + 1)

绘制图像

plt.plot(iterations, map1, marker='o')

plt.xlabel('Iterations')

plt.ylabel('mAP')

plt.title('coco/bbox_mAP on VALID over Iterations')

plt.grid(True)

plt.savefig('map1.png')

plt.close()

plt.plot(iterations, map2, marker='o')

plt.xlabel('Iterations')

plt.ylabel('mAP')

plt.title('coco/bbox_mAP_50 on VALID over Iterations')

plt.grid(True)

plt.savefig('map2.png')

plt.close()

plt.plot(iterations, map3, marker='o')

plt.xlabel('Iterations')

plt.ylabel('mAP')

plt.title('coco/bbox_mAP_75 on VALID over Iterations')

plt.grid(True)

plt.savefig('map3.png')

plt.close()

plt.plot(iterations, map4, marker='o')

plt.xlabel('Iterations')

plt.ylabel('mAP')

plt.title('coco/bbox_mAP_l on VALID over Iterations')

plt.grid(True)

plt.savefig('map4.png')

plt.close()

"""

map1 = [] # coco/bbox_mAP

map2 = [] # coco/bbox_mAP_50

map3 = [] # coco/bbox_mAP_75

map4 = [] # coco/bbox_mAP_l

"""

Backbone：ResNet

Copyright (c) OpenMMLab. All rights reserved.

importwarnings

importtorch.nnasnn

importtorch.utils.checkpointascp

from mmcv.cnn import build_conv_layer, build_norm_layer, build_plugin_layer

frommmengine.modelimportBaseModule

fromtorch.nn.modules.batchnormimport_BatchNorm

frommmdet.registryimportMODELS

from ..layersimportResLayer

classBasicBlock(BaseModule):

expansion = 1

def__init__(self,

             inplanes,

             planes,

             stride=1,

             dilation=1,

             downsample=None,

             style='pytorch',

             with_cp=False,

             conv_cfg=None,

             norm_cfg=dict(type='BN'),

             dcn=None,

             plugins=None,

             init_cfg=None):

    super(BasicBlock, self).__init__(init_cfg)

    assertdcnisNone, 'Not implemented yet.'

    assertpluginsisNone, 'Not implemented yet.'

    self.norm1_name, norm1 = build_norm_layer(norm_cfg, planes, postfix=1)

    self.norm2_name, norm2 = build_norm_layer(norm_cfg, planes, postfix=2)

    self.conv1 = build_conv_layer(

        conv_cfg,

        inplanes,

        planes,

        3,

        stride=stride,

        padding=dilation,

        dilation=dilation,

        bias=False)

    self.add_module(self.norm1_name, norm1)

    self.conv2 = build_conv_layer(

        conv_cfg, planes, planes, 3, padding=1, bias=False)

    self.add_module(self.norm2_name, norm2)

    self.relu = nn.ReLU(inplace=True)

    self.downsample = downsample

    self.stride = stride

    self.dilation = dilation

    self.with_cp = with_cp

@property

defnorm1(self):

    """nn.Module: normalization layer after the first convolution layer"""

    returngetattr(self, self.norm1_name)

@property

defnorm2(self):

    """nn.Module: normalization layer after the second convolution layer"""

    returngetattr(self, self.norm2_name)

defforward(self, x):

    """Forward function."""

    def_inner_forward(x):

        identity = x

        out = self.conv1(x)

        out = self.norm1(out)

        out = self.relu(out)

        out = self.conv2(out)

        out = self.norm2(out)

        ifself.downsampleisnotNone:

            identity = self.downsample(x)

        out += identity

        returnout

    ifself.with_cpandx.requires_grad:

        out = cp.checkpoint(_inner_forward, x)

    else:

        out = _inner_forward(x)

    out = self.relu(out)

    returnout

classBottleneck(BaseModule):

expansion = 4

def__init__(self,

             inplanes,

             planes,

             stride=1,

             dilation=1,

             downsample=None,

             style='pytorch',

             with_cp=False,

             conv_cfg=None,

             norm_cfg=dict(type='BN'),

             dcn=None,

             plugins=None,

             init_cfg=None):

    """Bottleneck block for ResNet.

    If style is "pytorch", the stride-two layer is the 3x3 conv layer, if

    it is "caffe", the stride-two layer is the first 1x1 conv layer.

    """

    super(Bottleneck, self).__init__(init_cfg)

    assertstylein ['pytorch', 'caffe']

    assertdcnisNoneorisinstance(dcn, dict)

    assertpluginsisNoneorisinstance(plugins, list)

    ifpluginsisnotNone:

        allowed_position = ['after_conv1', 'after_conv2', 'after_conv3']

        assertall(p['position'] inallowed_positionforpinplugins)

    self.inplanes = inplanes

    self.planes = planes

    self.stride = stride

    self.dilation = dilation

    self.style = style

    self.with_cp = with_cp

    self.conv_cfg = conv_cfg

    self.norm_cfg = norm_cfg

    self.dcn = dcn

    self.with_dcn = dcnisnotNone

    self.plugins = plugins

    self.with_plugins = pluginsisnotNone

    ifself.with_plugins:

        # collect plugins for conv1/conv2/conv3

        self.after_conv1_plugins = [

            plugin['cfg'] forplugininplugins

            ifplugin['position'] == 'after_conv1'

        ]

        self.after_conv2_plugins = [

            plugin['cfg'] forplugininplugins

            ifplugin['position'] == 'after_conv2'

        ]

        self.after_conv3_plugins = [

            plugin['cfg'] forplugininplugins

            ifplugin['position'] == 'after_conv3'

        ]

    ifself.style == 'pytorch':

        self.conv1_stride = 1

        self.conv2_stride = stride

    else:

        self.conv1_stride = stride

        self.conv2_stride = 1

    self.norm1_name, norm1 = build_norm_layer(norm_cfg, planes, postfix=1)

    self.norm2_name, norm2 = build_norm_layer(norm_cfg, planes, postfix=2)

    self.norm3_name, norm3 = build_norm_layer(

        norm_cfg, planes * self.expansion, postfix=3)

    self.conv1 = build_conv_layer(

        conv_cfg,

        inplanes,

        planes,

        kernel_size=1,

        stride=self.conv1_stride,

        bias=False)

    self.add_module(self.norm1_name, norm1)

    fallback_on_stride = False

    ifself.with_dcn:

        fallback_on_stride = dcn.pop('fallback_on_stride', False)

    ifnotself.with_dcnorfallback_on_stride:

        self.conv2 = build_conv_layer(

            conv_cfg,

            planes,

            planes,

            kernel_size=3,

            stride=self.conv2_stride,

            padding=dilation,

            dilation=dilation,

            bias=False)

    else:

        assertself.conv_cfgisNone, 'conv_cfg must be None for DCN'

        self.conv2 = build_conv_layer(

            dcn,

            planes,

            planes,

            kernel_size=3,

            stride=self.conv2_stride,

            padding=dilation,

            dilation=dilation,

            bias=False)

    self.add_module(self.norm2_name, norm2)

    self.conv3 = build_conv_layer(

        conv_cfg,

        planes,

        planes * self.expansion,

        kernel_size=1,

        bias=False)

    self.add_module(self.norm3_name, norm3)

    self.relu = nn.ReLU(inplace=True)

    self.downsample = downsample

    ifself.with_plugins:

        self.after_conv1_plugin_names = self.make_block_plugins(

            planes, self.after_conv1_plugins)

        self.after_conv2_plugin_names = self.make_block_plugins(

            planes, self.after_conv2_plugins)

        self.after_conv3_plugin_names = self.make_block_plugins(

            planes * self.expansion, self.after_conv3_plugins)

defmake_block_plugins(self, in_channels, plugins):

    """make plugins for block.

    Args:

        in_channels (int): Input channels of plugin.

        plugins (list[dict]): List of plugins cfg to build.

    Returns:

        list[str]: List of the names of plugin.

    """

    assertisinstance(plugins, list)

    plugin_names = []

    forplugininplugins:

        plugin = plugin.copy()

        name, layer = build_plugin_layer(

            plugin,

            in_channels=in_channels,

            postfix=plugin.pop('postfix', ''))

        assertnothasattr(self, name), f'duplicate plugin {name}'

        self.add_module(name, layer)

        plugin_names.append(name)

    returnplugin_names

defforward_plugin(self, x, plugin_names):

    out = x

    fornameinplugin_names:

        out = getattr(self, name)(out)

    returnout

@property

defnorm1(self):

    """nn.Module: normalization layer after the first convolution layer"""

    returngetattr(self, self.norm1_name)

@property

defnorm2(self):

    """nn.Module: normalization layer after the second convolution layer"""

    returngetattr(self, self.norm2_name)

@property

defnorm3(self):

    """nn.Module: normalization layer after the third convolution layer"""

    returngetattr(self, self.norm3_name)

defforward(self, x):

    """Forward function."""

    def_inner_forward(x):

        identity = x

        out = self.conv1(x)

        out = self.norm1(out)

        out = self.relu(out)

        ifself.with_plugins:

            out = self.forward_plugin(out, self.after_conv1_plugin_names)

        out = self.conv2(out)

        out = self.norm2(out)

        out = self.relu(out)

        ifself.with_plugins:

            out = self.forward_plugin(out, self.after_conv2_plugin_names)

        out = self.conv3(out)

        out = self.norm3(out)

        ifself.with_plugins:

            out = self.forward_plugin(out, self.after_conv3_plugin_names)

        ifself.downsampleisnotNone:

            identity = self.downsample(x)

        out += identity

        returnout

    ifself.with_cpandx.requires_grad:

        out = cp.checkpoint(_inner_forward, x)

    else:

        out = _inner_forward(x)

    out = self.relu(out)

    returnout

@MODELS.register_module()

classResNet(BaseModule):

"""ResNet backbone.

Args:

    depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.

    stem_channels (int | None): Number of stem channels. If not specified,

        it will be the same as `base_channels`. Default: None.

    base_channels (int): Number of base channels of res layer. Default: 64.

    in_channels (int): Number of input image channels. Default: 3.

    num_stages (int): Resnet stages. Default: 4.

    strides (Sequence[int]): Strides of the first block of each stage.

    dilations (Sequence[int]): Dilation of each stage.

    out_indices (Sequence[int]): Output from which stages.

    style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two

        layer is the 3x3 conv layer, otherwise the stride-two layer is

        the first 1x1 conv layer.

    deep_stem (bool): Replace 7x7 conv in input stem with 3 3x3 conv

    avg_down (bool): Use AvgPool instead of stride conv when

        downsampling in the bottleneck.

    frozen_stages (int): Stages to be frozen (stop grad and set eval mode).

        -1 means not freezing any parameters.

    norm_cfg (dict): Dictionary to construct and config norm layer.

    norm_eval (bool): Whether to set norm layers to eval mode, namely,

        freeze running stats (mean and var). Note: Effect on Batch Norm

        and its variants only.

    plugins (list[dict]): List of plugins for stages, each dict contains:

        - cfg (dict, required): Cfg dict to build plugin.

        - position (str, required): Position inside block to insert

          plugin, options are 'after_conv1', 'after_conv2', 'after_conv3'.

        - stages (tuple[bool], optional): Stages to apply plugin, length

          should be same as 'num_stages'.

    with_cp (bool): Use checkpoint or not. Using checkpoint will save some

        memory while slowing down the training speed.

    zero_init_residual (bool): Whether to use zero init for last norm layer

        in resblocks to let them behave as identity.

    pretrained (str, optional): model pretrained path. Default: None

    init_cfg (dict or list[dict], optional): Initialization config dict.

        Default: None

Example:

    >>> from mmdet.models import ResNet

    >>> import torch

    >>> self = ResNet(depth=18)

    >>> self.eval()

    >>> inputs = torch.rand(1, 3, 32, 32)

    >>> level_outputs = self.forward(inputs)

    >>> for level_out in level_outputs:

    ...     print(tuple(level_out.shape))

    (1, 64, 8, 8)

    (1, 128, 4, 4)

    (1, 256, 2, 2)

    (1, 512, 1, 1)

"""

arch_settings = {

    18: (BasicBlock, (2, 2, 2, 2)),

    34: (BasicBlock, (3, 4, 6, 3)),

    50: (Bottleneck, (3, 4, 6, 3)),

    101: (Bottleneck, (3, 4, 23, 3)),

    152: (Bottleneck, (3, 8, 36, 3))

}

def__init__(self,

             depth,

             in_channels=3,

             stem_channels=None,

             base_channels=64,

             num_stages=4,

             strides=(1, 2, 2, 2),

             dilations=(1, 1, 1, 1),

             out_indices=(0, 1, 2, 3),

             style='pytorch',

             deep_stem=False,

             avg_down=False,

             frozen_stages=-1,

             conv_cfg=None,

             norm_cfg=dict(type='BN', requires_grad=True),

             norm_eval=True,

             dcn=None,

             stage_with_dcn=(False, False, False, False),

             plugins=None,

             with_cp=False,

             zero_init_residual=True,

             pretrained=None,

             init_cfg=None):

    super(ResNet, self).__init__(init_cfg)

    self.zero_init_residual = zero_init_residual

    ifdepthnotinself.arch_settings:

        raiseKeyError(f'invalid depth {depth} for resnet')

    block_init_cfg = None

    assertnot (init_cfgandpretrained), \

        'init_cfg and pretrained cannot be specified at the same time'

    ifisinstance(pretrained, str):

        warnings.warn('DeprecationWarning: pretrained is deprecated, '

                      'please use "init_cfg" instead')

        self.init_cfg = dict(type='Pretrained', checkpoint=pretrained)

    elifpretrainedisNone:

        ifinit_cfgisNone:

            self.init_cfg = [

                dict(type='Kaiming', layer='Conv2d'),

                dict(

                    type='Constant',

                    val=1,

                    layer=['_BatchNorm', 'GroupNorm'])

            ]

            block = self.arch_settings[depth][0]

            ifself.zero_init_residual:

                ifblockisBasicBlock:

                    block_init_cfg = dict(

                        type='Constant',

                        val=0,

                        override=dict(name='norm2'))

                elifblockisBottleneck:

                    block_init_cfg = dict(

                        type='Constant',

                        val=0,

                        override=dict(name='norm3'))

    else:

        raiseTypeError('pretrained must be a str or None')

    self.depth = depth

    ifstem_channelsisNone:

        stem_channels = base_channels

    self.stem_channels = stem_channels

    self.base_channels = base_channels

    self.num_stages = num_stages

    assertnum_stages >= 1andnum_stages <= 4

    self.strides = strides

    self.dilations = dilations

    assertlen(strides) == len(dilations) == num_stages

    self.out_indices = out_indices

    assertmax(out_indices) < num_stages

    self.style = style

    self.deep_stem = deep_stem

    self.avg_down = avg_down

    self.frozen_stages = frozen_stages

    self.conv_cfg = conv_cfg

    self.norm_cfg = norm_cfg

    self.with_cp = with_cp

    self.norm_eval = norm_eval

    self.dcn = dcn

    self.stage_with_dcn = stage_with_dcn

    ifdcnisnotNone:

        assertlen(stage_with_dcn) == num_stages

    self.plugins = plugins

    self.block, stage_blocks = self.arch_settings[depth]

    self.stage_blocks = stage_blocks[:num_stages]

    self.inplanes = stem_channels

    self._make_stem_layer(in_channels, stem_channels)

    self.res_layers = []

    fori, num_blocksinenumerate(self.stage_blocks):

        stride = strides[i]

        dilation = dilations[i]

        dcn = self.dcnifself.stage_with_dcn[i] elseNone

        ifpluginsisnotNone:

            stage_plugins = self.make_stage_plugins(plugins, i)

        else:

            stage_plugins = None

        planes = base_channels * 2**i

        res_layer = self.make_res_layer(

            block=self.block,

            inplanes=self.inplanes,

            planes=planes,

            num_blocks=num_blocks,

            stride=stride,

            dilation=dilation,

            style=self.style,

            avg_down=self.avg_down,

            with_cp=with_cp,

            conv_cfg=conv_cfg,

            norm_cfg=norm_cfg,

            dcn=dcn,

            plugins=stage_plugins,

            init_cfg=block_init_cfg)

        self.inplanes = planes * self.block.expansion

        layer_name = f'layer{i + 1}'

        self.add_module(layer_name, res_layer)

        self.res_layers.append(layer_name)

    self._freeze_stages()

    self.feat_dim = self.block.expansion * base_channels * 2**(

        len(self.stage_blocks) - 1)

defmake_stage_plugins(self, plugins, stage_idx):

    """Make plugins for ResNet ``stage_idx`` th stage.

    Currently we support to insert ``context_block``,

    ``empirical_attention_block``, ``nonlocal_block`` into the backbone

    like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of

    Bottleneck.

    An example of plugins format could be:

    Examples:

        >>> plugins=[

        ...     dict(cfg=dict(type='xxx', arg1='xxx'),

        ...          stages=(False, True, True, True),

        ...          position='after_conv2'),

        ...     dict(cfg=dict(type='yyy'),

        ...          stages=(True, True, True, True),

        ...          position='after_conv3'),

        ...     dict(cfg=dict(type='zzz', postfix='1'),

        ...          stages=(True, True, True, True),

        ...          position='after_conv3'),

        ...     dict(cfg=dict(type='zzz', postfix='2'),

        ...          stages=(True, True, True, True),

        ...          position='after_conv3')

        ... ]

        >>> self = ResNet(depth=18)

        >>> stage_plugins = self.make_stage_plugins(plugins, 0)

        >>> assert len(stage_plugins) == 3

    Suppose ``stage_idx=0``, the structure of blocks in the stage would be:

    .. code-block:: none

        conv1-> conv2->conv3->yyy->zzz1->zzz2

    Suppose 'stage_idx=1', the structure of blocks in the stage would be:

    .. code-block:: none

        conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

    If stages is missing, the plugin would be applied to all stages.

    Args:

        plugins (list[dict]): List of plugins cfg to build. The postfix is

            required if multiple same type plugins are inserted.

        stage_idx (int): Index of stage to build

    Returns:

        list[dict]: Plugins for current stage

    """

    stage_plugins = []

    forplugininplugins:

        plugin = plugin.copy()

        stages = plugin.pop('stages', None)

        assertstagesisNoneorlen(stages) == self.num_stages

        # whether to insert plugin into current stage

        ifstagesisNoneorstages[stage_idx]:

            stage_plugins.append(plugin)

    returnstage_plugins

defmake_res_layer(self, **kwargs):

    """Pack all blocks in a stage into a ``ResLayer``."""

    returnResLayer(**kwargs)

@property

defnorm1(self):

    """nn.Module: the normalization layer named "norm1" """

    returngetattr(self, self.norm1_name)

def_make_stem_layer(self, in_channels, stem_channels):

    ifself.deep_stem:

        self.stem = nn.Sequential(

            build_conv_layer(

                self.conv_cfg,

                in_channels,

                stem_channels // 2,

                kernel_size=3,

                stride=2,

                padding=1,

                bias=False),

            build_norm_layer(self.norm_cfg, stem_channels // 2)[1],

            nn.ReLU(inplace=True),

            build_conv_layer(

                self.conv_cfg,

                stem_channels // 2,

                stem_channels // 2,

                kernel_size=3,

                stride=1,

                padding=1,

                bias=False),

            build_norm_layer(self.norm_cfg, stem_channels // 2)[1],

            nn.ReLU(inplace=True),

            build_conv_layer(

                self.conv_cfg,

                stem_channels // 2,

                stem_channels,

                kernel_size=3,

                stride=1,

                padding=1,

                bias=False),

            build_norm_layer(self.norm_cfg, stem_channels)[1],

            nn.ReLU(inplace=True))

    else:

        self.conv1 = build_conv_layer(

            self.conv_cfg,

            in_channels,

            stem_channels,

            kernel_size=7,

            stride=2,

            padding=3,

            bias=False)

        self.norm1_name, norm1 = build_norm_layer(

            self.norm_cfg, stem_channels, postfix=1)

        self.add_module(self.norm1_name, norm1)

        self.relu = nn.ReLU(inplace=True)

    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

def_freeze_stages(self):

    ifself.frozen_stages >= 0:

        ifself.deep_stem:

            self.stem.eval()

            forparaminself.stem.parameters():

                param.requires_grad = False

        else:

            self.norm1.eval()

            formin [self.conv1, self.norm1]:

                forparaminm.parameters():

                    param.requires_grad = False

    foriinrange(1, self.frozen_stages + 1):

        m = getattr(self, f'layer{i}')

        m.eval()

        forparaminm.parameters():

            param.requires_grad = False

defforward(self, x):

    """Forward function."""

    ifself.deep_stem:

        x = self.stem(x)

    else:

        x = self.conv1(x)

        x = self.norm1(x)

        x = self.relu(x)

    x = self.maxpool(x)

    outs = []

    fori, layer_nameinenumerate(self.res_layers):

        res_layer = getattr(self, layer_name)

        x = res_layer(x)

        ifiinself.out_indices:

            outs.append(x)

    returntuple(outs)

deftrain(self, mode=True):

    """Convert the model into training mode while keep normalization layer

    freezed."""

    super(ResNet, self).train(mode)

    self._freeze_stages()

    ifmodeandself.norm_eval:

        forminself.modules():

            # trick: eval have effect on BatchNorm only

            ifisinstance(m, _BatchNorm):

                m.eval()

@MODELS.register_module()

classResNetV1d(ResNet):

r"""ResNetV1d variant described in `Bag of Tricks

<https://arxiv.org/pdf/1812.01187.pdf>`_.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in

the input stem with three 3x3 convs. And in the downsampling block, a 2x2

avg_pool with stride 2 is added before conv, whose stride is changed to 1.

"""

def__init__(self, **kwargs):

    super(ResNetV1d, self).__init__(

        deep_stem=True, avg_down=True, **kwargs)

Neck：fpn

fromtypingimportList, Tuple, Union

importtorch.nnasnn

importtorch.nn.functionalasF

from mmcv.cnn import ConvModule

frommmengine.modelimportBaseModule

fromtorchimportTensor

frommmdet.registryimportMODELS

frommmdet.utilsimportConfigType, MultiConfig, OptConfigType

@MODELS.register_module()

classFPN(BaseModule):

r"""Feature Pyramid Network.

This is an implementation of paper `Feature Pyramid Networks for Object

Detection <https://arxiv.org/abs/1612.03144>`_.

Args:

    in_channels (list[int]): Number of input channels per scale.

    out_channels (int): Number of output channels (used at each scale).

    num_outs (int): Number of output scales.

    start_level (int): Index of the start input backbone level used to

        build the feature pyramid. Defaults to 0.

    end_level (int): Index of the end input backbone level (exclusive) to

        build the feature pyramid. Defaults to -1, which means the

        last level.

    add_extra_convs (bool | str): If bool, it decides whether to add conv

        layers on top of the original feature maps. Defaults to False.

        If True, it is equivalent to `add_extra_convs='on_input'`.

        If str, it specifies the source feature map of the extra convs.

        Only the following options are allowed

        - 'on_input': Last feat map of neck inputs (i.e. backbone feature).

        - 'on_lateral': Last feature map after lateral convs.

        - 'on_output': The last output feature map after fpn convs.

    relu_before_extra_convs (bool): Whether to apply relu before the extra

        conv. Defaults to False.

    no_norm_on_lateral (bool): Whether to apply norm on lateral.

        Defaults to False.

    conv_cfg (:obj:`ConfigDict` or dict, optional): Config dict for

        convolution layer. Defaults to None.

    norm_cfg (:obj:`ConfigDict` or dict, optional): Config dict for

        normalization layer. Defaults to None.

    act_cfg (:obj:`ConfigDict` or dict, optional): Config dict for

        activation layer in ConvModule. Defaults to None.

    upsample_cfg (:obj:`ConfigDict` or dict, optional): Config dict

        for interpolate layer. Defaults to dict(mode='nearest').

    init_cfg (:obj:`ConfigDict` or dict or list[:obj:`ConfigDict` or \

        dict]): Initialization config dict.

Example:

    >>> import torch

    >>> in_channels = [2, 3, 5, 7]

    >>> scales = [340, 170, 84, 43]

    >>> inputs = [torch.rand(1, c, s, s)

    ...           for c, s in zip(in_channels, scales)]

    >>> self = FPN(in_channels, 11, len(in_channels)).eval()

    >>> outputs = self.forward(inputs)

    >>> for i in range(len(outputs)):

    ...     print(f'outputs[{i}].shape = {outputs[i].shape}')

    outputs[0].shape = torch.Size([1, 11, 340, 340])

    outputs[1].shape = torch.Size([1, 11, 170, 170])

    outputs[2].shape = torch.Size([1, 11, 84, 84])

    outputs[3].shape = torch.Size([1, 11, 43, 43])

"""

def__init__(

    self,

    in_channels: List[int],

    out_channels: int,

    num_outs: int,

    start_level: int = 0,

    end_level: int = -1,

    add_extra_convs: Union[bool, str] = False,

    relu_before_extra_convs: bool = False,

    no_norm_on_lateral: bool = False,

    conv_cfg: OptConfigType = None,

    norm_cfg: OptConfigType = None,

    act_cfg: OptConfigType = None,

    upsample_cfg: ConfigType = dict(mode='nearest'),

    init_cfg: MultiConfig = dict(

        type='Xavier', layer='Conv2d', distribution='uniform')

) -> None:

    super().__init__(init_cfg=init_cfg)

    assertisinstance(in_channels, list)

    self.in_channels = in_channels

    self.out_channels = out_channels

    self.num_ins = len(in_channels)

    self.num_outs = num_outs

    self.relu_before_extra_convs = relu_before_extra_convs

    self.no_norm_on_lateral = no_norm_on_lateral

    self.fp16_enabled = False

    self.upsample_cfg = upsample_cfg.copy()

    ifend_level == -1orend_level == self.num_ins - 1:

        self.backbone_end_level = self.num_ins

        assertnum_outs >= self.num_ins - start_level

    else:

        # if end_level is not the last level, no extra level is allowed

        self.backbone_end_level = end_level + 1

        assertend_level < self.num_ins

        assertnum_outs == end_level - start_level + 1

    self.start_level = start_level

    self.end_level = end_level

    self.add_extra_convs = add_extra_convs

    assertisinstance(add_extra_convs, (str, bool))

    ifisinstance(add_extra_convs, str):

        # Extra_convs_source choices: 'on_input', 'on_lateral', 'on_output'

        assertadd_extra_convsin ('on_input', 'on_lateral', 'on_output')

    elifadd_extra_convs:  # True

        self.add_extra_convs = 'on_input'

    self.lateral_convs = nn.ModuleList()

    self.fpn_convs = nn.ModuleList()

    foriinrange(self.start_level, self.backbone_end_level):

        l_conv = ConvModule(

            in_channels[i],

            out_channels,

            1,

            conv_cfg=conv_cfg,

            norm_cfg=norm_cfgifnotself.no_norm_on_lateralelseNone,

            act_cfg=act_cfg,

            inplace=False)

        fpn_conv = ConvModule(

            out_channels,

            out_channels,

            3,

            padding=1,

            conv_cfg=conv_cfg,

            norm_cfg=norm_cfg,

            act_cfg=act_cfg,

            inplace=False)

        self.lateral_convs.append(l_conv)

        self.fpn_convs.append(fpn_conv)

    # add extra conv layers (e.g., RetinaNet)

    extra_levels = num_outs - self.backbone_end_level + self.start_level

    ifself.add_extra_convsandextra_levels >= 1:

        foriinrange(extra_levels):

            ifi == 0andself.add_extra_convs == 'on_input':

                in_channels = self.in_channels[self.backbone_end_level - 1]

            else:

                in_channels = out_channels

            extra_fpn_conv = ConvModule(

                in_channels,

                out_channels,

                3,

                stride=2,

                padding=1,

                conv_cfg=conv_cfg,

                norm_cfg=norm_cfg,

                act_cfg=act_cfg,

                inplace=False)

            self.fpn_convs.append(extra_fpn_conv)

defforward(self, inputs: Tuple[Tensor]) -> tuple:

    """Forward function.

    Args:

        inputs (tuple[Tensor]): Features from the upstream network, each

            is a 4D-tensor.

    Returns:

        tuple: Feature maps, each is a 4D-tensor.

    """

    assertlen(inputs) == len(self.in_channels)

    # build laterals

    laterals = [

        lateral_conv(inputs[i + self.start_level])

        fori, lateral_convinenumerate(self.lateral_convs)

    ]

    # build top-down path

    used_backbone_levels = len(laterals)

    foriinrange(used_backbone_levels - 1, 0, -1):

        # In some cases, fixing `scale factor` (e.g. 2) is preferred, but

        #  it cannot co-exist with `size` in `F.interpolate`.

        if'scale_factor'inself.upsample_cfg:

            # fix runtime error of "+=" inplace operation in PyTorch 1.10

            laterals[i - 1] = laterals[i - 1] + F.interpolate(

                laterals[i], **self.upsample_cfg)

        else:

            prev_shape = laterals[i - 1].shape[2:]

            laterals[i - 1] = laterals[i - 1] + F.interpolate(

                laterals[i], size=prev_shape, **self.upsample_cfg)

    # build outputs

    # part 1: from original levels

    outs = [

        self.fpn_convs[i](laterals[i]) foriinrange(used_backbone_levels)

    ]

    # part 2: add extra levels

    ifself.num_outs > len(outs):

        # use max pool to get more levels on top of outputs

        # (e.g., Faster R-CNN, Mask R-CNN)

        ifnotself.add_extra_convs:

            foriinrange(self.num_outs - used_backbone_levels):

                outs.append(F.max_pool2d(outs[-1], 1, stride=2))

        # add conv layers on top of original feature maps (RetinaNet)

        else:

            ifself.add_extra_convs == 'on_input':

                extra_source = inputs[self.backbone_end_level - 1]

            elifself.add_extra_convs == 'on_lateral':

                extra_source = laterals[-1]

            elifself.add_extra_convs == 'on_output':

                extra_source = outs[-1]

            else:

                raiseNotImplementedError

            outs.append(self.fpn_convs[used_backbone_levels](extra_source))

            foriinrange(used_backbone_levels + 1, self.num_outs):

                ifself.relu_before_extra_convs:

                    outs.append(self.fpn_convs[i](F.relu(outs[-1])))

                else:

                    outs.append(self.fpn_convs[i](outs[-1]))

    returntuple(outs)

参数文件

auto_scale_lr = dict(base_batch_size=16, enable=False)

backend_args = None

data_root = 'data/coco/'

dataset_type = 'mmdet.datasets.CocoDataset'

default_hooks = dict(

checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'),

logger=dict(interval=50, type='mmengine.hooks.LoggerHook'),

param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),

sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),

timer=dict(type='mmengine.hooks.IterTimerHook'),

visualization=dict(type='mmdet.engine.hooks.DetVisualizationHook'))

default_scope = None

env_cfg = dict(

cudnn_benchmark=False,

dist_cfg=dict(backend='nccl'),

mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))

launcher = 'none'

load_from = None

log_level = 'INFO'

log_processor = dict(

by_epoch=True, type='mmengine.runner.LogProcessor', window_size=50)

model = dict(

backbone=dict(

    depth=50,

    frozen_stages=1,

    init_cfg=dict(checkpoint='torchvision://resnet50', type='Pretrained'),

    norm_cfg=dict(requires_grad=True, type='torch.nn.BatchNorm2d'),

    norm_eval=True,

    num_stages=4,

    out_indices=(

        0,

        1,

        2,

        3,

    ),

    style='pytorch',

    type='mmdet.models.backbones.resnet.ResNet'),

data_preprocessor=dict(

    bgr_to_rgb=True,

    mean=[

        123.675,

        116.28,

        103.53,

    ],

    pad_size_divisor=32,

    std=[

        58.395,

        57.12,

        57.375,

    ],

    type=

    'mmdet.models.data_preprocessors.data_preprocessor.DetDataPreprocessor'

),

neck=dict(

    in_channels=[

        256,

        512,

        1024,

        2048,

    ],

    num_outs=5,

    out_channels=256,

    type='mmdet.models.necks.fpn.FPN'),

roi_head=dict(

    bbox_head=dict(

        bbox_coder=dict(

            target_means=[

                0.0,

                0.0,

                0.0,

                0.0,

            ],

            target_stds=[

                0.1,

                0.1,

                0.2,

                0.2,

            ],

            type=

            'mmdet.models.task_modules.coders.delta_xywh_bbox_coder.DeltaXYWHBBoxCoder'

        ),

        fc_out_channels=1024,

        in_channels=256,

        loss_bbox=dict(

            loss_weight=1.0,

            type='mmdet.models.losses.smooth_l1_loss.L1Loss'),

        loss_cls=dict(

            loss_weight=1.0,

            type='mmdet.models.losses.cross_entropy_loss.CrossEntropyLoss',

            use_sigmoid=False),

        num_classes=80,

        reg_class_agnostic=False,

        roi_feat_size=7,

        type=

        'mmdet.models.roi_heads.bbox_heads.convfc_bbox_head.Shared2FCBBoxHead'

    ),

    bbox_roi_extractor=dict(

        featmap_strides=[

            4,

            8,

            16,

            32,

        ],

        out_channels=256,

        roi_layer=dict(

            output_size=7, sampling_ratio=0, type='mmcv.ops.RoIAlign'),

        type=

        'mmdet.models.roi_heads.roi_extractors.single_level_roi_extractor.SingleRoIExtractor'

    ),

    type='mmdet.models.roi_heads.standard_roi_head.StandardRoIHead'),

rpn_head=dict(

    anchor_generator=dict(

        ratios=[

            0.5,

            1.0,

            2.0,

        ],

        scales=[

            8,

        ],

        strides=[

            4,

            8,

            16,

            32,

            64,

        ],

        type=

        'mmdet.models.task_modules.prior_generators.anchor_generator.AnchorGenerator'

    ),

    bbox_coder=dict(

        target_means=[

            0.0,

            0.0,

            0.0,

            0.0,

        ],

        target_stds=[

            1.0,

            1.0,

            1.0,

            1.0,

        ],

        type=

        'mmdet.models.task_modules.coders.delta_xywh_bbox_coder.DeltaXYWHBBoxCoder'

    ),

    feat_channels=256,

    in_channels=256,

    loss_bbox=dict(

        loss_weight=1.0, type='mmdet.models.losses.smooth_l1_loss.L1Loss'),

    loss_cls=dict(

        loss_weight=1.0,

        type='mmdet.models.losses.cross_entropy_loss.CrossEntropyLoss',

        use_sigmoid=True),

    type='mmdet.models.dense_heads.rpn_head.RPNHead'),

test_cfg=dict(

    rcnn=dict(

        max_per_img=100,

        nms=dict(iou_threshold=0.5, type='mmcv.ops.nms'),

        score_thr=0.05),

    rpn=dict(

        max_per_img=1000,

        min_bbox_size=0,

        nms=dict(iou_threshold=0.7, type='mmcv.ops.nms'),

        nms_pre=1000)),

train_cfg=dict(

    rcnn=dict(

        assigner=dict(

            ignore_iof_thr=-1,

            match_low_quality=False,

            min_pos_iou=0.5,

            neg_iou_thr=0.5,

            pos_iou_thr=0.5,

            type=

            'mmdet.models.task_modules.assigners.max_iou_assigner.MaxIoUAssigner'

        ),

        debug=False,

        pos_weight=-1,

        sampler=dict(

            add_gt_as_proposals=True,

            neg_pos_ub=-1,

            num=512,

            pos_fraction=0.25,

            type=

            'mmdet.models.task_modules.samplers.random_sampler.RandomSampler'

        )),

    rpn=dict(

        allowed_border=-1,

        assigner=dict(

            ignore_iof_thr=-1,

            match_low_quality=True,

            min_pos_iou=0.3,

            neg_iou_thr=0.3,

            pos_iou_thr=0.7,

            type=

            'mmdet.models.task_modules.assigners.max_iou_assigner.MaxIoUAssigner'

        ),

        debug=False,

        pos_weight=-1,

        sampler=dict(

            add_gt_as_proposals=False,

            neg_pos_ub=-1,

            num=256,

            pos_fraction=0.5,

            type=

            'mmdet.models.task_modules.samplers.random_sampler.RandomSampler'

        )),

    rpn_proposal=dict(

        max_per_img=1000,

        min_bbox_size=0,

        nms=dict(iou_threshold=0.7, type='mmcv.ops.nms'),

        nms_pre=2000)),

type='mmdet.models.detectors.faster_rcnn.FasterRCNN')

optim_wrapper = dict(

optimizer=dict(

    lr=0.02, momentum=0.9, type='torch.optim.sgd.SGD',

    weight_decay=0.0001),

type='mmengine.optim.optimizer.optimizer_wrapper.OptimWrapper')

param_scheduler = [

dict(

    begin=0,

    by_epoch=False,

    end=500,

    start_factor=0.001,

    type='mmengine.optim.scheduler.lr_scheduler.LinearLR'),

dict(

    begin=0,

    by_epoch=True,

    end=12,

    gamma=0.1,

    milestones=[

        8,

        11,

    ],

    type='mmengine.optim.scheduler.lr_scheduler.MultiStepLR'),

]

resume = False

test_cfg = dict(type='mmengine.runner.loops.TestLoop')

test_dataloader = dict(

batch_size=1,

dataset=dict(

    ann_file='annotations/instances_val2017.json',

    backend_args=None,

    data_prefix=dict(img='val2017/'),

    data_root='data/coco/',

    pipeline=[

        dict(backend_args=None, type='mmcv.transforms.LoadImageFromFile'),

        dict(

            keep_ratio=True,

            scale=(

                1333,

                800,

            ),

            type='mmdet.datasets.transforms.Resize'),

        dict(

            type='mmdet.datasets.transforms.LoadAnnotations',

            with_bbox=True),

        dict(

            meta_keys=(

                'img_id',

                'img_path',

                'ori_shape',

                'img_shape',

                'scale_factor',

            ),

            type='mmdet.datasets.transforms.PackDetInputs'),

    ],

    test_mode=True,

    type='mmdet.datasets.CocoDataset'),

drop_last=False,

num_workers=2,

persistent_workers=True,

sampler=dict(

    shuffle=False, type='mmengine.dataset.sampler.DefaultSampler'))

test_evaluator = dict(

ann_file='data/coco/annotations/instances_val2017.json',

backend_args=None,

format_only=False,

metric='bbox',

type='mmdet.evaluation.CocoMetric')

test_pipeline = [

dict(backend_args=None, type='mmcv.transforms.LoadImageFromFile'),

dict(

    keep_ratio=True,

    scale=(

        1333,

        800,

    ),

    type='mmdet.datasets.transforms.Resize'),

dict(type='mmdet.datasets.transforms.LoadAnnotations', with_bbox=True),

dict(

    meta_keys=(

        'img_id',

        'img_path',

        'ori_shape',

        'img_shape',

        'scale_factor',

    ),

    type='mmdet.datasets.transforms.PackDetInputs'),

]

train_cfg = dict(

max_epochs=12,

type='mmengine.runner.loops.EpochBasedTrainLoop',

val_interval=1)

train_dataloader = dict(

batch_sampler=dict(type='mmdet.datasets.AspectRatioBatchSampler'),

batch_size=2,

dataset=dict(

    ann_file='annotations/instances_train2017.json',

    backend_args=None,

    data_prefix=dict(img='train2017/'),

    data_root='data/coco/',

    filter_cfg=dict(filter_empty_gt=True, min_size=32),

    pipeline=[

        dict(backend_args=None, type='mmcv.transforms.LoadImageFromFile'),

        dict(

            type='mmdet.datasets.transforms.LoadAnnotations',

            with_bbox=True),

        dict(

            keep_ratio=True,

            scale=(

                1333,

                800,

            ),

            type='mmdet.datasets.transforms.Resize'),

        dict(prob=0.5, type='mmdet.datasets.transforms.RandomFlip'),

        dict(type='mmdet.datasets.transforms.PackDetInputs'),

    ],

    type='mmdet.datasets.CocoDataset'),

num_workers=2,

persistent_workers=True,

sampler=dict(shuffle=True, type='mmengine.dataset.sampler.DefaultSampler'))

train_pipeline = [

dict(backend_args=None, type='mmcv.transforms.LoadImageFromFile'),

dict(type='mmdet.datasets.transforms.LoadAnnotations', with_bbox=True),

dict(

    keep_ratio=True,

    scale=(

        1333,

        800,

    ),

    type='mmdet.datasets.transforms.Resize'),

dict(prob=0.5, type='mmdet.datasets.transforms.RandomFlip'),

dict(type='mmdet.datasets.transforms.PackDetInputs'),

]

val_cfg = dict(type='mmengine.runner.loops.ValLoop')

val_dataloader = dict(

batch_size=1,

dataset=dict(

    ann_file='annotations/instances_val2017.json',

    backend_args=None,

    data_prefix=dict(img='val2017/'),

    data_root='data/coco/',

    pipeline=[

        dict(backend_args=None, type='mmcv.transforms.LoadImageFromFile'),

        dict(

            keep_ratio=True,

            scale=(

                1333,

                800,

            ),

            type='mmdet.datasets.transforms.Resize'),

        dict(

            type='mmdet.datasets.transforms.LoadAnnotations',

            with_bbox=True),

        dict(

            meta_keys=(

                'img_id',

                'img_path',

                'ori_shape',

                'img_shape',

                'scale_factor',

            ),

            type='mmdet.datasets.transforms.PackDetInputs'),

    ],

    test_mode=True,

    type='mmdet.datasets.CocoDataset'),

drop_last=False,

num_workers=2,

persistent_workers=True,

sampler=dict(

    shuffle=False, type='mmengine.dataset.sampler.DefaultSampler'))

val_evaluator = dict(

ann_file='data/coco/annotations/instances_val2017.json',

backend_args=None,

format_only=False,

metric='bbox',

type='mmdet.evaluation.CocoMetric')

vis_backends = [

dict(type='mmengine.visualization.LocalVisBackend'),

]

visualizer = dict(

name='visualizer',

type='mmdet.visualization.DetLocalVisualizer',

vis_backends=[

    dict(type='mmengine.visualization.LocalVisBackend'),

])

work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_coco'

标签：人工智能目标检测算法

本文转载自: https://blog.csdn.net/m0_65787507/article/details/138442917
版权归原作者 MorleyOlsen 所有，如有侵权，请联系我们删除。

【人工智能Ⅱ】实验6：目标检测算法

打开文本文件

初始化存储数值的列表

遍历每一行文本，提取数值

获取模型迭代次数（假设为每迭代一次记录一次）

绘制图像

Copyright (c) OpenMMLab. All rights reserved.

Copyright (c) OpenMMLab. All rights reserved.

发表评论

“【人工智能Ⅱ】实验6：目标检测算法”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航