0


deeplab v3+ 源码详解

训练模型:

** 下载好voc数据集,并传入所需的参数即可进行训练。**

参数配置:

  1. """
  2. 训练:
  3. --model deeplabv3plus_mobilenet
  4. --gpu_id 0
  5. --year 2012_aug
  6. --crop_val
  7. --lr 0.01
  8. --crop_size 513
  9. --batch_size 4
  10. --output_stride 16
  11. 测试:
  12. --model deeplabv3plus_mobilenet
  13. --gpu_id 0 --year 2012_aug
  14. --crop_val
  15. --lr 0.01
  16. --crop_size 513
  17. --batch_size 16
  18. --output_stride 16
  19. --ckpt checkpoints/best_deeplabv3plus_mobilenet_voc_os16.pth
  20. --test_only
  21. --save_val_results
  22. """

1.数据预处理部分

  1. deeplab v3+默认使用voc数据集和cityspace数据集,图片预处理部分仅仅读取图片和对应的标签,同时对图片进行随机翻转、随机裁剪等常见图片预处理方式。
  1. def get_dataset(opts):
  2. """ Dataset And Augmentation
  3. """
  4. if opts.dataset == 'voc':
  5. train_transform = et.ExtCompose([
  6. #et.ExtResize(size=opts.crop_size),
  7. et.ExtRandomScale((0.5, 2.0)),
  8. et.ExtRandomCrop(size=(opts.crop_size, opts.crop_size), pad_if_needed=True),
  9. et.ExtRandomHorizontalFlip(), # 以给定的概率水平翻转给定的图像
  10. et.ExtToTensor(),
  11. et.ExtNormalize(mean=[0.485, 0.456, 0.406],
  12. std=[0.229, 0.224, 0.225]),
  13. ])
  14. if opts.crop_val:
  15. val_transform = et.ExtCompose([
  16. et.ExtResize(opts.crop_size),
  17. et.ExtCenterCrop(opts.crop_size),
  18. et.ExtToTensor(),
  19. et.ExtNormalize(mean=[0.485, 0.456, 0.406],
  20. std=[0.229, 0.224, 0.225]),
  21. ])
  22. else:
  23. val_transform = et.ExtCompose([
  24. et.ExtToTensor(),
  25. et.ExtNormalize(mean=[0.485, 0.456, 0.406],
  26. std=[0.229, 0.224, 0.225]),
  27. ])
  28. # 读取数据
  29. train_dst = VOCSegmentation(root=opts.data_root, year=opts.year,
  30. image_set='train', download=opts.download, transform=train_transform)
  31. val_dst = VOCSegmentation(root=opts.data_root, year=opts.year,
  32. image_set='val', download=False, transform=val_transform)
  33. if opts.dataset == 'cityscapes':
  34. train_transform = et.ExtCompose([
  35. #et.ExtResize( 512 ),
  36. et.ExtRandomCrop(size=(opts.crop_size, opts.crop_size)),
  37. et.ExtColorJitter( brightness=0.5, contrast=0.5, saturation=0.5 ),
  38. et.ExtRandomHorizontalFlip(),
  39. et.ExtToTensor(),
  40. et.ExtNormalize(mean=[0.485, 0.456, 0.406],
  41. std=[0.229, 0.224, 0.225]),
  42. ])
  43. val_transform = et.ExtCompose([
  44. #et.ExtResize( 512 ),
  45. et.ExtToTensor(),
  46. et.ExtNormalize(mean=[0.485, 0.456, 0.406],
  47. std=[0.229, 0.224, 0.225]),
  48. ])
  49. train_dst = Cityscapes(root=opts.data_root,
  50. split='train', transform=train_transform)
  51. val_dst = Cityscapes(root=opts.data_root,
  52. split='val', transform=val_transform)
  53. return train_dst, val_dst

2.网络结构:

Ecoder部分

  1. 使用resnet作为网络的ecoder部分,resnet作为图像分类模型,将图像下采样了32倍,特征图信息损失比较大,尤其是目标分割而言,无法再下采样32倍的特征图中恢复细节信息,因此,resnet的最后三层,将根据需要的特征图的大小,将下采样换为空洞卷积。输出layer4经过ASPP层的结果(下采样16倍或者8倍),同时也输出layer1的结果。
  2. ** ASPP层:**

** ** 如图所示,ASPP由不同空洞率的空洞卷积组成。以实现不同感受野的特征信息融合。此外,还有一个细节是空洞卷积的padding=空洞率,以保证输入输出特征图大小不改变。

  1. class ASPPConv(nn.Sequential):
  2. def __init__(self, in_channels, out_channels, dilation):
  3. modules = [
  4. nn.Conv2d(in_channels, out_channels, 3, padding=dilation, dilation=dilation, bias=False),
  5. nn.BatchNorm2d(out_channels),
  6. nn.ReLU(inplace=True)
  7. ]
  8. super(ASPPConv, self).__init__(*modules)
  9. class ASPPPooling(nn.Sequential):
  10. def __init__(self, in_channels, out_channels):
  11. super(ASPPPooling, self).__init__(
  12. nn.AdaptiveAvgPool2d(1),
  13. nn.Conv2d(in_channels, out_channels, 1, bias=False),
  14. nn.BatchNorm2d(out_channels),
  15. nn.ReLU(inplace=True))
  16. def forward(self, x):
  17. size = x.shape[-2:]
  18. x = super(ASPPPooling, self).forward(x)
  19. return F.interpolate(x, size=size, mode='bilinear', align_corners=False)
  20. class ASPP(nn.Module):
  21. def __init__(self, in_channels, atrous_rates):
  22. super(ASPP, self).__init__()
  23. out_channels = 256
  24. modules = []
  25. modules.append(nn.Sequential(
  26. nn.Conv2d(in_channels, out_channels, 1, bias=False),
  27. nn.BatchNorm2d(out_channels),
  28. nn.ReLU(inplace=True)))
  29. rate1, rate2, rate3 = tuple(atrous_rates)
  30. modules.append(ASPPConv(in_channels, out_channels, rate1))
  31. modules.append(ASPPConv(in_channels, out_channels, rate2))
  32. modules.append(ASPPConv(in_channels, out_channels, rate3))
  33. modules.append(ASPPPooling(in_channels, out_channels))
  34. self.convs = nn.ModuleList(modules)
  35. self.project = nn.Sequential(
  36. nn.Conv2d(5 * out_channels, out_channels, 1, bias=False),
  37. nn.BatchNorm2d(out_channels),
  38. nn.ReLU(inplace=True),
  39. nn.Dropout(0.1),)
  40. def forward(self, x):
  41. res = []
  42. for conv in self.convs:
  43. #print(conv(x).shape)
  44. res.append(conv(x))
  45. res = torch.cat(res, dim=1)
  46. return self.project(res)

Decoder部分

  1. Layer4的输出首先会经过ASPP模块,然后经过1*1的卷积调整通道数至256,然后上采样至layer1输出结果的大小,将layer1的输出结果的通道数经过1*1的卷积调整至48,将这两个结果进行拼接。经过3*3的卷积后,再经过1*1的卷积对输出进行预测。

代码如下:

  1. class DeepLabHeadV3Plus(nn.Module):
  2. def __init__(self, in_channels, low_level_channels, num_classes, aspp_dilate=[12, 24, 36]):
  3. super(DeepLabHeadV3Plus, self).__init__()
  4. self.project = nn.Sequential(
  5. nn.Conv2d(low_level_channels, 48, 1, bias=False),
  6. nn.BatchNorm2d(48),
  7. nn.ReLU(inplace=True),
  8. )
  9. self.aspp = ASPP(in_channels, aspp_dilate)
  10. self.classifier = nn.Sequential(
  11. nn.Conv2d(304, 256, 3, padding=1, bias=False),
  12. nn.BatchNorm2d(256),
  13. nn.ReLU(inplace=True),
  14. nn.Conv2d(256, num_classes, 1)
  15. )
  16. self._init_weight()
  17. def forward(self, feature):
  18. #print(feature.shape)
  19. low_level_feature = self.project( feature['low_level'] )#return_layers = {'layer4': 'out', 'layer1': 'low_level'}
  20. #print(low_level_feature.shape)
  21. output_feature = self.aspp(feature['out'])
  22. #print(output_feature.shape)
  23. output_feature = F.interpolate(output_feature, size=low_level_feature.shape[2:], mode='bilinear', align_corners=False)
  24. #print(output_feature.shape)
  25. return self.classifier( torch.cat( [ low_level_feature, output_feature ], dim=1 ) )
  26. def _init_weight(self):
  27. for m in self.modules():
  28. if isinstance(m, nn.Conv2d):
  29. nn.init.kaiming_normal_(m.weight)
  30. elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
  31. nn.init.constant_(m.weight, 1)
  32. nn.init.constant_(m.bias, 0)
  1. dcoder的输出的特征图还是下采样4倍的结果,最终的输出需要继续进行双线性插值,调整到原特征图大小,最终代码如下:
  1. class _SimpleSegmentationModel(nn.Module):
  2. def __init__(self, backbone, classifier):
  3. super(_SimpleSegmentationModel, self).__init__()
  4. self.backbone = backbone
  5. self.classifier = classifier
  6. def forward(self, x):
  7. input_shape = x.shape[-2:]
  8. features = self.backbone(x)
  9. x = self.classifier(features)
  10. x = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False)
  11. return x

本文转载自: https://blog.csdn.net/qq_52053775/article/details/127086195
版权归原作者 樱花的浪漫 所有, 如有侵权,请联系我们删除。

“deeplab v3+ 源码详解”的评论:

还没有评论