0


语义分割系列15-UPerNet(pytorch实现)

UPerNet:《Unified Perceptual Parsing for Scene Understanding》

发布于2018ECCV。


引文

人类在识别物体上往往是通过多角度多层次的观察来得出物体类别的,包括物体的形状、纹理、位于什么环境背景中、其中包含了什么等等。比如,一扇窗,材质是玻璃,位于墙上,形状为矩形,综合这一堆结论,我们得出:哦!这是一扇窗。

在CV界,有做场景分析的、做材质识别的、做目标检测的、做语义分割的等等,但是很少有将这些任务集成在一个model上的研究,也就是Multi-task任务。

而Multi-task learning的数据集较少,同时制作也较为困难,因为对于不同任务的数据标签是异质的。比如,对于场景分析的ADE20K数据集来说,所有注释都是像素级别的对象,而对于描述纹理信息的数据集DTD(Describe Texture Dataset),标注都是图像级别的。这成为了数据集建立的瓶颈所在。

文章亮点

数据集创建

为了解决缺乏Multi-task 数据集的问题,作者使用Broadly and Densely Labeled Dataset (Broden)来统一了ADE20K、Pascal-Context、Pascal-Part、OpenSurfaces、和Describable Textures Dataset (DTD)这几个数据集。这些数据集中包含了各种场景、对象、对象的部分组成件和材料。接着,作者对类别不均衡问题做了进一步处理,包括删除出现次数少于50张图像的类别、删除像素数少于50000的类别。总之,作者构建了一个十分宏大的Multi-task数据集,总共62,262张图像。


图1 Multi-task Dataset


图2 数据集中的一些样本

模型设计

UPerNet的模型设计总体基于FPN(Feature Pyramid Network)和PPM(Pyramid Pooling Module),如下图。


图3 UPerNet模型设计

作者为每一个task设计了不同的检测头。

  • 对于Scene parse任务,由于场景类别的注释是图像级别的,所有并不需要做上采样操作,直接在PPM Head的输出后,连接一个卷积、池化和线性分类器即可。
  • 对于Object和Object part segmentation任务,也就是语义分割任务,UPerNet在FPN的每一层做了一个特征融合,将融合后的特征输入两个结构等同的检测头中,完成物体或物体部分的分割。
  • 对于Material任务,也就是材质检测任务,需要FPN最后一次的输出结果进行预测,因为对于这些材料,上下文的信息也是十分重要的,比如玻璃材质的杯子,那么在先验上,我们会认为玻璃杯子一般会在桌子上,根据图像中的上下文信息——玻璃杯子在桌子上,相比于没有上下文语义信息的模型来说,拥有更多上下文信息的模型可以更好的去检测这个玻璃杯子。
  • 对于Texture task,纹理检测任务,它的检测头是经过特别设计的,而且,额外叠加其它层的信息并与其他检测任务融合的话,对于纹理检测是有害的。因此,在这里,直接将FPN第一层的语义结果作为texture检测头的输入,同时,在检测头Head中额外添加了4个卷积层,每一个卷积层拥有128个通道,同时,该部分的梯度是不允许反向传播的,以避免对其他任务进行干扰。这样设计有几个原因,一是纹理是最低级别的语义信息,也就是纯粹一眼就能看出来的,根本不需要融合高级语义。二是对其他任务进行训练时,模型在无形中就得到了纹理的结果,毕竟同一类物体的纹理往往是同质的,或者说每一个物体都有其对应的纹理。

语义分割部分

当然,本文题名为语义分割系列,作者对于UPerNet的使用也主要局限在语义分割部分,因此,可以对UPerNet的其他分支进行剪枝,删去其他分支的检测头,只保留语义分割部分的检测头即可。也就是下图这样:

图4 UPerNet语义分割任务部分

论文中的一些结果

Multi-task Learning的分割、分类结果:

场景中内容物的关系可视化结果:

总结

UPerNet做了一个Multi-task learning的任务示范,创建了一个多任务的数据集。合理设计了UPerNet的主干部分和检测头部分用于不同任务的分类。

模型复现

backbone-ResNet50

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. class BasicBlock(nn.Module):
  5. expansion: int = 4
  6. def __init__(self, inplanes, planes, stride = 1, downsample = None, groups = 1,
  7. base_width = 64, dilation = 1, norm_layer = None):
  8. super(BasicBlock, self).__init__()
  9. if norm_layer is None:
  10. norm_layer = nn.BatchNorm2d
  11. if groups != 1 or base_width != 64:
  12. raise ValueError("BasicBlock only supports groups=1 and base_width=64")
  13. if dilation > 1:
  14. raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
  15. # Both self.conv1 and self.downsample layers downsample the input when stride != 1
  16. self.conv1 = nn.Conv2d(inplanes, planes ,kernel_size=3, stride=stride,
  17. padding=dilation,groups=groups, bias=False,dilation=dilation)
  18. self.bn1 = norm_layer(planes)
  19. self.relu = nn.ReLU(inplace=True)
  20. self.conv2 = nn.Conv2d(planes, planes ,kernel_size=3, stride=stride,
  21. padding=dilation,groups=groups, bias=False,dilation=dilation)
  22. self.bn2 = norm_layer(planes)
  23. self.downsample = downsample
  24. self.stride = stride
  25. def forward(self, x):
  26. identity = x
  27. out = self.conv1(x)
  28. out = self.bn1(out)
  29. out = self.relu(out)
  30. out = self.conv2(out)
  31. out = self.bn2(out)
  32. if self.downsample is not None:
  33. identity = self.downsample(x)
  34. out += identity
  35. out = self.relu(out)
  36. return out
  37. class Bottleneck(nn.Module):
  38. expansion = 4
  39. def __init__(self, inplanes, planes, stride=1, downsample= None,
  40. groups = 1, base_width = 64, dilation = 1, norm_layer = None,):
  41. super(Bottleneck, self).__init__()
  42. if norm_layer is None:
  43. norm_layer = nn.BatchNorm2d
  44. width = int(planes * (base_width / 64.0)) * groups
  45. # Both self.conv2 and self.downsample layers downsample the input when stride != 1
  46. self.conv1 = nn.Conv2d(inplanes, width, kernel_size=1, stride=1, bias=False)
  47. self.bn1 = norm_layer(width)
  48. self.conv2 = nn.Conv2d(width, width, kernel_size=3, stride=stride, bias=False, padding=dilation, dilation=dilation)
  49. self.bn2 = norm_layer(width)
  50. self.conv3 = nn.Conv2d(width, planes * self.expansion, kernel_size=1, stride=1, bias=False)
  51. self.bn3 = norm_layer(planes * self.expansion)
  52. self.relu = nn.ReLU(inplace=True)
  53. self.downsample = downsample
  54. self.stride = stride
  55. def forward(self, x):
  56. identity = x
  57. out = self.conv1(x)
  58. out = self.bn1(out)
  59. out = self.relu(out)
  60. out = self.conv2(out)
  61. out = self.bn2(out)
  62. out = self.relu(out)
  63. out = self.conv3(out)
  64. out = self.bn3(out)
  65. if self.downsample is not None:
  66. identity = self.downsample(x)
  67. out += identity
  68. out = self.relu(out)
  69. return out
  70. class ResNet(nn.Module):
  71. def __init__(
  72. self,block, layers,num_classes = 1000, zero_init_residual = False, groups = 1,
  73. width_per_group = 64, replace_stride_with_dilation = None, norm_layer = None):
  74. super(ResNet, self).__init__()
  75. if norm_layer is None:
  76. norm_layer = nn.BatchNorm2d
  77. self._norm_layer = norm_layer
  78. self.inplanes = 64
  79. self.dilation = 1
  80. if replace_stride_with_dilation is None:
  81. # each element in the tuple indicates if we should replace
  82. # the 2x2 stride with a dilated convolution instead
  83. replace_stride_with_dilation = [False, False, False]
  84. if len(replace_stride_with_dilation) != 3:
  85. raise ValueError(
  86. "replace_stride_with_dilation should be None "
  87. f"or a 3-element tuple, got {replace_stride_with_dilation}"
  88. )
  89. self.groups = groups
  90. self.base_width = width_per_group
  91. self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
  92. self.bn1 = norm_layer(self.inplanes)
  93. self.relu = nn.ReLU(inplace=True)
  94. self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
  95. self.layer1 = self._make_layer(block, 64, layers[0])
  96. self.layer2 = self._make_layer(block, 128, layers[1], stride=2, dilate=replace_stride_with_dilation[0])
  97. self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1])
  98. self.layer4 = self._make_layer(block, 512, layers[3], stride=2, dilate=replace_stride_with_dilation[2])
  99. self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
  100. self.fc = nn.Linear(512 * block.expansion, num_classes)
  101. for m in self.modules():
  102. if isinstance(m, nn.Conv2d):
  103. nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
  104. elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
  105. nn.init.constant_(m.weight, 1)
  106. nn.init.constant_(m.bias, 0)
  107. # Zero-initialize the last BN in each residual branch,
  108. # so that the residual branch starts with zeros, and each residual block behaves like an identity.
  109. # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
  110. if zero_init_residual:
  111. for m in self.modules():
  112. if isinstance(m, Bottleneck):
  113. nn.init.constant_(m.bn3.weight, 0) # type: ignore[arg-type]
  114. elif isinstance(m, BasicBlock):
  115. nn.init.constant_(m.bn2.weight, 0) # type: ignore[arg-type]
  116. def _make_layer(
  117. self,
  118. block,
  119. planes,
  120. blocks,
  121. stride = 1,
  122. dilate = False,
  123. ):
  124. norm_layer = self._norm_layer
  125. downsample = None
  126. previous_dilation = self.dilation
  127. if dilate:
  128. self.dilation *= stride
  129. stride = stride
  130. if stride != 1 or self.inplanes != planes * block.expansion:
  131. downsample = nn.Sequential(
  132. nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False),
  133. norm_layer(planes * block.expansion))
  134. layers = []
  135. layers.append(
  136. block(
  137. self.inplanes, planes, stride, downsample, self.groups, self.base_width, previous_dilation, norm_layer
  138. )
  139. )
  140. self.inplanes = planes * block.expansion
  141. for _ in range(1, blocks):
  142. layers.append(
  143. block(
  144. self.inplanes,
  145. planes,
  146. groups=self.groups,
  147. base_width=self.base_width,
  148. dilation=self.dilation,
  149. norm_layer=norm_layer,
  150. )
  151. )
  152. return nn.Sequential(*layers)
  153. def _forward_impl(self, x):
  154. out = []
  155. x = self.conv1(x)
  156. x = self.bn1(x)
  157. x = self.relu(x)
  158. x = self.maxpool(x)
  159. x = self.layer1(x)
  160. out.append(x)
  161. x = self.layer2(x)
  162. out.append(x)
  163. x = self.layer3(x)
  164. out.append(x)
  165. x = self.layer4(x)
  166. out.append(x)
  167. return out
  168. def forward(self, x) :
  169. return self._forward_impl(x)
  170. def _resnet(block, layers, pretrained_path = None, **kwargs,):
  171. model = ResNet(block, layers, **kwargs)
  172. if pretrained_path is not None:
  173. model.load_state_dict(torch.load(pretrained_path), strict=False)
  174. return model
  175. def resnet50(pretrained_path=None, **kwargs):
  176. return ResNet._resnet(Bottleneck, [3, 4, 6, 3],pretrained_path,**kwargs)
  177. def resnet101(pretrained_path=None, **kwargs):
  178. return ResNet._resnet(Bottleneck, [3, 4, 23, 3],pretrained_path,**kwargs)

Decoder = FPN+PPM

  1. class PPM(nn.ModuleList):
  2. def __init__(self, pool_sizes, in_channels, out_channels):
  3. super(PPM, self).__init__()
  4. self.pool_sizes = pool_sizes
  5. self.in_channels = in_channels
  6. self.out_channels = out_channels
  7. for pool_size in pool_sizes:
  8. self.append(
  9. nn.Sequential(
  10. nn.AdaptiveMaxPool2d(pool_size),
  11. nn.Conv2d(self.in_channels, self.out_channels, kernel_size=1),
  12. )
  13. )
  14. def forward(self, x):
  15. out_puts = []
  16. for ppm in self:
  17. ppm_out = nn.functional.interpolate(ppm(x), size=(x.size(2), x.size(3)), mode='bilinear', align_corners=True)
  18. out_puts.append(ppm_out)
  19. return out_puts
  20. class PPMHEAD(nn.Module):
  21. def __init__(self, in_channels, out_channels, pool_sizes = [1, 2, 3, 6],num_classes=31):
  22. super(PPMHEAD, self).__init__()
  23. self.pool_sizes = pool_sizes
  24. self.num_classes = num_classes
  25. self.in_channels = in_channels
  26. self.out_channels = out_channels
  27. self.psp_modules = PPM(self.pool_sizes, self.in_channels, self.out_channels)
  28. self.final = nn.Sequential(
  29. nn.Conv2d(self.in_channels + len(self.pool_sizes)*self.out_channels, self.out_channels, kernel_size=1),
  30. nn.BatchNorm2d(self.out_channels),
  31. nn.ReLU(),
  32. )
  33. def forward(self, x):
  34. out = self.psp_modules(x)
  35. out.append(x)
  36. out = torch.cat(out, 1)
  37. out = self.final(out)
  38. return out
  39. class FPNHEAD(nn.Module):
  40. def __init__(self, channels=2048, out_channels=256):
  41. super(FPNHEAD, self).__init__()
  42. self.PPMHead = PPMHEAD(in_channels=channels, out_channels=out_channels)
  43. self.Conv_fuse1 = nn.Sequential(
  44. nn.Conv2d(channels//2, out_channels, 1),
  45. nn.BatchNorm2d(out_channels),
  46. nn.ReLU()
  47. )
  48. self.Conv_fuse1_ = nn.Sequential(
  49. nn.Conv2d(out_channels, out_channels, 1),
  50. nn.BatchNorm2d(out_channels),
  51. nn.ReLU()
  52. )
  53. self.Conv_fuse2 = nn.Sequential(
  54. nn.Conv2d(channels//4, out_channels, 1),
  55. nn.BatchNorm2d(out_channels),
  56. nn.ReLU()
  57. )
  58. self.Conv_fuse2_ = nn.Sequential(
  59. nn.Conv2d(out_channels, out_channels, 1),
  60. nn.BatchNorm2d(out_channels),
  61. nn.ReLU()
  62. )
  63. self.Conv_fuse3 = nn.Sequential(
  64. nn.Conv2d(channels//8, out_channels, 1),
  65. nn.BatchNorm2d(out_channels),
  66. nn.ReLU()
  67. )
  68. self.Conv_fuse3_ = nn.Sequential(
  69. nn.Conv2d(out_channels, out_channels, 1),
  70. nn.BatchNorm2d(out_channels),
  71. nn.ReLU()
  72. )
  73. self.fuse_all = nn.Sequential(
  74. nn.Conv2d(out_channels*4, out_channels, 1),
  75. nn.BatchNorm2d(out_channels),
  76. nn.ReLU()
  77. )
  78. self.conv_x1 = nn.Conv2d(out_channels, out_channels, 1)
  79. def forward(self, input_fpn):
  80. # b, 512, 7, 7
  81. x1 = self.PPMHead(input_fpn[-1])
  82. x = nn.functional.interpolate(x1, size=(x1.size(2)*2, x1.size(3)*2),mode='bilinear', align_corners=True)
  83. x = self.conv_x1(x) + self.Conv_fuse1(input_fpn[-2])
  84. x2 = self.Conv_fuse1_(x)
  85. x = nn.functional.interpolate(x2, size=(x2.size(2)*2, x2.size(3)*2),mode='bilinear', align_corners=True)
  86. x = x + self.Conv_fuse2(input_fpn[-3])
  87. x3 = self.Conv_fuse2_(x)
  88. x = nn.functional.interpolate(x3, size=(x3.size(2)*2, x3.size(3)*2),mode='bilinear', align_corners=True)
  89. x = x + self.Conv_fuse3(input_fpn[-4])
  90. x4 = self.Conv_fuse3_(x)
  91. x1 = F.interpolate(x1, x4.size()[-2:],mode='bilinear', align_corners=True)
  92. x2 = F.interpolate(x2, x4.size()[-2:],mode='bilinear', align_corners=True)
  93. x3 = F.interpolate(x3, x4.size()[-2:],mode='bilinear', align_corners=True)
  94. x = self.fuse_all(torch.cat([x1, x2, x3, x4], 1))
  95. return x

model

  1. class UPerNet(nn.Module):
  2. def __init__(self, num_classes):
  3. super(UPerNet, self).__init__()
  4. self.num_classes = num_classes
  5. self.backbone = ResNet.resnet50(replace_stride_with_dilation=[1,2,4])
  6. self.in_channels = 2048
  7. self.channels = 256
  8. self.decoder = FPNHEAD()
  9. self.cls_seg = nn.Sequential(
  10. nn.Conv2d(self.channels, self.num_classes, kernel_size=3, padding=1),
  11. )
  12. def forward(self, x):
  13. x = self.backbone(x)
  14. x = self.decoder(x)
  15. x = nn.functional.interpolate(x, size=(x.size(2)*4, x.size(3)*4),mode='bilinear', align_corners=True)
  16. x = self.cls_seg(x)
  17. return x

Dataset-Camvid

  1. # 导入库
  2. import os
  3. os.environ['CUDA_VISIBLE_DEVICES'] = '0'
  4. import torch
  5. import torch.nn as nn
  6. import torch.optim as optim
  7. import torch.nn.functional as F
  8. from torch import optim
  9. from torch.utils.data import Dataset, DataLoader, random_split
  10. from tqdm import tqdm
  11. import warnings
  12. warnings.filterwarnings("ignore")
  13. import os.path as osp
  14. import matplotlib.pyplot as plt
  15. from PIL import Image
  16. import numpy as np
  17. import albumentations as A
  18. from albumentations.pytorch.transforms import ToTensorV2
  19. torch.manual_seed(17)
  20. # 自定义数据集CamVidDataset
  21. class CamVidDataset(torch.utils.data.Dataset):
  22. """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.
  23. Args:
  24. images_dir (str): path to images folder
  25. masks_dir (str): path to segmentation masks folder
  26. class_values (list): values of classes to extract from segmentation mask
  27. augmentation (albumentations.Compose): data transfromation pipeline
  28. (e.g. flip, scale, etc.)
  29. preprocessing (albumentations.Compose): data preprocessing
  30. (e.g. noralization, shape manipulation, etc.)
  31. """
  32. def __init__(self, images_dir, masks_dir):
  33. self.transform = A.Compose([
  34. A.Resize(224, 224),
  35. A.HorizontalFlip(),
  36. A.VerticalFlip(),
  37. A.Normalize(),
  38. ToTensorV2(),
  39. ])
  40. self.ids = os.listdir(images_dir)
  41. self.images_fps = [os.path.join(images_dir, image_id) for image_id in self.ids]
  42. self.masks_fps = [os.path.join(masks_dir, image_id) for image_id in self.ids]
  43. def __getitem__(self, i):
  44. # read data
  45. image = np.array(Image.open(self.images_fps[i]).convert('RGB'))
  46. mask = np.array( Image.open(self.masks_fps[i]).convert('RGB'))
  47. image = self.transform(image=image,mask=mask)
  48. return image['image'], image['mask'][:,:,0]
  49. def __len__(self):
  50. return len(self.ids)
  51. # 设置数据集路径
  52. DATA_DIR = r'dataset\camvid' # 根据自己的路径来设置
  53. x_train_dir = os.path.join(DATA_DIR, 'train_images')
  54. y_train_dir = os.path.join(DATA_DIR, 'train_labels')
  55. x_valid_dir = os.path.join(DATA_DIR, 'valid_images')
  56. y_valid_dir = os.path.join(DATA_DIR, 'valid_labels')
  57. train_dataset = CamVidDataset(
  58. x_train_dir,
  59. y_train_dir,
  60. )
  61. val_dataset = CamVidDataset(
  62. x_valid_dir,
  63. y_valid_dir,
  64. )
  65. train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True,drop_last=True)
  66. val_loader = DataLoader(val_dataset, batch_size=8, shuffle=True,drop_last=True)
  67. model = UPerNet(num_classes=33).cuda()
  68. #model.load_state_dict(torch.load(r"checkpoints/resnet101-5d3b4d8f.pth"), strict=False)

Train

  1. from d2l import torch as d2l
  2. from tqdm import tqdm
  3. import pandas as pd
  4. #损失函数选用多分类交叉熵损失函数
  5. lossf = nn.CrossEntropyLoss(ignore_index=255)
  6. #选用adam优化器来训练
  7. optimizer = optim.SGD(model.parameters(), lr=0.1)
  8. scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.5, last_epoch=-1)
  9. #训练50轮
  10. epochs_num = 100
  11. def train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,scheduler,
  12. devices=d2l.try_all_gpus()):
  13. timer, num_batches = d2l.Timer(), len(train_iter)
  14. animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
  15. legend=['train loss', 'train acc', 'test acc'])
  16. net = nn.DataParallel(net, device_ids=devices).to(devices[0])
  17. loss_list = []
  18. train_acc_list = []
  19. test_acc_list = []
  20. epochs_list = []
  21. time_list = []
  22. for epoch in range(num_epochs):
  23. # Sum of training loss, sum of training accuracy, no. of examples,
  24. # no. of predictions
  25. metric = d2l.Accumulator(4)
  26. for i, (features, labels) in enumerate(train_iter):
  27. timer.start()
  28. l, acc = d2l.train_batch_ch13(
  29. net, features, labels.long(), loss, trainer, devices)
  30. metric.add(l, acc, labels.shape[0], labels.numel())
  31. timer.stop()
  32. if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
  33. animator.add(epoch + (i + 1) / num_batches,
  34. (metric[0] / metric[2], metric[1] / metric[3],
  35. None))
  36. test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
  37. animator.add(epoch + 1, (None, None, test_acc))
  38. scheduler.step()
  39. print(f"epoch {epoch+1} --- loss {metric[0] / metric[2]:.3f} --- train acc {metric[1] / metric[3]:.3f} --- test acc {test_acc:.3f} --- cost time {timer.sum()}")
  40. #---------保存训练数据---------------
  41. df = pd.DataFrame()
  42. loss_list.append(metric[0] / metric[2])
  43. train_acc_list.append(metric[1] / metric[3])
  44. test_acc_list.append(test_acc)
  45. epochs_list.append(epoch+1)
  46. time_list.append(timer.sum())
  47. df['epoch'] = epochs_list
  48. df['loss'] = loss_list
  49. df['train_acc'] = train_acc_list
  50. df['test_acc'] = test_acc_list
  51. df['time'] = time_list
  52. df.to_excel("savefile/UPerNet_camvid.xlsx")
  53. #----------------保存模型-------------------
  54. if np.mod(epoch+1, 5) == 0:
  55. torch.save(model.state_dict(), f'checkpoints/UPerNet_{epoch+1}.pth')
  56. train_ch13(model, train_loader, val_loader, lossf, optimizer, epochs_num,scheduler)

Training curve


本文转载自: https://blog.csdn.net/yumaomi/article/details/125376320
版权归原作者 yumaomi 所有, 如有侵权,请联系我们删除。

“语义分割系列15-UPerNet(pytorch实现)”的评论:

还没有评论