0


动手学习ResNet50

ResNet 论文

《Deep Residual Learning for Image Recognition》
论文地址:https://arxiv.org/abs/1512.03385

残差网络(ResNet)

以学习ResNet的收获、ResNet50的复现二大部分,简述ResNet50网络。

一、学习ResNet的收获

  1. ResNet网络解决了深度CNN模型难训练的问题,并指出CNN模型随深度的加深可以得到更优的解,对于 plain/residual nets 均如此。随着 nets的加深,plain nets 的收敛速度变得缓慢,导致了CNN模型的深层网络不如较浅层网络,但是 residual nets 解决了这个问题。至于 residual nets 如何解决深层网络收敛速度变得缓慢的问题,得益于 residual learning 。
  2. Residual learning 的理想公式为 F ( x ) = H ( x ) − x . ( 1 ) F(x) = H(x)-x.(1) F(x)=H(x)−x.(1) H ( x ) = F ( x ) + x . ( 2 ) H(x)=F(x)+x.(2) H(x)=F(x)+x.(2) 设所需求解的底层映射为H(x),通过训练堆叠的非线性层映射F(x)=H(x)-x,因此H(x)=F(x)+x。 但是 Residual learning 的实际使用公式为 y = F ( x , W i ) + x . ( 3 ) y=F(x,\mathop{W}{i})+x.(3) y=F(x,Wi​)+x.(3) 或 y = F ( x , W i ) + W s x . ( 4 ) 或y=F(x,\mathop{W}{i})+\mathop{W}_{s}x.(4) 或y=F(x,Wi​)+Ws​x.(4) 设x、y、F分别为输入、输出、堆叠的非线性层映射。采用标识性快捷连接(Identity Mapping by Shortcuts),把输入x与堆叠的非线性层映射F相加得到输出y。其中Wi,Ws均为线性投影,对于x和F的维数(及尺寸),相同时使用公式(3),不相同时使用公式(4)。 (Fig 2.为 Residual learning 的结构图)在这里插入图片描述
  3. 对于 plain/residual nets 的深度探究与验证。1). plain nets 随着网络深度的加深,plain nets 的性能降低并非由梯度消失或梯度爆炸引起。对于 20-layer and 56-layer “plain” networks,其 training error 和 test error 随网络深度的加深而提高,但并非无法训练(如Fig 1.)。因此网络深度的加深,导致了网络收敛速度的降低。在这里插入图片描述2). residual nets 随着网络深度的加深,residual nets 的性能提升并非一成不变。《Deep Residual Learning for Image Recognition》指出 1202-layers ResNet 的 error 比 110-layers ResNet 的 error 高出 1.5% 左右,在CIFAR-10数据集及其他条件相同下测得(见 Table 6.)。何博士猜测数据集的规模未能匹配网络的规模,造成 residual nets 性能的降低(通俗:数据集的规模太小而网络的深度太大,造成 residual nets 性能的降低)。在这里插入图片描述

二、ResNet50的复现

通过九大部分复现ResNet50,例如数据增强、参数控制、数据集构建、网络构建、模型加载、模型训练、绘制损失图、模型验证、模型预测。

1. 数据增强

  1. 数据增强由缩放、裁剪、旋转、亮度、对比度、颜色的变化组成。调用pytorch transformer完成。
  2. 注意transformer匹配PILRGB格式,而非OpencvBGR格式。

打开图片

  1. import matplotlib.pyplot as plt
  2. from PIL import Image
  3. from torchvision import transforms as tfs
  4. img = Image.open('gift/bald3.jpg')
  5. plt.imshow(img)
  6. plt.show()# print(img.size)--->(301, 301)

在这里插入图片描述
比例缩放

  1. # 比例缩放# rint('before scale, shape:{}'.format(img.size))-->(before scale, shape:(301, 301))# 参数为元组(row,col)时,图片尺寸硬性放缩为(row,col)(宽,长)# 参数为数字size时,图片按比例缩放,其中min(row,col)=size
  2. new_img = tfs.Resize(256)(img)# print('after scale, shape:{}'.format(new_img.size))-->(after scale, shape:(256, 256))
  3. plt.imshow(new_img)
  4. plt.show()

在这里插入图片描述
随机裁剪

  1. # 随机裁剪
  2. random_img = tfs.RandomCrop(224)(new_img)
  3. plt.imshow(random_img)
  4. plt.show()

在这里插入图片描述
随机旋转

  1. # 随机角度旋转for i inrange(1):
  2. rot_img = tfs.RandomRotation(180)(new_img)
  3. plt.imshow(rot_img)
  4. plt.show()

在这里插入图片描述

  1. # 在这里给出图片翻转的代码,效果自行实验# 随机水平翻转
  2. h_filp = tfs.RandomHorizontalFlip()(new_img)
  3. plt.imshow(h_filp)
  4. plt.show()# 随机竖直翻转
  5. v_filp = tfs.RandomVerticalFlip()(new_img)
  6. plt.imshow(v_filp)
  7. plt.show()

亮度变化

  1. # 亮度
  2. bright_img = tfs.ColorJitter(brightness=1)(new_img)# 随机从 0---2 之间亮度变化,1 表示原图
  3. plt.imshow(bright_img)
  4. plt.show()

在这里插入图片描述
对比度变化

  1. # 对比度
  2. contrast_img = tfs.ColorJitter(contrast=1)(new_img)# 随机从 0---2 之间对比度变化,1 表示原图
  3. plt.imshow(contrast_img)
  4. plt.show()

在这里插入图片描述
颜色变化

  1. # 颜色
  2. color_img = tfs.ColorJitter(hue=0.5)(new_img)# 随机从 -0.5---0.5 之间对颜色变化
  3. plt.imshow(color_img)
  4. plt.show()

在这里插入图片描述
数据增强总实验

  1. img_aug = tfs.Compose([
  2. tfs.Resize(256),
  3. tfs.RandomRotation(180),
  4. tfs.RandomCrop(224),
  5. tfs.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5)])# 以九宫格的形式展示
  6. nrows =3
  7. ncols =3
  8. figsize =(32,32)
  9. _, figs = plt.subplots(nrows, ncols, figsize=figsize)for i inrange(nrows):for j inrange(ncols):
  10. figs[i][j].imshow(img_aug(img))
  11. figs[i][j].axes.get_xaxis().set_visible(False)
  12. figs[i][j].axes.get_yaxis().set_visible(False)
  13. plt.show()

在这里插入图片描述

2. 参数控制

  1. batch_size =32# 每次喂入的数据量
  2. lr =0.01# 学习率
  3. step_size =1# 每n个epoch更新一次学习率
  4. epoch_num =10# 总迭代次数
  5. num_print =280# 每n次batch打印一次
  6. num_check =1# 每n个epoch验证一次模型,若效果更优则保存模型
  7. enhance =False# 是否数据增强

3. 数据集构建

数据预处理,采用标准化处理。详细见动手学习VGG16的数据预处理。

  1. """
  2. train_path、verification_path、test_path 同为字典(类别kind,列表list),其中列表内存放着图片的绝对路径。
  3. labels 也是一个字典(类别kind,序号number),序号为1~n的数字
  4. """import torch
  5. import random
  6. from PIL import Image
  7. from torch.autograd import Variable
  8. from torchvision import transforms
  9. from torch.utils.data import Dataset, DataLoader
  10. en_transform = transforms.Compose([
  11. transforms.Resize(256),
  12. transforms.RandomRotation(180),
  13. transforms.RandomCrop(224),
  14. transforms.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5),
  15. transforms.ToTensor(),
  16. transforms.Normalize((0.51865010496974,0.49877608954906466,0.5143190141916275),(7.181780141103533,8.053991771959863,8.290017965464534))])
  17. no_transform = transforms.Compose([
  18. transforms.Resize(224),
  19. transforms.ToTensor(),
  20. transforms.Normalize((0.51865010496974,0.49877608954906466,0.5143190141916275),(7.181780141103533,8.053991771959863,8.290017965464534))])# -----------------ready the dataset--------------------------defdefault_loader(path):
  21. img = Image.open(path)return img
  22. classMyDataset(Dataset):# 构造函数def__init__(self, path, transform=None, target_transform=None, loader=default_loader, enhance=False):
  23. imgs =[]for classification in path:for i inrange(len(path[classification])):
  24. img_path = path[classification][i]
  25. img_label = labels[classification]
  26. imgs.append((img_path,int(img_label)))#imgs中包含有图像路径和标签
  27. self.path = path
  28. self.imgs = imgs
  29. self.transform = transform
  30. self.target_transform = target_transform
  31. self.loader = loader
  32. # hash_map建立def__getitem__(self, index):
  33. img_path, img_label = self.imgs[index]# 调用 opencv 打开图片
  34. img = self.loader(img_path)if self.transform isnotNone:
  35. img = self.transform(img)
  36. img_label -=1return img, img_label
  37. def__len__(self):returnlen(self.imgs)
  38. train_data = MyDataset(train_path, transform=no_transform,target_transform=en_transform,enhance=True)
  39. verification_data = MyDataset(verification_path, transform=no_transform,target_transform=en_transform,enhance=False)
  40. test_data = MyDataset(test_path, transform=no_transform,target_transform=en_transform,enhance=False)#train_data verification_datatest_data包含多有的训练、验证与测试数据,调用DataLoader批量加载
  41. train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
  42. verification_loader = DataLoader(dataset=verification_data, batch_size=batch_size, shuffle=False)
  43. test_loader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)

4. 网络构建

ResNet网络由数个残差块及卷积层构成,残差块的结构决定了ResNets。在这里给出何博士提到的2个结构分别为original、proposed,其中original为论文的主要实验对象,proposed为何博士于附录中提及。(其结构及性能见图8)参考于你必须要知道CNN模型:ResNet。
在这里插入图片描述

  1. import torch
  2. from torch import optim
  3. import torchvision
  4. import matplotlib.pyplot as plt
  5. import numpy as np
  6. from torchvision.utils import make_grid
  7. import time
  8. from torch import nn
  9. from torchsummary import summary

残差块-original

  1. # 残差块classResidual(nn.Module):def__init__(self,input_channels, temp_channels, num_channels,
  2. use_1x1conv=False, strides=1):super(Residual, self).__init__()
  3. self.conv1 = nn.Conv2d(input_channels, temp_channels,
  4. kernel_size=1, stride=strides)
  5. self.conv2 = nn.Conv2d(temp_channels, temp_channels,
  6. kernel_size=3, padding=1)
  7. self.conv3 = nn.Conv2d(temp_channels, num_channels,
  8. kernel_size=1)if use_1x1conv:
  9. self.conv4 = nn.Conv2d(input_channels,num_channels,
  10. kernel_size=1,stride=strides)else:
  11. self.conv4 =None
  12. self.bn1 = nn.BatchNorm2d(temp_channels)
  13. self.bn2 = nn.BatchNorm2d(temp_channels)
  14. self.bn3 = nn.BatchNorm2d(num_channels)
  15. self.rl1 = nn.ReLU(inplace=True)
  16. self.rl2 = nn.ReLU(inplace=True)
  17. self.rl3 = nn.ReLU(inplace=True)defforward(self, x):
  18. y = self.rl1(self.bn1(self.conv1(x)))
  19. y = self.rl2(self.bn2(self.conv2(y)))
  20. y = self.bn3(self.conv3(y))if self.conv4:
  21. x = self.conv4(x)return self.rl3(y + x)

残差块-proposed

  1. # 残差块classResidual(nn.Module):def__init__(self,input_channels, temp_channels, num_channels,
  2. use_1x1conv=False, strides=1):super(Residual, self).__init__()
  3. self.conv1 = nn.Conv2d(input_channels, temp_channels,
  4. kernel_size=1, stride=strides)
  5. self.conv2 = nn.Conv2d(temp_channels, temp_channels,
  6. kernel_size=3, padding=1)
  7. self.conv3 = nn.Conv2d(temp_channels, num_channels,
  8. kernel_size=1)if use_1x1conv:
  9. self.conv4 = nn.Conv2d(input_channels,num_channels,
  10. kernel_size=1,stride=strides)else:
  11. self.conv4 =None
  12. self.bn1 = nn.BatchNorm2d(input_channels)
  13. self.bn2 = nn.BatchNorm2d(temp_channels)
  14. self.bn3 = nn.BatchNorm2d(temp_channels)
  15. self.rl1 = nn.ReLU(inplace=True)
  16. self.rl2 = nn.ReLU(inplace=True)
  17. self.rl3 = nn.ReLU(inplace=True)defforward(self, x):
  18. y = self.conv1(self.rl1(self.bn1(x)))
  19. y = self.conv2(self.rl2(self.bn2(y)))
  20. y = self.conv3(self.rl3(self.bn3(y)))if self.conv4:
  21. x = self.conv4(x)return y + x

ResNet50网络

  1. # 残差网络# Residual封装defresent_block(channels, num_residuals, first_block=False):
  2. blk =[]
  3. input_channels, temp_channels, num_channels = channels
  4. for i inrange(num_residuals):if i ==0and first_block:
  5. blk.append(Residual(channels[0], channels[1], channels[2],
  6. use_1x1conv=True))elif i ==0andnot first_block:
  7. blk.append(Residual(channels[0], channels[1], channels[2],
  8. use_1x1conv=True, strides=2))else:
  9. blk.append(Residual(channels[2], channels[1], channels[2]))return blk
  10. # ResNet50定义classResNet50(nn.Module):def__init__(self):super(ResNet50, self).__init__()# 第一层,1个卷积层和1个最大池化层
  11. self.layer1 = nn.Sequential(
  12. nn.Conv2d(3,64, kernel_size=7, stride=2, padding=3),
  13. nn.BatchNorm2d(64),
  14. nn.ReLU(inplace=True),
  15. nn.MaxPool2d(kernel_size=3, stride=2, padding=1))# 第二层,3 * 3个卷积层
  16. self.layer2 = nn.Sequential(*resent_block((64,64,256),3, first_block=True))# 第三层,4 * 3个卷积层
  17. self.layer3 = nn.Sequential(*resent_block((256,128,512),4))# 第四层,6 * 3个卷积层
  18. self.layer4 = nn.Sequential(*resent_block((512,256,1024),6))# 第五层,3 * 3个卷积层
  19. self.layer5 = nn.Sequential(*resent_block((1024,512,2048),3))
  20. self.conv_layer = nn.Sequential(
  21. self.layer1,
  22. self.layer2,
  23. self.layer3,
  24. self.layer4,
  25. self.layer5
  26. )
  27. self.fc = nn.Sequential(
  28. nn.Linear(2048,1000),
  29. nn.ReLU(inplace=True),
  30. nn.Linear(1000,29))defforward(self, x):
  31. x = self.conv_layer(x)# 全局平均池化层
  32. x = nn.functional.adaptive_avg_pool2d(x,(1,1))
  33. x = x.view(x.size(0),-1)
  34. x = self.fc(x)return x
  35. # 验证网络是否可运行if __name__ =="__main__":
  36. device = torch.device('cuda'if torch.cuda.is_available()else'cpu')
  37. resent_model = ResNet50().to(device)
  38. summary(resent_model,(3,224,224))#打印网络结构

其中original、proposed结构的ResNet50的参数均为18kw,区别于卷积层、归一化层、激活函数ReLU层的顺序不同。

5. 模型加载

  1. # ResNet50
  2. PATH =Noneifnot PATH:
  3. device = torch.device("cuda:0"if torch.cuda.is_available()else"cpu")
  4. model = ResNet50().to(device)else:
  5. model = ResNet50()
  6. model.load_state_dict(torch.load(PATH), strict=False)
  7. model.eval()
  8. device = torch.device("cuda:0"if torch.cuda.is_available()else"cpu")
  9. model.to(device)
  10. summary(model,(3,224,224))# 调参# 交叉熵
  11. criterion = nn.CrossEntropyLoss()# 迭代器
  12. optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.8, weight_decay=0.001)# 更新学习率
  13. schedule = optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=0.5, last_epoch=-1)

6. 模型训练

  1. # 训练# 损失图
  2. loss_list =[]
  3. correct_optimal =0.0for epoch inrange(epoch_num):
  4. model.train()
  5. running_loss =0.0
  6. start = time.time()print(1+ epoch)for i,(inputs, labels)inenumerate(train_loader,0):# train_loader中取出64个数据
  7. inputs, labels = inputs.to(device), labels.to(device)# 梯度清零
  8. optimizer.zero_grad()# 模型训练
  9. outputs = model(inputs)#print(outputs.shape)# 反向传播
  10. loss = criterion(outputs,labels).to(device)
  11. loss.backward()
  12. optimizer.step()
  13. running_loss += loss.item()if(i+1)% num_print ==0:print('[%d epoch, %d] loss:%.6f'%(epoch+1, i+1, running_loss/num_print))
  14. loss_list.append(running_loss/num_print)
  15. running_loss =0.0# 打印学习率及确认学习率是否进行更新
  16. lr_1 = optimizer.param_groups[0]['lr']print("learn_rate: %.15f"%lr_1)
  17. schedule.step()# 验证模式if(epoch+1)% num_check ==0:# 不需要梯度更新
  18. model.eval()
  19. correct =0.0
  20. total =0with torch.no_grad():print("=======================check=======================")for inputs, labels in verification_loader:# train_loader中取出batch_size个数据
  21. inputs, labels = inputs.to(device), labels.to(device)# 模型验证
  22. outputs = model(inputs)
  23. pred = outputs.argmax(dim=1)#返回每一行中最大值的索引
  24. total = total + inputs.size(0)
  25. correct = correct + torch.eq(pred, labels).sum().item()
  26. correct =100* correct/total
  27. print("Accuracy of the network on the 19797 verification images:%.2f %%"%correct )print("===================================================")# 模型保存if correct > correct_optimal:
  28. torch.save(model.state_dict(),'ResNet_enhance/ResNet50_%03d-correct%.3f.pth'%(epoch +1, correct))
  29. correct_optimal = correct
  30. end=time.time()print("time:{}".format(end-start))

7. 绘制损失图

以下损失图分别为original、proposed结构的ResNet50的损失图,其6个单位(由数据集的数量及自定义参数num_print决定)为一次epoch。

  1. import matplotlib.pyplot as plt
  2. x =[ i+1for i inrange(len(loss_list))]# plot函数作图
  3. plt.plot(x, loss_list)# show函数展示出这个图,如果没有这行代码,则程序完成绘图,但看不到
  4. plt.show()

original
在这里插入图片描述
proposed

在这里插入图片描述

8. 模型验证

  1. # 检验模式,不需要梯度更新
  2. model.eval()
  3. correct =0.0
  4. total =0with torch.no_grad():print("=======================check=======================")for inputs, labels in test_loader:# train_loader中取出batch_size个数据
  5. inputs, labels = inputs.to(device), labels.to(device)# 模型检验
  6. outputs = model(inputs)
  7. pred = outputs.argmax(dim=1)#返回每一行中最大值的索引
  8. total = total + inputs.size(0)
  9. correct = correct + torch.eq(pred, labels).sum().item()
  10. correct =100* correct/total
  11. print("Accuracy of the network on the 20094 test images:%.2f %%"%correct )print("===================================================")

9. 模型预测

数据集构建

  1. """
  2. test为一个列表,存放图片的路径
  3. 其路径样式为asl_alphabet_test\\K_test.jpg
  4. pre_test_dict为一个字典,如(种类:标号)
  5. """import torch
  6. from PIL import Image
  7. from torch.autograd import Variable
  8. from torchvision import transforms
  9. from torch.utils.data import Dataset, DataLoader
  10. no_transform = transforms.Compose([
  11. transforms.Resize(224),
  12. transforms.ToTensor(),
  13. transforms.Normalize((0.51865010496974,0.49877608954906466,0.5143190141916275),(7.181780141103533,8.053991771959863,8.290017965464534))])# -----------------ready the dataset--------------------------defdefault_loader(path):
  14. img = Image.open(path)return img
  15. classMyDataset(Dataset):# 构造函数def__init__(self, path, transform=None, target_transform=None, loader=default_loader):
  16. imgs =[]for img_path in path:
  17. temp = img_path.split("\\")
  18. label = temp[4][:-9]
  19. img_label = pre_test_dict[label]
  20. imgs.append((img_path,int(img_label),label))#imgs中包含有图像路径和标签
  21. self.path = path
  22. self.imgs = imgs
  23. self.transform = transform
  24. self.target_transform = target_transform
  25. self.loader = loader
  26. # hash_map建立def__getitem__(self, index):
  27. img_path, img_label, label = self.imgs[index]# 调用 opencv 打开图片
  28. img = self.loader(img_path)if self.transform isnotNone:
  29. img = self.transform(img)
  30. img_label -=1return img, img_path, label
  31. def__len__(self):returnlen(self.imgs)
  32. test_data = MyDataset(test, transform=no_transform)#test_data测试数据,调用DataLoader批量加载
  33. test_loader = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)

模型预测

  1. """
  2. test_dict为一个字典,如(标号:种类)
  3. """# 预测模式,不需要梯度更新
  4. model.eval()with torch.no_grad():print("=======================forecast=======================")for inputs, img_paths, kind_labels in test_loader:# train_loader中取出batch_size个数据
  5. inputs = inputs.to(device)# 模型检验
  6. outputs = model(inputs)
  7. pred = outputs.argmax(dim=1).tolist()#返回每一行中最大值的索引for i inrange(len(pred)):
  8. predict = test_dict[pred[i]+1]
  9. path = img_paths[i]
  10. real = kind_labels[i]print("path: %s, predict: %s, real: %s"%(path, predict, real))print("===================================================")

本文转载自: https://blog.csdn.net/weixin_49529683/article/details/123313115
版权归原作者 费费川 所有, 如有侵权,请联系我们删除。

“动手学习ResNet50”的评论:

还没有评论