动手学习ResNet50

ResNet 论文

《Deep Residual Learning for Image Recognition》
论文地址：https://arxiv.org/abs/1512.03385

残差网络(ResNet)

以学习ResNet的收获、ResNet50的复现二大部分，简述ResNet50网络。

一、学习ResNet的收获

ResNet网络解决了深度CNN模型难训练的问题，并指出CNN模型随深度的加深可以得到更优的解，对于 plain/residual nets 均如此。随着 nets的加深，plain nets 的收敛速度变得缓慢，导致了CNN模型的深层网络不如较浅层网络，但是 residual nets 解决了这个问题。至于 residual nets 如何解决深层网络收敛速度变得缓慢的问题，得益于 residual learning 。
Residual learning 的理想公式为 F ( x ) = H ( x ) − x . ( 1 ) F(x) = H(x)-x.(1) F(x)=H(x)−x.(1) H ( x ) = F ( x ) + x . ( 2 ) H(x)=F(x)+x.(2) H(x)=F(x)+x.(2) 设所需求解的底层映射为H(x)，通过训练堆叠的非线性层映射F(x)=H(x)-x，因此H(x)=F(x)+x。但是 Residual learning 的实际使用公式为 y = F ( x , W i ) + x . ( 3 ) y=F(x,\mathop{W}{i})+x.(3) y=F(x,Wi)+x.(3) 或 y = F ( x , W i ) + W s x . ( 4 ) 或y=F(x,\mathop{W}{i})+\mathop{W}_{s}x.(4) 或y=F(x,Wi)+Wsx.(4) 设x、y、F分别为输入、输出、堆叠的非线性层映射。采用标识性快捷连接(Identity Mapping by Shortcuts)，把输入x与堆叠的非线性层映射F相加得到输出y。其中Wi,Ws均为线性投影，对于x和F的维数（及尺寸），相同时使用公式(3)，不相同时使用公式(4)。（Fig 2.为 Residual learning 的结构图）
对于 plain/residual nets 的深度探究与验证。1). plain nets 随着网络深度的加深，plain nets 的性能降低并非由梯度消失或梯度爆炸引起。对于 20-layer and 56-layer “plain” networks，其 training error 和 test error 随网络深度的加深而提高，但并非无法训练(如Fig 1.)。因此网络深度的加深，导致了网络收敛速度的降低。2). residual nets 随着网络深度的加深，residual nets 的性能提升并非一成不变。《Deep Residual Learning for Image Recognition》指出 1202-layers ResNet 的 error 比 110-layers ResNet 的 error 高出 1.5% 左右，在CIFAR-10数据集及其他条件相同下测得(见 Table 6.)。何博士猜测数据集的规模未能匹配网络的规模，造成 residual nets 性能的降低（通俗：数据集的规模太小而网络的深度太大，造成 residual nets 性能的降低）。

二、ResNet50的复现

通过九大部分复现ResNet50，例如数据增强、参数控制、数据集构建、网络构建、模型加载、模型训练、绘制损失图、模型验证、模型预测。

1. 数据增强

    数据增强由缩放、裁剪、旋转、亮度、对比度、颜色的变化组成。调用pytorch的     transformer完成。
    注意transformer匹配PIL的RGB格式，而非Opencv的BGR格式。

打开图片

import matplotlib.pyplot as plt
from PIL import Image
from torchvision import transforms as tfs
img = Image.open('gift/bald3.jpg')
plt.imshow(img)
plt.show()# print(img.size)--->(301, 301)

在这里插入图片描述
比例缩放

# 比例缩放# rint('before scale, shape:{}'.format(img.size))-->(before scale, shape:(301, 301))# 参数为元组(row,col)时,图片尺寸硬性放缩为(row,col)(宽,长)# 参数为数字size时，图片按比例缩放，其中min(row,col)=size
new_img = tfs.Resize(256)(img)# print('after scale, shape:{}'.format(new_img.size))-->(after scale, shape:(256, 256))
plt.imshow(new_img)
plt.show()

在这里插入图片描述
随机裁剪

# 随机裁剪
random_img = tfs.RandomCrop(224)(new_img)
plt.imshow(random_img)
plt.show()

在这里插入图片描述
随机旋转

# 随机角度旋转for i inrange(1):
    rot_img = tfs.RandomRotation(180)(new_img)
    plt.imshow(rot_img)
    plt.show()

在这里插入图片描述

# 在这里给出图片翻转的代码，效果自行实验# 随机水平翻转
h_filp = tfs.RandomHorizontalFlip()(new_img)
plt.imshow(h_filp)
plt.show()# 随机竖直翻转
v_filp = tfs.RandomVerticalFlip()(new_img)
plt.imshow(v_filp)
plt.show()

亮度变化

# 亮度
bright_img = tfs.ColorJitter(brightness=1)(new_img)# 随机从 0---2 之间亮度变化，1 表示原图
plt.imshow(bright_img)
plt.show()

在这里插入图片描述
对比度变化

# 对比度
contrast_img = tfs.ColorJitter(contrast=1)(new_img)# 随机从 0---2 之间对比度变化，1 表示原图
plt.imshow(contrast_img)
plt.show()

在这里插入图片描述
颜色变化

# 颜色
color_img = tfs.ColorJitter(hue=0.5)(new_img)# 随机从 -0.5---0.5 之间对颜色变化
plt.imshow(color_img)
plt.show()

在这里插入图片描述
数据增强总实验

img_aug = tfs.Compose([
    tfs.Resize(256),
    tfs.RandomRotation(180),
    tfs.RandomCrop(224),
    tfs.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5)])# 以九宫格的形式展示
nrows =3
ncols =3
figsize =(32,32)
_, figs = plt.subplots(nrows, ncols, figsize=figsize)for i inrange(nrows):for j inrange(ncols):
        figs[i][j].imshow(img_aug(img))
        figs[i][j].axes.get_xaxis().set_visible(False)
        figs[i][j].axes.get_yaxis().set_visible(False)
plt.show()

在这里插入图片描述

2. 参数控制

batch_size =32# 每次喂入的数据量
lr         =0.01# 学习率
step_size  =1# 每n个epoch更新一次学习率
epoch_num  =10# 总迭代次数
num_print  =280# 每n次batch打印一次
num_check  =1# 每n个epoch验证一次模型，若效果更优则保存模型
enhance    =False# 是否数据增强

3. 数据集构建

数据预处理，采用标准化处理。详细见动手学习VGG16的数据预处理。

"""
train_path、verification_path、test_path 同为字典（类别kind，列表list），其中列表内存放着图片的绝对路径。
labels 也是一个字典（类别kind，序号number），序号为1~n的数字
"""import torch
import random
from PIL import Image
from torch.autograd import Variable
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
en_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomRotation(180),
    transforms.RandomCrop(224),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5),
    transforms.ToTensor(),
    transforms.Normalize((0.51865010496974,0.49877608954906466,0.5143190141916275),(7.181780141103533,8.053991771959863,8.290017965464534))])
no_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.51865010496974,0.49877608954906466,0.5143190141916275),(7.181780141103533,8.053991771959863,8.290017965464534))])# -----------------ready the dataset--------------------------defdefault_loader(path):
    img = Image.open(path)return img
classMyDataset(Dataset):# 构造函数def__init__(self, path, transform=None, target_transform=None, loader=default_loader, enhance=False):
        imgs =[]for classification in path:for i inrange(len(path[classification])):
                img_path  = path[classification][i]
                img_label = labels[classification]
                imgs.append((img_path,int(img_label)))#imgs中包含有图像路径和标签
        self.path             = path
        self.imgs             = imgs
        self.transform        = transform
        self.target_transform = target_transform
        self.loader           = loader
        
    # hash_map建立def__getitem__(self, index):
        img_path, img_label = self.imgs[index]# 调用 opencv 打开图片
        img = self.loader(img_path)if self.transform isnotNone:
            img = self.transform(img)
        img_label -=1return img, img_label
    
    def__len__(self):returnlen(self.imgs)
train_data        = MyDataset(train_path, transform=no_transform,target_transform=en_transform,enhance=True)
verification_data = MyDataset(verification_path, transform=no_transform,target_transform=en_transform,enhance=False)
test_data         = MyDataset(test_path, transform=no_transform,target_transform=en_transform,enhance=False)#train_data 、verification_data和test_data包含多有的训练、验证与测试数据，调用DataLoader批量加载
train_loader        = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
verification_loader = DataLoader(dataset=verification_data, batch_size=batch_size, shuffle=False)
test_loader         = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)

4. 网络构建

ResNet网络由数个残差块及卷积层构成，残差块的结构决定了ResNets。在这里给出何博士提到的2个结构分别为original、proposed,其中original为论文的主要实验对象，proposed为何博士于附录中提及。(其结构及性能见图8）参考于你必须要知道CNN模型：ResNet。
在这里插入图片描述

import torch
from torch import optim
import torchvision
import matplotlib.pyplot as plt
import numpy as np
from torchvision.utils import make_grid
import time
from torch import nn
from torchsummary import summary

残差块-original

# 残差块classResidual(nn.Module):def__init__(self,input_channels, temp_channels, num_channels,
                  use_1x1conv=False, strides=1):super(Residual, self).__init__()
        
        self.conv1 = nn.Conv2d(input_channels, temp_channels,
                                   kernel_size=1, stride=strides)
        self.conv2 = nn.Conv2d(temp_channels, temp_channels,
                                   kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(temp_channels, num_channels,
                                   kernel_size=1)if use_1x1conv:
            self.conv4 = nn.Conv2d(input_channels,num_channels,
                                   kernel_size=1,stride=strides)else:
            self.conv4 =None
                
        self.bn1 = nn.BatchNorm2d(temp_channels)
        self.bn2 = nn.BatchNorm2d(temp_channels)
        self.bn3 = nn.BatchNorm2d(num_channels)
        
        self.rl1 = nn.ReLU(inplace=True)
        self.rl2 = nn.ReLU(inplace=True)
        self.rl3 = nn.ReLU(inplace=True)defforward(self, x):
        y = self.rl1(self.bn1(self.conv1(x)))
        y = self.rl2(self.bn2(self.conv2(y)))
        y = self.bn3(self.conv3(y))if self.conv4:
            x = self.conv4(x)return self.rl3(y + x)

残差块-proposed

# 残差块classResidual(nn.Module):def__init__(self,input_channels, temp_channels, num_channels,
                  use_1x1conv=False, strides=1):super(Residual, self).__init__()
        
        self.conv1 = nn.Conv2d(input_channels, temp_channels,
                                   kernel_size=1, stride=strides)
        self.conv2 = nn.Conv2d(temp_channels, temp_channels,
                                   kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(temp_channels, num_channels,
                                   kernel_size=1)if use_1x1conv:
            self.conv4 = nn.Conv2d(input_channels,num_channels,
                                   kernel_size=1,stride=strides)else:
            self.conv4 =None
                
        self.bn1 = nn.BatchNorm2d(input_channels)
        self.bn2 = nn.BatchNorm2d(temp_channels)
        self.bn3 = nn.BatchNorm2d(temp_channels)
        
        self.rl1 = nn.ReLU(inplace=True)
        self.rl2 = nn.ReLU(inplace=True)
        self.rl3 = nn.ReLU(inplace=True)defforward(self, x):
        y = self.conv1(self.rl1(self.bn1(x)))
        y = self.conv2(self.rl2(self.bn2(y)))
        y = self.conv3(self.rl3(self.bn3(y)))if self.conv4:
            x = self.conv4(x)return y + x

ResNet50网络

# 残差网络# Residual封装defresent_block(channels, num_residuals, first_block=False):
    blk =[]
    input_channels, temp_channels, num_channels = channels
    for i inrange(num_residuals):if i ==0and first_block:
            blk.append(Residual(channels[0], channels[1], channels[2],
                                use_1x1conv=True))elif i ==0andnot first_block:
            blk.append(Residual(channels[0], channels[1], channels[2],
                                use_1x1conv=True, strides=2))else:
            blk.append(Residual(channels[2], channels[1], channels[2]))return blk
# ResNet50定义classResNet50(nn.Module):def__init__(self):super(ResNet50, self).__init__()# 第一层，1个卷积层和1个最大池化层
        self.layer1 = nn.Sequential(
            nn.Conv2d(3,64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1))# 第二层，3 * 3个卷积层
        self.layer2 = nn.Sequential(*resent_block((64,64,256),3, first_block=True))# 第三层，4 * 3个卷积层
        self.layer3 = nn.Sequential(*resent_block((256,128,512),4))# 第四层，6 * 3个卷积层
        self.layer4 = nn.Sequential(*resent_block((512,256,1024),6))# 第五层，3 * 3个卷积层
        self.layer5 = nn.Sequential(*resent_block((1024,512,2048),3))
        
        self.conv_layer = nn.Sequential(
            self.layer1,
            self.layer2,
            self.layer3,
            self.layer4,
            self.layer5
        )
        
        self.fc = nn.Sequential(
            nn.Linear(2048,1000),
            nn.ReLU(inplace=True),
            
            nn.Linear(1000,29))defforward(self, x):
        x = self.conv_layer(x)# 全局平均池化层
        x = nn.functional.adaptive_avg_pool2d(x,(1,1))
        x = x.view(x.size(0),-1)
        x = self.fc(x)return x
# 验证网络是否可运行if __name__ =="__main__":
    device       = torch.device('cuda'if torch.cuda.is_available()else'cpu')
    resent_model = ResNet50().to(device)
    summary(resent_model,(3,224,224))#打印网络结构

其中original、proposed结构的ResNet50的参数均为18kw，区别于卷积层、归一化层、激活函数ReLU层的顺序不同。

5. 模型加载

# ResNet50
PATH   =Noneifnot PATH:
    device      = torch.device("cuda:0"if torch.cuda.is_available()else"cpu")
    model       = ResNet50().to(device)else:
    model  = ResNet50()
    model.load_state_dict(torch.load(PATH), strict=False)
    model.eval()
    device = torch.device("cuda:0"if torch.cuda.is_available()else"cpu")
    model.to(device)
    summary(model,(3,224,224))# 调参# 交叉熵
criterion = nn.CrossEntropyLoss()# 迭代器
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.8, weight_decay=0.001)# 更新学习率
schedule  = optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=0.5, last_epoch=-1)

6. 模型训练

# 训练# 损失图
loss_list         =[]
correct_optimal   =0.0for epoch inrange(epoch_num):
    model.train()
    running_loss =0.0
    start        = time.time()print(1+ epoch)for i,(inputs, labels)inenumerate(train_loader,0):# 从train_loader中取出64个数据
        inputs, labels = inputs.to(device), labels.to(device)# 梯度清零
        optimizer.zero_grad()# 模型训练
        outputs = model(inputs)#print(outputs.shape)# 反向传播
        loss = criterion(outputs,labels).to(device)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()if(i+1)% num_print ==0:print('[%d epoch, %d]  loss:%.6f'%(epoch+1, i+1, running_loss/num_print))
            loss_list.append(running_loss/num_print)
            running_loss =0.0# 打印学习率及确认学习率是否进行更新
    lr_1 = optimizer.param_groups[0]['lr']print("learn_rate: %.15f"%lr_1)
    schedule.step()# 验证模式if(epoch+1)% num_check ==0:# 不需要梯度更新
        model.eval()
        correct =0.0
        total   =0with torch.no_grad():print("=======================check=======================")for inputs, labels in verification_loader:# 从train_loader中取出batch_size个数据
                inputs, labels = inputs.to(device), labels.to(device)# 模型验证
                outputs = model(inputs)
                pred    = outputs.argmax(dim=1)#返回每一行中最大值的索引
                total   = total + inputs.size(0)
                correct = correct + torch.eq(pred, labels).sum().item()
            
        
        correct =100* correct/total
        print("Accuracy of the network on the 19797 verification images:%.2f %%"%correct )print("===================================================")# 模型保存if correct > correct_optimal:
            torch.save(model.state_dict(),'ResNet_enhance/ResNet50_%03d-correct%.3f.pth'%(epoch +1, correct))
            correct_optimal = correct
        
    end=time.time()print("time:{}".format(end-start))

7. 绘制损失图

以下损失图分别为original、proposed结构的ResNet50的损失图，其6个单位(由数据集的数量及自定义参数num_print决定)为一次epoch。

import matplotlib.pyplot as plt
x =[ i+1for i inrange(len(loss_list))]# plot函数作图
plt.plot(x, loss_list)# show函数展示出这个图，如果没有这行代码，则程序完成绘图，但看不到
plt.show()

original
在这里插入图片描述
proposed

在这里插入图片描述

8. 模型验证

# 检验模式，不需要梯度更新
model.eval()
correct =0.0
total   =0with torch.no_grad():print("=======================check=======================")for inputs, labels in test_loader:# 从train_loader中取出batch_size个数据
        inputs, labels = inputs.to(device), labels.to(device)# 模型检验
        outputs = model(inputs)
        pred    = outputs.argmax(dim=1)#返回每一行中最大值的索引
        total   = total + inputs.size(0)
        correct = correct + torch.eq(pred, labels).sum().item()
            
        
    correct =100* correct/total
    print("Accuracy of the network on the 20094 test images:%.2f %%"%correct )print("===================================================")

9. 模型预测

数据集构建

"""
test为一个列表，存放图片的路径
其路径样式为asl_alphabet_test\\K_test.jpg
pre_test_dict为一个字典，如(种类:标号)
"""import torch
from PIL import Image
from torch.autograd import Variable
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
no_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.51865010496974,0.49877608954906466,0.5143190141916275),(7.181780141103533,8.053991771959863,8.290017965464534))])# -----------------ready the dataset--------------------------defdefault_loader(path):
    img = Image.open(path)return img
classMyDataset(Dataset):# 构造函数def__init__(self, path, transform=None, target_transform=None, loader=default_loader):
        imgs =[]for img_path in path:
            temp      = img_path.split("\\")
            label     = temp[4][:-9]
            img_label = pre_test_dict[label]
            imgs.append((img_path,int(img_label),label))#imgs中包含有图像路径和标签
        self.path             = path
        self.imgs             = imgs
        self.transform        = transform
        self.target_transform = target_transform
        self.loader           = loader
        
    # hash_map建立def__getitem__(self, index):
        img_path, img_label, label = self.imgs[index]# 调用 opencv 打开图片
        img = self.loader(img_path)if self.transform isnotNone:
            img = self.transform(img)
        img_label -=1return img, img_path, label
    
    def__len__(self):returnlen(self.imgs)
test_data         = MyDataset(test, transform=no_transform)#test_data测试数据，调用DataLoader批量加载
test_loader         = DataLoader(dataset=test_data, batch_size=batch_size, shuffle=False)

模型预测

"""
test_dict为一个字典，如(标号:种类)
"""# 预测模式，不需要梯度更新
model.eval()with torch.no_grad():print("=======================forecast=======================")for inputs, img_paths, kind_labels in test_loader:# 从train_loader中取出batch_size个数据
        inputs = inputs.to(device)# 模型检验
        outputs      = model(inputs)
        pred         = outputs.argmax(dim=1).tolist()#返回每一行中最大值的索引for i inrange(len(pred)):
            predict = test_dict[pred[i]+1]
            path    = img_paths[i]
            real    = kind_labels[i]print("path: %s, predict: %s, real: %s"%(path, predict, real))print("===================================================")

标签：深度学习计算机视觉

本文转载自: https://blog.csdn.net/weixin_49529683/article/details/123313115
版权归原作者 费费川 所有，如有侵权，请联系我们删除。

动手学习ResNet50

ResNet 论文

残差网络(ResNet)

一、学习ResNet的收获

二、ResNet50的复现

1. 数据增强

2. 参数控制

3. 数据集构建

4. 网络构建

5. 模型加载

6. 模型训练

7. 绘制损失图

8. 模型验证

9. 模型预测

发表评论

“动手学习ResNet50”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航