Pytorch学习笔记（六）——Sequential类、参数管理与GPU

一、torch.nn.Sequential

Sequential

本质是一个模块（即

Module

），根据Pytorch中的约定，模块中可以继续添加模块。这意味着我们可以在

Sequential

中添加其它的模块（自然也就可以添加其他的

Sequential

）。添加完成后，

Sequential

会将这些模块组成一个流水线，输入将依次通过这些模块得到一个输出，如下图所示：

在这里插入图片描述
对应的代码如下：

from torch import nn

myseq = nn.Sequential(# Module 1# Module 2# ...# Module n)

因为

nn.Linear

和

nn.ReLU

也都是模块，所以我们可以将这些模块稍加组合放进

myseq

中以构建一个简单的神经网络。

以单隐层网络为例，假设输入层、隐层和输出层神经元的个数分别为

    20
   
   
    ,
   
   
    10
   
   
    ,
   
   
    5
   
  
  
   20, 10, 5
  
 
20,10,5，隐层激活函数采用 ReLU，则我们的网络可写为

net = nn.Sequential(
    nn.Linear(20,10),
    nn.ReLU(),
    nn.Linear(10,5))

在训练场景下，我们可以向定义好的

net

投喂一个 batch 的样本，假设 batch 的大小为

net

将返回一个 batch 的输出

torch.manual_seed(42)
X = torch.randn(3,20)
net(X)# tensor([[ 0.0092, -0.3154, -0.1202, -0.2654,  0.1336],#         [-0.0042, -0.2338, -0.1788, -0.5513, -0.6258],#         [ 0.0731, -0.4427, -0.3108,  0.1791,  0.1614]],#        grad_fn=<AddmmBackward0>)

1.1 Sequential 的基础操作

通过打印

Sequential

对象来查看它的结构

print(net)# Sequential(#   (0): Linear(in_features=20, out_features=10, bias=True)#   (1): ReLU()#   (2): Linear(in_features=10, out_features=5, bias=True)# )

像对待Python列表那样，我们可以使用索引来查看其子模块，也可以查看

Sequential

有多长

print(net[0])# Linear(in_features=20, out_features=10, bias=True)print(net[1])# ReLU()print(len(net))# 3

当然，我们还可以修改、删除、添加子模块：

net[1]= nn.Sigmoid()print(net)# Sequential(#   (0): Linear(in_features=20, out_features=10, bias=True)#   (1): Sigmoid()#   (2): Linear(in_features=10, out_features=5, bias=True)# )del net[2]print(net)# Sequential(#   (0): Linear(in_features=20, out_features=10, bias=True)#   (1): Sigmoid()# )

net.append(nn.Linear(10,2))# 均会添加到末尾print(net)# Sequential(#   (0): Linear(in_features=20, out_features=10, bias=True)#   (1): Sigmoid()#   (2): Linear(in_features=10, out_features=2, bias=True)# )

目前（Version 1.11.0），如果使用
del
删除的子模块不是最后一个，可能就会出现一些 bug？ 例如索引不连续，无法继续添加子模块等。

当然，

Sequential

对象本身就是一个可迭代对象，所以我们还可以使用 for 循环来打印所有子模块：

net = nn.Sequential(
    nn.Linear(20,10),
    nn.ReLU(),
    nn.Linear(10,5))for sub_module in net:print(sub_module)# Linear(in_features=20, out_features=10, bias=True)# ReLU()# Linear(in_features=10, out_features=5, bias=True)

1.2 手动实现一个 Sequential

为了加深理解，接下来我们从0开始手动实现

Sequential

（当然不会与官方的一样，只是为了便于理解）。

我们需要先完成最基础的功能，即将各个模块传入

Sequential

后，

Sequential

能对这些模块进行组装并拥有正向传播功能：

classMySeq(nn.Module):def__init__(self,*args):super().__init__()for idx, module inenumerate(args):
            self._modules[str(idx)]= module

    defforward(self, inputs):for module in self._modules.values():
            inputs = module(inputs)return inputs

尝试正向传播：

torch.manual_seed(42)
myseq = MySeq(nn.Linear(20,10), nn.ReLU(), nn.Linear(10,5))
X = torch.rand(3,20)
myseq(X)# tensor([[ 0.2056, -0.5307, -0.0023, -0.0309,  0.1289],#         [ 0.0681, -0.4473,  0.2085, -0.1179,  0.1157],#         [ 0.1187, -0.5331,  0.0530, -0.0466,  0.0874]],#        grad_fn=<AddmmBackward0>)

可以看出我们实现的

MySeq

能够得到正确的输出。但很显然，目前实现的

MySeq

功能太少，还需要实现索引、赋值、删除、添加等操作：

classMySeq(nn.Module):def__init__(self,*args):super().__init__()for idx, module inenumerate(args):
            self._modules[str(idx)]= module

    def__getitem__(self, idx):return self._modules[str(idx)]def__setitem__(self, idx, module):assert idx <len(self)
        self._modules[str(idx)]= module

    def__delitem__(self, idx):for i inrange(idx,len(self)-1):
            self._modules[str(i)]= self._modules[str(i +1)]del self._modules[str(len(self)-1)]def__len__(self):returnlen(self._modules)defappend(self, module):
        new_idx =int(list(self._modules.keys())[-1])+1
        self._modules[str(new_idx)]= module

    defforward(self, inputs):for module in self._modules.values():
            inputs = module(inputs)return inputs

到这里，我们的

MySeq

就算大功告成了，并且使用

del

方法不会出现bug。

1.3 Sequential 嵌套

Sequential

本身就是一个模块，而模块可以嵌套模块，这说明

Sequential

可以嵌套

Sequential

。

例如，在一个

Sequential

中嵌套两个

Sequential

：

seq_1 = nn.Sequential(nn.Linear(15,10), nn.ReLU(), nn.Linear(10,5))
seq_2 = nn.Sequential(nn.Linear(25,15), nn.Sigmoid(), nn.Linear(15,10))
seq_3 = nn.Sequential(seq_1, seq_2)print(seq_3)# Sequential(#   (0): Sequential(#     (0): Linear(in_features=15, out_features=10, bias=True)#     (1): ReLU()#     (2): Linear(in_features=10, out_features=5, bias=True)#   )#   (1): Sequential(#     (0): Linear(in_features=25, out_features=15, bias=True)#     (1): Sigmoid()#     (2): Linear(in_features=15, out_features=10, bias=True)#   )# )

我们依然可以像列表那样使用多级索引进行访问：

print(seq_3[1])# Sequential(#   (0): Linear(in_features=25, out_features=15, bias=True)#   (1): Sigmoid()#   (2): Linear(in_features=15, out_features=10, bias=True)# )print(seq_3[0][1])# ReLU()

还可以使用双重循环进行遍历：

for seq in seq_3:for module in seq:print(module)# Linear(in_features=15, out_features=10, bias=True)# ReLU()# Linear(in_features=10, out_features=5, bias=True)# Linear(in_features=25, out_features=15, bias=True)# Sigmoid()# Linear(in_features=15, out_features=10, bias=True)

可能会有读者好奇，给定输入

inputs

，它是如何在

seq_3

中进行传递的呢？

其实很显然，

inputs

首先会进入

seq_1

通过一系列模块得到一个输出，该输出会作为

seq_2

的输入，然后通过

seq_2

的一系列模块后又可以得到一个输出，而这个输出就是最终的输出了。

注意，本节的例子并不能将输入转化为输出，因为形状不匹配，需要修改成类似于如下这种：

seq_1 = nn.Sequential(nn.Linear(30,25), nn.ReLU(), nn.Linear(25,20))
seq_2 = nn.Sequential(nn.Linear(20,15), nn.Sigmoid(), nn.Linear(15,10))
seq_3 = nn.Sequential(seq_1, seq_2)

1.4 自定义层

Sequential

中的模块又称为层，我们完全不必局限于

torch.nn

中提供的各种层，通过继承

nn.Module

我们可以自定义层并将其添加到

Sequential

中。

1.4.1 不带参数的层

定义一个中心化层，它能够将输入减去其均值后再返回：

classCenteredLayer(nn.Module):def__init__(self):super().__init__()defforward(self, X):return X - X.mean()

我们可以来检验一下该层是否真的起到了作用：

torch.manual_seed(42)
net = nn.Sequential(nn.Linear(64,30), CenteredLayer())
X = torch.randn(3,64)print(net(X).mean())# tensor(-5.2982e-09, grad_fn=<MeanBackward0>)

输出结果足够小可以近似视为0，说明自定义层起到了作用。

1.4.2 带参数的层

依旧以单隐层网络为例，大多数时候，我们希望自定义每个层的神经元个数，因此在自定义层时需要传入相应的参数。

classNet(nn.Module):def__init__(self, input_nodes, hidden_nodes, output_nodes):super().__init__()
        self.inodes = input_nodes
        self.hnodes = hidden_nodes
        self.onodes = output_nodes
        self.model = nn.Sequential(
            nn.Linear(self.inodes, self.hnodes),
            nn.ReLU(),
            nn.Linear(self.hnodes, self.onodes))defforward(self, inputs):return self.model(inputs)

分别设置输出层、隐层和输出层结点数为

torch.manual_seed(42)
net = Net(784,256,8)
X = torch.randn(5,784)print(net(X))# tensor([[ 0.2291, -0.3913, -0.1745, -0.2685, -0.2684,  0.0760,  0.0071, -0.0337],#         [ 0.2084,  0.1235, -0.1054, -0.0508,  0.0194, -0.0429, -0.3269,  0.1890],#         [-0.0756, -0.4335, -0.1643, -0.1817, -0.2376, -0.1399,  0.2710, -0.3719],#         [ 0.4110, -0.2428, -0.1021, -0.1019, -0.0550, -0.0890,  0.1430,  0.0881],#         [ 0.0626, -0.4117,  0.0130,  0.1339, -0.2529, -0.1106, -0.2586,  0.2205]],#        grad_fn=<AddmmBackward0>)

二、参数管理

2.1 nn.Parameter

nn.Parameter

是

Tensor

的子类，可以被视为一种特殊的张量，它可被用作模块的参数，具体使用格式如下：

nn.Parameter(data, requires_grad=True)

其中

data

为待传入的

Tensor

，

requires_grad

默认为 True。

事实上，

torch.nn

中提供的模块中的参数均是

nn.Parameter

类，例如：

module = nn.Linear(3,3)type(module.weight)# torch.nn.parameter.Parametertype(module.bias)# torch.nn.parameter.Parameter

在我们自定义的模块中，只有使用

nn.Parameter

构建的参数才会被视为模块的参数，此时调用

parameters()

方法会显示这些参数。读者可自行体会以下两端代码：

""" 代码片段一 """classNet(nn.Module):def__init__(self):super().__init__()
        self.weight = torch.randn(3,3)
        self.bias = torch.randn(3)defforward(self, inputs):pass

net = Net()print(list(net.parameters()))# []""" 代码片段二 """classNet(nn.Module):def__init__(self):super().__init__()
        self.weight = nn.Parameter(torch.randn(3,3))
        self.bias = nn.Parameter(torch.randn(3))defforward(self, inputs):pass

net = Net()print(list(net.parameters()))# [Parameter containing:# tensor([[-0.4584,  0.3815, -0.4522],#         [ 2.1236,  0.7928, -0.7095],#         [-1.4921, -0.5689, -0.2342]], requires_grad=True), Parameter containing:# tensor([-0.6971, -0.7651,  0.7897], requires_grad=True)]

从以上结果可以得知，如果自定义模块中有些参数必须要手动构建而不能使用现成的模块，则最好使用

nn.Parameter

去构建。这样后续查看模块的参数或使用优化器更新模块的参数只需调用

parameters()

方法即可。

nn.Parameter

相当于把传入的数据包装成一个参数，如果要直接访问/使用其中的数据而非参数本身，可对

nn.Parameter

对象调用

data

属性：

a = torch.tensor([1,2,3]).to(torch.float32)
param = nn.Parameter(a)print(param)# Parameter containing:# tensor([1., 2., 3.], requires_grad=True)print(param.data)# tensor([1., 2., 3.])

2.2 参数访问

nn.Module

中有

state_dict()

方法（官网链接），该方法将以字典形式返回模块的所有状态，包括模块的参数和

persistent buffers

（博主目前还不太理解后者，暂时略过），字典的键就是对应的参数/缓冲区的名称。

由于所有模块都继承

nn.Module

，因此我们可以对任意的模块调用

state_dict()

方法以查看状态：

linear_layer = nn.Linear(2,2)print(linear_layer.state_dict())# OrderedDict([('weight', tensor([[ 0.2602, -0.2318],#         [-0.5192,  0.0130]])), ('bias', tensor([0.5890, 0.2476]))])print(linear_layer.state_dict().keys())# odict_keys(['weight', 'bias'])

对于线性层，除了

state_dict()

之外，我们还可以对其直接调用相应的属性，如下：

linear_layer = nn.Linear(2,1)print(linear_layer.weight)# Parameter containing:# tensor([[-0.1990,  0.3394]], requires_grad=True)print(linear_layer.bias)# Parameter containing:# tensor([0.2697], requires_grad=True)

需要注意的是以上返回的均为参数对象，如需使用其中的数据，可调用

data

属性。

当然我们还可以对

nn.Linear

实例调用

parameters()

和

named_parameters()

方法来获取其中的参数（对任何模块都可以调用这两个方法），具体可参考我的上一篇笔记，这里不再赘述。

2.3 参数初始化

以神经网络为例，当我们创建一个

nn.Linear(a, b)

的实例后，其中的参数就自动初始化了，其权重和偏置均从均匀分布

    U
   
   
    (
   
   
    −
   
   
    1
   
   
    /
   
   
    
     a
    
   
   
    ,
   
   
    1
   
   
    /
   
   
    
     a
    
   
   
    )
   
  
  
   U(-1/\sqrt{a},1/\sqrt{a})
  
 
U(−1/a,1/a) 中随机采样而来。

但有些时候，我们可能想使用其他的分布进行初始化，这时候可以考虑Pytorch中内置的初始化器

torch.nn.init

或自定义初始化。

2.3.1 使用内置初始化

对于下面的单隐层网络，我们想对其中的两个线性层应用内置初始化器

classNet(nn.Module):def__init__(self):super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(3,2),
            nn.ReLU(),
            nn.Linear(2,3),)defforward(self, X):return self.layers(X)

假设权重从

    N
   
   
    (
   
   
    0
   
   
    ,
   
   
    1
   
   
    )
   
  
  
   \mathcal{N}(0,1)
  
 
N(0,1) 中采样，偏置全部初始化为 

 
  
   
    0
   
  
  
   0
  
 
0，则初始化代码如下

definit_normal(module):# 需要判断子模块是否为nn.Linear类，因为激活函数没有参数iftype(module)== nn.Linear:
        nn.init.normal_(module.weight, mean=0, std=1)
        nn.init.zeros_(module.bias)

net = Net()
net.apply(init_normal)for param in net.parameters():print(param)# Parameter containing:# tensor([[-0.3560,  0.8078, -2.4084],#         [ 0.1700, -0.3217, -1.3320]], requires_grad=True)# Parameter containing:# tensor([0., 0.], requires_grad=True)# Parameter containing:# tensor([[-0.8025, -1.0695],#         [-1.7031, -0.3068],#         [-0.3499,  0.4263]], requires_grad=True)# Parameter containing:# tensor([0., 0., 0.], requires_grad=True)

对

net

调用

apply

方法则会递归地对其下所有的子模块应用

init_normal

函数。

2.3.2 自定义初始化

如果我们想要自定义初始化，例如使用以下的分布来初始化网络的权重：

     w
    
    
     ∼
    
    
     
      {
     
     
      
       
        
         
          
           U
          
          
           (
          
          
           5
          
          
           ,
          
          
           10
          
          
           )
          
          
           ,
          
         
        
       
       
        
         
          
           p
          
          
           r
          
          
           o
          
          
           b
          
          
           =
          
          
           0.25
          
         
        
       
      
      
       
        
         
          
           0
          
          
           ,
          
         
        
       
       
        
         
          
           p
          
          
           r
          
          
           o
          
          
           b
          
          
           =
          
          
           0.5
          
         
        
       
      
      
       
        
         
          
           U
          
          
           (
          
          
           −
          
          
           10
          
          
           ,
          
          
           −
          
          
           5
          
          
           )
          
          
           ,
          
         
        
       
       
        
         
          
           p
          
          
           r
          
          
           o
          
          
           b
          
          
           =
          
          
           0.25
          
         
        
       
      
     
    
   
   
     w\sim \begin{cases} U(5,10),&prob=0.25 \\ 0,&prob=0.5\\ U(-10,-5),&prob=0.25 \\ \end{cases} 
   
  
 w∼⎩⎪⎨⎪⎧U(5,10),0,U(−10,−5),prob=0.25prob=0.5prob=0.25

即相当于

    w
   
  
  
   w
  
 
w 从 

 
  
   
    U
   
   
    (
   
   
    −
   
   
    10
   
   
    ,
   
   
    10
   
   
    )
   
  
  
   U(-10, 10)
  
 
U(−10,10) 中采样，如果 

 
  
   
    w
   
  
  
   w
  
 
w 落到 

 
  
   
    (
   
   
    −
   
   
    5
   
   
    ,
   
   
    5
   
   
    )
   
  
  
   (-5, 5)
  
 
(−5,5) 中，则将其置为 

 
  
   
    0
   
  
  
   0
  
 
0。

defmy_init(module):iftype(module)== nn.Linear:
        nn.init.uniform_(module.weight,-10,10)
        mask = module.weight.data.abs()>=5
        module.weight.data *= mask

net = Net()
net.apply(my_init)for param in net.parameters():print(param)# Parameter containing:# tensor([[-0.0000, -5.9610,  8.0000],#         [-0.0000, -0.0000,  7.6041]], requires_grad=True)# Parameter containing:# tensor([ 0.4058, -0.2891], requires_grad=True)# Parameter containing:# tensor([[ 0.0000, -0.0000],#         [-6.9569, -9.5102],#         [-9.0270, -0.0000]], requires_grad=True)# Parameter containing:# tensor([ 0.2521, -0.1500, -0.1484], requires_grad=True)

2.4 参数绑定

对于一个三隐层网络：

net = nn.Sequential(nn.Linear(4,8), nn.ReLU(),
                    nn.Linear(8,8), nn.ReLU(),
                    nn.Linear(8,8), nn.ReLU(),
                    nn.Linear(8,1))

如果我们想让第二个隐层和第三个隐层共享参数，则可以这样做：

shared = nn.Linear(8,8)
net = nn.Sequential(nn.Linear(4,8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.Linear(8,1))

2.5 模型保存

在讲解模型的保存之前，我们先来看一下张量是如何保存的。

2.5.1 张量的保存

torch.save()

和

torch.load()

可以保存/加载Pytorch中的任何对象，使用格式如下:

torch.save(obj, path)
torch.load(path)

其中

path

需要包含文件名，且扩展名通常选择

.pt

。

以张量为例，保存和加载的步骤如下：

t = torch.tensor([1,2,3])
path ='./models/my_tensor.pt'
torch.save(t, path)

a = torch.load(path)print(a)# tensor([1, 2, 3])

需要注意的是，如果

models

文件夹不存在则会报错，因此需要先创建好要保存到的目录再进行保存。

2.5.2 保存整个模型

保存整个模型通常指保存模型的所有参数和整个架构，假设训练好的模型是

model

，则保存和加载的方法如下

torch.save(model,'model.pt')
model = torch.load('model.pt')

但我们通常不这样做，这是因为保存整个模型通常会占用巨大的空间，绝大多数时候我们仅保存模型的参数。

2.5.3 保存模型的参数

我们通常会保存

model.state_dict()

，如下：

torch.save(model.state_dict(),'model_params.pt')

该操作不会保存模型的架构而仅仅是保存参数。若要加载模型，需要先实例化，然后调用

load_state_dict

方法：

model.load_state_dict(torch.load('model_params.pt'))
model.eval()

注意

model.eval()

是必要的，它可将

dropout

和

batch normalization

层设置为评估模式。

三、GPU

在PyTorch中，CPU和GPU用

torch.device('cpu')

和

torch.device('cuda')

来进行表示。需要注意的是，CPU设备意味着所有物理CPU和内存，这意味着PyTorch的计算将尝试使用所有CPU核心。然而，GPU设备只代表一个卡和相应的显存。如果有多个GPU，我们使用

torch.device('cuda:{}'.format(i))

来表示第

    i
   
  
  
   i
  
 
i 块GPU（从0开始）。 另外，

cuda:0

和

cuda

是等价的。

我们可以查询可用GPU的数量：

print(torch.cuda.device_count())# 1

为了使用GPU，需要先声明设备：

device = torch.device("cuda"if torch.cuda.is_available()else"cpu")

3.1 将数据移动到GPU

默认情况下，张量是在CPU上进行创建的。

我们可以直接在创建数据的时候将其移动到GPU上

t = torch.zeros(3,3, device=device)
t.device
# device(type='cuda', index=0)

也可以创建之后使用

to

方法进行移动（使用

to

方法后会返回一个新的对象）

t = torch.zeros(3,3)
t.device
# device(type='cpu')
t = t.to(device)
t.device
# device(type='cuda', index=0)

只有一个GPU时，我们还可以对张量调用

cuda()

方法来返回一个在GPU上的拷贝：

t = torch.zeros(3,3)
t = t.cuda()
t.device
# device(type='cuda', index=0)

该做法的好处是不需要事先声明设备。

3.2 将模型移动到GPU上

我们只有将数据和模型全部移动到GPU上才可以在GPU上进行训练。

""" 方法一 """
device = torch.device("cuda"if torch.cuda.is_available()else"cpu")
net = Net()
net.to(device)""" 方法二 """
net = Net()
net.cuda()

标签： pytorch 深度学习学习

本文转载自: https://blog.csdn.net/raelum/article/details/124669588
版权归原作者 raelum 所有，如有侵权，请联系我们删除。