LSTM模型计算详解

LSTM

写在前面

本文记录笔者在学习LSTM时的记录，相信读者已经在网上看过许多的LSTM博客与视频，与其他博客不同的是，本文会从数学公式的角度，剖析LSTM模型中各个部分的模型输入输出等维度信息，帮助初学者在公式层面理解LSTM模型，并且给出了相关计算的例子代入股票预测场景，并给出参考代码。

模型结构

LSTM的模型结构如下图所示。它由若干个重复的LSTM单元组成，每个单元内部包含遗忘门、输入门和输出门，以及当前时刻的单元状态和输出状态。

LSTM模型结构图

模型输入

LSTM模型，通常是处理一个序列（比如文本序列或时间序列）

     X 
    
   
     = 
    
   
     ( 
    
    
    
      x 
     
    
      1 
     
    
   
     , 
    
    
    
      x 
     
    
      2 
     
    
   
     , 
    
   
     … 
    
   
     , 
    
    
    
      x 
     
    
      t 
     
    
   
     , 
    
   
     … 
     
    
    
      ) 
     
    
      T 
     
    
   
  
    X = (x_1,x_2,\dots,x_t,\dots)^T 
   
  
X=(x1,x2,…,xt,…)T ，每个时间步的输入可以表示为 
 
  
   
    
    
      x 
     
    
      t 
     
    
   
  
    x_t 
   
  
xt，我们使用滑动窗口将序列分为若干个窗口大小为 
 
  
   
   
     L 
    
   
  
    L 
   
  
L的窗口，步长为 
 
  
   
   
     s 
    
   
     t 
    
   
     e 
    
   
     p 
    
   
  
    step 
   
  
step，当数据划分到最后，若不足为 
 
  
   
   
     L 
    
   
  
    L 
   
  
L不能构成窗口时，缺少的数据使用pad填充，通常为0填充或使用最近数据填充。例如，假设我们有 
 
  
   
   
     29 
    
   
  
    29 
   
  
29个时间步骤的输入，即 
 
  
   
    
    
      x 
     
    
      ⃗ 
     
    
   
     = 
    
   
     ( 
    
    
    
      x 
     
    
      0 
     
    
   
     , 
    
    
    
      x 
     
    
      1 
     
    
   
     , 
    
   
     … 
    
   
     , 
    
    
    
      x 
     
    
      28 
     
    
    
    
      ) 
     
    
      T 
     
    
   
  
    \vec{x} = (x_0,x_1,\dots,x_{28})^T 
   
  
x=(x0,x1,…,x28)T，且假设窗口大小为 
 
  
   
   
     10 
    
   
  
    10 
   
  
10，步长 
 
  
   
   
     s 
    
   
     t 
    
   
     e 
    
   
     p 
    
   
  
    step 
   
  
step也为 
 
  
   
   
     10 
    
   
  
    10 
   
  
10我们将数据分成三个窗口，即分为
  
   
    
     
      
      
        x 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      ( 
     
     
     
       x 
      
     
       0 
      
     
    
      , 
     
     
     
       x 
      
     
       1 
      
     
    
      , 
     
    
      … 
     
    
      , 
     
     
     
       x 
      
     
       9 
      
     
     
     
       ) 
      
     
       T 
      
     
    
   
     \vec{x_1} = (x_0,x_1,\dots,x_{9})^T 
    
   
 x1=(x0,x1,…,x9)T
  
   
    
     
      
      
        x 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      ( 
     
     
     
       x 
      
     
       10 
      
     
    
      , 
     
     
     
       x 
      
     
       11 
      
     
    
      , 
     
    
      … 
     
    
      , 
     
     
     
       x 
      
     
       19 
      
     
     
     
       ) 
      
     
       T 
      
     
    
   
     \vec{x_2} = (x_{10},x_{11},\dots,x_{19})^T 
    
   
 x2=(x10,x11,…,x19)T
  
   
    
     
      
      
        x 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      ( 
     
     
     
       x 
      
     
       20 
      
     
    
      , 
     
     
     
       x 
      
     
       21 
      
     
    
      , 
     
    
      … 
     
    
      , 
     
     
     
       x 
      
     
       28 
      
     
    
      , 
     
     
     
       x 
      
     
       29 
      
     
     
     
       ) 
      
     
       T 
      
     
    
   
     \vec{x_3} = (x_{20},x_{21},\dots,x_{28},x_{29})^T 
    
   
 x3=(x20,x21,…,x28,x29)T

由于

      x 
     
    
      29 
     
    
   
  
    x_{29} 
   
  
x29的值不存在，我们将其值设为 
 
  
   
   
     0 
    
   
  
    0 
   
  
0或者 
 
  
   
    
    
      x 
     
    
      28 
     
    
   
  
    x_{28} 
   
  
x28的值，即 
 
  
   
    
     
     
       x 
      
     
       3 
      
     
    
      ⃗ 
     
    
   
     = 
    
   
     ( 
    
    
    
      x 
     
    
      20 
     
    
   
     , 
    
    
    
      x 
     
    
      21 
     
    
   
     , 
    
   
     … 
    
   
     , 
    
    
    
      x 
     
    
      28 
     
    
   
     , 
    
   
     0 
    
    
    
      ) 
     
    
      T 
     
    
   
  
    \vec{x_3} = (x_{20},x_{21},\dots,x_{28}, 0)^T 
   
  
x3=(x20,x21,…,x28,0)T或者 
 
  
   
    
     
     
       x 
      
     
       3 
      
     
    
      ⃗ 
     
    
   
     = 
    
   
     ( 
    
    
    
      x 
     
    
      20 
     
    
   
     , 
    
    
    
      x 
     
    
      21 
     
    
   
     , 
    
   
     … 
    
   
     , 
    
    
    
      x 
     
    
      28 
     
    
   
     , 
    
    
    
      x 
     
    
      28 
     
    
    
    
      ) 
     
    
      T 
     
    
   
  
    \vec{x_3} = (x_{20},x_{21},\dots,x_{28},x_{28})^T 
   
  
x3=(x20,x21,…,x28,x28)T。

当步长

     s 
    
   
     t 
    
   
     e 
    
   
     p 
    
   
  
    step 
   
  
step为 
 
  
   
   
     1 
    
   
  
    1 
   
  
1时，通常不会出现上面的情况，这也是我们使用的最多的一种滑动窗口划分方案。

例如，对于一个时序序列

     X 
    
   
     = 
    
   
     { 
    
    
    
      x 
     
    
      1 
     
    
   
     , 
    
    
    
      x 
     
    
      2 
     
    
   
     , 
    
   
     … 
    
   
     , 
    
    
    
      x 
     
    
      10 
     
    
   
     } 
    
   
  
    X = \{x_1, x_2, \ldots, x_{10}\} 
   
  
X={x1,x2,…,x10}，窗口大小  
 
  
   
   
     L 
    
   
     = 
    
   
     3 
    
   
  
    L = 3 
   
  
L=3，滑动步长  
 
  
   
   
     s 
    
   
     t 
    
   
     e 
    
   
     p 
    
   
     = 
    
   
     1 
    
   
  
    step = 1 
   
  
step=1，滑动窗口划分结果为：
  
   
    
     
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
          , 
         
         
         
           x 
          
         
           3 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           2 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           2 
          
         
        
          , 
         
         
         
           x 
          
         
           3 
          
         
        
          , 
         
         
         
           x 
          
         
           4 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           3 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           3 
          
         
        
          , 
         
         
         
           x 
          
         
           4 
          
         
        
          , 
         
         
         
           x 
          
         
           5 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           4 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           4 
          
         
        
          , 
         
         
         
           x 
          
         
           5 
          
         
        
          , 
         
         
         
           x 
          
         
           6 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           5 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           5 
          
         
        
          , 
         
         
         
           x 
          
         
           6 
          
         
        
          , 
         
         
         
           x 
          
         
           7 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           6 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           6 
          
         
        
          , 
         
         
         
           x 
          
         
           7 
          
         
        
          , 
         
         
         
           x 
          
         
           8 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           7 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           7 
          
         
        
          , 
         
         
         
           x 
          
         
           8 
          
         
        
          , 
         
         
         
           x 
          
         
           9 
          
         
        
          ) 
         
        
       
      
     
     
      
       
        
         
         
           x 
          
         
           8 
          
         
        
          ⃗ 
         
        
       
      
      
       
        
         
        
          = 
         
        
          ( 
         
         
         
           x 
          
         
           8 
          
         
        
          , 
         
         
         
           x 
          
         
           9 
          
         
        
          , 
         
         
         
           x 
          
         
           10 
          
         
        
          ) 
         
        
       
      
     
    
   
     \begin{aligned} \vec{x_1} & = (x_1, x_2, x_3) \\ \vec{x_2} & = (x_2, x_3, x_4) \\ \vec{x_3} & = (x_3, x_4, x_5) \\ \vec{x_4} & = (x_4, x_5, x_6) \\ \vec{x_5} & = (x_5, x_6, x_7) \\ \vec{x_6} & = (x_6, x_7, x_8) \\ \vec{x_7} & = (x_7, x_8, x_9) \\ \vec{x_8} & = (x_8, x_9, x_{10}) \end{aligned} 
    
   
 x1x2x3x4x5x6x7x8=(x1,x2,x3)=(x2,x3,x4)=(x3,x4,x5)=(x4,x5,x6)=(x5,x6,x7)=(x6,x7,x8)=(x7,x8,x9)=(x8,x9,x10)

LSTM 单元的输入包含当前时刻的输入

       x 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{x_t} 
   
  
xt、上一时刻的输出状态 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1以及上一时刻的单元状态 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1。在进行运算第一层LSTM单元时，我们会手动初始化 
 
  
   
    
    
      h 
     
    
      0 
     
    
   
  
    h_0 
   
  
h0、 
 
  
   
    
    
      c 
     
    
      0 
     
    
   
  
    c_0 
   
  
c0，而在后面的LSTM的单元中 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1和 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1，都可以由上一次的LSTM单元获得。 
 
  
   
    
     
     
       x 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{x_t} 
   
  
xt、 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1、 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1分别代表当前时刻的输入信息、上一时刻的输出信息以及上一时刻的记忆信息。其中， 
 
  
   
    
     
     
       x 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       m 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{x_t} \in \mathbb{R}^{m \times 1} 
   
  
xt∈Rm×1， 
 
  
   
   
     m 
    
   
  
    m 
   
  
m是输入序列处理后的窗口大小（长度）， 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1上一时刻的输出状态，形状为 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    h_{t-1} \in \mathbb{R}^{d \times 1} 
   
  
ht−1∈Rd×1， 
 
  
   
   
     d 
    
   
  
    d 
   
  
d是LSTM单元的隐藏状态大小， 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1是上一时刻的单元状态，形状为 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    c_{t-1} \in \mathbb{R}^{d \times 1} 
   
  
ct−1∈Rd×1，与 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1具有相同的形状。

我们通常会把

      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1和 
 
  
   
    
     
     
       x 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{x_t} 
   
  
xt拼在一起形成更长的向量 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt，我们通常竖着拼，即  
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1 ，如公式下所示，然后 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt会传入各个门。当采用多批次时， 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       n 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times n} 
   
  
yt∈R(d+m)×n。
  
   
    
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      [ 
     
     
     
       h 
      
      
      
        t 
       
      
        − 
       
      
        1 
       
      
     
    
      ; 
     
     
      
      
        x 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      ] 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
          
          
            h 
           
           
           
             t 
            
           
             − 
            
           
             1 
            
           
          
         
        
       
       
        
         
          
           
           
             x 
            
           
             t 
            
           
          
            ⃗ 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     \vec{y_t} = [h_{t-1}; \vec{x_t}] = \left[{\begin{matrix} h_{t-1} \\ \vec{x_t} \end{matrix}}\right] 
    
   
 yt=[ht−1;xt]=[ht−1xt]

遗忘门

遗忘门的输入为我们在模型输入中处理得到的

      X 
     
    
      t 
     
    
      ′ 
     
    
   
  
    X_t' 
   
  
Xt′。我们将 
 
  
   
    
    
      X 
     
    
      t 
     
    
      ′ 
     
    
   
  
    X_t' 
   
  
Xt′与遗忘门中的权重矩阵 
 
  
   
    
    
      W 
     
    
      f 
     
    
   
  
    W_f 
   
  
Wf相乘再加上置偏值 
 
  
   
    
    
      b 
     
    
      f 
     
    
   
  
    b_f 
   
  
bf，得到结果 
 
  
   
    
    
      M 
     
    
      f 
     
    
   
  
    M_f 
   
  
Mf。然后对 
 
  
   
    
    
      M 
     
    
      f 
     
    
   
  
    M_f 
   
  
Mf取Sigmoid，得到遗忘门的输出 
 
  
   
    
    
      f 
     
    
      t 
     
    
   
  
    f_t 
   
  
ft，其形状与单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct相同，即  
 
  
   
    
    
      f 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    f_t \in \mathbb{R}^{d \times 1} 
   
  
ft∈Rd×1，表示遗忘的程度。具体的计算公式如(\ref{LSTME02})所示。
  
   
    
     
     
       M 
      
     
       f 
      
     
    
      = 
     
     
     
       W 
      
     
       f 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       f 
      
     
    
   
     M_f = W_f\vec{y_t} + b_f 
    
   
 Mf=Wfyt+bf
  
   
    
     
     
       f 
      
     
       t 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       M 
      
     
       f 
      
     
    
      ) 
     
    
      = 
     
     
     
       1 
      
      
      
        1 
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           f 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           f 
          
         
        
          ) 
         
        
       
      
     
    
   
     f_t = \sigma(M_f) = \frac{1}{1 + e^{-(W_f\vec{y_t} + b_f)}} 
    
   
 ft=σ(Mf)=1+e−(Wfyt+bf)1

其中，

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1， 
 
  
   
    
    
      W 
     
    
      f 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
    
   
  
    W_f \in \mathbb{R}^{d \times (d + m)} 
   
  
Wf∈Rd×(d+m)， 
 
  
   
    
    
      b 
     
    
      f 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    b_f \in \mathbb{R}^{d \times 1} 
   
  
bf∈Rd×1， 
 
  
   
    
    
      f 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    f_t \in \mathbb{R}^{d \times 1} 
   
  
ft∈Rd×1。

在LSTM的许多门中，都使用Sigmoid函数，Sigmoid函数的绝大部分的值的取值范围为

     ( 
    
   
     0 
    
   
     , 
    
   
     1 
    
   
     ) 
    
   
  
    (0, 1) 
   
  
(0,1)，这可以很有效的表示在Sigmoid函数的输入中哪些数据需要记忆，哪些数据需要遗忘的过程。当Sigmoid函数只越接近 
 
  
   
   
     0 
    
   
  
    0 
   
  
0时表示遗忘，当接近 
 
  
   
   
     1 
    
   
  
    1 
   
  
1时表示需要记忆。

输入门

输入门的输入为我们在模型输入中处理得到的

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt，且 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1 } 
   
  
yt∈R(d+m)×1。我们将 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt与输入门中的权重矩阵 
 
  
   
    
    
      W 
     
    
      i 
     
    
   
  
    W_i 
   
  
Wi相乘再加上置偏值 
 
  
   
    
    
      b 
     
    
      i 
     
    
   
  
    b_i 
   
  
bi，得到结果 
 
  
   
    
    
      M 
     
    
      i 
     
    
   
  
    M_i 
   
  
Mi，然后对 
 
  
   
    
    
      M 
     
    
      i 
     
    
   
  
    M_i 
   
  
Mi取Sigmoid，得到输入门的输出 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
  
    i_t 
   
  
it，表示输入的重要程度。具体的计算公式如下所示。
  
   
    
     
     
       M 
      
     
       i 
      
     
    
      = 
     
     
     
       W 
      
     
       i 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       i 
      
     
    
   
     M_i = W_i\vec{y_t} + b_i 
    
   
 Mi=Wiyt+bi
  
   
    
     
     
       i 
      
     
       t 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       M 
      
     
       i 
      
     
    
      ) 
     
    
      = 
     
     
     
       1 
      
      
      
        1 
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           i 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           i 
          
         
        
          ) 
         
        
       
      
     
    
   
     i_t = \sigma(M_i) = \frac{1}{1 + e^{-(W_i\vec{y_t} + b_i)}} 
    
   
 it=σ(Mi)=1+e−(Wiyt+bi)1

其中，

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       n 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times n} 
   
  
yt∈R(d+m)×n， 
 
  
   
    
    
      W 
     
    
      i 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
    
   
  
    W_i \in \mathbb{R}^{d \times (d + m)} 
   
  
Wi∈Rd×(d+m)， 
 
  
   
    
    
      b 
     
    
      i 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    b_i \in \mathbb{R}^{d \times 1} 
   
  
bi∈Rd×1， 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    i_t \in \mathbb{R}^{d \times 1} 
   
  
it∈Rd×1。

输出门

输出门的输入为我们在模型输入中处理得到的

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt，且 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1 } 
   
  
yt∈R(d+m)×1。我们将 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt与输出门中的权重矩阵 
 
  
   
    
    
      W 
     
    
      o 
     
    
   
  
    W_o 
   
  
Wo相乘再加上置偏值 
 
  
   
    
    
      b 
     
    
      o 
     
    
   
  
    b_o 
   
  
bo，得到结果 
 
  
   
    
    
      M 
     
    
      o 
     
    
   
  
    M_o 
   
  
Mo，然后对 
 
  
   
    
    
      M 
     
    
      o 
     
    
   
  
    M_o 
   
  
Mo取Sigmoid，得到输出门的输出 
 
  
   
    
    
      o 
     
    
      t 
     
    
   
  
    o_t 
   
  
ot，具体的计算公式如下所示。
  
   
    
     
     
       M 
      
     
       o 
      
     
    
      = 
     
     
     
       W 
      
     
       o 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       o 
      
     
    
   
     M_o = W_o\vec{y_t} + b_o 
    
   
 Mo=Woyt+bo
  
   
    
     
     
       o 
      
     
       t 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       M 
      
     
       o 
      
     
    
      ) 
     
    
      = 
     
     
     
       1 
      
      
      
        1 
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           o 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           o 
          
         
        
          ) 
         
        
       
      
     
    
   
     o_t = \sigma(M_o) = \frac{1}{1 + e^{-(W_o\vec{y_t} + b_o)}} 
    
   
 ot=σ(Mo)=1+e−(Woyt+bo)1

其中，

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1， 
 
  
   
    
    
      W 
     
    
      o 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
    
   
  
    W_o \in \mathbb{R}^{d \times (d + m)} 
   
  
Wo∈Rd×(d+m)， 
 
  
   
    
    
      b 
     
    
      o 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    b_o \in \mathbb{R}^{d \times 1} 
   
  
bo∈Rd×1， 
 
  
   
    
    
      o 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    o_t \in \mathbb{R}^{d \times 1} 
   
  
ot∈Rd×1。

当前输入单元状态

在计算

      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct之前，我们需要引入当前输入单元状态，并计算 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~的值。 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~是当前输入的单元状态，表示当前输入要保留多少内容到记忆中。我们将 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt与当前时刻状态单元的权重矩阵 
 
  
   
    
    
      W 
     
    
      c 
     
    
   
  
    W_c 
   
  
Wc相乘再加上置偏值 
 
  
   
    
    
      b 
     
    
      c 
     
    
   
  
    b_c 
   
  
bc，得到结果 
 
  
   
    
    
      M 
     
    
      c 
     
    
   
  
    M_c 
   
  
Mc，然后对 
 
  
   
    
    
      M 
     
    
      c 
     
    
   
  
    M_c 
   
  
Mc取tanh，得到的输出 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~。 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~的计算如公式下所示。
  
   
    
     
     
       M 
      
     
       c 
      
     
    
      = 
     
     
     
       W 
      
     
       c 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       c 
      
     
    
   
     M_c = W_c\vec{y_t} + b_c 
    
   
 Mc=Wcyt+bc
  
   
    
     
      
      
        c 
       
      
        t 
       
      
     
       ~ 
      
     
    
      = 
     
    
      tanh 
     
    
      ( 
     
     
     
       M 
      
     
       c 
      
     
    
      ) 
     
    
      = 
     
     
      
       
       
         e 
        
        
        
          M 
         
        
          c 
         
        
       
      
        − 
       
       
       
         e 
        
        
        
          − 
         
         
         
           M 
          
         
           c 
          
         
        
       
      
      
       
       
         e 
        
        
        
          M 
         
        
          c 
         
        
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
         
         
           M 
          
         
           c 
          
         
        
       
      
     
    
      = 
     
     
      
      
        ( 
       
       
       
         e 
        
        
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
        − 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
      
      
        ( 
       
       
       
         e 
        
        
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
     
    
   
     \tilde{c_t} = \text{tanh}(M_c) = \frac{e^{M_c}-e^{-M_c}}{e^{M_c}+e^{-M_c}} = \frac{(e^{W_c\vec{y_t} + b_c)}-e^{-(W_c\vec{y_t} + b_c)}}{(e^{W_c\vec{y_t} + b_c)}+e^{-(W_c\vec{y_t} + b_c)}} 
    
   
 ct~=tanh(Mc)=eMc+e−MceMc−e−Mc=(eWcyt+bc)+e−(Wcyt+bc)(eWcyt+bc)−e−(Wcyt+bc)

其中，

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1， 
 
  
   
    
    
      W 
     
    
      c 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
    
   
  
    W_c \in \mathbb{R}^{d \times (d + m)} 
   
  
Wc∈Rd×(d+m)， 
 
  
   
    
    
      b 
     
    
      c 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    b_c \in \mathbb{R}^{d \times 1} 
   
  
bc∈Rd×1， 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \tilde{c_t} \in \mathbb{R}^{d \times 1} 
   
  
ct~∈Rd×1。

当前输入单元状态中，使用了tanh函数，tanh函数的取值范围为

     ( 
    
   
     − 
    
   
     1 
    
   
     , 
    
   
     1 
    
   
     ) 
    
   
  
    (-1,1) 
   
  
(−1,1)，当函数的值接近 
 
  
   
   
     − 
    
   
     1 
    
   
  
    -1 
   
  
−1时代表着当前输入信息要被修正，当但函数值接近 
 
  
   
   
     1 
    
   
  
    1 
   
  
1时，代码当前输入信息要被加强。

当前时刻单元状态

接下来我们进行当前时刻单元状态

      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct的计算。我们使用遗忘门和输入门得到的结果  
 
  
   
    
    
      f 
     
    
      t 
     
    
   
  
    f_t 
   
  
ft、 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
  
    i_t 
   
  
it和上一时刻单元状态 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1来计算当前时刻单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct。我们分别将 
 
  
   
    
    
      f 
     
    
      t 
     
    
   
  
    f_t 
   
  
ft、 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1按元素相乘， 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
  
    i_t 
   
  
it和 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~按元素相乘，然后再将两者相加得到我们的当前时刻单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct。具体计算如公式下所示。
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      = 
     
     
     
       f 
      
     
       t 
      
     
    
      ∘ 
     
     
     
       c 
      
      
      
        t 
       
      
        − 
       
      
        1 
       
      
     
    
      + 
     
     
     
       i 
      
     
       t 
      
     
    
      ∘ 
     
     
      
      
        c 
       
      
        t 
       
      
     
       ~ 
      
     
    
   
     c_t = f_t \circ c_{t-1} + i_t \circ \tilde{c_t} 
    
   
 ct=ft∘ct−1+it∘ct~

其中，

      f 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    f_t \in \mathbb{R}^{d \times 1} 
   
  
ft∈Rd×1时遗忘门输出， 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    i_t \in \mathbb{R}^{d \times 1} 
   
  
it∈Rd×1是输入门输出， 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \tilde{c_{t}} \in \mathbb{R}^{d \times 1} 
   
  
ct~∈Rd×1是当前输入状态单元， 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    c_{t-1} \in \mathbb{R}^{d \times 1} 
   
  
ct−1∈Rd×1 是上一时刻状态单元， 
 
  
   
   
     ∘ 
    
   
  
    \circ 
   
  
∘表示 **按元素乘**。

模型输出

模型的输出是

      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht和当前时刻的单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct，而 
 
  
   
    
    
      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht由当前时刻的单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct和输出门的输出 
 
  
   
    
    
      o 
     
    
      t 
     
    
   
  
    o_t 
   
  
ot确定。我们将当前时刻的单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct取 tanh得到 
 
  
   
    
    
      d 
     
    
      t 
     
    
   
  
    d_t 
   
  
dt，然后将  
 
  
   
    
    
      d 
     
    
      t 
     
    
   
  
    d_t 
   
  
dt 与  
 
  
   
    
    
      o 
     
    
      t 
     
    
   
  
    o_t 
   
  
ot按元素相乘得到最后的 
 
  
   
    
    
      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht，计算公式如下所示。通常， 
 
  
   
    
    
      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht会进一步传递给模型的上层或者作为最终的预测结果。
  
   
    
     
     
       d 
      
     
       t 
      
     
    
      = 
     
    
      tanh 
     
    
      ( 
     
     
     
       c 
      
     
       t 
      
     
    
      ) 
     
    
      = 
     
     
      
       
       
         e 
        
        
        
          c 
         
        
          t 
         
        
       
      
        − 
       
       
       
         e 
        
        
        
          − 
         
         
         
           c 
          
         
           t 
          
         
        
       
      
      
       
       
         e 
        
        
        
          c 
         
        
          t 
         
        
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
         
         
           c 
          
         
           t 
          
         
        
       
      
     
    
   
     d_t = \text{tanh}(c_t) = \frac{e^{c_t}-e^{-c_t}}{e^{c_t}+e^{-c_t}} 
    
   
 dt=tanh(ct)=ect+e−ctect−e−ct
  
   
    
     
     
       h 
      
     
       t 
      
     
    
      = 
     
     
     
       o 
      
     
       t 
      
     
    
      ∘ 
     
     
     
       d 
      
     
       t 
      
     
    
   
     h_t = o_t \circ d_t 
    
   
 ht=ot∘dt

其中

      h 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    h_t \in \mathbb{R}^{d \times 1} 
   
  
ht∈Rd×1 为当前层隐藏状态， 
 
  
   
    
    
      o 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    o_t \in \mathbb{R}^{d \times 1} 
   
  
ot∈Rd×1为输出门的输出， 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    c_t \in \mathbb{R}^{d \times 1} 
   
  
ct∈Rd×1为当前时刻状态单元。

日期开盘价收盘价最高价最低价4月23日3038.61183021.97753044.94383016.51684月24日3029.40283044.82233045.63993019.12384月25日3037.92723052.89993060.26343034.64994月26日3054.97933088.63573092.43003054.9793
Table: SH000001

简单的LSTM例子

接下来我们根据上面的模型结构中的计算方法来简单计算一个LSTM的例子。

我们以取中国A股上证指数（SH000001）2024年4月23日-25日共3个交易日的数据为例，取开盘价、收盘价、最高价、最低价作为特征，具体数据如表格所示。使用LSTM模型计算预测2024年4月26日的开盘价、收盘价、最高价、最低价，损失函数使用MSE。我们取隐藏层状态

     d 
    
   
  
    d 
   
  
d的大小为 
 
  
   
   
     4 
    
   
  
    4 
   
  
4，然后进行计算，预测下一天的数据。

我们把表格数据处理成

      x 
     
    
      t 
     
    
   
  
    x_t 
   
  
xt的形式，也就是把每天的 
 
  
   
   
     4 
    
   
  
    4 
   
  
4个特征，转换成 
 
  
   
   
     m 
    
   
     × 
    
   
     1 
    
   
  
    m \times 1 
   
  
m×1即 
 
  
   
   
     ( 
    
   
     4 
    
   
     × 
    
   
     1 
    
   
     ) 
    
   
  
    (4 \times 1) 
   
  
(4×1)的向量，然后我们得到以 
 
  
   
   
     X 
    
   
  
    X 
   
  
X的结果。
  
   
    
    
      X 
     
    
      = 
     
    
      ( 
     
     
      
      
        x 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      , 
     
     
      
      
        x 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      , 
     
     
      
      
        x 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           3038.6118 
          
         
        
        
         
         
           3029.4028 
          
         
        
        
         
         
           3037.9272 
          
         
        
       
       
        
         
         
           3021.9775 
          
         
        
        
         
         
           3044.8223 
          
         
        
        
         
         
           3052.8999 
          
         
        
       
       
        
         
         
           3044.9438 
          
         
        
        
         
         
           3045.6399 
          
         
        
        
         
         
           3060.2634 
          
         
        
       
       
        
         
         
           3016.5168 
          
         
        
        
         
         
           3019.1238 
          
         
        
        
         
         
           3034.6499 
          
         
        
       
      
     
       ] 
      
     
    
   
     X = (\vec{x_1}, \vec{x_2}, \vec{x_3}) = \begin{bmatrix} 3038.6118 & 3029.4028 & 3037.9272 \\ 3021.9775 & 3044.8223 & 3052.8999 \\ 3044.9438 & 3045.6399 & 3060.2634 \\ 3016.5168 & 3019.1238 & 3034.6499 \\ \end{bmatrix} 
    
   
 X=(x1,x2,x3)=3038.61183021.97753044.94383016.51683029.40283044.82233045.63993019.12383037.92723052.89993060.26343034.6499

由于隐藏层大小为

     d 
    
   
     = 
    
   
     4 
    
   
  
    d = 4 
   
  
d=4，所以  
 
  
   
    
    
      h 
     
    
      0 
     
    
   
  
    h_0 
   
  
h0、 
 
  
   
    
    
      c 
     
    
      0 
     
    
   
  
    c_0 
   
  
c0的维度都是  
 
  
   
   
     4 
    
   
     × 
    
   
     1 
    
   
  
    4 \times 1 
   
  
4×1，我们将  
 
  
   
    
    
      h 
     
    
      0 
     
    
   
  
    h_0 
   
  
h0和 
 
  
   
    
    
      c 
     
    
      0 
     
    
   
  
    c_0 
   
  
c0进行初始化为 
 
  
   
    
    
      0 
     
    
      ⃗ 
     
    
   
  
    \vec{0} 
   
  
0向量，即
  
   
    
     
     
       h 
      
     
       0 
      
     
    
      = 
     
    
      [ 
     
    
      0 
     
    
      , 
     
    
      0 
     
    
      , 
     
    
      0 
     
    
      , 
     
    
      0 
     
     
     
       ] 
      
     
       T 
      
     
    
      , 
     
     
     
       c 
      
     
       0 
      
     
    
      = 
     
    
      [ 
     
    
      0 
     
    
      , 
     
    
      0 
     
    
      , 
     
    
      0 
     
    
      , 
     
    
      0 
     
     
     
       ] 
      
     
       T 
      
     
    
   
     h_0 = [0, 0, 0, 0]^T, c_0 = [0, 0, 0, 0]^T 
    
   
 h0=[0,0,0,0]T,c0=[0,0,0,0]T

随后我们初始化

      W 
     
    
      f 
     
    
   
  
    W_f 
   
  
Wf、 
 
  
   
    
    
      W 
     
    
      i 
     
    
   
  
    W_i 
   
  
Wi、 
 
  
   
    
    
      W 
     
    
      c 
     
    
   
  
    W_c 
   
  
Wc、 
 
  
   
    
    
      W 
     
    
      o 
     
    
   
  
    W_o 
   
  
Wo（维度为 
 
  
   
   
     d 
    
   
     × 
    
   
     ( 
    
   
     d 
    
   
     + 
    
   
     m 
    
   
     ) 
    
   
  
    d \times (d + m) 
   
  
d×(d+m)，即  
 
  
   
   
     4 
    
   
     × 
    
   
     8 
    
   
  
    4 \times 8 
   
  
4×8以及 
 
  
   
    
    
      b 
     
    
      f 
     
    
   
  
    b_f 
   
  
bf、 
 
  
   
    
    
      b 
     
    
      i 
     
    
   
  
    b_i 
   
  
bi、 
 
  
   
    
    
      b 
     
    
      c 
     
    
   
  
    b_c 
   
  
bc、 
 
  
   
    
    
      b 
     
    
      o 
     
    
   
  
    b_o 
   
  
bo， 
 
  
   
   
     W 
    
   
  
    W 
   
  
W的元素值  
 
  
   
   
     ∈ 
    
   
     [ 
    
   
     − 
    
   
     0.0001 
    
   
     , 
    
   
     0.0001 
    
   
     ] 
    
   
  
    \in [-0.0001, 0.0001] 
   
  
∈[−0.0001,0.0001]，W是随机矩阵，如下所示。
  
   
    
     
     
       W 
      
     
       f 
      
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0010 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0010 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0004 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
       
       
        
         
         
           0.0004 
          
         
        
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
         
           0.0009 
          
         
        
        
         
         
           0.0001 
          
         
        
        
         
         
           0.0004 
          
         
        
        
         
         
           0.0009 
          
         
        
        
         
         
           0.0003 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
         
           0.0007 
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
        
         
         
           0.0001 
          
         
        
        
         
         
           0.0004 
          
         
        
        
         
         
           0.0006 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
         
           0.0007 
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
         
           0.0005 
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0010 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     W_f = \begin{bmatrix} -0.0005 & -0.0010 & -0.0010 & -0.0004 & -0.0008 & -0.0006 & -0.0006 & -0.0007 \\ 0.0004 & -0.0009 & -0.0006 & 0.0009 & 0.0001 & 0.0004 & 0.0009 & 0.0003 \\ -0.0005 & -0.0006 & 0.0007 & -0.0003 & -0.0003 & 0.0001 & 0.0004 & 0.0006 \\ -0.0007 & -0.0008 & 0.0007 & -0.0006 & 0.0005 & -0.0003 & -0.0010 & -0.0002 \\ \end{bmatrix} 
    
   
 Wf=−0.00050.0004−0.0005−0.0007−0.0010−0.0009−0.0006−0.0008−0.0010−0.00060.00070.0007−0.00040.0009−0.0003−0.0006−0.00080.0001−0.00030.0005−0.00060.00040.0001−0.0003−0.00060.00090.0004−0.0010−0.00070.00030.0006−0.0002
  
   
    
     
     
       W 
      
     
       i 
      
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0001 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
        
         
         
           0.0002 
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
         
           0.0000 
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
       
       
        
         
         
           0.0007 
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
         
           0.0006 
          
         
        
        
         
         
           0.0001 
          
         
        
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
         
           0.0004 
          
         
        
        
         
         
           0.0007 
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
         
           0.0010 
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
         
           0.0010 
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
         
           0.0006 
          
         
        
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
        
         
         
           0.0002 
          
         
        
       
      
     
       ] 
      
     
    
   
     W_i = \begin{bmatrix} -0.0006 & -0.0001 & -0.0003 & 0.0002 & 0.0008 & 0.0000 & -0.0003 & -0.0003 \\ 0.0007 & -0.0002 & 0.0006 & 0.0001 & -0.0009 & -0.0005 & -0.0007 & -0.0005 \\ -0.0008 & 0.0004 & 0.0007 & -0.0008 & -0.0008 & 0.0010 & -0.0006 & -0.0009 \\ -0.0005 & 0.0010 & -0.0006 & -0.0002 & -0.0002 & 0.0006 & -0.0007 & 0.0002 \\ \end{bmatrix} 
    
   
 Wi=−0.00060.0007−0.0008−0.0005−0.0001−0.00020.00040.0010−0.00030.00060.0007−0.00060.00020.0001−0.0008−0.00020.0008−0.0009−0.0008−0.00020.0000−0.00050.00100.0006−0.0003−0.0007−0.0006−0.0007−0.0003−0.0005−0.00090.0002
  
   
    
     
     
       W 
      
     
       c 
      
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0001 
          
         
        
        
         
         
           0.0004 
          
         
        
        
         
         
           0.0000 
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
         
           0.0003 
          
         
        
        
         
         
           0.0005 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
         
           0.0005 
          
         
        
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
        
         
         
           0.0002 
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
       
       
        
         
         
           0.0002 
          
         
        
        
         
         
           0.0004 
          
         
        
        
         
         
           0.0000 
          
         
        
        
         
         
           0.0009 
          
         
        
        
         
         
           0.0003 
          
         
        
        
         
         
           0.0003 
          
         
        
        
         
         
           0.0006 
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
         
           0.0009 
          
         
        
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
        
         
         
           0.0002 
          
         
        
        
         
          
          
            − 
           
          
            0.0010 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0003 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     W_c = \begin{bmatrix} 0.0001 & 0.0004 & 0.0000 & -0.0006 & -0.0006 & -0.0002 & 0.0003 & 0.0005 \\ -0.0002 & -0.0006 & 0.0005 & -0.0009 & 0.0002 & -0.0008 & -0.0003 & -0.0009 \\ 0.0002 & 0.0004 & 0.0000 & 0.0009 & 0.0003 & 0.0003 & 0.0006 & -0.0008 \\ -0.0007 & -0.0008 & 0.0009 & -0.0007 & 0.0002 & -0.0010 & -0.0006 & -0.0003 \\ \end{bmatrix} 
    
   
 Wc=0.0001−0.00020.0002−0.00070.0004−0.00060.0004−0.00080.00000.00050.00000.0009−0.0006−0.00090.0009−0.0007−0.00060.00020.00030.0002−0.0002−0.00080.0003−0.00100.0003−0.00030.0006−0.00060.0005−0.0009−0.0008−0.0003
  
   
    
     
     
       W 
      
     
       o 
      
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
         
           0.0000 
          
         
        
        
         
         
           0.0001 
          
         
        
        
         
          
          
            − 
           
          
            0.0001 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0004 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0007 
           
          
         
        
       
       
        
         
         
           0.0009 
          
         
        
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
          
          
            − 
           
          
            0.0009 
           
          
         
        
        
         
         
           0.0001 
          
         
        
        
         
         
           0.0004 
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
         
           0.0004 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0005 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0004 
           
          
         
        
        
         
         
           0.0007 
          
         
        
        
         
          
          
            − 
           
          
            0.0008 
           
          
         
        
        
         
          
          
            − 
           
          
            0.0006 
           
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
         
           0.0006 
          
         
        
        
         
         
           0.0010 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
          
          
            − 
           
          
            0.0004 
           
          
         
        
        
         
         
           0.0008 
          
         
        
        
         
          
          
            − 
           
          
            0.0002 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     W_o = \begin{bmatrix} -0.0009 & -0.0005 & 0.0000 & 0.0001 & -0.0001 & -0.0004 & -0.0005 & -0.0007 \\ 0.0009 & -0.0005 & 0.0008 & -0.0009 & 0.0001 & 0.0004 & -0.0002 & 0.0004 \\ -0.0005 & -0.0004 & 0.0007 & -0.0008 & -0.0006 & 0.0008 & 0.0006 & 0.0010 \\ -0.0002 & 0.0008 & 0.0008 & -0.0002 & 0.0008 & -0.0004 & 0.0008 & -0.0002 \\ \end{bmatrix} 
    
   
 Wo=−0.00090.0009−0.0005−0.0002−0.0005−0.0005−0.00040.00080.00000.00080.00070.00080.0001−0.0009−0.0008−0.0002−0.00010.0001−0.00060.0008−0.00040.00040.0008−0.0004−0.0005−0.00020.00060.0008−0.00070.00040.0010−0.0002
 
  
   
   
     b 
    
   
  
    b 
   
  
b全部初始化为单位列向量即
  
   
    
     
     
       b 
      
     
       f 
      
     
    
      = 
     
     
     
       b 
      
     
       i 
      
     
    
      = 
     
     
     
       b 
      
     
       c 
      
     
    
      = 
     
     
     
       b 
      
     
       o 
      
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            1 
           
          
         
        
        
         
          
          
            1 
           
          
         
        
        
         
          
          
            1 
           
          
         
        
        
         
          
          
            1 
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     b_f = b_i = b_c = b_o = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}^T 
    
   
 bf=bi=bc=bo=1111T

然后我们将

      h 
     
    
      0 
     
    
   
  
    h_0 
   
  
h0与 
 
  
   
    
    
      x 
     
    
      1 
     
    
   
  
    x_1 
   
  
x1拼在一起作为  
 
  
   
    
     
     
       y 
      
     
       1 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_1} 
   
  
y1，即
  
   
    
     
      
      
        y 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      [ 
     
     
     
       h 
      
     
       0 
      
     
    
      ; 
     
     
      
      
        x 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      ] 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0 
           
          
         
         
          
          
            0 
           
          
         
         
          
          
            0 
           
          
         
         
          
          
            0 
           
          
         
         
          
          
            3038.6118 
           
          
         
         
          
          
            3021.9775 
           
          
         
         
          
          
            3044.9438 
           
          
         
         
          
          
            3016.5168 
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     \vec{y_1} = [h_0; \vec{x_1}] = \begin{bmatrix} 0 & 0 & 0 & 0 & 3038.6118 & 3021.9775 & 3044.9438 & 3016.5168 \end{bmatrix}^T 
    
   
 y1=[h0;x1]=[00003038.61183021.97753044.94383016.5168]T

我们依次计算遗忘门

      f 
     
    
      1 
     
    
   
  
    f_1 
   
  
f1，输入门 
 
  
   
    
    
      i 
     
    
      1 
     
    
   
  
    i_1 
   
  
i1，输出门 
 
  
   
    
    
      o 
     
    
      1 
     
    
   
  
    o_1 
   
  
o1，即
  
   
    
     
     
       f 
      
     
       1 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       f 
      
     
     
      
      
        y 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       f 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0008 
          
         
        
       
       
        
         
         
           0.9985 
          
         
        
       
       
        
         
         
           0.9713 
          
         
        
       
       
        
         
         
           0.1164 
          
         
        
       
      
     
       ] 
      
     
    
      , 
     
     
     
       i 
      
     
       1 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       i 
      
     
     
      
      
        y 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       i 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.8514 
          
         
        
       
       
        
         
         
           0.0010 
          
         
        
       
       
        
         
         
           0.0568 
          
         
        
       
       
        
         
         
           0.6491 
          
         
        
       
      
     
       ] 
      
     
    
      , 
     
     
     
       o 
      
     
       1 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       o 
      
     
     
      
      
        y 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       o 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0198 
          
         
        
       
       
        
         
         
           0.9577 
          
         
        
       
       
        
         
         
           0.9981 
          
         
        
       
       
        
         
         
           0.9842 
          
         
        
       
      
     
       ] 
      
     
    
   
     f_1 = \sigma(W_f\vec{y_1} + b_f) = \begin{bmatrix} 0.0008 \\ 0.9985 \\ 0.9713 \\ 0.1164 \end{bmatrix}, i_1 = \sigma(W_i\vec{y_1} + b_i) = \begin{bmatrix} 0.8514 \\ 0.0010 \\ 0.0568 \\ 0.6491 \end{bmatrix}, o_1 = \sigma(W_o\vec{y_1} + b_o) = \begin{bmatrix} 0.0198 \\ 0.9577 \\ 0.9981 \\ 0.9842 \end{bmatrix} 
    
   
 f1=σ(Wfy1+bf)=0.00080.99850.97130.1164,i1=σ(Wiy1+bi)=0.85140.00100.05680.6491,o1=σ(Woy1+bo)=0.01980.95770.99810.9842

随后我们进行计算当前输入单元状态

       c 
      
     
       1 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_1} 
   
  
c1~，即
  
   
    
     
      
      
        c 
       
      
        1 
       
      
     
       ~ 
      
     
    
      = 
     
    
      tanh 
     
    
      ( 
     
     
     
       W 
      
     
       c 
      
     
     
      
      
        y 
       
      
        1 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       c 
      
     
    
      ) 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.7923 
           
          
         
         
          
           
           
             − 
            
           
             0.9997 
            
           
          
         
         
          
          
            0.9805 
           
          
         
         
          
           
           
             − 
            
           
             0.9994 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     \tilde{c_1} = \text{tanh}(W_c\vec{y_1} + b_c) = \begin{bmatrix} 0.7923 & -0.9997 & 0.9805 & -0.9994 \end{bmatrix}^T 
    
   
 c1~=tanh(Wcy1+bc)=[0.7923−0.99970.9805−0.9994]T

接着我们计算当前时刻单元状态

      c 
     
    
      1 
     
    
   
  
    c_1 
   
  
c1，即
  
   
    
     
     
       c 
      
     
       1 
      
     
    
      = 
     
     
     
       f 
      
     
       1 
      
     
    
      ∘ 
     
     
     
       c 
      
     
       0 
      
     
    
      + 
     
     
     
       i 
      
     
       1 
      
     
    
      ∘ 
     
     
      
      
        c 
       
      
        1 
       
      
     
       ~ 
      
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0008 
          
         
        
       
       
        
         
         
           0.9985 
          
         
        
       
       
        
         
         
           0.9713 
          
         
        
       
       
        
         
         
           0.1164 
          
         
        
       
      
     
       ] 
      
     
    
      ∘ 
     
     
     
       [ 
      
      
       
        
         
         
           0 
          
         
        
       
       
        
         
         
           0 
          
         
        
       
       
        
         
         
           0 
          
         
        
       
       
        
         
         
           0 
          
         
        
       
      
     
       ] 
      
     
    
      + 
     
     
     
       [ 
      
      
       
        
         
         
           0.8514 
          
         
        
       
       
        
         
         
           0.0010 
          
         
        
       
       
        
         
         
           0.0568 
          
         
        
       
       
        
         
         
           0.6491 
          
         
        
       
      
     
       ] 
      
     
    
      ∘ 
     
     
     
       [ 
      
      
       
        
         
         
           0.7923 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.9997 
           
          
         
        
       
       
        
         
         
           0.9805 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.9994 
           
          
         
        
       
      
     
       ] 
      
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.6746 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.001 
           
          
         
        
       
       
        
         
         
           0.0557 
          
         
        
       
       
        
         
          
          
            − 
           
          
            0.6488 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     c_1 = f_1 \circ c_{0} + i_1 \circ \tilde{c_1} = \begin{bmatrix} 0.0008 \\ 0.9985 \\ 0.9713 \\ 0.1164 \end{bmatrix} \circ \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0.8514 \\ 0.0010 \\ 0.0568 \\ 0.6491 \end{bmatrix} \circ \begin{bmatrix} 0.7923 \\ -0.9997 \\ 0.9805 \\ -0.9994 \end{bmatrix} = \begin{bmatrix} 0.6746 \\ -0.001 \\ 0.0557 \\ -0.6488 \end{bmatrix} 
    
   
 c1=f1∘c0+i1∘c1~=0.00080.99850.97130.1164∘0000+0.85140.00100.05680.6491∘0.7923−0.99970.9805−0.9994=0.6746−0.0010.0557−0.6488

最后我们计算当前层隐藏层输出

      h 
     
    
      1 
     
    
   
  
    h_1 
   
  
h1，即
  
   
    
     
     
       h 
      
     
       1 
      
     
    
      = 
     
     
     
       o 
      
     
       1 
      
     
    
      ∘ 
     
     
     
       d 
      
     
       1 
      
     
    
      = 
     
     
     
       o 
      
     
       1 
      
     
    
      ∘ 
     
    
      tanh 
     
    
      ( 
     
     
     
       c 
      
     
       1 
      
     
    
      ) 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.0116 
           
          
         
         
          
           
           
             − 
            
           
             0.001 
            
           
          
         
         
          
          
            0.0556 
           
          
         
         
          
           
           
             − 
            
           
             0.5618 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     h_1 = o_1 \circ d_1 = o_1 \circ \text{tanh}(c_1) = \begin{bmatrix} 0.0116 & -0.001 & 0.0556 & -0.5618 \end{bmatrix}^T 
    
   
 h1=o1∘d1=o1∘tanh(c1)=[0.0116−0.0010.0556−0.5618]T

这样我们就完成了一次LSTM单元的正向传播计算，我们得到了

      h 
     
    
      1 
     
    
   
  
    h_1 
   
  
h1和 
 
  
   
    
    
      c 
     
    
      1 
     
    
   
  
    c_1 
   
  
c1，我们将其传入下一层。

同理我们可以进行接下来 **第

      2 
     
    
   
     2 
    
   
 2个交易日** 的计算。

我们将

      h 
     
    
      1 
     
    
   
  
    h_1 
   
  
h1与 
 
  
   
    
     
     
       x 
      
     
       2 
      
     
    
      ⃗ 
     
    
   
  
    \vec{x_2} 
   
  
x2拼在一起作为  
 
  
   
    
     
     
       y 
      
     
       2 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_2} 
   
  
y2，即
  
   
    
     
      
      
        y 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      [ 
     
     
     
       h 
      
     
       1 
      
     
    
      ; 
     
     
      
      
        x 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      ] 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.0116 
           
          
         
         
          
           
           
             − 
            
           
             0.001 
            
           
          
         
         
          
          
            0.0556 
           
          
         
         
          
           
           
             − 
            
           
             0.5618 
            
           
          
         
         
          
          
            3029.4028 
           
          
         
         
          
          
            3044.8223 
           
          
         
         
          
          
            3045.6399 
           
          
         
         
          
          
            3019.1238 
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     \vec{y_2} = [h_1; \vec{x_2}] = \begin{bmatrix} 0.0116 & -0.001 & 0.0556 & -0.5618 & 3029.4028 & 3044.8223 & 3045.6399 & 3019.1238 \end{bmatrix}^T 
    
   
 y2=[h1;x2]=[0.0116−0.0010.0556−0.56183029.40283044.82233045.63993019.1238]T

我们依次计算遗忘门

      f 
     
    
      2 
     
    
   
  
    f_2 
   
  
f2，输入门 
 
  
   
    
    
      i 
     
    
      2 
     
    
   
  
    i_2 
   
  
i2，输出门 
 
  
   
    
    
      o 
     
    
      2 
     
    
   
  
    o_2 
   
  
o2，即
  
   
    
     
     
       f 
      
     
       2 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       f 
      
     
     
      
      
        y 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       f 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0008 
          
         
        
       
       
        
         
         
           0.9985 
          
         
        
       
       
        
         
         
           0.9715 
          
         
        
       
       
        
         
         
           0.1151 
          
         
        
       
      
     
       ] 
      
     
    
      , 
     
     
     
       i 
      
     
       2 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       i 
      
     
     
      
      
        y 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       i 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.8503 
          
         
        
       
       
        
         
         
           0.0010 
          
         
        
       
       
        
         
         
           0.0583 
          
         
        
       
       
        
         
         
           0.6527 
          
         
        
       
      
     
       ] 
      
     
    
      , 
     
     
     
       o 
      
     
       2 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       o 
      
     
     
      
      
        y 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       o 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0196 
          
         
        
       
       
        
         
         
           0.9581 
          
         
        
       
       
        
         
         
           0.9981 
          
         
        
       
       
        
         
          
          
            . 
           
          
            9839 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     f_2 = \sigma(W_f\vec{y_2} + b_f) = \begin{bmatrix} 0.0008 \\ 0.9985 \\ 0.9715 \\ 0.1151 \end{bmatrix}, i_2 = \sigma(W_i\vec{y_2} + b_i) = \begin{bmatrix} 0.8503 \\ 0.0010 \\ 0.0583 \\ 0.6527 \end{bmatrix}, o_2 = \sigma(W_o\vec{y_2} + b_o) = \begin{bmatrix} 0.0196 \\ 0.9581 \\ 0.9981 \\.9839 \end{bmatrix} 
    
   
 f2=σ(Wfy2+bf)=0.00080.99850.97150.1151,i2=σ(Wiy2+bi)=0.85030.00100.05830.6527,o2=σ(Woy2+bo)=0.01960.95810.9981.9839

随后我们进行计算当前输入单元状态

       c 
      
     
       2 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_2} 
   
  
c2~，即
  
   
    
     
      
      
        c 
       
      
        2 
       
      
     
       ~ 
      
     
    
      = 
     
    
      tanh 
     
    
      ( 
     
     
     
       W 
      
     
       c 
      
     
     
      
      
        y 
       
      
        2 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       c 
      
     
    
      ) 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.7935 
           
          
         
         
          
           
           
             − 
            
           
             0.9998 
            
           
          
         
         
          
          
            0.9806 
           
          
         
         
          
           
           
             − 
            
           
             0.9994 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     \tilde{c_2} = \text{tanh}(W_c\vec{y_2} + b_c) = \begin{bmatrix} 0.7935 & -0.9998 & 0.9806 & -0.9994 \end{bmatrix}^T 
    
   
 c2~=tanh(Wcy2+bc)=[0.7935−0.99980.9806−0.9994]T

接着我们计算当前时刻单元状态

      c 
     
    
      2 
     
    
   
  
    c_2 
   
  
c2，即
  
   
    
     
     
       c 
      
     
       2 
      
     
    
      = 
     
     
     
       f 
      
     
       2 
      
     
    
      ∘ 
     
     
     
       c 
      
     
       1 
      
     
    
      + 
     
     
     
       i 
      
     
       2 
      
     
    
      ∘ 
     
     
      
      
        c 
       
      
        2 
       
      
     
       ~ 
      
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.6747 
           
          
         
         
          
           
           
             − 
            
           
             0.0010 
            
           
          
         
         
          
          
            0.0571 
           
          
         
         
          
           
           
             − 
            
           
             0.6524 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     c_2 = f_2 \circ c_{1} + i_2 \circ \tilde{c_2} = \begin{bmatrix} 0.6747 & -0.0010 & 0.0571 & -0.6524 \end{bmatrix}^T 
    
   
 c2=f2∘c1+i2∘c2~=[0.6747−0.00100.0571−0.6524]T

最后我们计算当前层隐藏层输出

      h 
     
    
      2 
     
    
   
  
    h_2 
   
  
h2，即
  
   
    
     
     
       h 
      
     
       2 
      
     
    
      = 
     
     
     
       o 
      
     
       2 
      
     
    
      ∘ 
     
     
     
       d 
      
     
       2 
      
     
    
      = 
     
     
     
       o 
      
     
       2 
      
     
    
      ∘ 
     
    
      tanh 
     
    
      ( 
     
     
     
       c 
      
     
       2 
      
     
    
      ) 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.0115 
           
          
         
         
          
           
           
             − 
            
           
             0.0010 
            
           
          
         
         
          
          
            0.0570 
           
          
         
         
          
           
           
             − 
            
           
             0.5640 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     h_2 = o_2 \circ d_2 = o_2 \circ \text{tanh}(c_2) = \begin{bmatrix} 0.0115 & -0.0010 & 0.0570 & -0.5640 \end{bmatrix}^T 
    
   
 h2=o2∘d2=o2∘tanh(c2)=[0.0115−0.00100.0570−0.5640]T

同理我们可以进行接下来 **第

      3 
     
    
   
     3 
    
   
 3个交易日** 的计算。

我们将

      h 
     
    
      2 
     
    
   
  
    h_2 
   
  
h2与 
 
  
   
    
     
     
       x 
      
     
       3 
      
     
    
      ⃗ 
     
    
   
  
    \vec{x_3} 
   
  
x3拼在一起作为  
 
  
   
    
     
     
       y 
      
     
       3 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_3} 
   
  
y3，即
  
   
    
     
      
      
        y 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      [ 
     
     
     
       h 
      
     
       2 
      
     
    
      ; 
     
     
      
      
        x 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      ] 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.0115 
           
          
         
         
          
           
           
             − 
            
           
             0.0010 
            
           
          
         
         
          
          
            0.0570 
           
          
         
         
          
           
           
             − 
            
           
             0.5640 
            
           
          
         
         
          
          
            3037.9272 
           
          
         
         
          
          
            3052.8999 
           
          
         
         
          
          
            3060.2634 
           
          
         
         
          
          
            3034.6499 
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     \vec{y_3} = [h_2; \vec{x_3}] = \begin{bmatrix} 0.0115 & -0.0010 & 0.0570 & -0.5640 & 3037.9272 & 3052.8999 & 3060.2634 & 3034.6499 \end{bmatrix}^T 
    
   
 y3=[h2;x3]=[0.0115−0.00100.0570−0.56403037.92723052.89993060.26343034.6499]T

我们依次计算遗忘门

      f 
     
    
      3 
     
    
   
  
    f_3 
   
  
f3，输入门 
 
  
   
    
    
      i 
     
    
      3 
     
    
   
  
    i_3 
   
  
i3，输出门 
 
  
   
    
    
      o 
     
    
      3 
     
    
   
  
    o_3 
   
  
o3。
  
   
    
     
     
       f 
      
     
       3 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       f 
      
     
     
      
      
        y 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       f 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0008 
          
         
        
       
       
        
         
         
           0.9985 
          
         
        
       
       
        
         
         
           0.9719 
          
         
        
       
       
        
         
         
           0.1135 
          
         
        
       
      
     
       ] 
      
     
    
      , 
     
     
     
       i 
      
     
       3 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       i 
      
     
     
      
      
        y 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       i 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.8501 
          
         
        
       
       
        
         
         
           0.0010 
          
         
        
       
       
        
         
         
           0.0572 
          
         
        
       
       
        
         
         
           0.6518 
          
         
        
       
      
     
       ] 
      
     
    
      , 
     
     
     
       o 
      
     
       3 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       o 
      
     
     
      
      
        y 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       o 
      
     
    
      ) 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
         
           0.0192 
          
         
        
       
       
        
         
         
           0.9584 
          
         
        
       
       
        
         
         
           0.9982 
          
         
        
       
       
        
         
         
           0.9841 
          
         
        
       
      
     
       ] 
      
     
    
   
     f_3 = \sigma(W_f\vec{y_3} + b_f) = \begin{bmatrix} 0.0008 \\ 0.9985 \\ 0.9719 \\ 0.1135 \end{bmatrix}, i_3 = \sigma(W_i\vec{y_3} + b_i) = \begin{bmatrix} 0.8501 \\ 0.0010 \\ 0.0572 \\ 0.6518 \end{bmatrix}, o_3 = \sigma(W_o\vec{y_3} + b_o) = \begin{bmatrix} 0.0192 \\ 0.9584 \\ 0.9982 \\ 0.9841 \end{bmatrix} 
    
   
 f3=σ(Wfy3+bf)=0.00080.99850.97190.1135,i3=σ(Wiy3+bi)=0.85010.00100.05720.6518,o3=σ(Woy3+bo)=0.01920.95840.99820.9841

随后我们进行计算当前输入单元状态

       c 
      
     
       3 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_3} 
   
  
c3~，即
  
   
    
     
      
      
        c 
       
      
        3 
       
      
     
       ~ 
      
     
    
      = 
     
    
      tanh 
     
    
      ( 
     
     
     
       W 
      
     
       c 
      
     
     
      
      
        y 
       
      
        3 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       c 
      
     
    
      ) 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.7956 
           
          
         
         
          
           
           
             − 
            
           
             0.9998 
            
           
          
         
         
          
          
            0.9807 
           
          
         
         
          
           
           
             − 
            
           
             0.9994 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     \tilde{c_3} = \text{tanh}(W_c\vec{y_3} + b_c) = \begin{bmatrix} 0.7956 & -0.9998 & 0.9807 & -0.9994 \end{bmatrix}^T 
    
   
 c3~=tanh(Wcy3+bc)=[0.7956−0.99980.9807−0.9994]T

接着我们计算当前时刻单元状态

      c 
     
    
      3 
     
    
   
  
    c_3 
   
  
c3，即
  
   
    
     
     
       c 
      
     
       3 
      
     
    
      = 
     
     
     
       f 
      
     
       3 
      
     
    
      ∘ 
     
     
     
       c 
      
     
       2 
      
     
    
      + 
     
     
     
       i 
      
     
       3 
      
     
    
      ∘ 
     
     
      
      
        c 
       
      
        3 
       
      
     
       ~ 
      
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.6763 
           
          
         
         
          
           
           
             − 
            
           
             0.0010 
            
           
          
         
         
          
          
            0.0561 
           
          
         
         
          
           
           
             − 
            
           
             0.6515 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     c_3 = f_3 \circ c_{2} + i_3 \circ \tilde{c_3} = \begin{bmatrix} 0.6763 & -0.0010 & 0.0561 & -0.6515 \end{bmatrix}^T 
    
   
 c3=f3∘c2+i3∘c3~=[0.6763−0.00100.0561−0.6515]T

最后我们计算当前层隐藏层输出

      h 
     
    
      3 
     
    
   
  
    h_3 
   
  
h3，即
  
   
    
     
     
       h 
      
     
       3 
      
     
    
      = 
     
     
     
       o 
      
     
       3 
      
     
    
      ∘ 
     
     
     
       d 
      
     
       3 
      
     
    
      = 
     
     
     
       o 
      
     
       3 
      
     
    
      ∘ 
     
    
      tanh 
     
    
      ( 
     
     
     
       c 
      
     
       3 
      
     
    
      ) 
     
    
      = 
     
     
      
      
        [ 
       
       
        
         
          
          
            0.0113 
           
          
         
         
          
           
           
             − 
            
           
             0.0010 
            
           
          
         
         
          
          
            0.0559 
           
          
         
         
          
           
           
             − 
            
           
             0.5636 
            
           
          
         
        
       
      
        ] 
       
      
     
       T 
      
     
    
   
     h_3 = o_3 \circ d_3 = o_3 \circ \text{tanh}(c_3) = \begin{bmatrix} 0.0113 & -0.0010 & 0.0559 & -0.5636 \end{bmatrix}^T 
    
   
 h3=o3∘d3=o3∘tanh(c3)=[0.0113−0.00100.0559−0.5636]T

得到了

      h 
     
    
      3 
     
    
   
  
    h_3 
   
  
h3之后，我们可以简单将 
 
  
   
    
    
      h 
     
    
      3 
     
    
   
  
    h_3 
   
  
h3的结果作为预测的结果，然后使用MSE进行计算损失，MSE的计算公式如下所示。
  
   
    
    
      MSE 
     
    
      = 
     
     
     
       1 
      
     
       n 
      
     
     
     
       ∑ 
      
      
      
        i 
       
      
        = 
       
      
        1 
       
      
     
       n 
      
     
    
      ( 
     
     
      
      
        y 
       
      
        i 
       
      
     
       ^ 
      
     
    
      − 
     
     
     
       y 
      
     
       i 
      
     
     
     
       ) 
      
     
       2 
      
     
    
   
     \text{MSE} = \frac{1}{n} \sum_{i = 1}^{n} (\hat{y_i} - y_i )^2 
    
   
 MSE=n1i=1∑n(yi^−yi)2
  
   
    
    
      MSE 
     
    
      = 
     
     
     
       1 
      
     
       4 
      
     
    
      [ 
     
    
      ( 
     
    
      3054.9793 
     
    
      − 
     
    
      0.0113 
     
     
     
       ) 
      
     
       2 
      
     
    
      + 
     
    
      ( 
     
    
      3088.6357 
     
    
      + 
     
    
      0.0010 
     
     
     
       ) 
      
     
       2 
      
     
    
      + 
     
    
      ( 
     
    
      3092.43 
     
    
      − 
     
    
      0.0559 
     
     
     
       ) 
      
     
       2 
      
     
    
      + 
     
    
      ( 
     
    
      3054.9793 
     
    
      + 
     
    
      0.5636 
     
     
     
       ) 
      
     
       2 
      
     
    
      ] 
     
     
    
      = 
     
    
      9437756.3022 
     
    
   
     \text{MSE} = \frac{1}{4} [(3054.9793 - 0.0113)^2 + (3088.6357 + 0.0010)^2 + ( 3092.43 - 0.0559)^2 + (3054.9793 + 0.5636)^2 ] \\ = 9437756.3022 
    
   
 MSE=41[(3054.9793−0.0113)2+(3088.6357+0.0010)2+(3092.43−0.0559)2+(3054.9793+0.5636)2]=9437756.3022

然后我们就得到我们的损失为

     9437756.3022 
    
   
  
    9437756.3022 
   
  
9437756.3022。

以上就完成了一次将LSTM用于预测的计算。可以看到误差很大，实际应用中会先将数据输入到LSTM前，会进行一次归一化，在LSTM的输出后，会将隐藏层的结果进行一层线性映射，然后使用逆归一化，这样得到结果会比较接近我们的指数。

小结

LSTM模型的具体训练步骤如下：

1.LSTM 单元的输入包含当前时刻的输入

     v 
    
   
     e 
    
   
     c 
    
    
    
      x 
     
    
      t 
     
    
   
  
    vec{x_t} 
   
  
vecxt、上一时刻的输出状态 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1以及上一时刻的单元状态 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1。在进行运算第一层LSTM单元时，我们会手动初始化 
 
  
   
    
    
      h 
     
    
      0 
     
    
   
  
    h_0 
   
  
h0、 
 
  
   
    
    
      c 
     
    
      0 
     
    
   
  
    c_0 
   
  
c0，而在后面的LSTM的单元中 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1和 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1，都可以由上一次的LSTM单元获得。其中， 
 
  
   
    
     
     
       x 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       m 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{x_t} \in \mathbb{R}^{m \times 1} 
   
  
xt∈Rm×1， 
 
  
   
   
     m 
    
   
  
    m 
   
  
m是输入特征的维度， 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1上一时刻的输出状态，形状为 
 
  
   
    
    
      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    h_{t-1} \in \mathbb{R}^{d \times 1} 
   
  
ht−1∈Rd×1， 
 
  
   
   
     d 
    
   
  
    d 
   
  
d是LSTM单元的隐藏状态大小， 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1是上一时刻的单元状态，形状为 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    c_{t-1} \in \mathbb{R}^{d \times 1} 
   
  
ct−1∈Rd×1。

我们通常会把

      h 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    h_{t-1} 
   
  
ht−1和 
 
  
   
    
     
     
       x 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{x_t} 
   
  
xt拼在一起形成更长的向量 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt，我们通常竖着拼，即  
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1 ，然后 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt会传入各个门。
  
   
    
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      = 
     
    
      [ 
     
     
     
       h 
      
      
      
        t 
       
      
        − 
       
      
        1 
       
      
     
    
      ; 
     
     
      
      
        x 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      ] 
     
    
      = 
     
     
     
       [ 
      
      
       
        
         
          
          
            h 
           
           
           
             t 
            
           
             − 
            
           
             1 
            
           
          
         
        
       
       
        
         
          
           
           
             x 
            
           
             t 
            
           
          
            ⃗ 
           
          
         
        
       
      
     
       ] 
      
     
    
   
     \vec{y_t} = [h_{t-1};\vec{x_t}] = \left[{\begin{matrix}h_{t-1} \\ \vec{x_t} \end{matrix}}\right] 
    
   
 yt=[ht−1;xt]=[ht−1xt]

2.随后是计算各个门的输出，各个门的输入是

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt。我们将 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt与门中的权重矩阵 
 
  
   
   
     W 
    
   
  
    W 
   
  
W相乘再加上置偏值 
 
  
   
   
     b 
    
   
  
    b 
   
  
b，得到中间结果 
 
  
   
   
     M 
    
   
  
    M 
   
  
M。然后对 
 
  
   
   
     M 
    
   
  
    M 
   
  
M取Sigmoid，得到门的输出 
 
  
   
    
    
      g 
     
    
      t 
     
    
   
  
    g_t 
   
  
gt，其形状与单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct相同，即  
 
  
   
    
    
      g 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    g_t \in \mathbb{R}^{d \times 1} 
   
  
gt∈Rd×1。
  
   
    
     
     
       f 
      
     
       t 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       f 
      
     
     
      
       
       
         y 
        
       
         t 
        
       
      
        ⃗ 
       
      
     
       ′ 
      
     
    
      + 
     
     
     
       b 
      
     
       f 
      
     
    
      ) 
     
    
      = 
     
     
     
       1 
      
      
      
        1 
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           f 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           f 
          
         
        
          ) 
         
        
       
      
     
    
   
     f_t = \sigma(W_f\vec{y_t}' + b_f) = \frac{1}{1 + e^{-(W_f\vec{y_t} + b_f)}} 
    
   
 ft=σ(Wfyt′+bf)=1+e−(Wfyt+bf)1
  
   
    
     
     
       i 
      
     
       t 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       i 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       i 
      
     
    
      ) 
     
    
      = 
     
     
     
       1 
      
      
      
        1 
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           i 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           i 
          
         
        
          ) 
         
        
       
      
     
    
   
     i_t = \sigma(W_i\vec{y_t} + b_i) = \frac{1}{1 + e^{-(W_i\vec{y_t} + b_i)}} 
    
   
 it=σ(Wiyt+bi)=1+e−(Wiyt+bi)1
  
   
    
     
     
       o 
      
     
       t 
      
     
    
      = 
     
    
      σ 
     
    
      ( 
     
     
     
       W 
      
     
       o 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       o 
      
     
    
      ) 
     
    
      = 
     
     
     
       1 
      
      
      
        1 
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           f 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           o 
          
         
        
          ) 
         
        
       
      
     
    
   
     o_t = \sigma(W_o\vec{y_t} + b_o) = \frac{1}{1 + e^{-(W_f\vec{y_t} + b_o)}} 
    
   
 ot=σ(Woyt+bo)=1+e−(Wfyt+bo)1

其中，

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1， 
 
  
   
    
    
      W 
     
    
      f 
     
    
   
     、 
    
    
    
      W 
     
    
      i 
     
    
   
     、 
    
    
    
      W 
     
    
      o 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
    
   
  
    W_f、W_i、W_o \in \mathbb{R}^{d \times (d + m)} 
   
  
Wf、Wi、Wo∈Rd×(d+m)， 
 
  
   
    
    
      b 
     
    
      f 
     
    
   
     、 
    
    
    
      b 
     
    
      i 
     
    
   
     、 
    
    
    
      b 
     
    
      o 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    b_f、b_i、b_o \in \mathbb{R}^{d \times 1} 
   
  
bf、bi、bo∈Rd×1， 
 
  
   
    
    
      f 
     
    
      t 
     
    
   
     、 
    
    
    
      i 
     
    
      t 
     
    
   
     、 
    
    
    
      o 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    f_t、i_t、o_t \in \mathbb{R}^{d \times 1} 
   
  
ft、it、ot∈Rd×1。

3.计算当前输入单元状态

       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~的值，表示当前输入要保留多少内容到记忆中。我们将 
 
  
   
    
     
     
       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
  
    \vec{y_t} 
   
  
yt与当前时刻状态单元的权重矩阵 
 
  
   
    
    
      W 
     
    
      c 
     
    
   
  
    W_c 
   
  
Wc相乘再加上置偏值 
 
  
   
    
    
      b 
     
    
      c 
     
    
   
  
    b_c 
   
  
bc，得到中间结果 
 
  
   
    
    
      M 
     
    
      c 
     
    
   
  
    M_c 
   
  
Mc，然后对 
 
  
   
    
    
      M 
     
    
      c 
     
    
   
  
    M_c 
   
  
Mc取tanh，得到输出 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~。
  
   
    
     
      
      
        c 
       
      
        t 
       
      
     
       ~ 
      
     
    
      = 
     
    
      tanh 
     
    
      ( 
     
     
     
       W 
      
     
       c 
      
     
     
      
      
        y 
       
      
        t 
       
      
     
       ⃗ 
      
     
    
      + 
     
     
     
       b 
      
     
       c 
      
     
    
      ) 
     
    
      = 
     
     
      
       
       
         e 
        
        
        
          ( 
         
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
        − 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
      
       
       
         e 
        
        
        
          ( 
         
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
        
          ( 
         
         
         
           W 
          
         
           c 
          
         
         
          
          
            y 
           
          
            t 
           
          
         
           ⃗ 
          
         
        
          + 
         
         
         
           b 
          
         
           c 
          
         
        
          ) 
         
        
       
      
     
    
   
     \tilde{c_t} = \text{tanh}(W_c\vec{y_t} + b_c) = \frac{e^{(W_c\vec{y_t} + b_c)}-e^{-(W_c\vec{y_t} + b_c)}}{e^{(W_c\vec{y_t} + b_c)}+e^{-(W_c\vec{y_t} + b_c)}} 
    
   
 ct~=tanh(Wcyt+bc)=e(Wcyt+bc)+e−(Wcyt+bc)e(Wcyt+bc)−e−(Wcyt+bc)

其中，

       y 
      
     
       t 
      
     
    
      ⃗ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \vec{y_t} \in \mathbb{R}^{(d + m) \times 1} 
   
  
yt∈R(d+m)×1， 
 
  
   
    
    
      W 
     
    
      c 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       ( 
      
     
       d 
      
     
       + 
      
     
       m 
      
     
       ) 
      
     
    
   
  
    W_c \in \mathbb{R}^{d \times (d + m)} 
   
  
Wc∈Rd×(d+m)， 
 
  
   
    
    
      b 
     
    
      c 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    b_c \in \mathbb{R}^{d \times 1} 
   
  
bc∈Rd×1， 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \tilde{c_t} \in \mathbb{R}^{d \times 1} 
   
  
ct~∈Rd×1。

4.接下来我们进行当前时刻单元状态

      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct的计算。我们使用遗忘门和输入门得到的结果  
 
  
   
    
    
      f 
     
    
      t 
     
    
   
  
    f_t 
   
  
ft、 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
  
    i_t 
   
  
it和上一时刻单元状态 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1来计算当前时刻单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct。我们分别将 
 
  
   
    
    
      f 
     
    
      t 
     
    
   
  
    f_t 
   
  
ft、 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
  
    c_{t-1} 
   
  
ct−1按元素相乘， 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
  
    i_t 
   
  
it和 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
  
    \tilde{c_t} 
   
  
ct~按元素相乘，然后再将两者相加得到我们的但钱时刻单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct。
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      = 
     
     
     
       f 
      
     
       t 
      
     
    
      ∘ 
     
     
     
       c 
      
      
      
        t 
       
      
        − 
       
      
        1 
       
      
     
    
      + 
     
     
     
       i 
      
     
       t 
      
     
    
      ∘ 
     
     
      
      
        c 
       
      
        t 
       
      
     
       ~ 
      
     
    
   
     c_t = f_t \circ c_{t-1} + i_t \circ \tilde{c_t} 
    
   
 ct=ft∘ct−1+it∘ct~

其中，

      f 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    f_t \in \mathbb{R}^{d \times 1} 
   
  
ft∈Rd×1时遗忘门输出， 
 
  
   
    
    
      i 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    i_t \in \mathbb{R}^{d \times 1} 
   
  
it∈Rd×1是输入门输出， 
 
  
   
    
     
     
       c 
      
     
       t 
      
     
    
      ~ 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    \tilde{c_{t}} \in \mathbb{R}^{d \times 1} 
   
  
ct~∈Rd×1是当前输入状态单元， 
 
  
   
    
    
      c 
     
     
     
       t 
      
     
       − 
      
     
       1 
      
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    c_{t-1} \in \mathbb{R}^{d \times 1} 
   
  
ct−1∈Rd×1 是上一时刻状态单元， 
 
  
   
   
     ∘ 
    
   
  
    \circ 
   
  
∘表示 **按元素乘**。

5.最后模型的输出是

      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht和当前时刻的单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct，而 
 
  
   
    
    
      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht由当前时刻的单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct和输出门的输出 
 
  
   
    
    
      o 
     
    
      t 
     
    
   
  
    o_t 
   
  
ot确定。我们将当前时刻的单元状态 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
  
    c_t 
   
  
ct取 tanh得到 
 
  
   
    
    
      d 
     
    
      t 
     
    
   
  
    d_t 
   
  
dt，然后将  
 
  
   
    
    
      d 
     
    
      t 
     
    
   
  
    d_t 
   
  
dt 与  
 
  
   
    
    
      o 
     
    
      t 
     
    
   
  
    o_t 
   
  
ot按元素相乘得到最后的 
 
  
   
    
    
      h 
     
    
      t 
     
    
   
  
    h_t 
   
  
ht。
  
   
    
     
     
       h 
      
     
       t 
      
     
    
      = 
     
     
     
       o 
      
     
       t 
      
     
    
      ∘ 
     
     
     
       d 
      
     
       t 
      
     
    
      = 
     
     
     
       o 
      
     
       t 
      
     
    
      ∘ 
     
    
      tanh 
     
    
      ( 
     
     
     
       c 
      
     
       t 
      
     
    
      ) 
     
    
      = 
     
     
      
       
       
         e 
        
        
        
          c 
         
        
          t 
         
        
       
      
        − 
       
       
       
         e 
        
        
        
          − 
         
         
         
           c 
          
         
           t 
          
         
        
       
      
      
       
       
         e 
        
        
        
          c 
         
        
          t 
         
        
       
      
        + 
       
       
       
         e 
        
        
        
          − 
         
         
         
           c 
          
         
           t 
          
         
        
       
      
     
    
   
     h_t = o_t \circ d_t = o_t \circ \text{tanh}(c_t) = \frac{e^{c_t}-e^{-c_t}}{e^{c_t}+e^{-c_t}} 
    
   
 ht=ot∘dt=ot∘tanh(ct)=ect+e−ctect−e−ct

其中

      h 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    h_t \in \mathbb{R}^{d \times 1} 
   
  
ht∈Rd×1 为当前层隐藏状态， 
 
  
   
    
    
      o 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    o_t \in \mathbb{R}^{d \times 1} 
   
  
ot∈Rd×1为输出门的输出， 
 
  
   
    
    
      c 
     
    
      t 
     
    
   
     ∈ 
    
    
    
      R 
     
     
     
       d 
      
     
       × 
      
     
       1 
      
     
    
   
  
    c_t \in \mathbb{R}^{d \times 1} 
   
  
ct∈Rd×1为当前时刻状态单元。

import torch
    import torch.nn as nn
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.preprocessing import MinMaxScaler
    
    
    # 读取数据
    df = pd.read_csv('sh_data.csv')
    df = df.iloc[-30:,[2,5,3,4]]
    df1 = df[25:28].reset_index(drop=True)
    df2 = df1.reset_index(drop=True)        
    
    data = df[['open','close','high','low']].values.astype(float)# 标准化数据
    scaler = MinMaxScaler(feature_range=(0,1))
    data = scaler.fit_transform(data)# 创建时间序列数据defcreate_sequences(data, time_step=1):
        X, y =[],[]for i inrange(len(data)- time_step):
            X.append(data[i:(i + time_step)])
            y.append(data[i + time_step])return np.array(X), np.array(y)
    
    time_step =2# 时间步长设置为2天
    X, y = create_sequences(data, time_step)# 转换为PyTorch张量
    X = torch.FloatTensor(X)
    y = torch.FloatTensor(y)classLSTM(nn.Module):def__init__(self, input_size, hidden_layer_size, output_size):super(LSTM, self).__init__()
            self.hidden_layer_size = hidden_layer_size
            self.lstm = nn.LSTM(input_size, hidden_layer_size)
            self.linear = nn.Linear(hidden_layer_size, output_size)
            self.hidden_cell =(torch.zeros(1,1, self.hidden_layer_size),
            torch.zeros(1,1, self.hidden_layer_size))defforward(self, input_seq):
            lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq),1,-1), self.hidden_cell)
            predictions = self.linear(lstm_out.view(len(input_seq),-1))return predictions[-1]
    
    
    input_size =4# 输入特征数量
    hidden_layer_size =4
    output_size =4# 输出特征数量
    
    model = LSTM(input_size=input_size, hidden_layer_size=hidden_layer_size, output_size=output_size)
    loss_function = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)# epochs = 1# for i in range(epochs):#     for seq, labels in zip(X, y):#         optimizer.zero_grad()#         model.hidden_cell = (torch.zeros(1, 1, model.hidden_layer_size),#                              torch.zeros(1, 1, model.hidden_layer_size))#         y_pred = model(seq)#         single_loss = loss_function(y_pred, labels)#         single_loss.backward()#         optimizer.step()#     if i % 10 == 0:#         print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')# 只进行一次训练
    seq, labels = X[0], y[0]
    optimizer.zero_grad()
    model.hidden_cell =(torch.zeros(1,1, model.hidden_layer_size),
    torch.zeros(1,1, model.hidden_layer_size))
    y_pred = model(seq)
    single_loss = loss_function(y_pred, labels)
    single_loss.backward()
    optimizer.step()print(f'Single training loss: {single_loss.item():10.8f}')
    
    model.eval()# 预测下一天的四个特征with torch.no_grad():
        seq = torch.FloatTensor(data[-time_step:])
        model.hidden_cell =(torch.zeros(1,1, model.hidden_layer_size),
        torch.zeros(1,1, model.hidden_layer_size))
        next_day = model(seq).numpy()# 将预测结果逆归一化
    next_day = scaler.inverse_transform(next_day.reshape(-1, output_size))print(f'Predicted features for the next day: open={next_day[0][0]}, close={next_day[0][1]}, high={next_day[0][2]}, low={next_day[0][3]}')# 获取训练集的预测值
    train_predict =[]for seq in X:with torch.no_grad():
        model.hidden_cell =(torch.zeros(1,1, model.hidden_layer_size),
        torch.zeros(1,1, model.hidden_layer_size))
        train_predict.append(model(seq).numpy())# 将预测结果逆归一化
    train_predict = scaler.inverse_transform(np.array(train_predict).reshape(-1, output_size))
    actual = scaler.inverse_transform(data)# 绘制图形
    plt.figure(figsize=(10,6))for i, col inenumerate(['open','close','high','low']):
        plt.subplot(2,2, i+1)
        plt.plot(actual[:, i], label=f'Actual {col}')
        plt.plot(range(time_step, time_step +len(train_predict)), train_predict[:, i], label=f'Train Predict {col}')
        plt.legend()
    
    plt.tight_layout()
    plt.show()

标签： lstm 人工智能 rnn

本文转载自: https://blog.csdn.net/weixin_44555174/article/details/140758757
版权归原作者 --fancy 所有，如有侵权，请联系我们删除。

LSTM模型计算详解

LSTM

写在前面

模型结构

模型输入

遗忘门

输入门

输出门

当前输入单元状态

当前时刻单元状态

模型输出

简单的LSTM例子

小结

发表评论

“LSTM模型计算详解”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航