Deep Learning (15) - RNN & LSTM

Edit


这是吴恩达五门Deep Learning课程的最后一门。开篇第一周的课程就是著名的RNN和LSTM。这两个模型都是为了解决序列模型的问题。典型的序列模型问题如下:

挨个翻译一下:

  • 语音识别
  • 音乐发生器
  • 评论识别
  • DNA序列识别
  • 机器翻译
  • 视频动作识别
  • 名字识别

这些问题都有一个共同点,输入或输出都有可能是一个时间序列。例如语音识别,输入就是一个连续的音频在时间上的序列,视频动作识别也是类似的。音乐发生器没有输入,输出是一段音频,也就是声音在时间上的序列。其他几个输入是一段包含语义的文字,输出是识别的结果。通常包含语义的文字也是时间上的序列,文字只有形成时间上的序列才会有含义。

为什么要有序列模型

序列模型通常包含时间上的连续数据。当然,我们仍然可以采用经典的神经网络模型。那就要求输入包含所有时间上的切片以及不同的步长。
经典模型输出通常是同等维度的输出,而序列模型问题,输出有可能是不同的长度。例如语音识别,同样长度的音频,识别输出文本可能有长有短。如果采用经典模型,则要求输出补偿成相等长度。例如,估计一个最大长度,然后将不足的长度以0补足。
因为序列模型问题,时间上较后的输入受到较前的输入的影响,而经典神经网络没有这部分信息,理论上通过统计学角度,仍然能够达到最终的预测结果,但计算量可想而知,实际也是没有使用价值。这也就是引入序列模型算法的必要性。RNN和LSTM就是两个善于解决序列模型问题的较为常用的算法。

RNN

注:本章所有的例子都是基于解决NLP(Native Language Processing自然语言处理)问题进行阐述的。

基本模型

问题:
检测出文字中的名字字符串
输入:
Harry Potter and Hermione Granger invented a new spell.
首先有一个字典向量,通常这个字典包含10-15万个单词。每一个单词是,例如Harry是,Potter是等等。每一个x是一个one-hot向量。也就是一个与字典向量同维度的列向量,且对应单词位置取1,其他位置取0。
输出是:

也就是是名字的单词输出为1,否则为0。

Forward Propagation



是为了保证公式的完整性,通常用零向量来做初始值。

通常令,则上式转化为


通常写作,上式可简写为
是输入状态的个数,是输出状态的个数,在这个例子中。也有不等的例子。

其他类型的RNN

Many-to-one (Sentiment classification)


One-to-many (Music generation)


Many-to-many ( Name entity recognition)这是我们刚才用的例子

Many-to-many ( Machine translation)

Backward Propagation

这里的反向传播又称为Backward Propagation through time。


上图中蓝色的箭头是正向传播,红色的箭头是反向传播

RNN应用举例: Language Modelling & Sequence Sampling

Andrew花了两个视频介绍了Language Modelling和Sequence Sampling。后者基于前者来生成语句。两者均基于由RNN训练出来的模型。一个基本的RNN网络如下,在应用之前,我们需要输入大量的文本进行训练

Language Modelling

当训练完毕后,我们可以用下面的网络结构进行modelling。


与训练时的网络不同的是,这里没有样本输入,即:

经网络输出的是一个softmax分类器的概率向量,其长度等于字典向量的长度。每个分量代表那个单词在这个位置出现的概率。同时能计算出单个softmax的交叉熵损失函数:
整个网络输出这一串文字()的概率就是:

Sequence Sampling

当得到每个时刻t的softmax输出时,运用np.random.choice即可产生一个此时刻的单词。当产生句子终止符<EOS>的或者产生足够数量的单词的时候,sampling终止。
如果training set使用的是各种News,则产生的sequence读起来就和新闻一样。如果training set使用的是莎翁的作品,产生的sequence就会像出自莎翁的戏剧。

Character-level language model

这种model的字典向量从所有的词,变成字母+标点符号。字典向量维度大大降低。RNN模型将用来预估每一个字母后面出现字母的概率。
pros

  • 不会出现<UKN>(未知单词),因为全是字母和标点符号。
    cons
  • RNN的序列很长,所以计算量较大。
  • 句子通常需要单词间语义的关联,如果以字母为预测单位,则损失了这一有用的信息

Vanishing Gradient

当序列问题的序列足够长的时候,实际也就是RNN网络的层数很多的时候,同标准神经网络一样,会遇到梯度爆炸(exploding gradient)和梯度消失(vanishing gradient)的问题。但是序列问题,又恰恰需要解决这一问题。例如这句话The cat, which already ate …, was full.。主句谓语was需要按照主语是否复数来定。如果中间的从句可能很长,且发生梯度消失问题,则后续的输出不太会受到序列前期输入的影响。GRU和LSTM的提出,都是为了解决梯度消失的问题。

梯度爆炸通常比梯度消失容易解决,只要为梯度值设置一个上限即可,就不会导致梯度值变成NaN。但梯度消失通常不太好解决。

GRU





  • 是memory cell,在GRU中记录激活值,即
  • 是t时刻c的更新备选
  • 是更新门,决定是否用更新
  • 是相关门,代表的相关性
  • 所有的W,b都是learnable parameter
  • 代表element-wise相乘

图中是simplified版本的GRU,没有参数

LSTM






  • :更新门
  • :遗忘门
  • : 输出门

LSTM通常有很多变种,一个比较著名的变种就是在计算3个时,加入了的影响。这称为peephole connection(窥视孔连接)

在深度学习的发展中,LSTM很早就被提出,GRU反而相对较晚。GRU的特点是运用门数较少(2个),计算较快,适合搭建更大型的序列模型。LSTM有3个门控制,灵活性更大,但通常计算量也较大。

Bidirectional RNN

先举个name entity recognition的例子:
He said, “Teddy bears are on sale!”
He said, “Teddy Roosevelt was a great President!”
这两个句子,第一句中的Teddy不是名字,第二句中的Teddy是名字。


当这个网络里输入到Teddy了,怎么判断是不是一个名字的一部分呢?如果只通过前面输入的部分是无法判断的。必须要借助后面的语句才可以正确的判断。所以就有了双向循环网络的必要了。
最终网络结构如下图:


中间紫色的路径是标准的RNN,绿色的路径是反向路径。紫色和绿色的模块都可以采用GRU或者LSTM单元。通常在NLP的应用中,采用LSTM的双向网络是比较常见的。

Deep RNN

对于RNN来说一般很少像CNN那样堆叠很多层,3层对RNN来说就已经非常庞大了。如果需要堆叠多层,一般会删去水平连接。 每个RNN单元可以是标准RNN单元,也可以是GRU单元、LSTM单元甚至BRNN单元,可以自由设置。


如上图所示,是标准的RNN。在RNN之上可以继续添加深层神经网络来输出,而不是直接从RNN中输出。

参考文献

GRU:

  • Cho et al., 2014. On the properties of neural machine translation: Encoder-decoder approaches
  • Chung et al., 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

LSTM: Hochreiter & Schmidhuber 1997. Long short-term memory

%23%20Deep%20Learning%20%2815%29%20-%20RNN%20%26%20LSTM%0A@%28myblog%29%5Bdeep%20learning%2C%20machine%20learning%5D%0A%u8FD9%u662F%u5434%u6069%u8FBE%u4E94%u95E8Deep%20Learning%u8BFE%u7A0B%u7684%u6700%u540E%u4E00%u95E8%u3002%u5F00%u7BC7%u7B2C%u4E00%u5468%u7684%u8BFE%u7A0B%u5C31%u662F%u8457%u540D%u7684RNN%u548CLSTM%u3002%u8FD9%u4E24%u4E2A%u6A21%u578B%u90FD%u662F%u4E3A%u4E86%u89E3%u51B3%u5E8F%u5217%u6A21%u578B%u7684%u95EE%u9898%u3002%u5178%u578B%u7684%u5E8F%u5217%u6A21%u578B%u95EE%u9898%u5982%u4E0B%uFF1A%0A%21%5BAlt%20text%5D%28./1538908031365.png%29%0A%0A%u6328%u4E2A%u7FFB%u8BD1%u4E00%u4E0B%uFF1A%0A-%20%u8BED%u97F3%u8BC6%u522B%0A-%20%u97F3%u4E50%u53D1%u751F%u5668%0A-%20%u8BC4%u8BBA%u8BC6%u522B%0A-%20DNA%u5E8F%u5217%u8BC6%u522B%0A-%20%u673A%u5668%u7FFB%u8BD1%0A-%20%u89C6%u9891%u52A8%u4F5C%u8BC6%u522B%0A-%20%u540D%u5B57%u8BC6%u522B%0A%0A%u8FD9%u4E9B%u95EE%u9898%u90FD%u6709%u4E00%u4E2A%u5171%u540C%u70B9%uFF0C%u8F93%u5165%u6216%u8F93%u51FA%u90FD%u6709%u53EF%u80FD%u662F%u4E00%u4E2A%u65F6%u95F4%u5E8F%u5217%u3002%u4F8B%u5982%u8BED%u97F3%u8BC6%u522B%uFF0C%u8F93%u5165%u5C31%u662F%u4E00%u4E2A%u8FDE%u7EED%u7684%u97F3%u9891%u5728%u65F6%u95F4%u4E0A%u7684%u5E8F%u5217%uFF0C%u89C6%u9891%u52A8%u4F5C%u8BC6%u522B%u4E5F%u662F%u7C7B%u4F3C%u7684%u3002%u97F3%u4E50%u53D1%u751F%u5668%u6CA1%u6709%u8F93%u5165%uFF0C%u8F93%u51FA%u662F%u4E00%u6BB5%u97F3%u9891%uFF0C%u4E5F%u5C31%u662F%u58F0%u97F3%u5728%u65F6%u95F4%u4E0A%u7684%u5E8F%u5217%u3002%u5176%u4ED6%u51E0%u4E2A%u8F93%u5165%u662F%u4E00%u6BB5%u5305%u542B%u8BED%u4E49%u7684%u6587%u5B57%uFF0C%u8F93%u51FA%u662F%u8BC6%u522B%u7684%u7ED3%u679C%u3002%u901A%u5E38%u5305%u542B%u8BED%u4E49%u7684%u6587%u5B57%u4E5F%u662F%u65F6%u95F4%u4E0A%u7684%u5E8F%u5217%uFF0C%u6587%u5B57%u53EA%u6709%u5F62%u6210%u65F6%u95F4%u4E0A%u7684%u5E8F%u5217%u624D%u4F1A%u6709%u542B%u4E49%u3002%0A%0A%23%23%20%u4E3A%u4EC0%u4E48%u8981%u6709%u5E8F%u5217%u6A21%u578B%0A%u5E8F%u5217%u6A21%u578B%u901A%u5E38%u5305%u542B%u65F6%u95F4%u4E0A%u7684%u8FDE%u7EED%u6570%u636E%u3002%u5F53%u7136%uFF0C%u6211%u4EEC%u4ECD%u7136%u53EF%u4EE5%u91C7%u7528%u7ECF%u5178%u7684%u795E%u7ECF%u7F51%u7EDC%u6A21%u578B%u3002%u90A3%u5C31%u8981%u6C42%u8F93%u5165%u5305%u542B%u6240%u6709%u65F6%u95F4%u4E0A%u7684%u5207%u7247%u4EE5%u53CA%u4E0D%u540C%u7684%u6B65%u957F%u3002%0A%u7ECF%u5178%u6A21%u578B%u8F93%u51FA%u901A%u5E38%u662F%u540C%u7B49%u7EF4%u5EA6%u7684%u8F93%u51FA%uFF0C%u800C%u5E8F%u5217%u6A21%u578B%u95EE%u9898%uFF0C%u8F93%u51FA%u6709%u53EF%u80FD%u662F%u4E0D%u540C%u7684%u957F%u5EA6%u3002%u4F8B%u5982%u8BED%u97F3%u8BC6%u522B%uFF0C%u540C%u6837%u957F%u5EA6%u7684%u97F3%u9891%uFF0C%u8BC6%u522B%u8F93%u51FA%u6587%u672C%u53EF%u80FD%u6709%u957F%u6709%u77ED%u3002%u5982%u679C%u91C7%u7528%u7ECF%u5178%u6A21%u578B%uFF0C%u5219%u8981%u6C42%u8F93%u51FA%u8865%u507F%u6210%u76F8%u7B49%u957F%u5EA6%u3002%u4F8B%u5982%uFF0C%u4F30%u8BA1%u4E00%u4E2A%u6700%u5927%u957F%u5EA6%uFF0C%u7136%u540E%u5C06%u4E0D%u8DB3%u7684%u957F%u5EA6%u4EE50%u8865%u8DB3%u3002%0A%u56E0%u4E3A%u5E8F%u5217%u6A21%u578B%u95EE%u9898%uFF0C%u65F6%u95F4%u4E0A%u8F83%u540E%u7684%u8F93%u5165%u53D7%u5230%u8F83%u524D%u7684%u8F93%u5165%u7684%u5F71%u54CD%uFF0C%u800C%u7ECF%u5178%u795E%u7ECF%u7F51%u7EDC%u6CA1%u6709%u8FD9%u90E8%u5206%u4FE1%u606F%uFF0C%u7406%u8BBA%u4E0A%u901A%u8FC7%u7EDF%u8BA1%u5B66%u89D2%u5EA6%uFF0C%u4ECD%u7136%u80FD%u591F%u8FBE%u5230%u6700%u7EC8%u7684%u9884%u6D4B%u7ED3%u679C%uFF0C%u4F46%u8BA1%u7B97%u91CF%u53EF%u60F3%u800C%u77E5%uFF0C%u5B9E%u9645%u4E5F%u662F%u6CA1%u6709%u4F7F%u7528%u4EF7%u503C%u3002%u8FD9%u4E5F%u5C31%u662F%u5F15%u5165%u5E8F%u5217%u6A21%u578B%u7B97%u6CD5%u7684%u5FC5%u8981%u6027%u3002RNN%u548CLSTM%u5C31%u662F%u4E24%u4E2A%u5584%u4E8E%u89E3%u51B3%u5E8F%u5217%u6A21%u578B%u95EE%u9898%u7684%u8F83%u4E3A%u5E38%u7528%u7684%u7B97%u6CD5%u3002%0A%0A%23%23%20RNN%0A**%u6CE8%uFF1A**%u672C%u7AE0%u6240%u6709%u7684%u4F8B%u5B50%u90FD%u662F%u57FA%u4E8E%u89E3%u51B3NLP%28Native%20Language%20Processing%u81EA%u7136%u8BED%u8A00%u5904%u7406%29%u95EE%u9898%u8FDB%u884C%u9610%u8FF0%u7684%u3002%0A%23%23%23%20%u57FA%u672C%u6A21%u578B%0A%u95EE%u9898%uFF1A%0A%u68C0%u6D4B%u51FA%u6587%u5B57%u4E2D%u7684%u540D%u5B57%u5B57%u7B26%u4E32%0A%u8F93%u5165%uFF1A%0AHarry%20Potter%20and%20Hermione%20Granger%20invented%20a%20new%20spell.%0A%u9996%u5148%u6709%u4E00%u4E2A%u5B57%u5178%u5411%u91CF%uFF0C%u901A%u5E38%u8FD9%u4E2A%u5B57%u5178%u5305%u542B10-15%u4E07%u4E2A%u5355%u8BCD%u3002%u6BCF%u4E00%u4E2A%u5355%u8BCD%u662F%24x%5E%7B%3Ci%3E%7D%24%uFF0C%u4F8B%u5982Harry%u662F%24x%5E%7B%3C1%3E%7D%24%uFF0CPotter%u662F%24x%5E%7B%3C2%3E%7D%24%u7B49%u7B49%u3002%u6BCF%u4E00%u4E2Ax%u662F%u4E00%u4E2Aone-hot%u5411%u91CF%u3002%u4E5F%u5C31%u662F%u4E00%u4E2A%u4E0E%u5B57%u5178%u5411%u91CF%u540C%u7EF4%u5EA6%u7684%u5217%u5411%u91CF%uFF0C%u4E14%u5BF9%u5E94%u5355%u8BCD%u4F4D%u7F6E%u53D61%uFF0C%u5176%u4ED6%u4F4D%u7F6E%u53D60%u3002%0A%u8F93%u51FA%u662F%uFF1A%0A%24y%20%3D%20%5B%5Cbegin%7Bmatrix%7D1%20%26%201%20%26%200%20%26%201%20%26%201%20%26%200%20%26%200%20%26%200%20%260%5Cend%7Bmatrix%7D%5D%24%0A%u4E5F%u5C31%u662F%u662F%u540D%u5B57%u7684%u5355%u8BCD%u8F93%u51FA%u4E3A1%uFF0C%u5426%u5219%u4E3A0%u3002%0A%0A%23%23%23%20Forward%20Propagation%0A%21%5BAlt%20text%7C600x0%5D%28./1538919781321.png%29%0A%24a%5E%7B%3C0%3E%7D%20%3D%20%5Coverrightarrow0%24%0A%24a%5E%7B%3C0%3E%7D%24%u662F%u4E3A%u4E86%u4FDD%u8BC1%u516C%u5F0F%u7684%u5B8C%u6574%u6027%uFF0C%u901A%u5E38%u7528%u96F6%u5411%u91CF%u6765%u505A%u521D%u59CB%u503C%u3002%0A%24a%5E%7B%3Ct%3E%7D%20%3D%20g%28W_%7Baa%7Da%5E%7B%3Ct-1%3E%7D%20+%20W_%7Bax%7Dx%5E%7B%3Ct%3E%7D%20+%20b_a%29%24%0A%u901A%u5E38%u4EE4%24W_a%20%3D%20%5Cbegin%7Bbmatrix%7D%20W_%7Baa%7D%20%26%20W%7Bax%7D%5Cend%7Bbmatrix%7D%24%uFF0C%u5219%u4E0A%u5F0F%u8F6C%u5316%u4E3A%0A%24a%5E%7B%3Ct%3E%7D%20%3D%20g%28W_%7Ba%7D%20%5Cbegin%7Bbmatrix%7D%20a%5E%7B%3Ct-1%3E%7D%20%5C%5C%20x%5E%7B%3Ct%3E%7D%20%5Cend%7Bbmatrix%7D+%20b_a%29%24%0A%24%5Chat%20y%5E%7B%3Ct%3E%7D%20%3D%20g%28W_%7Bya%7Da%5E%7B%3Ct%3E%7D%20+b_y%29%24%0A%24W_%7Bya%7D%24%u901A%u5E38%u5199%u4F5C%24W_y%24%uFF0C%u4E0A%u5F0F%u53EF%u7B80%u5199%u4E3A%24%5Chat%20y%5E%7B%3Ct%3E%7D%20%3D%20g%28W_%7By%7Da%5E%7B%3Ct%3E%7D%20+b_y%29%24%0A%24T_x%24%u662F%u8F93%u5165%u72B6%u6001%u7684%u4E2A%u6570%uFF0C%24T_y%24%u662F%u8F93%u51FA%u72B6%u6001%u7684%u4E2A%u6570%uFF0C%u5728%u8FD9%u4E2A%u4F8B%u5B50%u4E2D%24T_y%20%3D%20T_x%24%u3002%u4E5F%u6709%u4E0D%u7B49%u7684%u4F8B%u5B50%u3002%0A%0A%23%23%23%23%20%u5176%u4ED6%u7C7B%u578B%u7684RNN%0AMany-to-one%20%28Sentiment%20classification%29%0A%21%5BAlt%20text%7C300x0%5D%28./1538953445104.png%29%0AOne-to-many%20%28Music%20generation%29%0A%21%5BAlt%20text%7C320x0%5D%28./1538953501761.png%29%0AMany-to-many%20%28%24T_x%20%3D%20T_y%24%20Name%20entity%20recognition%29%u8FD9%u662F%u6211%u4EEC%u521A%u624D%u7528%u7684%u4F8B%u5B50%0A%21%5BAlt%20text%7C300x0%5D%28./1538953612909.png%29%0A%0AMany-to-many%20%28%24T_y%20%5Cneq%20T_x%24%20Machine%20translation%29%0A%21%5BAlt%20text%7C450x0%5D%28./1538953571706.png%29%0A%0A%0A%23%23%23%20Backward%20Propagation%0A%u8FD9%u91CC%u7684%u53CD%u5411%u4F20%u64AD%u53C8%u79F0%u4E3ABackward%20Propagation%20through%20time%u3002%0A%21%5BAlt%20text%7C700x0%5D%28./1538952837860.png%29%0A%u4E0A%u56FE%u4E2D%u84DD%u8272%u7684%u7BAD%u5934%u662F%u6B63%u5411%u4F20%u64AD%uFF0C%u7EA2%u8272%u7684%u7BAD%u5934%u662F%u53CD%u5411%u4F20%u64AD%0A%24%5Cmathcal%20L%5E%7B%3Ct%3E%7D%28%5Chat%20y%5E%7B%3Ct%3E%7D%2C%20y%5E%7B%3Ct%3E%7D%29%20%3D%20-y%5E%7B%3Ct%3E%7Dlog%20%5Chat%20y%5E%7B%3Ct%3E%7D%20-%20%281-y%5E%7B%3Ct%3E%7D%29log%281-%5Chat%20y%5E%7B%3Ct%3E%7D%29%24%0A%24%5Cmathcal%20L%28%5Chat%20y%2C%20y%29%20%3D%20%5CSigma_%7Bt%3D1%7D%5E%7BT_y%7D%20%5Cmathcal%20L%5E%7B%3Ct%3E%7D%20%28%5Chat%20y%5E%7B%3Ct%3E%7D%2C%20y%5E%7B%3Ct%3E%7D%29%24%0A%0A%23%23%23%20RNN%u5E94%u7528%u4E3E%u4F8B%3A%20Language%20Modelling%20%26%20Sequence%20Sampling%0AAndrew%u82B1%u4E86%u4E24%u4E2A%u89C6%u9891%u4ECB%u7ECD%u4E86Language%20Modelling%u548CSequence%20Sampling%u3002%u540E%u8005%u57FA%u4E8E%u524D%u8005%u6765%u751F%u6210%u8BED%u53E5%u3002%u4E24%u8005%u5747%u57FA%u4E8E%u7531RNN%u8BAD%u7EC3%u51FA%u6765%u7684%u6A21%u578B%u3002%u4E00%u4E2A%u57FA%u672C%u7684RNN%u7F51%u7EDC%u5982%u4E0B%uFF0C%u5728%u5E94%u7528%u4E4B%u524D%uFF0C%u6211%u4EEC%u9700%u8981%u8F93%u5165%u5927%u91CF%u7684%u6587%u672C%u8FDB%u884C%u8BAD%u7EC3%0A%21%5BAlt%20text%7C700x0%5D%28./1539138095181.png%29%0A%0A%23%23%23%23%20Language%20Modelling%0A%u5F53%u8BAD%u7EC3%u5B8C%u6BD5%u540E%uFF0C%u6211%u4EEC%u53EF%u4EE5%u7528%u4E0B%u9762%u7684%u7F51%u7EDC%u7ED3%u6784%u8FDB%u884Cmodelling%u3002%0A%21%5BAlt%20text%5D%28./1539137888586.png%29%0A%u4E0E%u8BAD%u7EC3%u65F6%u7684%u7F51%u7EDC%u4E0D%u540C%u7684%u662F%uFF0C%u8FD9%u91CC%u6CA1%u6709%u6837%u672C%u8F93%u5165%uFF0C%u5373%uFF1A%0A%24%24%5Cbegin%7Bcases%7D%20x%5E%7B%3Ct%3E%7D%20%3D%20%5Cvec%200%20%26%20t%3D0%20%5C%5C%20x%5E%7B%3Ct%3E%7D%20%3D%20y%5E%7B%3Ct-1%3E%7D%20%26%20t%20%5Cge%201%5Cend%7Bcases%7D%24%24%0A%u7ECF%u7F51%u7EDC%u8F93%u51FA%u7684%u662F%u4E00%u4E2Asoftmax%u5206%u7C7B%u5668%u7684%u6982%u7387%u5411%u91CF%uFF0C%u5176%u957F%u5EA6%u7B49%u4E8E%u5B57%u5178%u5411%u91CF%u7684%u957F%u5EA6%u3002%u6BCF%u4E2A%u5206%u91CF%u4EE3%u8868%u90A3%u4E2A%u5355%u8BCD%u5728%u8FD9%u4E2A%u4F4D%u7F6E%u51FA%u73B0%u7684%u6982%u7387%u3002%u540C%u65F6%u80FD%u8BA1%u7B97%u51FA%u5355%u4E2Asoftmax%u7684%u4EA4%u53C9%u71B5%u635F%u5931%u51FD%u6570%uFF1A%0A%24%24%5Cmathscr%20L%5E%7B%3Ct%3E%7D%20%28%5Chat%20y%5E%7B%3Ct%3E%7D%2C%20y%5E%7B%3Ct%3E%7D%29%3D%20%5CSigma_i%20y_i%5E%7B%3Ct%3E%7Dlog%5Chat%20y_i%5E%7B%3Ct%3E%7D%24%24%0A%u6574%u4E2A%u7F51%u7EDC%u8F93%u51FA%u8FD9%u4E00%u4E32%u6587%u5B57%28%24%5Chat%20y%5E%7B%3C0%3E%7D%2C%20%5Chat%20y%5E%7B%3C1%3E%7D%2C%20%5Cdots%24%29%u7684%u6982%u7387%u5C31%u662F%uFF1A%0A%24%24%5Cmathscr%20L%20%3D%20%5CSigma_t%20%5Cmathscr%20L%5E%7B%3Ct%3E%7D%28%5Chat%20y%5E%7B%3Ct%3E%7D%2C%20y%5E%7B%3Ct%3E%7D%29%24%24%0A%0A%23%23%23%23%20Sequence%20Sampling%0A%u5F53%u5F97%u5230%u6BCF%u4E2A%u65F6%u523Bt%u7684softmax%u8F93%u51FA%u65F6%uFF0C%u8FD0%u7528np.random.choice%u5373%u53EF%u4EA7%u751F%u4E00%u4E2A%u6B64%u65F6%u523B%u7684%u5355%u8BCD%24y%5E%7B%3Ct%3E%7D%24%u3002%u5F53%u4EA7%u751F%u53E5%u5B50%u7EC8%u6B62%u7B26%60%3CEOS%3E%60%u7684%u6216%u8005%u4EA7%u751F%u8DB3%u591F%u6570%u91CF%u7684%u5355%u8BCD%u7684%u65F6%u5019%uFF0Csampling%u7EC8%u6B62%u3002%0A%u5982%u679Ctraining%20set%u4F7F%u7528%u7684%u662F%u5404%u79CDNews%uFF0C%u5219%u4EA7%u751F%u7684sequence%u8BFB%u8D77%u6765%u5C31%u548C%u65B0%u95FB%u4E00%u6837%u3002%u5982%u679Ctraining%20set%u4F7F%u7528%u7684%u662F%u838E%u7FC1%u7684%u4F5C%u54C1%uFF0C%u4EA7%u751F%u7684sequence%u5C31%u4F1A%u50CF%u51FA%u81EA%u838E%u7FC1%u7684%u620F%u5267%u3002%0A%0A%23%23%23%23%20Character-level%20language%20model%0A%u8FD9%u79CDmodel%u7684%u5B57%u5178%u5411%u91CF%u4ECE%u6240%u6709%u7684%u8BCD%uFF0C%u53D8%u6210%u5B57%u6BCD+%u6807%u70B9%u7B26%u53F7%u3002%u5B57%u5178%u5411%u91CF%u7EF4%u5EA6%u5927%u5927%u964D%u4F4E%u3002RNN%u6A21%u578B%u5C06%u7528%u6765%u9884%u4F30%u6BCF%u4E00%u4E2A%u5B57%u6BCD%u540E%u9762%u51FA%u73B0%u5B57%u6BCD%u7684%u6982%u7387%u3002%0A**pros**%0A-%20%u4E0D%u4F1A%u51FA%u73B0%60%3CUKN%3E%60%28%u672A%u77E5%u5355%u8BCD%29%uFF0C%u56E0%u4E3A%u5168%u662F%u5B57%u6BCD%u548C%u6807%u70B9%u7B26%u53F7%u3002%0A**cons**%0A-%20RNN%u7684%u5E8F%u5217%u5F88%u957F%uFF0C%u6240%u4EE5%u8BA1%u7B97%u91CF%u8F83%u5927%u3002%0A-%20%u53E5%u5B50%u901A%u5E38%u9700%u8981%u5355%u8BCD%u95F4%u8BED%u4E49%u7684%u5173%u8054%uFF0C%u5982%u679C%u4EE5%u5B57%u6BCD%u4E3A%u9884%u6D4B%u5355%u4F4D%uFF0C%u5219%u635F%u5931%u4E86%u8FD9%u4E00%u6709%u7528%u7684%u4FE1%u606F%0A%0A%23%23Vanishing%20Gradient%0A%u5F53%u5E8F%u5217%u95EE%u9898%u7684%u5E8F%u5217%u8DB3%u591F%u957F%u7684%u65F6%u5019%uFF0C%u5B9E%u9645%u4E5F%u5C31%u662FRNN%u7F51%u7EDC%u7684%u5C42%u6570%u5F88%u591A%u7684%u65F6%u5019%uFF0C%u540C%u6807%u51C6%u795E%u7ECF%u7F51%u7EDC%u4E00%u6837%uFF0C%u4F1A%u9047%u5230%u68AF%u5EA6%u7206%u70B8%28exploding%20gradient%29%u548C%u68AF%u5EA6%u6D88%u5931%28vanishing%20gradient%29%u7684%u95EE%u9898%u3002%u4F46%u662F%u5E8F%u5217%u95EE%u9898%uFF0C%u53C8%u6070%u6070%u9700%u8981%u89E3%u51B3%u8FD9%u4E00%u95EE%u9898%u3002%u4F8B%u5982%u8FD9%u53E5%u8BDD%60The%20cat%2C%20which%20already%20ate%20%u2026%2C%20was%20full.%60%u3002%u4E3B%u53E5%u8C13%u8BEDwas%u9700%u8981%u6309%u7167%u4E3B%u8BED%u662F%u5426%u590D%u6570%u6765%u5B9A%u3002%u5982%u679C%u4E2D%u95F4%u7684%u4ECE%u53E5%u53EF%u80FD%u5F88%u957F%uFF0C%u4E14%u53D1%u751F%u68AF%u5EA6%u6D88%u5931%u95EE%u9898%uFF0C%u5219%u540E%u7EED%24%5Chat%20y%5E%7B%3Ct%3E%7D%24%u7684%u8F93%u51FA%u4E0D%u592A%u4F1A%u53D7%u5230%u5E8F%u5217%u524D%u671F%u8F93%u5165%24x%5E%7B%3Ct%3E%7D%24%u7684%u5F71%u54CD%u3002GRU%u548CLSTM%u7684%u63D0%u51FA%uFF0C%u90FD%u662F%u4E3A%u4E86%u89E3%u51B3%u68AF%u5EA6%u6D88%u5931%u7684%u95EE%u9898%u3002%0A%3E%20%u68AF%u5EA6%u7206%u70B8%u901A%u5E38%u6BD4%u68AF%u5EA6%u6D88%u5931%u5BB9%u6613%u89E3%u51B3%uFF0C%u53EA%u8981%u4E3A%u68AF%u5EA6%u503C%u8BBE%u7F6E%u4E00%u4E2A%u4E0A%u9650%u5373%u53EF%uFF0C%u5C31%u4E0D%u4F1A%u5BFC%u81F4%u68AF%u5EA6%u503C%u53D8%u6210NaN%u3002%u4F46%u68AF%u5EA6%u6D88%u5931%u901A%u5E38%u4E0D%u592A%u597D%u89E3%u51B3%u3002%0A%0A%23%23%23%20GRU%0A%21%5BAlt%20text%7C400x0%5D%28./1539162264045.png%29%0A%24%5Ctilde%20c%5E%7B%3Ct%3E%7D%20%3D%20tanh%28W_c%5B%5CGamma_r%20%5Cast%20c%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_c%29%24%0A%24%5CGamma_u%20%3D%20%5Csigma%20%28W_u%5Bc%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_u%29%24%0A%24%5CGamma_r%20%3D%20%5Csigma%28W_r%5Bc%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_r%29%24%0A%24c%5E%7B%3Ct%3E%7D%20%3D%20%5CGamma_u%20%5Cast%20%5Ctilde%20c%5E%7B%3Ct%3E%7D%20+%20%281-%5CGamma_u%29%20%5Cast%20c%5E%7B%3Ct-1%3E%7D%24%0A-%20%24c%5E%7B%3Ct%3E%7D%24%u662Fmemory%20cell%uFF0C%u5728GRU%u4E2D%u8BB0%u5F55%u6FC0%u6D3B%u503C%uFF0C%u5373%24c%5E%7B%3Ct%3E%7D%20%3D%20a%5E%7B%3Ct%3E%7D%24%0A-%20%24%5Ctilde%20c%5E%7B%3Ct%3E%7D%24%u662Ft%u65F6%u523Bc%u7684%u66F4%u65B0%u5907%u9009%0A-%20%24%5CGamma_u%24%u662F%u66F4%u65B0%u95E8%uFF0C%u51B3%u5B9A%u662F%u5426%u7528%24%5Ctilde%20c%5E%7B%3Ct%3E%7D%24%u66F4%u65B0%24c%5E%7B%3Ct%3E%7D%24%0A-%20%24%5CGamma_r%24%u662F%u76F8%u5173%u95E8%uFF0C%u4EE3%u8868%24c%5E%7B%3Ct-1%3E%7D%24%u548C%24%5Ctilde%20c%5E%7B%3Ct%3E%7D%24%u7684%u76F8%u5173%u6027%0A-%20%u6240%u6709%u7684W%uFF0Cb%u90FD%u662Flearnable%20parameter%0A-%20%24%5Cast%24%u4EE3%u8868element-wise%u76F8%u4E58%0A%0A%u56FE%u4E2D%u662Fsimplified%u7248%u672C%u7684GRU%uFF0C%u6CA1%u6709%24%5CGamma_r%24%u53C2%u6570%0A%0A%23%23%23%20LSTM%0A%21%5BAlt%20text%5D%28./1539164098789.png%29%0A%0A%24%5Ctilde%20c%5E%7B%3Ct%3E%7D%20%3D%20tanh%28W_c%5B%5CGamma_r%20%5Cast%20c%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_c%29%24%0A%24%5CGamma_u%20%3D%20%5Csigma%20%28W_u%5Ba%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_u%29%24%0A%24%5CGamma_f%20%3D%20%5Csigma%20%28W_f%5Ba%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_f%29%24%0A%24%5CGamma_o%20%3D%20%5Csigma%20%28W_o%5Ba%5E%7B%3Ct-1%3E%7D%2C%20x%5E%7B%3Ct%3E%7D%5D%20+%20b_o%29%24%0A%24c%5E%7B%3Ct%3E%7D%20%3D%20%5CGamma_u%20%5Cast%20%5Ctilde%20c%5E%7B%3Ct%3E%7D%20+%20%5CGamma_f%20%5Cast%20c%5E%7B%3Ct-1%3E%7D%24%0A%24a%5E%7B%3Ct%3E%7D%20%3D%20%5CGamma_o%20%5Cast%20tanh%28c%5E%7B%3Ct%3E%7D%29%24%0A%0A-%20%24%5CGamma_u%24%uFF1A%u66F4%u65B0%u95E8%0A-%20%24%5CGamma_f%24%uFF1A%u9057%u5FD8%u95E8%0A-%20%24%5CGamma_o%24%uFF1A%20%u8F93%u51FA%u95E8%0A%0A%3ELSTM%u901A%u5E38%u6709%u5F88%u591A%u53D8%u79CD%uFF0C%u4E00%u4E2A%u6BD4%u8F83%u8457%u540D%u7684%u53D8%u79CD%u5C31%u662F%u5728%u8BA1%u7B973%u4E2A%24%5CGamma%24%u65F6%uFF0C%u52A0%u5165%u4E86%24c%5E%7B%3Ct-1%3E%7D%24%u7684%u5F71%u54CD%u3002%u8FD9%u79F0%u4E3Apeephole%20connection%28%u7AA5%u89C6%u5B54%u8FDE%u63A5%29%0A%0A%3E%u5728%u6DF1%u5EA6%u5B66%u4E60%u7684%u53D1%u5C55%u4E2D%uFF0CLSTM%u5F88%u65E9%u5C31%u88AB%u63D0%u51FA%uFF0CGRU%u53CD%u800C%u76F8%u5BF9%u8F83%u665A%u3002GRU%u7684%u7279%u70B9%u662F%u8FD0%u7528%u95E8%u6570%u8F83%u5C11%282%u4E2A%29%uFF0C%u8BA1%u7B97%u8F83%u5FEB%uFF0C%u9002%u5408%u642D%u5EFA%u66F4%u5927%u578B%u7684%u5E8F%u5217%u6A21%u578B%u3002LSTM%u67093%u4E2A%u95E8%u63A7%u5236%uFF0C%u7075%u6D3B%u6027%u66F4%u5927%uFF0C%u4F46%u901A%u5E38%u8BA1%u7B97%u91CF%u4E5F%u8F83%u5927%u3002%0A%0A%23%23%20Bidirectional%20RNN%0A%u5148%u4E3E%u4E2Aname%20entity%20recognition%u7684%u4F8B%u5B50%uFF1A%0AHe%20said%2C%20%u201CTeddy%20bears%20are%20on%20sale%21%u201D%0AHe%20said%2C%20%u201CTeddy%20Roosevelt%20was%20a%20great%20President%21%u201D%0A%u8FD9%u4E24%u4E2A%u53E5%u5B50%uFF0C%u7B2C%u4E00%u53E5%u4E2D%u7684Teddy%u4E0D%u662F%u540D%u5B57%uFF0C%u7B2C%u4E8C%u53E5%u4E2D%u7684Teddy%u662F%u540D%u5B57%u3002%0A%21%5BAlt%20text%7C700x0%5D%28./1539165068192.png%29%0A%u5F53%u8FD9%u4E2A%u7F51%u7EDC%u91CC%u8F93%u5165%u5230Teddy%u4E86%uFF0C%u600E%u4E48%u5224%u65AD%u662F%u4E0D%u662F%u4E00%u4E2A%u540D%u5B57%u7684%u4E00%u90E8%u5206%u5462%uFF1F%u5982%u679C%u53EA%u901A%u8FC7%u524D%u9762%u8F93%u5165%u7684%u90E8%u5206%u662F%u65E0%u6CD5%u5224%u65AD%u7684%u3002%u5FC5%u987B%u8981%u501F%u52A9%u540E%u9762%u7684%u8BED%u53E5%u624D%u53EF%u4EE5%u6B63%u786E%u7684%u5224%u65AD%u3002%u6240%u4EE5%u5C31%u6709%u4E86%u53CC%u5411%u5FAA%u73AF%u7F51%u7EDC%u7684%u5FC5%u8981%u4E86%u3002%0A%u6700%u7EC8%u7F51%u7EDC%u7ED3%u6784%u5982%u4E0B%u56FE%uFF1A%0A%21%5BAlt%20text%7C700x0%5D%28./1539165232978.png%29%0A%u4E2D%u95F4%u7D2B%u8272%u7684%u8DEF%u5F84%u662F%u6807%u51C6%u7684RNN%uFF0C%u7EFF%u8272%u7684%u8DEF%u5F84%u662F%u53CD%u5411%u8DEF%u5F84%u3002%u7D2B%u8272%u548C%u7EFF%u8272%u7684%u6A21%u5757%u90FD%u53EF%u4EE5%u91C7%u7528GRU%u6216%u8005LSTM%u5355%u5143%u3002%u901A%u5E38%u5728NLP%u7684%u5E94%u7528%u4E2D%uFF0C%u91C7%u7528LSTM%u7684%u53CC%u5411%u7F51%u7EDC%u662F%u6BD4%u8F83%u5E38%u89C1%u7684%u3002%0A%0A%23%23%20Deep%20RNN%0A%u5BF9%u4E8ERNN%u6765%u8BF4%u4E00%u822C%u5F88%u5C11%u50CFCNN%u90A3%u6837%u5806%u53E0%u5F88%u591A%u5C42%uFF0C3%u5C42%u5BF9RNN%u6765%u8BF4%u5C31%u5DF2%u7ECF%u975E%u5E38%u5E9E%u5927%u4E86%u3002%u5982%u679C%u9700%u8981%u5806%u53E0%u591A%u5C42%uFF0C%u4E00%u822C%u4F1A%u5220%u53BB%u6C34%u5E73%u8FDE%u63A5%u3002%20%u6BCF%u4E2ARNN%u5355%u5143%u53EF%u4EE5%u662F%u6807%u51C6RNN%u5355%u5143%uFF0C%u4E5F%u53EF%u4EE5%u662FGRU%u5355%u5143%u3001LSTM%u5355%u5143%u751A%u81F3BRNN%u5355%u5143%uFF0C%u53EF%u4EE5%u81EA%u7531%u8BBE%u7F6E%u3002%0A%21%5BAlt%20text%7C500x0%5D%28./1539165553848.png%29%0A%u5982%u4E0A%u56FE%u6240%u793A%uFF0C%24a%5E%7B%5B1%5D%7D%24%u5230%24a%5E%7B%5B3%5D%7D%24%u662F%u6807%u51C6%u7684RNN%u3002%u5728RNN%u4E4B%u4E0A%u53EF%u4EE5%u7EE7%u7EED%u6DFB%u52A0%u6DF1%u5C42%u795E%u7ECF%u7F51%u7EDC%u6765%u8F93%u51FA%24%5Chat%20y%5E%7B%3Ct%3E%7D%24%uFF0C%u800C%u4E0D%u662F%u76F4%u63A5%u4ECERNN%u4E2D%u8F93%u51FA%u3002%0A%0A%23%23%20%u53C2%u8003%u6587%u732E%0AGRU%3A%20%0A-%20Cho%20et%20al.%2C%202014.%20On%20the%20properties%20of%20neural%20machine%20translation%3A%20Encoder-decoder%20approaches%0A-%20Chung%20et%20al.%2C%202014.%20Empirical%20Evaluation%20of%20Gated%20Recurrent%20Neural%20Networks%20on%20Sequence%20Modeling%0A%0ALSTM%3A%20Hochreiter%20%26%20Schmidhuber%201997.%20Long%20short-term%20memory