搜索空间奥德赛外文翻译资料-开题报告网

A Search Space Odyssey

Introduction

Recurrent neural networks with Long Short-Term Memory (which we will concisely refer to as LSTMs) have emerged as an effective and scalable model for several learning problems related to sequential data. Earlier methods for attacking these problems were usually hand-designed workarounds to deal with the sequential nature of data such as language and audio signals. Since LSTMs are effective at capturing long-term temporal dependencies without suffering from the optimization hurdles that plague simple recurrent networks (SRNs) (Hochreiter, 1991; Bengio et al., 1994), they have been used to advance the state of the art for many difficult problems. This includes handwriting recognition (Graves et al., 2009; Pham et al., 2013; Doetsch et al., 2014) and generation (Graves et al., 2013), language modeling (Zaremba et al., 2014) and translation (Luong et al., 2014), acoustic modeling of speech (Sak et al., 2014), speech synthesis (Fan et al., 2014), protein secondary structure prediction (S0nderby amp; Winther, 2014), analysis of audio (Marchi et al., 2014), and video data (Donahue et al., 2014) among others.

The central idea behind the LSTM architecture is a memory cell which can maintain its state over time, and non-linear gating units which regulate the information flow into and out of the cell. Most modern studies incorporate many improvements that have been made to the LSTM architecture since its original formulation (Hochreiter amp; Schmidhuber, 1995; 1997). However, LSTMs are now applied to many learning problems which differ significantly in scale and nature from the problems that these improvements were initially tested on. A systematic study of the utility of various computational components which comprise LSTMs (see Figure 1) was missing. This paper fills that gap and systematically addresses the open question of improving the LSTM architecture.

We evaluate the most popular LSTM architecture (vanilla LSTM; Section 2) and eight different variants thereof on three benchmark problems: acoustic modeling, handwriting recognition and polyphonic music modeling. Each one differs from the vanilla LSTM by a single change. This allows us to isolate the effect of each of these changes on the performance of the architecture. Random search (Anderson, 1953; Solis amp;Wets,1981; Bergstra amp; Bengio, 2012) is used to find the best performing hyperparameters for each variant on each problem, enabling a reliable comparison of the performance of different variants. We also provide insights gained about hyperparameters and their interaction using fANOVA (Hutter et al., 2014). mentary material. Here x^t is the input vector at time t, the W are rectangular input weight matrices, the R are square recurrent weight matrices, the p are peephole weight vectors and b are bias vectors. Functions a, g and h are point-wise non-linear activation functions: logistic sigmoid (₁e__x) is used for as activation function of the gates and hyperbolic tangent is usually used as the block input and output activation function. The point-wise multiplication of two vectors is denoted with.

Figure 1. Detailed schematic of the Simple Recurrent Network (SRN) unit (left) and a Long Short-Term Memory block (right) as used in the hidden layers of a recurrent neural network.

recurrent

block output

output gate

LSTM block

peepholes

input

recurrent

forget gate

input

input gate

input

block input

input

recurrent

output

recurrent

	Legend
—	unweighted connection
—	weighted connection
.---	connection with time-lag
bull;	branching point
0	mutliplication
㊉	sum over all inputs
copy;	gate activation function (always sigmoid)
0	input activation function (usually tanh)
	output activation function (usually tanh)

Vanilla LSTM

The LSTM architecture most commonly used in literature was originally described by Graves amp; Schmidhuber (2005).1 We refer to it as vanilla LSTM and use it as a reference for comparison of all the variants. The vanilla LSTM incorporates changes by Gers et al. (1999) and Gers amp; Schmi

剩余内容已隐藏，支付完成后下载完整资料

搜索空间奥德赛

介绍

具有长期短期记忆的循环神经网络（我们将简明地称为 LSTM）已成为一个有效和可扩展的模型，用于解决与顺序数据相关的几个学习问题。早期攻击这些问题的方法通常是手工设计的解决方法，以处理数据（如语言和音频信号）的顺序特性。由于 LSTM 能够有效地捕获长期的时间依赖关系，而不会受到困扰简单循环网络（SRN）的优化障碍(（Hochreiter，1991; 1991;本吉奥等人 1994年），它们被用来推进许多难题的艺术状态。这包括手写识别（格雷夫斯等人）。， 2009;帕姆等人， 2013;多奇等人， 2014）和生成（格雷夫斯等人. ，2013年），语言建模（扎伦巴等人）。，2014 2014）和翻译（龙等人. ，2014 2014），语音的声学建模（Sak 等人. ，2014）)，语音合成（范等人. ，2014）)，蛋白质二次结构预测（S0nderby amp; Winther， 2014），)音频分析（马尔基等人. ( ，2014 2014）和视频数据（多纳休等人., ，2014）等等。

LSTM 体系结构背后的中心思想是一个内存单元，它可以随着时间的推移保持其状态，以及调节进出单元的信息流的非线性门控单元。大多数现代研究都纳入了LSTM架构自其最初制定以来所做的许多改进（Hochreiter amp; Schmidhuber，1995, 1995年; ( 1997年）.然而，LSTM 现在适用于许多学习问题，这些问题在规模和性质上与最初测试这些改进的问题有很大不同。缺少对构成 LSTM 的各种计算组件的效用的系统研究（参见图 1）。) 本文填补了这一空白，并系统地解决了改进LSTM体系结构的一个悬而未决的问题。

我们评估最流行的LSTM架构（香草LSTM;第2节）及其八个不同的变体，三个基准问题：声学建模，手写识别和复音音乐建模。每个都不同于香草LSTM一个变化。这使我们能够隔离每个更改对体系结构性能的影响。随机搜索（安德儿子, ，1953年;索利斯和湿,，1981年; Bergstra amp; Bengio， 2012）用于查找每个问题上每个变体性能最佳的超参数，从而可靠地比较不同变体的性能。我们还提供使用 fANOVA（Hutter 等人）获得的关于超参数及其相互作用的见解(。，2014）. 薄荷材料.此处 x^t是时间 t 的输入矢量，W 是矩形输入权重矩阵，R 是方形循环权重矩阵，p 是窥视孔权重矢量，b 是偏置矢量。函数a、g 和 h 是点式非线性激活函数：逻辑 sigmoid _（1e__x）用作门的激活功能，双曲切线(通常用作块输入和输出激活函数。两个矢量的点乘法用表示。.

图 1.简单循环网络（SRN）单元（左）和长期短期内存块（右）的详细原理图，用于循环神经网络的隐藏层。

经常

块输出

输出门

LSTM 块

窥视孔

输入

经常

忘记门

输入

输入门

输入

块输入

输入

经常

输出

经常

	传说
—	非加权连接
—	加权连接
.---	与时滞的连接
bull;	分支点
0	排泄
bull;	所有输入的总和
copy;	栅极激活功能（始终 sigmoid）
0	输入激活功能（通常为 tanh）
	输出激活功能（通常为 tanh）

香草LSTM

文献中最常用的LSTM架构最初由Graves amp; Schmidhuber （2005）描述。香草LSTM将Gers等人（1999年） and和Gers amp; Schmidhuber（2000年）的变更合并到原来的LSTM（Hochreiter (Hochreiter amp; Schmidhuber，1997年），并使用全梯度训练。Section 3 提供了这些主要 LSTM 更改的说明。

在图图1 中可以看到香草 LSTM 块的示意图。它具有三个门（输入、忘记和输出）、块输入、单个单元（恒定误差旋转木马）、输出激活功能和窥视孔连接。块的输出当前重新连接到块输入和所有门。

下面给出了香草 LSTM 层正向通的矢量公式。相应的反向传播时间（BPTT）公式可在柔韧中找到-??????·

1但请注意，一些研究省略了窥视孔连接。

zt = g （Wzxt = Rzyt-1 = bz）块输入

它 = a（WiXt = Riyt-1 = pi copy; ct-1 = b' 输入门英尺 = a（Wfxt = Rfyt-1 = pf copy; ct-1 = bf）忘记门 ct = copy; zt =英尺 copy; ct-1cell 状态

ot = a （Woxt = Royt-1 = po copy; ct = bo）输出门 yt = copy; h（ct）块输出

LSTM 的历史
1. 原始配方

LSTM 块的初始版本（Hochreiter amp; Schmidhuber，1995; 1997 年）包括（可能有多个）单元、输入和输出门，但没有忘记门和没有窥视孔连接。在某些实验中省略了输出门、单元偏置或输入激活函数。培训采用实时经常性学习（RTRL）和'时间回推'（BPTT）的混合功能进行。只有单元格的渐变通过时间传播回来，其他循环连接的渐变被截断。因此，该研究没有使用确切的梯度进行培训。该版本的另一个特点是使用全门重复，这意味着除了块输出的经常性输入外，所有门在前一个时间步长时都收到来自所有门的经常性输入。此功能未出现在任何以后的论文中。

1. 忘记门

第一个建议修改LSTM架构的论文引入了忘记门（Gers等人，1999年），使LSTM能够重置自己的状态。这允许学习连续的任务，如嵌入式 Reber 语法。

1. 窥视孔连接

Gers amp; Schmidhuber（2000年）认为，为了学习精确的时间，细胞需要控制大门。到目前为止，这只能通过打开的输出门。在体系结构中添加了窥视孔连接（从细胞到门的连接，图 1 中为蓝色），以便更容易学习精确计时。此外，省略了输出激活函数，因为没有证据表明它对于解决 LSTM 迄今测试的问题至关重要。

1. lt;a id='boo
  剩余内容已隐藏，支付完成后下载完整资料
  
  资料编号：[606760]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

您需要先支付 30元 才能查看全部内容！立即支付

课题毕业论文、开题报告、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。

注册

找回密码

搜索空间奥德赛外文翻译资料

您可能感兴趣的文章

最新文档

联系我们

登录

您可能感兴趣的文章

最新文档

联系我们