11 神经网络 | 王冠嵩

11 神经网络

本章将介绍一类学习方法，其在不同的领域中，统计学和人工智能，基于基本相同的模型分别发展而来。其中心思想是将输入变量的线性组合提取为衍生特征，然后将目标变量建模为这些特征的一个非线性函数。这样得出的是一个强大的学习方法，在很多领域中都有广泛的应用。我们首先介绍在半参数统计学和函数平滑中发展来的投影寻踪（projection pursuit）模型。本章的其他部分介绍神经网络（neural network）模型。

内容概要

11.2 投影寻踪回归

第 389-392 页。投影寻踪是一个泛逼近器，用输入变量线性组合的非线性函数可逼近任意的连续函数。它的本质思想与神经网络模型一致。
11.3 神经网络

第 392-395 页。神经网络模型的本质是一种非线性的统计模型。本节以单隐层反向传播网络为例，介绍了神经网络的基本结构。
11.4 神经网络的拟合

第 395-397 页。反向传播（BP）的拟合方法本质上是一个只用到一阶更新的梯度下降过程。神经网络的结构使这个过程具有局部性，因此可利用并行的计算结构。BP 的更新是一种批量学习，所以也可以进行在线训练。但由于只用到一阶梯度，BP 的收敛过程可能会非常慢。
11.5 训练神经网络的一些问题

第 397-401 页。初始值的选择、用正则化方法来避免过拟合、对输入变量的标准化处理、单元和层的个数的选择、多个局部极小值点的处理。
11.6 示例：模拟数据

第 401-404 页。在模拟数据例子中，演示单隐层神经网络模型，以及隐藏单元个数和权重衰减参数对模型效果的影响。
11.7 示例：邮政编码数据

第 404-408 页。在手写数字识别的分类问题中演示神经网络模型。在寻找最优模型的过程中，一个方向是扩大备选模型的范围，也就是让模型可以模拟出更复杂的函数结构；另一个方向是根据具体的常识缩小搜寻的范围，比如对模型的系数后结构加以限制。译者在本节有较多不理解之处，待回溯。
11.8 讨论

第 408-409 页。总结，神经网络模型是一个强大而且通用的方法。由于其缺乏可解释性，所以更适用于预测而不太适用于解释数据和输入变量的关联机制。
11.9 BNN 和 NIPS 2003 挑战赛

第 409-414 页。Neal and Zhang (2006) 在 NIPS 2003 的分类问题挑战赛中利用贝叶斯神经网络方法赢得了比赛。本节用竞赛的数据集，对比了一系列不同的方法。
11.10 关于计算量

第 414 页。

本章练习

练习 11.1：
练习 11.2：第 11.5 节
练习 11.3：第 11.4 节
练习 11.4：第 11.7 节
练习 11.5：
练习 11.6：
练习 11.7：

参考文献

Projection pursuit was proposed by Friedman and Tukey (1974), and specialized to regression by Friedman and Stuetzle (1981). Huber (1985) gives a scholarly overview, and Roosen and Hastie (1994) present a formulation using smoothing splines. The motivation for neural networks dates back to McCulloch and Pitts (1943), Widrow and Hoff (1960) (reprinted in Anderson and Rosenfeld (1988)) and Rosenblatt (1962). Hebb (1949) heavily influenced the development of learning algorithms. The resurgence of neural networks in the mid 1980s was due to Werbos (1974), Parker (1985) and Rumelhart et al. (1986), who proposed the back-propagation algorithm. Today there are many books written on the topic, for a broad range of audiences. For readers of this book, Hertz et al. (1991), Bishop (1995) and Ripley (1996) may be the most informative. Bayesian learning for neural networks is described in Neal (1996). The ZIP code example was taken from Le Cun (1989); see also Le Cun et al. (1990) and Le Cun et al. (1998).

We do not discuss theoretical topics such as approximation properties of neural networks, such as the work of Barron (1993), Girosi et al. (1995) and Jones (1992). Some of these results are summarized by Ripley (1996).