Machine Learning (2) - Neural Network

Posted on 2017-04-03 Edited on 2018-09-16

~~Edit~~

Model

: activation of unit i in layer j
: matrix of weights controlling function mapping from layer j to layer j+1

这是一个三层神经网络。每一层都是遵循Logistic Regression的sigmoid function，所以是非线性的。

The sigmoid function is

通过hypothesis function 就可以计算cost function。公式在Machine Learning (1) - Linear & Logistic Regression
摘取公式如下：

Note：

的尺寸为，取决于上一层和这一层的node数量。其中的+1，是bias节点，即常量(+1)节点，而不算在节点数中
矩阵的数量取决于网络层数，即3层网络，只有2个矩阵

Cost Function

接着上一章的公式，给出完整的多状态分类并且正规化的cost function如下：

稍作解释：

公式的前半部分是原版的hypothesis，并对所有的分类进行累加
后半部分是正规化参数，是对所有的参数进行平方累加。一共有L-1层神经网络，每一层有个

注
注意Regularized项，只累加项，不包含bias项

Backpropagation Algorithm

有了cost function，就可以开始Gradient descent了，基本公式如下：

但是现在引入了神经网络，求偏导数就没这么简单了，所以引入了Forwardpropagation & Backpropagation。意即，向前算一遍，再向回算一遍，综合两方面的数据就能得出cost function的偏导数。证明不会，但是Andrew给出了公式。
整个算法过程如下，这里不同于教材中的4层神经网络，而是采用本篇笔记中的三层神经网络：

Given training set
Set for all (l,i,j)。 size() = ，与同
For i =1 to m
   1. Set , both a and x are vectors
   2. Perform forward propagation to compute for l=2,3,…,L. L=神经网络层数
         ——注意加入bias项

         ——注意加入bias项


   3. Using , compute . y是对输出的量测，是对输出的模型估计，算是模型偏差。
   4. Using backpropagation to compute ，输入层没有，即
     ——此处应当有误，具体参考相关笔记
注:
针对的计算要特别注意。下面专门放一节解释计算
   5. Compute

    Vectorized equation: ，其中j是每一层网络中的节点编号
注:
此处累加指的是针对training samples
   6. Final step, compute the partial derivative of

整个过程略显复杂，但是这就是神经网络。据说后面有更优雅的算法。

计算

: an error item that measures how much the node was responsible for any errors in our output.

最终计算式应为：

g=sigmoid function.

算法调优

算法偏差过大有两类问题：

Overfitting (过拟合 / High Variance)
Underfitting (欠拟合 / High Bias)
针对这两个问题，我们要做的是：
1. 识别他们
2. 采取相应措施

如何产生正确的模型

首先将training samples分成三部分：

Training set: 60%
Cross validation set: 20%
Test set: 20%
用training set来挑选，用Cross validation set来选择多项式幂次，最后用test set error 来评估算法的优劣。

Polynomial Degree - d

参考如下图像，

对于相同的，当多项式幂次很低时，可能会产生Underfitting，此时和都很大
当多项式幂次很大时，可能会产生Overfitting，此时远大于，这对挑选d有很好的参考意义

Regularization -

Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,…10.24});
计算不包含的train error和cross validation error
画出下图
和选择d类似，选取合适的

Random initialization

对的初始值进行随机化。在ex4的练习中，给出了一下公式对进行初始化：

识别Overfitting(High Variance)/Underfitting(High Bias)

要识别，就要通过Learning Curves

High Bias:

High Variance:

What to try next?

Getting more training examples: Fixes high variance
Trying smaller sets of features: Fixes high variance
Adding features: Fixes high bias
Adding polynomial features: Fixes high bias
Decreasing λ: Fixes high bias
Increasing λ: Fixes high variance.

还有一个办法就是：
如果想得到好算法，做到两方面即可：

引入很多变量，参数 => high variance, low bias
提供大量training samples => low variance

Error Analysis

Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
Plot learning curves to decide if more data, more features, etc. are likely to help.
Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.