Machine Learning (2) - Neural Network

Edit

Model

: activation of unit i in layer j
: matrix of weights controlling function mapping from layer j to layer j+1

这是一个三层神经网络。每一层都是遵循Logistic Regression的sigmoid function,所以是非线性的。


The sigmoid function is

通过hypothesis function 就可以计算cost function。公式在Machine Learning (1) - Linear & Logistic Regression
摘取公式如下:

Note:

  • 的尺寸为,取决于上一层和这一层的node数量。其中的+1,是bias节点,即常量(+1)节点,而不算在节点数中
  • 矩阵的数量取决于网络层数,即3层网络,只有2个矩阵

Cost Function

接着上一章的公式,给出完整的多状态分类并且正规化的cost function如下:

稍作解释:

  • 公式的前半部分是原版的hypothesis,并对所有的分类进行累加
  • 后半部分是正规化参数,是对所有的参数进行平方累加。一共有L-1层神经网络,每一层有


注意Regularized项,只累加项,不包含bias项

Backpropagation Algorithm

有了cost function,就可以开始Gradient descent了,基本公式如下:

但是现在引入了神经网络,求偏导数就没这么简单了,所以引入了Forwardpropagation & Backpropagation。意即,向前算一遍,再向回算一遍,综合两方面的数据就能得出cost function的偏导数。证明不会,但是Andrew给出了公式。
整个算法过程如下,这里不同于教材中的4层神经网络,而是采用本篇笔记中的三层神经网络:

Given training set
Set for all (l,i,j)。 size() = ,与
For i =1 to m
   1. Set , both a and x are vectors
   2. Perform forward propagation to compute for l=2,3,…,L. L=神经网络层数
         ——注意加入bias项
        
         ——注意加入bias项
        
        

   3. Using , compute . y是对输出的量测,是对输出的模型估计,算是模型偏差。
   4. Using backpropagation to compute ,输入层没有,即
     ——此处应当有误,具体参考相关笔记

注:
针对的计算要特别注意。下面专门放一节解释计算

   5. Compute
    
    Vectorized equation: ,其中j是每一层网络中的节点编号
注:
此处累加指的是针对training samples

   6. Final step, compute the partial derivative of
    
    
    

整个过程略显复杂,但是这就是神经网络。据说后面有更优雅的算法。

计算

: an error item that measures how much the node was responsible for any errors in our output.



最终计算式应为:

g=sigmoid function.

算法调优

算法偏差过大有两类问题:

  • Overfitting (过拟合 / High Variance)
  • Underfitting (欠拟合 / High Bias)
    针对这两个问题,我们要做的是:
    1. 识别他们
    2. 采取相应措施

如何产生正确的模型

首先将training samples分成三部分:

  • Training set: 60%
  • Cross validation set: 20%
  • Test set: 20%
    用training set来挑选,用Cross validation set来选择多项式幂次,最后用test set error 来评估算法的优劣。

Polynomial Degree - d

参考如下图像,

对于相同的,当多项式幂次很低时,可能会产生Underfitting,此时都很大
当多项式幂次很大时,可能会产生Overfitting,此时远大于,这对挑选d有很好的参考意义

Regularization -

  1. Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,…10.24});
  2. 计算不包含的train error和cross validation error
  3. 画出下图
  4. 和选择d类似,选取合适的

Random initialization

的初始值进行随机化。在ex4的练习中,给出了一下公式对进行初始化:


识别Overfitting(High Variance)/Underfitting(High Bias)

要识别,就要通过Learning Curves

High Bias:

High Variance:

What to try next?

  • Getting more training examples: Fixes high variance
  • Trying smaller sets of features: Fixes high variance
  • Adding features: Fixes high bias
  • Adding polynomial features: Fixes high bias
  • Decreasing λ: Fixes high bias
  • Increasing λ: Fixes high variance.

还有一个办法就是:
如果想得到好算法,做到两方面即可:

  1. 引入很多变量,参数 => high variance, low bias
  2. 提供大量training samples => low variance

Error Analysis

  • Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
  • Plot learning curves to decide if more data, more features, etc. are likely to help.
  • Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.

特例skewed data

当处理分类器时,如果有一个分类占绝对多数,例如99%。那可能我们对预测有偏向,比如要预测癌症,宁可误诊,不可漏诊,也就是要高Recall。

命中率

可以通过取不同的threshold,来做出下面这张图,根据Recall和Precision的偏好来选择需要的threshold。

如何评价算法优劣:
-Score (F Score) =

%23Machine%20Learning%20%282%29%20-%20Neural%20Network%0A%0A@%28myblog%29%5B%u6DF1%u5EA6%u5B66%u4E60%2C%20deep%20learning%5D%0A%0A%23%23Model%0A%21%5BAlt%20text%5D%28./1491893421634.png%29%0A%0A%24a_i%5E%7B%28j%29%7D%24%3A%20**activation**%20of%20unit%20i%20in%20layer%20j%0A%24%5CTheta%5E%7B%28j%29%7D%24%3A%20matrix%20of%20**weights**%20controlling%20function%20mapping%20from%20layer%20***j***%20to%20layer%20***j+1***%0A%0A%u8FD9%u662F%u4E00%u4E2A%u4E09%u5C42%u795E%u7ECF%u7F51%u7EDC%u3002%u6BCF%u4E00%u5C42%u90FD%u662F%u9075%u5FAALogistic%20Regression%u7684sigmoid%20function%uFF0C%u6240%u4EE5%u662F%u975E%u7EBF%u6027%u7684%u3002%20%0A%24a_1%5E%7B%282%29%7D%20%3D%20g%28%5CTheta_%7B10%7D%5E%7B%281%29%7D%20+%20%5CTheta_%7B11%7D%5E%7B%281%29%7Dx_1%20+%20%5CTheta_%7B12%7D%5E%7B%281%29%7Dx_2%20+%20%5CTheta_%7B13%7D%5E%7B%281%29%7Dx_3%29%20%24%0A%24a_2%5E%7B%282%29%7D%20%3D%20g%28%5CTheta_%7B20%7D%5E%7B%281%29%7D%20+%20%5CTheta_%7B21%7D%5E%7B%281%29%7Dx_1%20+%20%5CTheta_%7B22%7D%5E%7B%281%29%7Dx_2%20+%20%5CTheta_%7B13%7D%5E%7B%281%29%7Dx_3%29%20%24%0A%24a_3%5E%7B%282%29%7D%20%3D%20g%28%5CTheta_%7B30%7D%5E%7B%281%29%7D%20+%20%5CTheta_%7B31%7D%5E%7B%281%29%7Dx_1%20+%20%5CTheta_%7B32%7D%5E%7B%281%29%7Dx_2%20+%20%5CTheta_%7B13%7D%5E%7B%281%29%7Dx_3%29%20%24%0A%0AThe%20sigmoid%20function%20is%20%24g%28z%29%20%3D%20%5Cdfrac%20%7B1%7D%7B1+e%5E%7B-z%7D%7D%24%0A%0A%24%24h_%5CTheta%28x%29%20%3D%20a_1%5E%7B%283%29%7D%20%3D%20g%28%5CTheta_%7B10%7D%5E%7B%282%29%7Da_0%5E%7B%282%29%7D%20+%20%5CTheta_%7B11%7D%5E%7B%282%29%7Da_1%5E%7B2%7D%20+%20%5CTheta_%7B12%7D%5E%7B%282%29%7Da_2%5E%7B%282%29%7D%20+%20%5CTheta_%7B13%7D%5E%7B%282%29%7Da_3%5E%7B%282%29%7D%20%29%24%24%0A%0A%u901A%u8FC7hypothesis%20function%20%24h_%5Ctheta%28x%29%24%u5C31%u53EF%u4EE5%u8BA1%u7B97cost%20function%u3002%u516C%u5F0F%u5728%5BMachine%20Learning%20%281%29%20-%20Linear%20%26%20Logistic%20Regression%5D%28https%3A//app.yinxiang.com/shard/s10/nl/161681/b780cf32-b71b-44f8-929a-f91664ca0bc9%29%0A%u6458%u53D6%u516C%u5F0F%u5982%u4E0B%uFF1A%0A%24%24J%28%5Ctheta%29%20%3D%20-%5Cdfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%5By%5E%7B%28i%29%7Dlog%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29%29%20+%20%281-y%5E%7B%28i%29%7D%29log%281-h_%5Ctheta%28x%5E%7B%28i%29%7D%29%29%5D%24%24%0A%0ANote%uFF1A%0A*%20%24%5CTheta%5E%7B%28j%29%7D%24%u7684%u5C3A%u5BF8%u4E3A%24s_%7Bj+1%7D%20%20%5Ctimes%20%28s_j+1%29%24%uFF0C%u53D6%u51B3%u4E8E%u4E0A%u4E00%u5C42%u548C%u8FD9%u4E00%u5C42%u7684node%u6570%u91CF%u3002%u5176%u4E2D%u7684+1%uFF0C%u662Fbias%u8282%u70B9%uFF0C%u5373%u5E38%u91CF%28+1%29%u8282%u70B9%uFF0C%u800C%u4E0D%u7B97%u5728%u8282%u70B9%u6570%u4E2D%0A*%20%24%5CTheta%24%u77E9%u9635%u7684%u6570%u91CF%u53D6%u51B3%u4E8E%u7F51%u7EDC%u5C42%u6570%uFF0C%u53733%u5C42%u7F51%u7EDC%uFF0C%u53EA%u67092%u4E2A%24%5CTheta%24%u77E9%u9635%0A%0A%23%23Cost%20Function%0A%u63A5%u7740%u4E0A%u4E00%u7AE0%u7684%u516C%u5F0F%uFF0C%u7ED9%u51FA%u5B8C%u6574%u7684%u591A%u72B6%u6001%u5206%u7C7B%u5E76%u4E14%u6B63%u89C4%u5316%u7684cost%20function%u5982%u4E0B%uFF1A%0A%24%24J%28%5CTheta%29%20%3D%20-%5Cdfrac%20%7B1%7D%7Bm%7D%20%5Csum_%7Bi%3D1%7D%5Em%20%5Csum_%7Bk%3D1%7D%5EK%5By_k%5E%7B%28i%29%7Dlog%28h_%5CTheta%28x%5E%7B%28i%29%7D%29%29_k%20+%20%281-y_k%5E%7B%28i%29%7D%29log%281-%28h_%5CTheta%28x%5E%7B%28i%29%7D%29%29_k%29%5D%20+%20%5Cdfrac%20%7B%5Clambda%7D%7B2m%7D%20%5Csum_%7Bl%3D1%7D%5E%7BL-1%7D%20%5Csum_%7Bi%3D1%7D%5E%7Bs_l%7D%20%5Csum_%7Bj%3D1%7D%5E%7Bs_l+1%7D%28%5CTheta_%7Bi%2Cj%7D%5E%7B%28l%29%7D%29%5E2%24%24%0A%u7A0D%u4F5C%u89E3%u91CA%uFF1A%0A*%20%u516C%u5F0F%u7684%u524D%u534A%u90E8%u5206%u662F%u539F%u7248%u7684hypothesis%uFF0C%u5E76%u5BF9%u6240%u6709%u7684%u5206%u7C7B%u8FDB%u884C%u7D2F%u52A0%0A*%20%u540E%u534A%u90E8%u5206%u662F%u6B63%u89C4%u5316%u53C2%u6570%uFF0C%u662F%u5BF9%u6240%u6709%u7684%24%5CTheta%24%u53C2%u6570%u8FDB%u884C%u5E73%u65B9%u7D2F%u52A0%u3002%u4E00%u5171%u6709L-1%u5C42%u795E%u7ECF%u7F51%u7EDC%uFF0C%u6BCF%u4E00%u5C42%u6709%24s_%7Bl+1%7D%5Ctimes%20s_l+1%24%u4E2A%24%5CTheta%24%0A%0A%3E**%u6CE8**%0A%3E%u6CE8%u610FRegularized%u9879%uFF0C%u53EA%u7D2F%u52A0%24s_%7Bj+1%7D%20%5Ctimes%20s_j%24%u9879%uFF0C%u4E0D%u5305%u542Bbias%u9879%24%5CTheta_%7Bi0%7D%24%0A%0A%23%23%23Backpropagation%20Algorithm%0A%u6709%u4E86cost%20function%uFF0C%u5C31%u53EF%u4EE5%u5F00%u59CBGradient%20descent%u4E86%uFF0C%u57FA%u672C%u516C%u5F0F%u5982%u4E0B%uFF1A%0A%24%24%5Ctheta_j%20%3A%3D%20%5Ctheta_j%20-%20%5Cdfrac%20%7B%5Calpha%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29-y%5E%7B%28i%29%7D%29%5Ccenterdot%20x_j%5E%7B%28i%29%7D%24%24%0A%u4F46%u662F%u73B0%u5728%u5F15%u5165%u4E86%u795E%u7ECF%u7F51%u7EDC%uFF0C%u6C42%u504F%u5BFC%u6570%u5C31%u6CA1%u8FD9%u4E48%u7B80%u5355%u4E86%uFF0C%u6240%u4EE5%u5F15%u5165%u4E86Forwardpropagation%20%26%20Backpropagation%u3002%u610F%u5373%uFF0C%u5411%u524D%u7B97%u4E00%u904D%uFF0C%u518D%u5411%u56DE%u7B97%u4E00%u904D%uFF0C%u7EFC%u5408%u4E24%u65B9%u9762%u7684%u6570%u636E%u5C31%u80FD%u5F97%u51FAcost%20function%u7684%u504F%u5BFC%u6570%u3002%u8BC1%u660E%u4E0D%u4F1A%uFF0C%u4F46%u662FAndrew%u7ED9%u51FA%u4E86%u516C%u5F0F%u3002%0A%u6574%u4E2A%u7B97%u6CD5%u8FC7%u7A0B%u5982%u4E0B%uFF0C%u8FD9%u91CC%u4E0D%u540C%u4E8E%u6559%u6750%u4E2D%u76844%u5C42%u795E%u7ECF%u7F51%u7EDC%uFF0C%u800C%u662F%u91C7%u7528%u672C%u7BC7%u7B14%u8BB0%u4E2D%u7684%u4E09%u5C42%u795E%u7ECF%u7F51%u7EDC%uFF1A%0A%3E%20Given%20training%20set%20%24%7B%28x%5E%7B%281%29%7D%2C%20y%5E%7B%281%29%7D%29%2C%20...%2C%20%28x%5E%7B%28m%29%7D%2C%20y%5E%7B%28m%29%7D%29%7D%24%0A%3E%20Set%20%24%5CDelta_%7Bij%7D%5E%7B%28l%29%7D%20%3D%200%20%24%20for%20all%20%28l%2Ci%2Cj%29%u3002%20size%28%24%5CDelta%24%29%20%3D%20%24s_%7Bj+1%7D%20%5Ctimes%20s_%7Bj%7D+1%24%uFF0C%u4E0E%24%5CTheta%24%u540C%0A%3E%20For%20i%20%3D1%20to%20m%0A%09%26nbsp%3B%26nbsp%3B%201.%20Set%20%24a%5E%7B%281%29%7D%20%3D%20x%5E%7B%28i%29%7D%24%2C%20both%20a%20and%20x%20are%20vectors%0A%09%26nbsp%3B%26nbsp%3B%202.%20Perform%20forward%20propagation%20to%20compute%20%24a%5E%7B%28l%29%7D%24%20for%20l%3D2%2C3%2C...%2CL.%20L%3D%u795E%u7ECF%u7F51%u7EDC%u5C42%u6570%0A%09%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24a%5E%7B%281%29%7D%20%3D%20%5Cleft%5B%5Cbegin%7Bmatrix%7D%201%20%26%20x%20%5Cend%7Bmatrix%7D%20%5Cright%5D%24%20%u2014%u2014%u6CE8%u610F%u52A0%u5165bias%u9879%0A%09%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24z%5E%7B%282%29%7D%20%3D%20%5CTheta%5E%7B%281%29%7Da%5E%7B%281%29%7D%24%0A%09%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24a%5E%7B%282%29%7D%20%3D%20%5Cleft%5B%5Cbegin%7Bmatrix%7D%201%20%26%20g%28z%5E%7B%282%29%7D%29%5Cend%7Bmatrix%7D%20%5Cright%5D%24%20%u2014%u2014%u6CE8%u610F%u52A0%u5165bias%u9879%0A%09%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24z%5E%7B%283%29%7D%20%3D%20%5CTheta%5E%7B%282%29%7Da%5E%7B%282%29%7D%24%0A%09%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24a%5E%7B%283%29%7D%20%3D%20h_%5CTheta%28x%29%20%3D%20g%28z%5E%7B%283%29%7D%29%24%0A%09%3E%0A%20%20%20%26nbsp%3B%26nbsp%3B%203.%20Using%20%24y%5E%7B%28i%29%7D%24%2C%20compute%20%24%5Cdelta%5E%7B%28L%29%7D%20%3D%20a%5E%7B%28L%29%7D-%20y%20%5E%7B%28i%29%7D%24.%20y%u662F%u5BF9%u8F93%u51FA%u7684%u91CF%u6D4B%uFF0C%24a%5E%7B%28L%29%7D%20%3D%20h_%5CTheta%28x%29%24%u662F%u5BF9%u8F93%u51FA%u7684%u6A21%u578B%u4F30%u8BA1%uFF0C%24%5Cdelta%5E%7B%28L-1%29%7D%24%u7B97%u662F%u6A21%u578B%u504F%u5DEE%u3002%0A%20%20%20%26nbsp%3B%26nbsp%3B%204.%20Using%20backpropagation%20to%20compute%20%24%5Cdelta%5E%7B%28L-1%29%7D%2C%20%5Cdelta%5E%7B%28L-2%29%7D%2C%20...%2C%20%5Cdelta%5E%7B%282%29%7D%24%uFF0C%u8F93%u5165%u5C42%u6CA1%u6709%24%5Cdelta%24%uFF0C%u5373%24%5Cdelta%5E%7B%281%29%7D%24%0A%20%20%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24%5Cdelta%5E%7B%28l%29%7D%3D%28%5CTheta%5E%7B%28l%29%7D%29%5ET%5Cdelta%5E%7B%28l+1%29%7D.*a%5E%7B%28l%29%7D.*%281-a%5E%7B%28l%29%7D%29%24%20%u2014%u2014%u6B64%u5904%u5E94%u5F53%u6709%u8BEF%uFF0C%u5177%u4F53%u53C2%u8003%24%5Cdelta%24%u76F8%u5173%u7B14%u8BB0%0A%20%20%0A%20%3E**%u6CE8%3A**%0A*%u9488%u5BF9%24%5Cdelta%24%u7684%u8BA1%u7B97%u8981%u7279%u522B%u6CE8%u610F%u3002%u4E0B%u9762%u4E13%u95E8%u653E%u4E00%u8282%u89E3%u91CA%24%5Cdelta%24%u8BA1%u7B97*%0A%20%20%20%20%0A%20%3E%26nbsp%3B%26nbsp%3B%205.%20Compute%20%24%5CDelta_%7Bij%7D%5E%7B%28l%29%7D%24%0A%20%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%24%5CDelta_%7Bij%7D%5E%7B%28l%29%7D%20%3A%3D%20%5CDelta_%7Bij%7D%5E%7B%28l%29%7D%20+%20a_j%5E%7B%28l%29%7D%5Cdelta_i%5E%7B%28l+1%29%7D%24%0A%20%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3BVectorized%20equation%3A%20%24%5CDelta%5E%7B%28l%29%7D%20%3A%3D%20%5CDelta%5E%7B%28l%29%7D%20+%20%5Cdelta%5E%7B%28l+1%29%7D%28a%5E%7B%28l%29%7D%29%5ET%24%uFF0C%u5176%u4E2Dj%u662F%u6BCF%u4E00%u5C42%u7F51%u7EDC%u4E2D%u7684%u8282%u70B9%u7F16%u53F7%0A%20%3E**%u6CE8%3A**%0A%3E*%u6B64%u5904%u7D2F%u52A0%u6307%u7684%u662F%u9488%u5BF9training%20samples*%0A%0A%3E%20%26nbsp%3B%26nbsp%3B%206.%20Final%20step%2C%20compute%20the%20partial%20derivative%20of%20%24J%28%5CTheta%29%24%0A%20%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%20%24%5Cdfrac%20%7B%5Cpartial%7D%7B%5Cpartial%5CTheta_%7Bij%7D%5E%7B%28l%29%7D%7DJ%28%5CTheta%29%20%3D%20D_%7Bij%7D%5E%7B%28l%29%7D%24%0A%20%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%20%24D_%7Bij%7D%5E%7B%28l%29%7D%20%3A%3D%20%5Cdfrac%20%7B1%7D%7Bm%7D%20%5CDelta_%7Bij%7D%5E%7B%28l%29%7D%20+%20%5Clambda%20%5CTheta_%7Bij%7D%5E%7B%28l%29%7D%20%5Chspace%7B1em%7D%20if%20%5Chspace%7B0.1em%7D%20j%20%5Cneq%200%24%20%20%0A%20%20%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%20%24D_%7Bij%7D%5E%7B%28l%29%7D%20%3A%3D%20%5Cdfrac%20%7B1%7D%7Bm%7D%20%5CDelta_%7Bij%7D%5E%7B%28l%29%7D%20%5Chspace%7B4em%7D%20if%20%5Chspace%7B0.1em%7D%20j%20%3D%200%24%0A%0A%u6574%u4E2A%u8FC7%u7A0B%u7565%u663E%u590D%u6742%uFF0C%u4F46%u662F%u8FD9%u5C31%u662F%u795E%u7ECF%u7F51%u7EDC%u3002%u636E%u8BF4%u540E%u9762%u6709%u66F4%u4F18%u96C5%u7684%u7B97%u6CD5%u3002%0A%0A%23%23%23%u8BA1%u7B97%24%5Cdelta%24%0A%3E%24%5Cdelta%24%3A%20an%20error%20item%20that%20measures%20how%20much%20the%20node%20was%20responsible%20for%20any%20errors%20in%20our%20output.%0A%0A%24cost%28t%29%20%3D%20y%5E%7B%28t%29%7Dlog%28h_%5CTheta%28x%5E%7B%28t%29%7D%29%29%20+%20%281-y%5E%7B%28t%29%7D%29log%281-h_%5CTheta%28x%5E%7B%28t%29%7D%29%29%24%0A%24%5Cdelta_j%5E%7B%28t%29%7D%20%3D%20%5Cdfrac%20%7B%5Cpartial%7D%7B%5Cpartial%20z_j%5E%7B%28l%29%7D%7Dcost%28t%29%20%3D%20%28%5CTheta%5E%7B%28l%29%7D%29%5ET%5Cdelta%5E%7B%28l+1%29%7D.*g%27%28z%29%20%24%0A%24g%27%28z%29%20%3D%20%28%5Cdfrac%20%7B1%7D%7B1+e%5E%7B-z%7D%7D%29%27%20%3D%20g%28z%29%281-g%28z%29%29%24%0A%0A%u6700%u7EC8%u8BA1%u7B97%u5F0F%u5E94%u4E3A%uFF1A%0A%24%24%5Cdelta_j%5E%7B%28t%29%7D%20%3D%20%28%5CTheta%5E%7B%28l%29%7D%29%5ET%5Cdelta%5E%7B%28l+1%29%7D.*g%28%5CTheta%5E%7B%28l-1%29%7D%20a%5E%7B%28l-1%29%7D%29%20.*%281-g%28%5CTheta%5E%7B%28l-1%29%7Da%5E%7B%28l-1%29%7D%29%29%24%24%0A%0Ag%3Dsigmoid%20function.%0A%20%20%0A%23%23%u7B97%u6CD5%u8C03%u4F18%0A%u7B97%u6CD5%u504F%u5DEE%u8FC7%u5927%u6709%u4E24%u7C7B%u95EE%u9898%uFF1A%0A-%20Overfitting%20%28%u8FC7%u62DF%u5408%20/%20High%20Variance%29%0A-%20Underfitting%20%28%u6B20%u62DF%u5408%20/%20High%20Bias%29%0A%u9488%u5BF9%u8FD9%u4E24%u4E2A%u95EE%u9898%uFF0C%u6211%u4EEC%u8981%u505A%u7684%u662F%uFF1A%0A1.%20%u8BC6%u522B%u4ED6%u4EEC%0A2.%20%u91C7%u53D6%u76F8%u5E94%u63AA%u65BD%0A%0A%23%23%23%u5982%u4F55%u4EA7%u751F%u6B63%u786E%u7684%u6A21%u578B%0A%u9996%u5148%u5C06training%20samples%u5206%u6210%u4E09%u90E8%u5206%uFF1A%0A-%20Training%20set%3A%2060%25%0A-%20Cross%20validation%20set%3A%2020%25%0A-%20Test%20set%3A%2020%25%0A%u7528training%20set%u6765%u6311%u9009%24%5CTheta%2C%20%5Clambda%24%uFF0C%u7528Cross%20validation%20set%u6765%u9009%u62E9%u591A%u9879%u5F0F%u5E42%u6B21%uFF0C%u6700%u540E%u7528test%20set%20error%20%24J_%7Btest%7D%28%5CTheta%5E%7B%28d%29%7D%29%24%u6765%u8BC4%u4F30%u7B97%u6CD5%u7684%u4F18%u52A3%u3002%0A%21%5BAlt%20text%7C400x0%5D%28./1502938070114.png%29%0A%0A%0A%23%23%23%23Polynomial%20Degree%20-%20d%0A%u53C2%u8003%u5982%u4E0B%u56FE%u50CF%uFF0C%0A%21%5BAlt%20text%5D%28./1493951705755.png%29%0A%0A%3E%u5BF9%u4E8E%u76F8%u540C%u7684%24%5CTheta%24%uFF0C%u5F53%u591A%u9879%u5F0F%u5E42%u6B21%u5F88%u4F4E%u65F6%uFF0C%u53EF%u80FD%u4F1A%u4EA7%u751FUnderfitting%uFF0C%u6B64%u65F6%24J_cv%24%u548C%24J_training%24%u90FD%u5F88%u5927%0A%u5F53%u591A%u9879%u5F0F%u5E42%u6B21%u5F88%u5927%u65F6%uFF0C%u53EF%u80FD%u4F1A%u4EA7%u751FOverfitting%uFF0C%u6B64%u65F6%24J_cv%24%u8FDC%u5927%u4E8E%24J_training%24%uFF0C%u8FD9%u5BF9%u6311%u9009d%u6709%u5F88%u597D%u7684%u53C2%u8003%u610F%u4E49%0A%0A%23%23%23%23Regularization%20%20-%20%24%5Clambda%24%0A1.%20Create%20a%20list%20of%20lambdas%20%28i.e.%20%u03BB%u2208%7B0%2C0.01%2C0.02%2C0.04%2C...10.24%7D%29%3B%0A2.%20%u8BA1%u7B97%u4E0D%u5305%u542B%24%5Clambda%24%u7684train%20error%u548Ccross%20validation%20error%0A3.%20%u753B%u51FA%u4E0B%u56FE%0A4.%20%u548C%u9009%u62E9d%u7C7B%u4F3C%uFF0C%u9009%u53D6%u5408%u9002%u7684%24%5Clambda%24%0A%0A%21%5BAlt%20text%7C350x0%5D%28./1493952692430.png%29%0A%0A%23%23%23%23Random%20initialization%0A%u5BF9%24%5CTheta%24%u7684%u521D%u59CB%u503C%u8FDB%u884C%u968F%u673A%u5316%u3002%u5728ex4%u7684%u7EC3%u4E60%u4E2D%uFF0C%u7ED9%u51FA%u4E86%u4E00%u4E0B%u516C%u5F0F%u5BF9%24%5CTheta%24%u8FDB%u884C%u521D%u59CB%u5316%uFF1A%0A%24%24%5Cepsilon_%7Binit%7D%20%3D%20%5Cdfrac%7B%5Csqrt%7B6%7D%7D%7B%5Csqrt%20%7BL%5C_in%20+L%5C_out%7D%7D%24%24%0A%24L%5C_in%20%3D%20s_l%24%0A%24L%5C_out%20%3D%20s_%7Bl+1%7D%24%0A%0A%23%23%23%u8BC6%u522BOverfitting%28High%20Variance%29/Underfitting%28High%20Bias%29%0A%u8981%u8BC6%u522B%uFF0C%u5C31%u8981%u901A%u8FC7Learning%20Curves%0A%0A**High%20Bias%3A**%0A%21%5BAlt%20text%7C350x0%5D%28./1493950678949.png%29%0ATest%20error%u548Ctrain%20error%u63A5%u8FD1%uFF0C%u800C%u4E14%u90FD%u5F88%u9AD8%0A%0A**High%20Variance%3A**%0A%21%5BAlt%20text%7C350x0%5D%28./1493952926446.png%29%0ATrain%20error%u63A5%u8FD1%u6B63%u786E%u503C%uFF0C%u4F46%u662Ftest%20error%u8FD8%u662F%u5F88%u9AD8%uFF0C%u968F%u7740training%20set%20size%u53D8%u5927%uFF0Ctest%20error%u4E5F%u5728%u51CF%u5C0F%uFF0C%u4F46%u662F%u51CF%u5C0F%u7684%u901F%u5EA6%u5F88%u6162%u3002%0A%0A%23%23%23What%20to%20try%20next%3F%0A%3E%20-%20Getting%20more%20training%20examples%3A%20Fixes%20high%20variance%0A%3E-%20Trying%20smaller%20sets%20of%20features%3A%20Fixes%20high%20variance%0A%3E-%20Adding%20features%3A%20Fixes%20high%20bias%0A%3E-%20Adding%20polynomial%20features%3A%20Fixes%20high%20bias%0A%3E-%20Decreasing%20%u03BB%3A%20Fixes%20high%20bias%0A%3E-%20Increasing%20%u03BB%3A%20Fixes%20high%20variance.%0A%0A%u8FD8%u6709%u4E00%u4E2A%u529E%u6CD5%u5C31%u662F%uFF1A%0A%u5982%u679C%u60F3%u5F97%u5230%u597D%u7B97%u6CD5%uFF0C%u505A%u5230%u4E24%u65B9%u9762%u5373%u53EF%uFF1A%0A1.%20%u5F15%u5165%u5F88%u591A%u53D8%u91CF%uFF0C%u53C2%u6570%20%3D%3E%20high%20variance%2C%20low%20bias%0A2.%20%u63D0%u4F9B%u5927%u91CFtraining%20samples%20%3D%3E%20low%20variance%0A%23%23%23Error%20Analysis%0A%3E-%20Start%20with%20a%20simple%20algorithm%2C%20implement%20it%20quickly%2C%20and%20test%20it%20early%20on%20your%20cross%20validation%20data.%0A%3E-%20Plot%20learning%20curves%20to%20decide%20if%20more%20data%2C%20more%20features%2C%20etc.%20are%20likely%20to%20help.%0A%3E-%20Manually%20examine%20the%20errors%20on%20examples%20in%20the%20cross%20validation%20set%20and%20try%20to%20spot%20a%20trend%20where%20most%20of%20the%20errors%20were%20made.%0A%0A%23%23%23%u7279%u4F8Bskewed%20data%0A%u5F53%u5904%u7406%u5206%u7C7B%u5668%u65F6%uFF0C%u5982%u679C%u6709%u4E00%u4E2A%u5206%u7C7B%u5360%u7EDD%u5BF9%u591A%u6570%uFF0C%u4F8B%u598299%25%u3002%u90A3%u53EF%u80FD%u6211%u4EEC%u5BF9%u9884%u6D4B%u6709%u504F%u5411%uFF0C%u6BD4%u5982%u8981%u9884%u6D4B%u764C%u75C7%uFF0C%u5B81%u53EF%u8BEF%u8BCA%uFF0C%u4E0D%u53EF%u6F0F%u8BCA%uFF0C%u4E5F%u5C31%u662F%u8981%u9AD8Recall%u3002%0A%21%5BAlt%20text%5D%28./1493953864306.png%29%0A%24Precision%20%3D%20%5Cdfrac%20%7Btrue%5C%20positive%7D%7Btrue%5C%20positive%20+%20false%5C%20positive%7D%24%20%u9884%u6D4B%u547D%u4E2D%u7684%u51C6%u786E%u6027%0A%0A%24Recall%20%3D%20%5Cdfrac%20%7Btrue%5C%20positive%7D%7Btrue%5C%20positive%20+%20false%5C%20negative%7D%24%20%u547D%u4E2D%u7387%0A%0A%24Accuracy%20%3D%20%5Cdfrac%20%7Btrue%5C%20positive%20+%20true%5C%20negative%7D%20%7Boverall%5C%20samples%7D%24%0A%0A%u53EF%u4EE5%u901A%u8FC7%u53D6%u4E0D%u540C%u7684threshold%uFF0C%u6765%u505A%u51FA%u4E0B%u9762%u8FD9%u5F20%u56FE%uFF0C%u6839%u636ERecall%u548CPrecision%u7684%u504F%u597D%u6765%u9009%u62E9%u9700%u8981%u7684threshold%u3002%0A%21%5BAlt%20text%5D%28./1493961624396.png%29%0A%0A**%u5982%u4F55%u8BC4%u4EF7%u7B97%u6CD5%u4F18%u52A3%uFF1A**%0A%24F1%24-Score%20%28F%20Score%29%20%3D%20%242%5Cdfrac%20%7BPR%7D%7BP+R%7D%24%0A%0A%0A