Machine Learning (1) - Linear & Logistic Regression

Edit

Linear Regression

Linear Regression = 线性回归。

Single Feature Hypothesis:

通过求cost function的最小值,来估算
列出单feature的cost function,不做多介绍。重点放在Multiple Feature Linear Regression。

Cost Function:

Cost Function & Gradient Descent

Multiple Feature Hypothesis:

Multiple Feature Cost Function:

Gradient Descent:

  • j := 0…m
  • - Learning rate.

How to choose learning rate - ?

  • If is too small: slow convergence
  • If is too large: J() may not decrease on every iteration; may not converge.

To choose , try:
…, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …
每次取3倍。

Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

用下面的公式预处理我们的training set,就是Feature Scaling

  • is the average of all the values for feature (i)
  • is the range of values (max - min)
  • or is the standard deviation.

* standard deviation (标准差)=

Normal Equation正规解

  • 注1:training set的数量要大于feature数量,否则会不可逆,导致没有解
  • 注2:正规解不需要feature scaling

Logistic Regression

Hypothesis:

Sigmoid Function / Logistic Function

The following image shows us what the sigmoid function looks like:

这里的Hypothesis方程是不平滑的,会导致很多的local optimization,导致Gradient Descent不能成功收敛。

用条件概率表示Hypothesis:
y=1的条件下,x,的取值概率。

Cost Function



这其中用Cost函数取代了线性回归中的

将Cost函数代入整个Cost Function中可得:

Vectorized implementation:

进一步计算Gradient Descent的迭代算法为:

矢量化写法:

Overfitting过拟合

Regularized Linear Regression

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Regularized Logistic Regression

The second sum, means to explicitly exclude the bias term

!!注意:这里的regularized项,不包含,如果在Matlab/Octave中就是

%23Machine%20Learning%20%281%29%20-%20Linear%20%26%20Logistic%20Regression%0A%0A@%28%u5B66%u4E60%u7B14%u8BB0%29%0A%0A%0A%5BTOC%5D%0A%0A%21%5BAlt%20text%5D%28./1491028424509.png%29%0A%0A%23%23Linear%20Regression%0ALinear%20Regression%20%3D%20%u7EBF%u6027%u56DE%u5F52%u3002%0A%0ASingle%20Feature%20**Hypothesis**%3A%0A%24%24h_%7B%5Ctheta%7D%28x%29%20%20%3D%20%5Ctheta_%7B0%7D%20+%20%5Ctheta_%7B1%7Dx%24%24%0A%u901A%u8FC7%u6C42cost%20function%u7684%u6700%u5C0F%u503C%uFF0C%u6765%u4F30%u7B97%24%5Ctheta_%7Bi%7D%24%u3002%0A%u5217%u51FA%u5355feature%u7684cost%20function%uFF0C%u4E0D%u505A%u591A%u4ECB%u7ECD%u3002%u91CD%u70B9%u653E%u5728Multiple%20Feature%20Linear%20Regression%u3002%0A%0A**Cost%20Function**%3A%0A%24%24J%28%5Ctheta%29%20%3D%20%5Cdfrac%20%7B1%7D%7B2m%7D%20%5Csum_%7Bi%3D1%7D%5Em%20%28h_%5Ctheta%28x_%7Bi%7D%29%20-%20y_%7Bi%7D%29%5E2%24%24%0A%0A%23%23%23Cost%20Function%20%26%20Gradient%20Descent%0AMultiple%20Feature%20**Hypothesis**%3A%0A%24%24h_%5Ctheta%28x%29%20%3D%20%5Cbegin%20%7Bbmatrix%7D%5Ctheta_0%20%5Chspace%7B1em%7D%20%5Ctheta_1%20%5Chspace%7B1em%7D...%20%5Chspace%7B1em%7D%5Ctheta_n%20%5Cend%7Bbmatrix%7D%20%5Cbegin%7Bbmatrix%7Dx_0%20%5Cnewline%20x_1%20%5Cnewline%20%5Cvdots%20%5Cnewline%20x_n%5Cend%7Bbmatrix%7D%20%3D%20%5Ctheta%5ETx%24%24%0A%0AMultiple%20Feature%20**Cost%20Function**%3A%0A%24%24J%28%5Ctheta%29%20%3D%20%5Cdfrac%20%7B1%7D%7B2m%7D%5Csum_%7Bi%3D1%7D%5Em%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29-y%5E%7B%28i%29%7D%29%5E2%24%24%0A%0AGradient%20Descent%3A%0A%24%24%5Ctheta_j%20%3A%3D%20%5Ctheta_j%20-%20%5Calpha%5Cdfrac%20%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta_j%7D%20J%28%5Ctheta%29%20%3D%20%5Ctheta_j%20-%20%5Calpha%5Cdfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29%20-%20y%5E%7B%28i%29%7D%29%5Ccenterdot%20x_j%5E%7B%28i%29%7D%24%24%20%0A-%20j%20%3A%3D%200...m%0A-%20%24%5Calpha%24%20-%20Learning%20rate.%0A%0A%23%23%23How%20to%20choose%20learning%20rate%20-%20%24%5Calpha%24%20%3F%0A-%20If%20%24%5Calpha%24%20is%20too%20small%3A%20slow%20convergence%0A-%20If%20%24%5Calpha%24%20is%20too%20large%3A%20J%28%24%5Ctheta%24%29%20may%20not%20decrease%20on%20every%20iteration%3B%20may%20not%20converge.%0A%0ATo%20choose%20%24%5Calpha%24%2C%20try%3A%0A%09...%2C%200.001%2C%200.003%2C%200.01%2C%200.03%2C%200.1%2C%200.3%2C%201%2C%20...%0A%u6BCF%u6B21%u53D63%u500D%u3002%0A%0A%23%23%23Feature%20Scaling%0A%3EWe%20can%20speed%20up%20gradient%20descent%20by%20having%20each%20of%20our%20input%20values%20in%20**roughly%20the%20same%20range**.%20This%20is%20because%20%u03B8%20will%20descend%20quickly%20on%20small%20ranges%20and%20slowly%20on%20large%20ranges%2C%20and%20so%20will%20oscillate%20inefficiently%20down%20to%20the%20optimum%20when%20the%20variables%20are%20very%20uneven.%0A%0A%u7528%u4E0B%u9762%u7684%u516C%u5F0F%u9884%u5904%u7406%u6211%u4EEC%u7684training%20set%uFF0C%u5C31%u662FFeature%20Scaling%0A%24%24x_i%20%3A%3D%20%5Cdfrac%7Bx_i%20-%20%5Cmu_i%7D%7Bs_i%7D%20%24%24%0A-%20%24%5Cmu_i%24%20is%20the%20average%20of%20all%20the%20values%20for%20feature%20%28i%29%0A-%20%24s_i%24%20is%20the%20range%20of%20values%20%28max%20-%20min%29%0A-%20**or**%20%24s_i%24%20is%20the%20standard%20deviation.%0A%0A%5C*%20*standard%20deviation%20%28%u6807%u51C6%u5DEE%29%3D%20%24%5Csqrt%7B%5Cdfrac%7B1%7D%7BN%7D%5Csum_%7Bi%3D1%7D%5EN%28x_i%20-%20%5Cmu%292%7D%24*%0A%0A%23%23%23Normal%20Equation%u6B63%u89C4%u89E3%0A%24%24%5Ctheta%20%3D%20%28X%5ETX%29%5E%7B-1%7DX%5ETy%24%24%0A*%20%u6CE81%uFF1Atraining%20set%u7684%u6570%u91CF%u8981%u5927%u4E8Efeature%u6570%u91CF%uFF0C%u5426%u5219%24X%5ETX%24%u4F1A%u4E0D%u53EF%u9006%uFF0C%u5BFC%u81F4%u6CA1%u6709%u89E3%0A*%20%u6CE82%uFF1A%u6B63%u89C4%u89E3%u4E0D%u9700%u8981feature%20scaling%0A%0A%23%23Logistic%20Regression%0A**Hypothesis**%3A%0A%24%24h_%5Ctheta%28x%29%20%3D%20g%28%5Ctheta%5ETx%29%24%24%0A**Sigmoid%20Function%20/%20Logistic%20Function**%0A%24%24g%28z%29%20%3D%20%5Cdfrac%20%7B1%7D%7B1+e%5E%7B-z%7D%7D%24%24%0A%24z%20%3D%20%5Ctheta%5ETx%24%0A%0AThe%20following%20image%20shows%20us%20what%20the%20sigmoid%20function%20looks%20like%3A%0A%21%5BAlt%20text%5D%28./1491033397652.png%29%0A%0A%u8FD9%u91CC%u7684Hypothesis%u65B9%u7A0B%u662F%u4E0D%u5E73%u6ED1%u7684%uFF0C%u4F1A%u5BFC%u81F4%u5F88%u591A%u7684local%20optimization%uFF0C%u5BFC%u81F4Gradient%20Descent%u4E0D%u80FD%u6210%u529F%u6536%u655B%u3002%0A%0A%u7528%u6761%u4EF6%u6982%u7387%u8868%u793AHypothesis%3A%0Ay%3D1%u7684%u6761%u4EF6%u4E0B%uFF0Cx%2C%24%5Ctheta%24%u7684%u53D6%u503C%u6982%u7387%u3002%0A%24%24h_%5Ctheta%28x%29%3DP%28y%3D1%5Cmid%20x%3B%u03B8%29%3D1%u2212P%28y%3D0%20%5Cmid%20x%3B%u03B8%29%24%24%0A%24%24P%28y%3D0%20%5Cmid%20x%3B%u03B8%29+P%28y%3D1%20%5Cmid%20x%3B%u03B8%29%3D1%24%24%0A%0A%0A%0A%23%23%23%20Cost%20Function%0A%24%24J%28%5Ctheta%29%20%3D%20-%5Cdfrac%20%7B1%7D%7Bm%7D%20%5Csum_%7Bi%3D1%7D%5EmCost%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29%2C%20y%5E%7B%28i%29%7D%29%24%24%0A%24%24Cost%28h_%5Ctheta%28x%29%2C%20y%29%29%20%3D%20-log%28h_%5Ctheta%28x%29%29%20%20%5Chspace%7B10em%7Dif%20y%20%3D%201%24%24%0A%24%24Cost%28h_%5Ctheta%28x%29%2C%20y%29%29%20%3D%20-log%281-h_%5Ctheta%28x%29%29%20%5Chspace%7B8em%7D%20if%20y%20%3D%200%24%24%0A%0A%24Cost%28h_%5Ctheta%28x%29%2C%20y%29%29%20%3D%200%20%5Chspace%7B3em%7D%20if%20%5Chspace%7B1em%7D%20h_%5Ctheta%28x%29%20%3D%20y%24%0A%24Cost%28h_%5Ctheta%28x%29%2C%20y%29%29%20%5Crightarrow%20%5Cinfty%20%5Chspace%7B2em%7D%20if%20y%3D0%20%5Chspace%7B0.5em%7D%20and%20%5Chspace%7B0.5em%7D%20h_%5Ctheta%28x%29%20%5Crightarrow%201%24%0A%24Cost%28h_%5Ctheta%28x%29%2C%20y%29%29%20%5Crightarrow%20%5Cinfty%20%5Chspace%7B2em%7D%20if%20y%3D1%20%5Chspace%7B0.5em%7D%20and%20%5Chspace%7B0.5em%7D%20h_%5Ctheta%28x%29%20%5Crightarrow%200%24%0A%0A%u8FD9%u5176%u4E2D%u7528Cost%u51FD%u6570%u53D6%u4EE3%u4E86%u7EBF%u6027%u56DE%u5F52%u4E2D%u7684%24%5Cdfrac%20%7B1%7D%7B2%7D%28h_%5Ctheta%28x_i%29-y_i%29%5E2%24%0A%0A%u5C06Cost%u51FD%u6570%u4EE3%u5165%u6574%u4E2ACost%20Function%u4E2D%u53EF%u5F97%uFF1A%0A%24J%28%5Ctheta%29%20%3D%20-%5Cdfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%5By%5E%7B%28i%29%7Dlog%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29%29%20+%20%281-y%5E%7B%28i%29%7D%29log%281-h_%5Ctheta%28x%5E%7B%28i%29%7D%29%29%5D%24%0A%0A**Vectorized%20implementation**%3A%0A%24h%3Dg%28X%5Ctheta%29%20%3D%20%5Cdfrac%20%7B1%7D%7B1+e%5E%7B-%5Ctheta%5ETx%7D%7D%24%0A%24J%28%5Ctheta%29%20%3D%20%5Cdfrac%20%7B1%7D%7Bm%7D%5Ccenterdot%20%28-y%5ETlog%28h_%5Ctheta%29-%281-y%29%5ETlog%281-h_%5Ctheta%29%29%24%0A%0A%u8FDB%u4E00%u6B65%u8BA1%u7B97Gradient%20Descent%u7684%u8FED%u4EE3%u7B97%u6CD5%u4E3A%uFF1A%0A%24%24%5Ctheta_j%20%3A%3D%20%5Ctheta_j%20-%20%5Cdfrac%20%7B%5Calpha%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29-y%5E%7B%28i%29%7D%29%5Ccenterdot%20x_j%5E%7B%28i%29%7D%24%24%0A%0A%u77E2%u91CF%u5316%u5199%u6CD5%uFF1A%0A%24%24%5Ctheta%20%3A%3D%20%5Ctheta%20-%20%5Cdfrac%7B%5Calpha%7D%7Bm%7DX%5ET%28g%28X%5Ctheta%29%20-%20%5Cvec%7By%7D%29%24%24%0A%0A%23%23Overfitting%u8FC7%u62DF%u5408%0A%23%23%23Regularized%20Linear%20Regression%0A%24%24J%28%5Ctheta%29%20%3D%20%5Cdfrac%7B1%7D%7B2m%7D%5Csum_%7Bi%3D1%7D%5Em%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29-y%5E%7Bi%7D%29%5E2+%5Clambda%5Csum_%7Bj%3D1%7D%5En%5Ctheta_j%5E2%24%24%0A%3EThe%20%u03BB%2C%20or%20lambda%2C%20is%20the%20regularization%20parameter.%20It%20determines%20how%20much%20the%20costs%20of%20our%20theta%20parameters%20are%20inflated.%0A%23%23%23Regularized%20Logistic%20Regression%0A%24%24J%28%5Ctheta%29%20%3D%20-%5Cdfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%5By%5E%7B%28i%29%7Dlog%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29%29%20+%20%281-y%5E%7B%28i%29%7D%29log%281-h_%5Ctheta%28x%5E%7B%28i%29%7D%29%29%5D%20+%20%5Cdfrac%20%7B%5Clambda%7D%7B2m%7D%5Csum_%7Bj%3D1%7D%5En%5Ctheta_j%5E2%24%24%0AThe%20second%20sum%2C%24%5Csum_%7Bj%3D1%7D%5En%5Ctheta_j%5E2%24%20means%20to%20**explicitly%20exclude%20the%20bias%20term**%20%24%5Ctheta_0%24%0A%0A%21%21%u6CE8%u610F%uFF1A%u8FD9%u91CC%u7684regularized%u9879%uFF0C%u4E0D%u5305%u542B%24%5Ctheta_0%24%uFF0C%u5982%u679C%u5728Matlab/Octave%u4E2D%u5C31%u662F%24%5Ctheta_1%24%u3002