Machine Learning (1) - Linear & Logistic Regression
Linear Regression
Linear Regression = 线性回归。
Single Feature Hypothesis:
列出单feature的cost function,不做多介绍。重点放在Multiple Feature Linear Regression。
Cost Function:
Cost Function & Gradient Descent
Multiple Feature Hypothesis:
Multiple Feature Cost Function:
Gradient Descent:
- j := 0…m
- - Learning rate.
How to choose learning rate - ?
- If is too small: slow convergence
- If is too large: J() may not decrease on every iteration; may not converge.
To choose , try:
…, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …
每次取3倍。
Feature Scaling
We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.
用下面的公式预处理我们的training set,就是Feature Scaling
- is the average of all the values for feature (i)
- is the range of values (max - min)
- or is the standard deviation.
* standard deviation (标准差)=
Normal Equation正规解
- 注1:training set的数量要大于feature数量,否则会不可逆,导致没有解
- 注2:正规解不需要feature scaling
Logistic Regression
Hypothesis:
The following image shows us what the sigmoid function looks like:
这里的Hypothesis方程是不平滑的,会导致很多的local optimization,导致Gradient Descent不能成功收敛。
用条件概率表示Hypothesis:
y=1的条件下,x,的取值概率。
Cost Function
这其中用Cost函数取代了线性回归中的
将Cost函数代入整个Cost Function中可得:
Vectorized implementation:
进一步计算Gradient Descent的迭代算法为:
矢量化写法:
Overfitting过拟合
Regularized Linear Regression
The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.
Regularized Logistic Regression
!!注意:这里的regularized项,不包含,如果在Matlab/Octave中就是。