Machine Learning (5) - Large Scale Machine Learning
Stochastic gradient descent
回忆线性回归的cost function:
Stochastic的意思就是每次只用一个sample,并不断进行迭代,从而达到和普通的gradient descent相同的效果。
- Randomly shuffle dataset
- Repeat
- for i = 1, …, m
- for j = 0, …, n
- for j = 0, …, n
- for i = 1, …, m
Mini-batch gradient descent
改进版的stochastic gradient descent
Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration
Say b = 10, m = 1000
Repeat{
- for i = 1, 11, 21, 31,…,991{
- for j = 0,…, n
- }
- for j = 0,…, n
- }
在stochastic的图像中,我们可以看到,这种算法和batch算法比较,更多迂回婉转,难于收敛。所以在取的时候要格外小心。一般采用动态调整的办法设置
Learning rate is typically held constant. Can slowly decrease over timee if we want to converge.
Online learning
这其实是stochastic算法的一个应用。因为在每一次的迭代中,不需要所有的sample都参与,所以算法可以一边手机数据,一边进行迭代优化。