Machine Learning (5) - Large Scale Machine Learning

Posted on 2017-06-23 Edited on 2017-07-06

~~Edit~~

Stochastic gradient descent

回忆线性回归的cost function：

m是sample数量，当m很大的时候，例如1,000,000，我们需要将所有的sample全部投入运算，这是很惊人的运算量。

Stochastic的意思就是每次只用一个sample，并不断进行迭代，从而达到和普通的gradient descent相同的效果。

Randomly shuffle dataset
Repeat
- for i = 1, …, m
  - for j = 0, …, n

Mini-batch gradient descent

改进版的stochastic gradient descent

Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration

Say b = 10, m = 1000
Repeat{

for i = 1, 11, 21, 31,…,991{
- for j = 0,…, n
- }
}

在stochastic的图像中，我们可以看到，这种算法和batch算法比较，更多迂回婉转，难于收敛。所以在取的时候要格外小心。一般采用动态调整的办法设置

Learning rate is typically held constant. Can slowly decrease over timee if we want to converge.

Online learning

这其实是stochastic算法的一个应用。因为在每一次的迭代中，不需要所有的sample都参与，所以算法可以一边手机数据，一边进行迭代优化。

Big Ben

Machine Learning (5) - Large Scale Machine Learning

Stochastic gradient descent

Mini-batch gradient descent

Online learning

Map-reduce and data parallelism