Machine Learning (5) - Large Scale Machine Learning

Edit

Stochastic gradient descent

回忆线性回归的cost function:

m是sample数量,当m很大的时候,例如1,000,000,我们需要将所有的sample全部投入运算,这是很惊人的运算量。

Stochastic的意思就是每次只用一个sample,并不断进行迭代,从而达到和普通的gradient descent相同的效果。

  1. Randomly shuffle dataset
  2. Repeat
    • for i = 1, …, m
      • for j = 0, …, n

Mini-batch gradient descent

改进版的stochastic gradient descent

Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use1 example in each iteration
Mini-batch gradient descent: Use b examples in each iteration

Say b = 10, m = 1000
Repeat{

  • for i = 1, 11, 21, 31,…,991{
    • for j = 0,…, n
    • }
  • }

在stochastic的图像中,我们可以看到,这种算法和batch算法比较,更多迂回婉转,难于收敛。所以在取的时候要格外小心。一般采用动态调整的办法设置

Learning rate is typically held constant. Can slowly decrease over timee if we want to converge.

Online learning

这其实是stochastic算法的一个应用。因为在每一次的迭代中,不需要所有的sample都参与,所以算法可以一边手机数据,一边进行迭代优化。

Map-reduce and data parallelism

%23Machine%20Learning%20%285%29%20-%20Large%20Scale%20Machine%20Learning%0A@%28myblog%29%5B%u6DF1%u5EA6%u5B66%u4E60%5D%0A%0A%0A%23%23Stochastic%20gradient%20descent%0A%0A%u56DE%u5FC6%u7EBF%u6027%u56DE%u5F52%u7684cost%20function%uFF1A%0A%24%24J%28%5Ctheta%29%20%3D%20%5Cdfrac%20%7B1%7D%7B2m%7D%5Csum_%7Bi%3D1%7D%5Em%28h_%5Ctheta%28x%5E%7B%28i%29%7D%29-y%5E%7B%28i%29%7D%29%5E2%24%24%0Am%u662Fsample%u6570%u91CF%uFF0C%u5F53m%u5F88%u5927%u7684%u65F6%u5019%uFF0C%u4F8B%u59821%2C000%2C000%uFF0C%u6211%u4EEC%u9700%u8981%u5C06%u6240%u6709%u7684sample%u5168%u90E8%u6295%u5165%u8FD0%u7B97%uFF0C%u8FD9%u662F%u5F88%u60CA%u4EBA%u7684%u8FD0%u7B97%u91CF%u3002%0A%0AStochastic%u7684%u610F%u601D%u5C31%u662F%u6BCF%u6B21%u53EA%u7528%u4E00%u4E2Asample%uFF0C%u5E76%u4E0D%u65AD%u8FDB%u884C%u8FED%u4EE3%uFF0C%u4ECE%u800C%u8FBE%u5230%u548C%u666E%u901A%u7684gradient%20descent%u76F8%u540C%u7684%u6548%u679C%u3002%0A%0A1.%20%24cost%28%5Ctheta%2C%20%28x%5E%7B%28i%29%7D%29%20-%20y%5E%7B%28i%29%7D%29%5E2%20%3D%20%5Cfrac%7B1%7D%7B2%7D%28h_%7B%5Ctheta%7D%28x%5E%7B%28i%29%7D%29%20-%20y%5E%7B%28i%29%7D%29%5E2%24%0A2.%20%24J_%7Btrain%7D%28%5Ctheta%29%20%3D%20%5Cfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Emcost%28%5Ctheta%2C%20%28x%5E%7B%28i%29%7D%2C%20y%5E%7B%28i%29%7D%29%29%24%0A3.%20Randomly%20shuffle%20dataset%0A4.%20Repeat%20%0A%09-%20for%20i%20%3D%201%2C%20...%2C%20m%0A%09%09%20*%20for%20j%20%3D%200%2C%20...%2C%20n%0A%09%09%09%24%5Ctheta_j%20%3A%3D%20%5Ctheta_j%20-%20%5Calpha%28h_%7B%5Ctheta%7D%28x%5E%7B%28i%29%7D%29-y%5E%7B%28i%29%7D%29x_j%5E%7B%28i%29%7D%24%0A%09%09%09%0A%21%5BAlt%20text%5D%28./1498184027550.png%29%0A%0A%23%23Mini-batch%20gradient%20descent%0A%u6539%u8FDB%u7248%u7684stochastic%20gradient%20descent%0A%3EBatch%20gradient%20descent%3A%20Use%20all%20m%20examples%20in%20each%20iteration%0AStochastic%20gradient%20descent%3A%20Use1%20example%20in%20each%20iteration%0AMini-batch%20gradient%20descent%3A%20Use%20b%20examples%20in%20each%20iteration%20%0A%0ASay%20b%20%3D%2010%2C%20m%20%3D%201000%0ARepeat%7B%0A-%20for%20i%20%3D%201%2C%2011%2C%2021%2C%2031%2C...%2C991%7B%0A%09*%20for%20j%20%3D%200%2C...%2C%20n%0A%09%09*%20%24%5Ctheta_j%20%3A%3D%20%5Ctheta_j%20-%20%5Calpha%20%5Cfrac%7B1%7D%7B10%7D%5Csum_%7Bk%3Di%7D%5E%7Bi+9%7D%28h_%7B%5Ctheta%7D%28x%5E%7B%28k%29%7D%29%20-%20y%5E%7B%28k%29%7D%29x_j%5E%7B%28k%29%7D%24%0A%09*%20%7D%0A-%20%7D%0A%0A%u5728stochastic%u7684%u56FE%u50CF%u4E2D%uFF0C%u6211%u4EEC%u53EF%u4EE5%u770B%u5230%uFF0C%u8FD9%u79CD%u7B97%u6CD5%u548Cbatch%u7B97%u6CD5%u6BD4%u8F83%uFF0C%u66F4%u591A%u8FC2%u56DE%u5A49%u8F6C%uFF0C%u96BE%u4E8E%u6536%u655B%u3002%u6240%u4EE5%u5728%u53D6%24%5Calpha%24%u7684%u65F6%u5019%u8981%u683C%u5916%u5C0F%u5FC3%u3002%u4E00%u822C%u91C7%u7528%u52A8%u6001%u8C03%u6574%u7684%u529E%u6CD5%u8BBE%u7F6E%24%5Calpha%24%0A%0A%3ELearning%20rate%20%24%5Calpha%24%20is%20typically%20held%20constant.%20Can%20slowly%20decrease%20%24%5Calpha%24%20over%20timee%20if%20we%20want%20%24%5Ctheta%24%20to%20converge.%0A%3E%24%24Eg%3A%20%5Calpha%20%3D%20%5Cfrac%20%7Bconst1%7D%7BiterationNumber%20+%20const2%7D%24%24%0A%0A%23%23Online%20learning%0A%u8FD9%u5176%u5B9E%u662Fstochastic%u7B97%u6CD5%u7684%u4E00%u4E2A%u5E94%u7528%u3002%u56E0%u4E3A%u5728%u6BCF%u4E00%u6B21%u7684%u8FED%u4EE3%u4E2D%uFF0C%u4E0D%u9700%u8981%u6240%u6709%u7684sample%u90FD%u53C2%u4E0E%uFF0C%u6240%u4EE5%u7B97%u6CD5%u53EF%u4EE5%u4E00%u8FB9%u624B%u673A%u6570%u636E%uFF0C%u4E00%u8FB9%u8FDB%u884C%u8FED%u4EE3%u4F18%u5316%u3002%0A%0A%23%23Map-reduce%20and%20data%20parallelism%0A%21%5BAlt%20text%5D%28./1498184980549.png%29%0A%u6240%u8C13%u7684map-reduce%u4E5F%u5C31%u662F%u5206%u5E03%u5F0F%u8FD0%u7B97%uFF0C%u51CF%u8F7B%u4E00%u5904%u7684%u8BA1%u7B97%u538B%u529B%uFF0C%u4FBF%u4E8E%u96C6%u7FA4%u8FD0%u7B97%u3002