Deep Learning (4) - Hyperparameters

Edit

在机器学习的模型中,通常有一些超参数(Hyperparameter),例如:学习率(),神经网络层数等等。这些是模型的参数。相对超参数,我们要通过学习调优的模型参数,例如:W,b等等,称为learnable parameter。超参数通常影响Gradient decent迭代的收敛速度和质量,甚至是否收敛。所以通常需要不断的调整,找到适合模型的超参数。

Tuning Process

在引入各种优化算法(Momentum,RMSprop,ADAM)之后,超参数的种类变得更多起来:

  • Learning rate:
  • Momentum:
  • ADAM:
  • Number of layers
  • Number of hidden units
  • Learning rate decay算法
  • mini-batch size

在调试这些参数的时候,Andrew给出了优先级:

解释一下,这么多超参数中:

  • Learning rate是最重要的,首先要调整的,选择合适的learning rate,否则算法有发散的可能
  • 第二优先级的是橙色的框框,包括:Momentum , Number of hidden units, mini-batch size
  • 之后是紫色的框框,包括:Number of layers, 选择Learning rate decay的算法
  • ADAM的参数通常不需要调整,经典值往往就有不错的效果,

Try random values, Don’t use grid search

Coarse to fine

粒度由粗到精,这个就是显而易见的策略了。下图也很好的说明了:

Using an appropriate scale to pick hyperparameter

这里意思是有些场合,超参数的调试范围希望是指数上均匀的。例如Momentum中的,当我们想调试0.9~0.999范围的时候,实际上是想调试1-,取

%23%20Deep%20Learning%20%284%29%20-%20Hyperparameters%0A@%28myblog%29%5Bdeep%20learning%2C%20machine%20learning%5D%0A%0A%u5728%u673A%u5668%u5B66%u4E60%u7684%u6A21%u578B%u4E2D%uFF0C%u901A%u5E38%u6709%u4E00%u4E9B%u8D85%u53C2%u6570%28Hyperparameter%29%uFF0C%u4F8B%u5982%uFF1A%u5B66%u4E60%u7387%28%24%5Calpha%24%29%uFF0C%u795E%u7ECF%u7F51%u7EDC%u5C42%u6570%u7B49%u7B49%u3002%u8FD9%u4E9B%u662F%u6A21%u578B%u7684%u53C2%u6570%u3002%u76F8%u5BF9%u8D85%u53C2%u6570%uFF0C%u6211%u4EEC%u8981%u901A%u8FC7%u5B66%u4E60%u8C03%u4F18%u7684%u6A21%u578B%u53C2%u6570%uFF0C%u4F8B%u5982%uFF1AW%uFF0Cb%u7B49%u7B49%uFF0C%u79F0%u4E3Alearnable%20parameter%u3002%u8D85%u53C2%u6570%u901A%u5E38%u5F71%u54CDGradient%20decent%u8FED%u4EE3%u7684%u6536%u655B%u901F%u5EA6%u548C%u8D28%u91CF%uFF0C%u751A%u81F3%u662F%u5426%u6536%u655B%u3002%u6240%u4EE5%u901A%u5E38%u9700%u8981%u4E0D%u65AD%u7684%u8C03%u6574%uFF0C%u627E%u5230%u9002%u5408%u6A21%u578B%u7684%u8D85%u53C2%u6570%u3002%0A%23%23%20Tuning%20Process%0A%u5728%u5F15%u5165%u5404%u79CD%u4F18%u5316%u7B97%u6CD5%28Momentum%uFF0CRMSprop%uFF0CADAM%29%u4E4B%u540E%uFF0C%u8D85%u53C2%u6570%u7684%u79CD%u7C7B%u53D8%u5F97%u66F4%u591A%u8D77%u6765%uFF1A%0A-%20Learning%20rate%3A%20%24%5Calpha%24%0A-%20Momentum%3A%20%24%5Cbeta%24%0A-%20ADAM%3A%20%24%5Cbeta_1%2C%20%5Cbeta_2%2C%20%5Cepsilon%24%0A-%20Number%20of%20layers%0A-%20Number%20of%20hidden%20units%0A-%20Learning%20rate%20decay%u7B97%u6CD5%0A-%20mini-batch%20size%0A%0A%u5728%u8C03%u8BD5%u8FD9%u4E9B%u53C2%u6570%u7684%u65F6%u5019%uFF0CAndrew%u7ED9%u51FA%u4E86%u4F18%u5148%u7EA7%uFF1A%0A%21%5BAlt%20text%7C250x0%5D%28./1535585275676.png%29%0A%3E%u89E3%u91CA%u4E00%u4E0B%uFF0C%u8FD9%u4E48%u591A%u8D85%u53C2%u6570%u4E2D%uFF1A%0A-%20Learning%20rate%u662F%u6700%u91CD%u8981%u7684%uFF0C%u9996%u5148%u8981%u8C03%u6574%u7684%uFF0C%u9009%u62E9%u5408%u9002%u7684learning%20rate%uFF0C%u5426%u5219%u7B97%u6CD5%u6709%u53D1%u6563%u7684%u53EF%u80FD%0A-%20%u7B2C%u4E8C%u4F18%u5148%u7EA7%u7684%u662F%u6A59%u8272%u7684%u6846%u6846%uFF0C%u5305%u62EC%uFF1AMomentum%20%24%5Cbeta%24%2C%20Number%20of%20hidden%20units%2C%20mini-batch%20size%0A-%20%u4E4B%u540E%u662F%u7D2B%u8272%u7684%u6846%u6846%uFF0C%u5305%u62EC%uFF1ANumber%20of%20layers%2C%20%u9009%u62E9Learning%20rate%20decay%u7684%u7B97%u6CD5%0A-%20ADAM%u7684%u53C2%u6570%u901A%u5E38%u4E0D%u9700%u8981%u8C03%u6574%uFF0C%u7ECF%u5178%u503C%u5F80%u5F80%u5C31%u6709%u4E0D%u9519%u7684%u6548%u679C%uFF0C%24%5Cbeta_1%3D0.9%2C%20%5Cbeta_2%3D0.999%2C%20%5Cepsilon%3D10%5E%7B-8%7D%24%0A%0A%23%23%20Try%20random%20values%2C%20Don%27t%20use%20grid%20search%0A%21%5BAlt%20text%7C650x0%5D%28./1535613475283.png%29%0A%u4E3A%u4EC0%u4E48%u4E0D%u7528grid%20search%uFF1F%u56E0%u4E3A%u8D85%u53C2%u6570%u7684%u91CD%u8981%u6027%u4E0D%u540C%u3002%u5982%u5DE6%u56FE%u7684%u7F51%u683C%u641C%u7D22%uFF0C5%u4E2A%u4E0D%u540C%u7684%24%5Calpha%24%u642D%u914D5%u4E2A%u4E0D%u540C%u7684%24%5Cepsilon%24%uFF0C%u7ED3%u679C%u53D1%u73B0%24%5Cepsilon%24%u5BF9%u7ED3%u679C%u5E76%u6CA1%u6709%u4EC0%u4E48%u5F71%u54CD%u3002%u90A3%u4E48%u8FD925%u6B21%u6D4B%u8BD5%uFF0C%u5176%u5B9E%u53EA%u67095%u7EC4%u662F%u6709%u6548%u7684%u3002%0A%u5982%u679C%u91C7%u7528%u53F3%u56FErandom%20search%uFF0C%u56E0%u4E3A%u6BCF%u7EC4%u7684%24%5Calpha%24%u548C%24%5Cepsilon%24%u5747%u4E0D%u76F8%u540C%uFF0C%u6240%u4EE5%u662F25%u7EC4%u6709%u6548%u6D4B%u8BD5%u3002%u4E5F%u5C31%u662F%u8BF4%u968F%u673A%u6D4B%u8BD5%u7684%u6548%u7387%u8981%u9AD8%u4E8E%u7F51%u683C%u6D4B%u8BD5%u3002%0A%0A%23%23%20Coarse%20to%20fine%0A%u7C92%u5EA6%u7531%u7C97%u5230%u7CBE%uFF0C%u8FD9%u4E2A%u5C31%u662F%u663E%u800C%u6613%u89C1%u7684%u7B56%u7565%u4E86%u3002%u4E0B%u56FE%u4E5F%u5F88%u597D%u7684%u8BF4%u660E%u4E86%uFF1A%0A%21%5BAlt%20text%7C400x0%5D%28./1535613723014.png%29%0A%u7ECF%u8FC7%u6D4B%u8BD5%u53D1%u73B0%u5176%u4E2D3%u70B9%u6548%u679C%u4E0D%u9519%uFF0C%u90A3%u5C31%u7F29%u5C0F%u533A%u57DF%uFF0C%u63D0%u9AD8%u641C%u7D22%u7CBE%u5EA6%u6765%u6D4B%u8BD5%u66F4%u591A%u7684%u70B9%u3002%0A%0A%23%23%20Using%20an%20appropriate%20scale%20to%20pick%20hyperparameter%0A%u8FD9%u91CC%u610F%u601D%u662F%u6709%u4E9B%u573A%u5408%uFF0C%u8D85%u53C2%u6570%u7684%u8C03%u8BD5%u8303%u56F4%u5E0C%u671B%u662F%u6307%u6570%u4E0A%u5747%u5300%u7684%u3002%u4F8B%u5982Momentum%u4E2D%u7684%24%5Cbeta%24%uFF0C%u5F53%u6211%u4EEC%u60F3%u8C03%u8BD50.9%7E0.999%u8303%u56F4%u7684%u65F6%u5019%uFF0C%u5B9E%u9645%u4E0A%u662F%u60F3%u8C03%u8BD51-%24%5Cbeta%20%5Cin%20%5B10%5E%7B-3%7D%2C%2010%5E%7B-1%7D%5D%24%uFF0C%u53D6%24r%20%5Cin%20%5B-3%2C%20-1%5D%24%uFF0C%24%5Cbeta%20%3D%201-10%5Er%24