Machine Learning (3) - Support Vector Machine (SVM)

Edit


SVM中文译名:支持向量机

特点是large margin,相对logistic regression可以容易得到全局最优解。

工作原理

基于land marks,如下

Hypothesis

Logistic Regression的Hypothesis为

对以上公式的两部分进行拟合,参看下图:

SVM Decision Boundary

Kernel

Gaussian Kernel

Steps

  1. Given
  2. Choose
  3. x->f
  4. Predict’“y=1”’if

Note: Do perform feature scaling before using the Gaussian kernel.

Multiclass classification:
Use one vs. all method. (Train K SVMs, one to distinguish y= i from the rest, for i = 1, 2,…,K), get
对于新的输入x,选取使最大的i。

Parameters

C =
- Large C: small Lower’bias,’high’variance.
- Small C: big Higher’bias,’low’variance.

  • Large : Features vary more smoothly. Higher bias, lower variance.

  • Small : Features vary less smoothly. Lower bias, higher variance.

Andrew提到还有很多其他的Kernel,但是用处不是很多,包括:

  • Polynomial kernel
  • String’kernel
  • chiIsquare’kernel
  • histogram intersection kernel,

Logistic regression vs. SVMs

n = number of features (), m = number of training examples

  • If n is large (relative to m): Use logistic regression, or SVM without a kernel (“linear kernel”).
  • If n is small, m is intermediate: Use SVM with Gaussian kernel.
  • If n is small, m is large: Create/add more features, then use logistic regression or SVM without a kernel.
  • Neural network likely to work well for most of these settings, but may be slower to train.

Recommendation for SVM implemetation

LIBSVM

%23Machine%20Learning%20%283%29%20-%20Support%20Vector%20Machine%20%28SVM%29%0A@%28%u5B66%u4E60%u7B14%u8BB0%29%5B%u6DF1%u5EA6%u5B66%u4E60%5D%0ASVM%u4E2D%u6587%u8BD1%u540D%uFF1A%u652F%u6301%u5411%u91CF%u673A%0A%0A%u7279%u70B9%u662Flarge%20margin%uFF0C%u76F8%u5BF9logistic%20regression%u53EF%u4EE5%u5BB9%u6613%u5F97%u5230%u5168%u5C40%u6700%u4F18%u89E3%u3002%0A%0A%23%23%u5DE5%u4F5C%u539F%u7406%0A%u57FA%u4E8Eland%20marks%uFF0C%u5982%u4E0B%0A%21%5BAlt%20text%7C600x0%5D%28./1494841637067.png%29%0A%u5F53%u8BAD%u7EC3%u51FA%24%5Ctheta%24%u503C%u5982%u4E0A%u6240%u793A%uFF0C%u5F53X%u53D6%u503C%u5728%24l%5E%7B%281%29%7D%24%2C%20%24l%5E%7B%282%29%7D%24%u9644%u8FD1%u65F6%uFF0Cy%u9884%u6D4B%u4E3A1%uFF0C%u5F53%u5728%24l%5E%7B%283%29%7D%24%u9644%u8FD1%u65F6%u4E3A0%u3002%u600E%u4E48%u505A%u5230%u7684%u5462%uFF1F%0A%23%23%23Hypothesis%0ALogistic%20Regression%u7684Hypothesis%u4E3A%0A%24h_%7B%5Ctheta%7D%28x%29%20%3D%20%20-y%20%5Cdfrac%20%7B1%7D%7B1+e%5E%7B-%5Ctheta%5ETx%7D%7D%20-%20%281-y%29%5Cdfrac%20%7B1%7D%7B1+e%5E%7B-%5Ctheta%5ETx%7D%7D%24%0A%0A%u5BF9%u4EE5%u4E0A%u516C%u5F0F%u7684%u4E24%u90E8%u5206%u8FDB%u884C%u62DF%u5408%uFF0C%u53C2%u770B%u4E0B%u56FE%uFF1A%0A%21%5BAlt%20text%7C600x0%5D%28./1494842281122.png%29%0A%u6709%u7528%u7684%u90E8%u5206%u5C31%u662Fz%u8F74%u4E0A%u90A3%u4E00%u7AEF%uFF0C%u659C%u7EBF%u90E8%u5206%u659C%u7387%u65E0%u5173%u7D27%u8981%0A%0ASVM%20Decision%20Boundary%0A%24min_%5Ctheta%20C%5Csum_%7Bi%3D1%7D%5E%7Bm%7D%5By%5E%7B%28i%29%7Dcost_1%28%5Ctheta%5ETx%5E%7B%28i%29%7D%29%20+%20%281-y%5E%7B%28i%29%7D%29cost_0%28%5Ctheta%5ETx%5E%7B%28i%29%7D%29%5D%20+%20%5Cdfrac%7B1%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%5Ctheta_j%5E2%24%0A%24C%3D%5Cdfrac%7B1%7D%7B%5Clambda%7D%24%0A%0A%23%23%23Kernel%0AGaussian%20Kernel%0A%24f_i%20%3D%20similarity%28x%2C%20l%5E%7B%28i%29%7D%29%20%3D%20exp%28-%5Cdfrac%20%7B%7C%7Cx-l%5E%7B%28i%29%7D%7C%7C%5E2%7D%7B2%5Csigma%5E2%7D%29%24%0A%0A%23%23%23Steps%0A1.%20Given%24%20%28x%5E%7B%281%29%7D%2C%20y%5E%7B%281%29%7D%29%2C%20%28x%5E%7B%282%29%7D%2C%20y%5E%7B%282%29%7D%29%2C...%2C%28x%5E%7B%28m%29%7D%2C%20y%5E%7B%28m%29%7D%29%24%0A2.%20Choose%20%24l%5E%7B%281%29%7D%20%3D%20x%5E%7B%281%29%7D%2C%20l%5E%7B%282%29%7D%20%3D%20x%5E%7B%282%29%7D%2C%20...%2Cl%5E%7B%28m%29%7D%20%3D%20x%5E%7B%28m%29%7D%24%0A3.%20x-%3Ef%0A4.%20Predict%27%u201Cy%3D1%u201D%27if%20%24%5Ctheta%5ETf%20%5Cgeq%200%24%0A%0A**Note**%3A%20Do%20perform%20feature%20scaling%20before%20using%20the%20Gaussian%20kernel.%0A%0AMulticlass%20classification%3A%0AUse%20one%20vs.%20all%20method.%20%28Train%20K%20SVMs%2C%20one%20to%20distinguish%20y%3D%20i%20from%20the%20rest%2C%20for%20i%20%3D%201%2C%202%2C...%2CK%29%2C%20get%20%24%5Ctheta%5E%7B%281%29%7D%2C%20%5Ctheta%5E%7B%282%29%7D%2C...%2C%5Ctheta%5E%7B%28K%29%7D%24%0A%u5BF9%u4E8E%u65B0%u7684%u8F93%u5165x%uFF0C%u9009%u53D6%u4F7F%24%28%5Ctheta%5E%7B%28i%29%7D%29%5ETx%24%u6700%u5927%u7684i%u3002%0A%0A%23%23%23Parameters%0AC%20%3D%20%24%5Cdfrac%20%7B1%7D%7B%5Clambda%7D%24%09%0A%09-%20Large%20C%3A%20small%20%24%5Clambda%24%20Lower%27bias%2C%27high%27variance.%0A%09-%20Small%20C%3A%20big%20%24%5Clambda%24%20Higher%27bias%2C%27low%27variance.%0A%0A%24%5Csigma%5E2%24%0A-%20Large%20%24%5Csigma%5E2%24%3A%20Features%20%24f_i%24%20vary%20more%20smoothly.%20Higher%20bias%2C%20lower%20variance.%0A%20%21%5BAlt%20text%5D%28./1494987748588.png%29%0A-%20Small%20%24%5Csigma%5E2%24%3A%20Features%20%24f_i%24%20vary%20less%20smoothly.%20Lower%20bias%2C%20higher%20variance.%0A%21%5BAlt%20text%5D%28./1494987812098.png%29%0A%0AAndrew%u63D0%u5230%u8FD8%u6709%u5F88%u591A%u5176%u4ED6%u7684Kernel%uFF0C%u4F46%u662F%u7528%u5904%u4E0D%u662F%u5F88%u591A%uFF0C%u5305%u62EC%uFF1A%0A-%20Polynomial%20kernel%0A-%20String%27kernel%0A-%20chiIsquare%27kernel%0A-%20histogram%20intersection%20kernel%2C%0A%0A%23%23Logistic%20regression%20vs.%20SVMs%0An%20%3D%20number%20of%20features%20%28%24x%20%5Cin%20R%5E%7Bn+1%7D%24%29%2C%20m%20%3D%20number%20of%20training%20examples%0A-%20If%20n%20is%20large%20%28relative%20to%20m%29%3A%20Use%20logistic%20regression%2C%20or%20SVM%20without%20a%20kernel%20%28%u201Clinear%20kernel%u201D%29.%0A-%20If%20n%20is%20small%2C%20m%20is%20intermediate%3A%20Use%20SVM%20with%20Gaussian%20kernel.%0A-%20If%20n%20is%20small%2C%20m%20is%20large%3A%20Create/add%20more%20features%2C%20then%20use%20logistic%20regression%20or%20SVM%20without%20a%20kernel.%0A-%20Neural%20network%20likely%20to%20work%20well%20for%20most%20of%20these%20settings%2C%20but%20may%20be%20slower%20to%20train.%0A%0A%23%23%20Recommendation%20for%20SVM%20implemetation%0A%5BLIBSVM%5D%28https%3A//www.csie.ntu.edu.tw/%7Ecjlin/libsvm/%29%0A%0A%0A%0A