Machine Learning (4) - Unsupervised Learning

Edit

Clustering - K-means Algorithm

步骤

  1. Randomly initialize K cluster centroids .
    1) Should have
    2) Randomly pick K training examples
    3) Set euqal to these K examples.
  2. Repeat
    for i=1 to m
    := index (from 1 to K) of cluster centroid closest to x^{(i)}
    归到离其最近的centroid
    for k = 1 to K
    := average (mean) of points assigned to cluster k.
    按照归到该centroid的所有sample来相应地移动该centroid

调优

目的

为了避免因为初始化不当,而产生local optima

Cost Function


调优就是使Cost Function最小化。具体的做法就是多做几次K-means (Andrew推荐100次),选取Cost Function最小的一组。

选择cluster number

Elvow method (肘关节法)

Dimensionality Reduction - Principal Component Analysis (PCA)

Dimensionality Reduction = 降维, 3D->2D

步骤

  1. Feature scaling/mean normalization.
    1)
    2)
    3) (optional) 是标准差
  2. Compute “covariance matrix”
  3. 调用系统库,Compute “eigenvectors” of matrix : [U,S,V] = svd(Sigma);
  4. Ureduce = U(:,1:k); z = Ureduce'*x;

Reconstruct

x = Ureduce * z

Choose k

Typically, choose k to be smallest value so that,

0.01 means 99% variance is retained. 保留了99%的变化。

另外一种通过svd函数计算variance的方法

  1. Start from k=1
  2. [U,S,V] = svd(Sigma)
  3. Pick the smallest k.

Application of PCA

  • Compression: 节省数据空间,提升计算速度
  • Visualization: k=2 or k = 3

不要滥用PCA。例如,用PCA减少feature set来消除Overfitting
Before implementing PCA, first try running whatever you want to do with the original/raw data . Only if that doesn’t do what you want, then implement PCA and consider using

Anomaly detection algorithm

我将之翻译为异常检测。实际就是对属于正态(或高斯)分布的特征量的非正常检测。

  1. Choose features that you think might be indicative of anomalous examples.
  2. Fit parameters
  3. Given new example , compute - 概率
  4. Anomaly if

高斯分布概率密度公式:

结果评估

如何对数据分类进行training和validate

Aircraft engines motivating example
10000 good(normal) engines
20 flawed engines (anomalous)

Training set: 6000 good engines
CV: 2000 good engines(y=0), 10 anomalous(y=1)
Test: 2000 good engines(y=0), 10 anomalous(y=1)
也就是把anomalous的sample省下来做validate和test用

特征量的选取

如果特征量不符合正态分布怎么办?按Andrew的建议是进行变换,则能得到近似正态的分布,如下:

Andrew在视频中用Octave命令行做了live demo:

讲义中提到的Error Analysis指的是:
有一些异常情况是,特征向量的各分量都在大概率的点,但是当某些分量组合时,会是小概率事件,则此时需要产生新的特征向量。例如:Andrew的例子,当CPU load相对较大,但仍在正常范围,但是network traffic却很小,但仍然在正常范围。这时你就需要一个变量,比如或者取决于新的特征量是否是正态分布。

Multivariate Gaussian distribution

前面的Anomaly Detection基于各特征分量是独立分布的情况,如果是非独立分布就采用上面提到的Error Analysis的方法。而Multivariate Gaussian则是对非独立分布的补充,也就是说上面的Anomaly Detection其实是Multivariate Gaussian的一种特殊情况。

Andrew给出了矢量化的Multivariate Gaussian的概率密度公式:


其中称为Covariance Matrix,协方差矩阵,的行列式(determinant),参考:

在上面给出的公式里,Covariance Matrix即就给出了变量之间的相关性的信息。

Original model和multivariate model都可以检测到上面举的CPU load和network traffic的例子中的情况。那要如何选择呢?Andrew给出了对比:

Anomaly detection vs. supervised learning

Recommender Systems

Content-based recommender systems

这个问题,其实就是已知每部电影的特征(特征向量),以及用户对电影的评分,来推测用户对没看过的电影的评分。看下面的表格:

其中:

  • : 用户数
  • : 电影数
  • : 用户j给电影i评分了
  • : 用户的真实评分
  • : hypothesis,推测评分

Gradient Descent:
for k = 0:

for k 0:

Collaborative filtering

假设在前面一节的training中,是已知的,而未知,即每部电影针对各个特征量的评分未知,此时通过training一样可以推测出各电影的评级。而Collabrative Filtering就是在两手空空的时候,既没有,也没有,而只有,就能预测

  1. Initialize to small random values.
  2. Minimize using gradient descent (or an advanced optimization algorithm).

  1. For a user with parameters and a movie with (learned) features , predict a star rating of

因为存在未给任何电影评分的用户,如果不做特殊处理,则这些用户对所有电影的预测评价都会是0分,具体Andrew在视频中给出论证。所以需要mean normalization,具体做法是:



  • 不用标准差除,因为评分本来就在近似的range

  • 1

Collaborative filtering矢量写法是:

而Collaborative filtering algorithm又叫Low rank matrix factorization。其中,就是Low rank matrix。

当要给用户推荐的时候,足够小的时候,就可以认为i和j足够相似。

%23Machine%20Learning%20%284%29%20-%20Unsupervised%20Learning%0A@%28%u5B66%u4E60%u7B14%u8BB0%29%5B%u6DF1%u5EA6%u5B66%u4E60%5D%0A%0A%5BTOC%5D%0A%0A%23%23Clustering%20-%20K-means%20Algorithm%0A%21%5BAlt%20text%7C400x0%5D%28./1495320402756.png%29%0A%21%5BAlt%20text%7C400x0%5D%28./1495320502218.png%29%0A%u6240%u8C13%u7684K-means%u5C31%u662F%u521D%u59CB%u5316%u4E24%u4E2Acluster%20centroid%uFF0C%u7136%u540E%u901A%u8FC7%u4E0D%u65AD%u8FED%u4EE3%uFF0C%u5C06%u4E4B%u79FB%u5230%u7C07%u7684%u4E2D%u592E%uFF0C%u4ECE%u800C%u8FBE%u5230%u5206%u7C7B%u3002%0A%0A%23%23%23%u6B65%u9AA4%0A1.%20Randomly%20initialize%20K%20cluster%20centroids%20.%0A%091%29%20Should%20have%20%24K%20%5Clt%20m%24%0A%092%29%20Randomly%20pick%20K%20training%20examples%0A%093%29%20Set%20%24%5Cmu_1%2C%20%5Cmu_2%2C...%2C%20%5Cmu_K%24%20euqal%20to%20these%20K%20examples.%0A2.%20Repeat%0A%09for%20i%3D1%20to%20m%0A%09%09%24c%5E%7B%28i%29%7D%24%20%3A%3D%20index%20%28from%201%20to%20K%29%20of%20cluster%20centroid%20closest%20to%20x%5E%7B%28i%29%7D%0A%09%09%u5C06%24x%5E%7B%28i%29%7D%24%u5F52%u5230%u79BB%u5176%u6700%u8FD1%u7684centroid%0A%09for%20k%20%3D%201%20to%20K%0A%09%09%24%5Cmu_k%24%20%3A%3D%20average%20%28mean%29%20of%20points%20assigned%20to%20cluster%20k.%0A%09%09%u6309%u7167%u5F52%u5230%u8BE5centroid%u7684%u6240%u6709sample%u6765%u76F8%u5E94%u5730%u79FB%u52A8%u8BE5centroid%0A%0A%23%23%23%u8C03%u4F18%0A%23%23%23%23%u76EE%u7684%0A%u4E3A%u4E86%u907F%u514D%u56E0%u4E3A%u521D%u59CB%u5316%u4E0D%u5F53%uFF0C%u800C%u4EA7%u751Flocal%20optima%0A%23%23%23%23Cost%20Function%0A%24J%28c%5E%7B%281%29%7D%2C%20c%5E%7B%282%29%7D%2C%20...%2C%20c%5E%7B%28m%29%7D%2C%20%5Cmu_1%2C%20%5Cmu_2%2C%20...%20%2C%5Cmu_K%29%20%3D%20%5Cdfrac%20%7B1%7D%7Bm%7D%20%5Csum_%7Bi%3D1%7D%5Em%7C%7Cx%5E%7B%28i%29%7D-%5Cmu_%7Bc%5E%7B%28i%29%7D%7D%7C%7C%5E2%24%0A%u8C03%u4F18%u5C31%u662F%u4F7FCost%20Function%u6700%u5C0F%u5316%u3002%u5177%u4F53%u7684%u505A%u6CD5%u5C31%u662F%u591A%u505A%u51E0%u6B21K-means%20%28Andrew%u63A8%u8350100%u6B21%29%uFF0C%u9009%u53D6Cost%20Function%u6700%u5C0F%u7684%u4E00%u7EC4%u3002%0A%0A%23%23%23%u9009%u62E9cluster%20number%0AElvow%20method%20%28%u8098%u5173%u8282%u6CD5%29%0A%21%5BAlt%20text%5D%28./1496198213650.png%29%0A%u4F46%u5982%u679C%u51FA%u73B0%u4E0B%u9762%u7684%u56FE%u50CF%uFF0C%u8098%u5173%u8282%u6CD5%u5C31%u5931%u6548%u4E86%0A%21%5BAlt%20text%5D%28./1496198240034.png%29%0A%u6309%u7167Andrew%u7684%u5EFA%u8BAE%uFF0C%u8FD8%u662F%u9700%u8981%u9886%u57DF%u903B%u8F91%u6765%u5224%u65AD%u9700%u8981%u5206%u4F5C%u51E0%u7C7B%u3002It%20depends%20on%20how%20to%20make%20the%20down%20stream%20or%20later%20process%20happy.%0A%0A%23%23Dimensionality%20Reduction%20-%20Principal%20Component%20Analysis%20%28PCA%29%0ADimensionality%20Reduction%20%3D%20%u964D%u7EF4%2C%203D-%3E2D%0A%23%23%23%u6B65%u9AA4%0A1.%20Feature%20scaling/mean%20normalization.%0A%091%29%20%24%5Cmu_i%20%3D%20%5Cdfrac%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Emx%5E%7B%28i%29%7D_j%24%0A%092%29%20%24x_j%5E%7B%28i%29%7D%3Dx_j%5E%7B%28i%29%7D-%5Cmu_j%24%20%0A%093%29%20%28optional%29%20%24x_j%5E%7B%28i%29%7D%20%3D%20%5Cdfrac%20%7Bx_j%5E%7B%28i%29%7D-%5Cmu_j%7D%7Bs_j%7D%24%20%uFF0C%24s_j%24%u662F%u6807%u51C6%u5DEE%0A2.%20Compute%20%u201Ccovariance%20matrix%u201D%0A%24%24%5CSigma%20%3D%20%5Cdfrac%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5En%28x%5E%7B%28i%29%7D%29%28x%5E%7B%28i%29%7D%29%5ET%24%24%0A3.%20%u8C03%u7528%u7CFB%u7EDF%u5E93%uFF0CCompute%20%u201Ceigenvectors%u201D%20of%20matrix%20%24%5CSigma%24%3A%20%60%5BU%2CS%2CV%5D%20%3D%20svd%28Sigma%29%3B%60%0A4.%20%60Ureduce%20%3D%20U%28%3A%2C1%3Ak%29%3B%20z%20%3D%20Ureduce%27*x%3B%60%0A%0A%23%23%23Reconstruct%0A%60x%20%3D%20Ureduce%20*%20z%60%0A%0A%21%5BAlt%20text%5D%28./1496209550430.png%29%0A%0A%0A%23%23%23%20Choose%20k%0ATypically%2C%20choose%20k%20to%20be%20smallest%20value%20so%20that%2C%0A%24%24%5Cdfrac%20%7B%5Cdfrac%20%7B1%7D%7Bm%7D%7C%7Cx%5E%7B%28i%29%7D-x_%7Bapprox%7D%5E%7B%28i%29%7D%7C%7C%5E2%7D%7B%5Cdfrac%20%7B1%7D%7Bm%7D%7C%7Cx%5E%7B%28i%29%7D%7C%7C%5E2%7D%20%5Cleq%200.01%24%24%0A0.01%20means%2099%25%20variance%20is%20retained.%20%u4FDD%u7559%u4E8699%25%u7684%u53D8%u5316%u3002%0A%0A%u53E6%u5916%u4E00%u79CD%u901A%u8FC7svd%u51FD%u6570%u8BA1%u7B97variance%u7684%u65B9%u6CD5%0A1.%20Start%20from%20k%3D1%0A2.%20%60%5BU%2CS%2CV%5D%20%3D%20svd%28Sigma%29%60%0A2.%20%241%20-%20%5Cdfrac%20%7B%5Csum_%7Bi%3D1%7D%5EkS_%7Bii%7D%7D%7B%5Csum_%7Bi%3D1%7D%5EnS_%7Bii%7D%7D%20%5Cleq%200.01%24%0A3.%20Pick%20the%20smallest%20k.%0A%0A%23%23%23Application%20of%20PCA%0A-%20Compression%3A%20%u8282%u7701%u6570%u636E%u7A7A%u95F4%uFF0C%u63D0%u5347%u8BA1%u7B97%u901F%u5EA6%0A-%20Visualization%3A%20k%3D2%20or%20k%20%3D%203%0A%0A%u4E0D%u8981%u6EE5%u7528PCA%u3002%u4F8B%u5982%uFF0C%u7528PCA%u51CF%u5C11feature%20set%u6765%u6D88%u9664Overfitting%0ABefore%20implementing%20PCA%2C%20first%20try%20running%20whatever%20you%20want%20to%20do%20with%20the%20original/raw%20data%20%24x%5E%7B%28i%29%7D%24.%20Only%20if%20that%20doesn%u2019t%20do%20what%20you%20want%2C%20then%20implement%20PCA%20and%20consider%20using%20%24z%5E%7B%28i%29%7D%24%0A%0A%23%23Anomaly%20detection%20algorithm%0A%u6211%u5C06%u4E4B%u7FFB%u8BD1%u4E3A%u5F02%u5E38%u68C0%u6D4B%u3002%u5B9E%u9645%u5C31%u662F%u5BF9%u5C5E%u4E8E%u6B63%u6001%uFF08%u6216%u9AD8%u65AF%uFF09%u5206%u5E03%u7684%u7279%u5F81%u91CF%u7684%u975E%u6B63%u5E38%u68C0%u6D4B%u3002%0A1.%20Choose%20features%20%24x_i%24%20that%20you%20think%20might%20be%20indicative%20of%20anomalous%20examples.%0A2.%20Fit%20parameters%20%24%5Cmu_1%2C%20...%2C%5Cmu_n%2C%20%5Csigma_1%2C...%2C%5Csigma_n%24%0A%09%24%24%5Cmu_j%20%3D%20%5Cdfrac%20%7B1%7D%7Bm%7D%20%5Csum_%7Bi%3D1%7D%5Em%20x_j%5E%7B%28i%29%7D%24%24%0A%09%24%24%5Csigma_j%5E2%20%3D%20%5Cdfrac%20%7B1%7D%7Bm%7D%20%5Csum_%7Bi%3D1%7D%5Em%20%28x_j%5E%7B%28i%29%7D%20-%20%5Cmu_j%29%5E2%24%24%0A3.%20Given%20new%20example%20%24x%24%2C%20compute%20%24p%28x%29%24%20-%20%u6982%u7387%0A%09%24%24p%28x%29%20%3D%20%5Cprod_%7Bj%3D1%7D%5En%20p%28x_j%3B%20%5Cmu_j%2C%20%5Csigma_j%5E2%29%20%3D%20%5Cprod_%7Bj%3D1%7D%5En%20%5Cdfrac%20%7B1%7D%7B%5Csqrt%20%7B2%5Cpi%7D%5Csigma_j%7Dexp%20%5Cleft%28%7B-%5Cdfrac%20%7B%28x_j-%5Cmu_j%29%5E2%7D%7B2%20%5Csigma_j%5E2%7D%7D%5Cright%29%24%24%0A4.%20%20Anomaly%20if%20%24p%28x%29%20%5Cle%20%5Cepsilon%24%0A%0A%u9AD8%u65AF%u5206%u5E03%u6982%u7387%u5BC6%u5EA6%u516C%u5F0F%uFF1A%0A%24%24p%28x%29%20%3D%20%5Cdfrac%20%7B1%7D%7B%5Csqrt%20%7B2%5Cpi%7D%5Csigma%7Dexp%20%5Cleft%28%7B-%5Cdfrac%20%7B%28x-%5Cmu%29%5E2%7D%7B2%20%5Csigma%5E2%7D%7D%5Cright%29%24%24%0A%0A%23%23%23%u7ED3%u679C%u8BC4%u4F30%0A%u5982%u4F55%u5BF9%u6570%u636E%u5206%u7C7B%u8FDB%u884Ctraining%u548Cvalidate%0A%0A%3EAircraft%20engines%20motivating%20example%20%0A10000%20%20good%28normal%29%20engines%0A20%20%20flawed%20engines%20%28anomalous%29%0A%0A%3ETraining%20set%3A%206000%20good%20engines%0ACV%3A%202000%20good%20engines%28y%3D0%29%2C%2010%20anomalous%28y%3D1%29%0ATest%3A%202000%20good%20engines%28y%3D0%29%2C%2010%20anomalous%28y%3D1%29%0A%u4E5F%u5C31%u662F%u628Aanomalous%u7684sample%u7701%u4E0B%u6765%u505Avalidate%u548Ctest%u7528%0A%23%23%23%u7279%u5F81%u91CF%u7684%u9009%u53D6%0A%u5982%u679C%u7279%u5F81%u91CF%u4E0D%u7B26%u5408%u6B63%u6001%u5206%u5E03%u600E%u4E48%u529E%uFF1F%u6309Andrew%u7684%u5EFA%u8BAE%u662F%u8FDB%u884C%u53D8%u6362%uFF0C%u5219%u80FD%u5F97%u5230%u8FD1%u4F3C%u6B63%u6001%u7684%u5206%u5E03%uFF0C%u5982%u4E0B%uFF1A%0A%21%5BAlt%20text%5D%28./1496744051794.png%29%0A%u4E00%u822C%u7528%u5230%u7684%u65B9%u6CD5%u6709%uFF1A%0A%24x_%7Bnew%7D%20%3D%20log%28x+c%29%24%0A%24x_%7Bnew%7D%20%3D%20x%5E%7B%5Cfrac%20%7B1%7D%7Bn%7D%7D%24%0A%0AAndrew%u5728%u89C6%u9891%u4E2D%u7528Octave%u547D%u4EE4%u884C%u505A%u4E86live%20demo%uFF1A%0A%21%5BAlt%20text%5D%28./1496744359543.png%29%0A%u6211%u4EEC%u5F80%u5F80%u53EF%u4EE5%u901A%u8FC7hist%u547D%u4EE4%u6765%u753B%u56FE%u67E5%u770B%u4FEE%u6B63%u7684%u7279%u5F81%u91CF%u662F%u5426%u5C5E%u4E8E%u6B63%u6001%u5206%u5E03%u3002%0A%0A%u8BB2%u4E49%u4E2D%u63D0%u5230%u7684**Error%20Analysis**%u6307%u7684%u662F%uFF1A%0A%u6709%u4E00%u4E9B%u5F02%u5E38%u60C5%u51B5%u662F%uFF0C%u7279%u5F81%u5411%u91CF%u7684%u5404%u5206%u91CF%u90FD%u5728%u5927%u6982%u7387%u7684%u70B9%uFF0C%u4F46%u662F%u5F53%u67D0%u4E9B%u5206%u91CF%u7EC4%u5408%u65F6%uFF0C%u4F1A%u662F%u5C0F%u6982%u7387%u4E8B%u4EF6%uFF0C%u5219%u6B64%u65F6%u9700%u8981%u4EA7%u751F%u65B0%u7684%u7279%u5F81%u5411%u91CF%u3002%u4F8B%u5982%uFF1AAndrew%u7684%u4F8B%u5B50%uFF0C%u5F53CPU%20load%u76F8%u5BF9%u8F83%u5927%uFF0C%u4F46%u4ECD%u5728%u6B63%u5E38%u8303%u56F4%uFF0C%u4F46%u662Fnetwork%20traffic%u5374%u5F88%u5C0F%uFF0C%u4F46%u4ECD%u7136%u5728%u6B63%u5E38%u8303%u56F4%u3002%u8FD9%u65F6%u4F60%u5C31%u9700%u8981%u4E00%u4E2A%u53D8%u91CF%uFF0C%u6BD4%u5982%24x_5%20%3D%20%5Cfrac%20%7BCPU%5C%20load%7D%20%7Bnetwork%5C%20traffic%7D%24%u6216%u8005%24x_6%20%3D%20%5Cfrac%20%7B%28CPU%5C%20load%29%5E2%7D%7Bnetwork%5C%20traffic%7D%24%u53D6%u51B3%u4E8E%u65B0%u7684%u7279%u5F81%u91CF%u662F%u5426%u662F%u6B63%u6001%u5206%u5E03%u3002%0A%23%23%23Multivariate%20Gaussian%20distribution%0A%u524D%u9762%u7684Anomaly%20Detection%u57FA%u4E8E%u5404%u7279%u5F81%u5206%u91CF%u662F%u72EC%u7ACB%u5206%u5E03%u7684%u60C5%u51B5%uFF0C%u5982%u679C%u662F%u975E%u72EC%u7ACB%u5206%u5E03%u5C31%u91C7%u7528%u4E0A%u9762%u63D0%u5230%u7684**Error%20Analysis**%u7684%u65B9%u6CD5%u3002%u800CMultivariate%20Gaussian%u5219%u662F%u5BF9%u975E%u72EC%u7ACB%u5206%u5E03%u7684%u8865%u5145%uFF0C%u4E5F%u5C31%u662F%u8BF4%u4E0A%u9762%u7684Anomaly%20Detection%u5176%u5B9E%u662FMultivariate%20Gaussian%u7684%u4E00%u79CD%u7279%u6B8A%u60C5%u51B5%u3002%0A%0AAndrew%u7ED9%u51FA%u4E86%u77E2%u91CF%u5316%u7684Multivariate%20Gaussian%u7684%u6982%u7387%u5BC6%u5EA6%u516C%u5F0F%uFF1A%0A%24%24p%28x%3B%5Cmu%2C%20%5CSigma%29%20%3D%20%5Cdfrac%20%7B1%7D%7B%282%5Cpi%29%5E%7B%5Cfrac%20%7Bn%7D%7B2%7D%7D%5Cleft%20%7C%20%5CSigma%5Cright%7C%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%7Dexp%20%5Cleft%28-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5ET%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%5Cright%29%24%24%0A%24%5Cmu%20%3D%20%5Cfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%20x%5E%7B%28i%29%7D%24%0A%24%5CSigma%20%3D%20%5Cfrac%20%7B1%7D%7Bm%7D%5Csum_%7Bi%3D1%7D%5Em%20%28x%5E%7B%28i%29%7D-%5Cmu%29%28x%5E%7B%28i%29%7D-%5Cmu%29%5ET%24%0A%3E%20%0A%u5176%u4E2D%24%5CSigma%24%u79F0%u4E3ACovariance%20Matrix%uFF0C%u534F%u65B9%u5DEE%u77E9%u9635%uFF0C%24%7C%5CSigma%7C%24%u4E3A%24%5CSigma%24%u7684%u884C%u5217%u5F0F%28determinant%29%uFF0C%u53C2%u8003%uFF1A%0A-%20%5Bwiki%20-%20%u884C%u5217%u5F0F%5D%28https%3A//zh.wikipedia.org/wiki/%25E8%25A1%258C%25E5%2588%2597%25E5%25BC%258F%29%0A%u7B80%u5355%u6458%u53D6%u4E24%u5F20%u8BA1%u7B97%u77E9%u9635%u884C%u5217%u5F0F%u7684%u63D2%u56FE%uFF1A%0A%3E%21%5BAlt%20text%5D%28./1496745897356.png%29%0A%3E%0A%3E%21%5BAlt%20text%5D%28./1496745910661.png%29%0A%3EOctave%u91CC%u9762%u7528%u51FD%u6570%60det%28A%29%60%0A%0A%u5728%u4E0A%u9762%u7ED9%u51FA%u7684%u516C%u5F0F%u91CC%uFF0CCovariance%20Matrix%u5373%24%5CSigma%24%u5C31%u7ED9%u51FA%u4E86%u53D8%u91CF%u4E4B%u95F4%u7684%u76F8%u5173%u6027%u7684%u4FE1%u606F%u3002%0A%0AOriginal%20model%u548Cmultivariate%20model%u90FD%u53EF%u4EE5%u68C0%u6D4B%u5230%u4E0A%u9762%u4E3E%u7684CPU%20load%u548Cnetwork%20traffic%u7684%u4F8B%u5B50%u4E2D%u7684%u60C5%u51B5%u3002%u90A3%u8981%u5982%u4F55%u9009%u62E9%u5462%uFF1FAndrew%u7ED9%u51FA%u4E86%u5BF9%u6BD4%uFF1A%0A%21%5BAlt%20text%5D%28./1496746985199.png%29%0A%u53EF%u89C1%u867D%u7136%u770B%u8D77%u6765Multivariate%u6B63%u89C4%u4E00%u4E9B%uFF0C%u4F46%u662FOriginal%20model%u66F4%u5B9E%u7528%u4E00%u70B9%u3002%0A%23%23%23Anomaly%20detection%20vs.%20supervised%20learning%0A%21%5BAlt%20text%5D%28./1496580685327.png%29%0A%21%5BAlt%20text%5D%28./1496580714267.png%29%0A%0A%23%23Recommender%20Systems%0A%23%23%23Content-based%20recommender%20systems%0A%u8FD9%u4E2A%u95EE%u9898%uFF0C%u5176%u5B9E%u5C31%u662F%u5DF2%u77E5%u6BCF%u90E8%u7535%u5F71%u7684%u7279%u5F81%28%u7279%u5F81%u5411%u91CF%29%uFF0C%u4EE5%u53CA%u7528%u6237%u5BF9%u7535%u5F71%u7684%u8BC4%u5206%uFF0C%u6765%u63A8%u6D4B%u7528%u6237%u5BF9%u6CA1%u770B%u8FC7%u7684%u7535%u5F71%u7684%u8BC4%u5206%u3002%u770B%u4E0B%u9762%u7684%u8868%u683C%uFF1A%0A%21%5BAlt%20text%5D%28./1497231459572.png%29%0A%u65B9%u6CD5%u5C31%u662F%u7EBF%u6027%u56DE%u5F52%uFF0C%u6C42%u4E0B%u9762cost%20function%u7684%u6700%u5C0F%u503C%uFF1A%0A%24%24min_%7B%5Ctheta%5E%7B%281%29%7D%2C%20%5Ctheta%5E%7B%282%29%7D%2C...%2C%20%5Ctheta%5E%7B%28n_u%29%7D%7D%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bj%3D1%7D%5E%7Bn_u%7D%5Csum_%7Bi%3Ar%28i%2Cj%29%3D1%7D%5Cleft%20%28%28%5Ctheta%5E%7B%28j%29%7D%29%5ETx%5E%7B%28i%29%7D%20-%20y%5E%7B%28i%2Cj%29%7D%5Cright%29%5E2%20+%20%5Cfrac%7B%5Clambda%7D%7B2%7D%5Csum_%7Bj%3D1%7D%5E%7Bn_u%7D%5Csum_%7Bk%3D1%7D%5En%5Cleft%28%5Ctheta_k%5E%7B%28j%29%7D%5Cright%29%5E2%24%24%0A%u5176%u4E2D%uFF1A%0A-%20%24n_u%24%3A%20%u7528%u6237%u6570%0A-%20%24n%24%3A%20%u7535%u5F71%u6570%0A-%20%24i%3Ar%28i%2Cj%29%20%3D%201%24%3A%20%u7528%u6237j%u7ED9%u7535%u5F71i%u8BC4%u5206%u4E86%0A-%20%24y%5E%7B%28i%2Cj%29%7D%24%3A%20%u7528%u6237%u7684%u771F%u5B9E%u8BC4%u5206%0A-%20%24%28%5Ctheta%5E%7B%28j%29%7D%29%5ETx%5E%7B%28i%29%7D%24%3A%20hypothesis%uFF0C%u63A8%u6D4B%u8BC4%u5206%0A%0A**Gradient%20Descent**%3A%0Afor%20k%20%3D%200%3A%0A%24%5Ctheta_k%5E%7B%28j%29%7D%20%3A%3D%20%5Ctheta_k%5E%7B%28j%29%7D%20-%20%5Calpha%5Csum_%7Bi%3Ar%28i%2Cj%29%3D1%7D%5Cleft%28%28%5Ctheta%5E%7B%28j%29%7D%29%5ETx%5E%7B%28i%29%7D-y%5E%7B%28i%2Cj%29%7D%5Cright%29x_k%5E%7B%28i%29%7D%24%20%0Afor%20k%24%5Cneq%24%200%3A%0A%24%5Ctheta_k%5E%7B%28j%29%7D%20%3A%3D%20%5Ctheta_k%5E%7B%28j%29%7D%20-%20%5Calpha%5Cleft%28%5Csum_%7Bi%3Ar%28i%2Cj%29%3D1%7D%5Cleft%28%28%5Ctheta%5E%7B%28j%29%7D%29%5ETx%5E%7B%28i%29%7D-y%5E%7B%28i%2Cj%29%7D%5Cright%29x_k%5E%7B%28i%29%7D%20+%20%5Clambda%5Ctheta_k%5E%7B%28j%29%7D%5Cright%29%24%20%0A%0A%23%23%23Collaborative%20filtering%0A%u5047%u8BBE%u5728%u524D%u9762%u4E00%u8282%u7684training%u4E2D%uFF0C%24%5Ctheta_k%5E%7B%28j%29%7D%24%u662F%u5DF2%u77E5%u7684%uFF0C%u800C%24x%5E%7B%28i%29%7D%24%u672A%u77E5%uFF0C%u5373%u6BCF%u90E8%u7535%u5F71%u9488%u5BF9%u5404%u4E2A%u7279%u5F81%u91CF%u7684%u8BC4%u5206%u672A%u77E5%uFF0C%u6B64%u65F6%u901A%u8FC7training%u4E00%u6837%u53EF%u4EE5%u63A8%u6D4B%u51FA%u5404%u7535%u5F71%u7684%u8BC4%u7EA7%u3002%u800CCollabrative%20Filtering%u5C31%u662F%u5728%u4E24%u624B%u7A7A%u7A7A%u7684%u65F6%u5019%uFF0C%u65E2%u6CA1%u6709%24%5Ctheta%24%uFF0C%u4E5F%u6CA1%u6709%24x%24%uFF0C%u800C%u53EA%u6709%24y%5E%7B%28i%2Cj%29%7D%24%uFF0C%u5C31%u80FD%u9884%u6D4B%24%5Ctheta%24%u548C%24x%24%u3002%0A%0A1.%20Initialize%20%24x%5E%7B%281%29%7D%2C%20x%5E%7B%281%29%7D%2C%20...%2C%20x%5E%7B%28n_m%29%7D%2C%20%5Ctheta%5E%7B%281%29%7D%20%2C%20%5Ctheta%5E%7B%282%29%7D%2C...%2C%5Ctheta%5E%7B%28n_u%29%7D%24%20to%20small%20random%20values.%0A2.%20Minimize%20%24J%28x%5E%7B%281%29%7D%2C%20x%5E%7B%281%29%7D%2C%20...%2C%20x%5E%7B%28n_m%29%7D%2C%20%5Ctheta%5E%7B%281%29%7D%20%2C%20%5Ctheta%5E%7B%282%29%7D%2C...%2C%5Ctheta%5E%7B%28n_u%29%7D%29%24using%20gradient%20descent%20%28or%20an%20advanced%20optimization%20algorithm%29.%0A%24J%20%3D%20%5Cfrac%7B1%7D%7B2%7D%5Csum_%7B%28i%2Cj%29%3Ar%28i%2Cj%29%3D1%7D%5Cleft%28%28%5Ctheta%5E%7B%28j%29%7D%29%5ETx%5E%7B%28i%29%7D%20-%20y%5E%7B%28i%2Cj%29%7D%5Cright%29%5E2%20+%20%5Cfrac%7B%5Clambda%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5E%7Bn_m%7D%5Csum_%7Bk%3D1%7D%5En%5Cleft%28x_k%5E%7B%28i%29%7D%5Cright%29%5E2+%5Cfrac%7B%5Clambda%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5E%7Bn_m%7D%5Csum_%7Bk%3D1%7D%5En%5Cleft%28%5Ctheta_k%5E%7B%28i%29%7D%5Cright%29%5E2%24%0A%21%5BAlt%20text%5D%28./1497233522468.png%29%0A3.%20For%20a%20user%20with%20parameters%20%24%5Ctheta%24%20and%20a%20movie%20with%20%28learned%29%20features%20%24x%24%2C%20predict%20a%20star%20rating%20of%20%24%5Ctheta%5ETx%24%0A%3E%u56E0%u4E3A%u5B58%u5728%u672A%u7ED9%u4EFB%u4F55%u7535%u5F71%u8BC4%u5206%u7684%u7528%u6237%uFF0C%u5982%u679C%u4E0D%u505A%u7279%u6B8A%u5904%u7406%uFF0C%u5219%u8FD9%u4E9B%u7528%u6237%u5BF9%u6240%u6709%u7535%u5F71%u7684%u9884%u6D4B%u8BC4%u4EF7%u90FD%u4F1A%u662F0%u5206%uFF0C%u5177%u4F53Andrew%u5728%u89C6%u9891%u4E2D%u7ED9%u51FA%u8BBA%u8BC1%u3002%u6240%u4EE5%u9700%u8981mean%20normalization%uFF0C%u5177%u4F53%u505A%u6CD5%u662F%uFF1A%0A%3E-%20%24%5Cmu%20%3D%20%5Cfrac%20%7B1%7D%7Bm%7D%20%5Csum_1%5Em%20y%5E%7Bi%7D%24%0A%3E-%20%24Y%20%3D%20Y%20-%20%5Cmu%24%0A%3E-%20%u4E0D%u7528%u6807%u51C6%u5DEE%u9664%uFF0C%u56E0%u4E3A%u8BC4%u5206%u672C%u6765%u5C31%u5728%u8FD1%u4F3C%u7684range%0A%0A1%0A%0A%0ACollaborative%20filtering%u77E2%u91CF%u5199%u6CD5%u662F%uFF1A%0A%24%24h_%7B%5CTheta%7D%20%3D%20X%5CTheta%5ET%24%24%0A%u800CCollaborative%20filtering%20algorithm%u53C8%u53EBLow%20rank%20matrix%20factorization%u3002%u5176%u4E2D%uFF0C%24X%5CTheta%5ET%24%u5C31%u662FLow%20rank%20matrix%u3002%0A%0A%3E%u5F53%u8981%u7ED9%u7528%u6237%u63A8%u8350%u7684%u65F6%u5019%uFF0C%24%7C%7Cx_j%20-%20x_i%7C%7C%24%u8DB3%u591F%u5C0F%u7684%u65F6%u5019%uFF0C%u5C31%u53EF%u4EE5%u8BA4%u4E3Ai%u548Cj%u8DB3%u591F%u76F8%u4F3C%u3002%0A%0A%0A%0A%0A%0A%0A%0A%09%09%0A