Deep Learning (13) - Face Recognition

Edit

本章讨论的是目标检测的一个特殊用例——人脸识别。所谓人脸识别,就是输入一张照片,或者现场采集头像照片,并匹配数据库中的数据,来翻译成身份信息。通常一个人脸识别算法可以分作两步:

  • Face verification
  • Face recognition

前者输入照片和ID信息,判断是否匹配。后者输入照片信息,与数据库中已有的照片集进行匹配。当前者算法精确度足够高时,就可以应用到后者中。

One Shot Learning

Face recognition首要解决的就是One Shot Learning问题。例如公司的门禁系统,通常每个员工只上传一张个人照片,那么这个系统就要能够正确识别该员工。即便所有的员工都上传了照片,其实这个样本集的数量级仍然不会很高。怎么在一个小样本集上有效训练来达到比较好的效果,这就是One Shot Learning需要解决的问题。后面提到的Siamese网络就是One Shot Learning的一种解决方案。

Siamese Network

Siamese就是下图这样的一个网络。它包含2个或者更多个完全一样的网络分支。每个分支将输入数据映射成最终的activation向量输出。然后通过比较这两个向量的相似度,来度量两幅图片的相似度。


其中两路输出的difference用下式来表明:

如果输出层,采用sigmoid激活函数,则预测输出为:

上面是训练好所有权重后,如何使用一个Siamese网络来得到图片异同的预测。那么如何训练一个Siamese网络呢?要训练一个网络,首先我们需要模型的损失函数J,其次我们需要满足训练的样本集。

Triplet Loss

要训练一个Siamese网络,它的输入样本可以这样选择


一共3张图片,分成两组,训练的目标是左边一组输出的difference函数要小于右边一组输出的difference函数。考虑随机噪声,再添加一些margin,可以得到损失函数如下:

  • 就是margin
  • A代表Anchor,P代表Positive,N代表Negative
  • m是mini-batch中的样本数

如果有1k个人的10k张图片,可以生成很多个这样APN的样本组。生成时,最好不要采用随机算法,因为如果是随机生成APN的话,导致AN很有可能图片本身差别就很大,所以就很容易满足,这样的样本就是无效样本。在生成样本时,尤其是挑选AN对时,尽量挑选比较相似的,以进行有效训练。
通过对该损失函数J进行前向传播和反向传播,来运行Gradient descent算法,迭代得到最终Siamese网络的模型参数。

参考文献

  • Siamese Network: Taigman et. al., 2014, DeepFace closing the gap to human level performance.
  • Triplet Loss: Schroff et. al., 2015, FaceNet: A unified embedding for face recognition and clustering.
  • Siamese Network & Triplet Loss
%23%20Deep%20Learning%20%2813%29%20-%20Face%20Recognition%0A@%28myblog%29%5Bdeep%20learning%2C%20machine%20learning%5D%0A%0A%u672C%u7AE0%u8BA8%u8BBA%u7684%u662F%u76EE%u6807%u68C0%u6D4B%u7684%u4E00%u4E2A%u7279%u6B8A%u7528%u4F8B%u2014%u2014%u4EBA%u8138%u8BC6%u522B%u3002%u6240%u8C13%u4EBA%u8138%u8BC6%u522B%uFF0C%u5C31%u662F%u8F93%u5165%u4E00%u5F20%u7167%u7247%uFF0C%u6216%u8005%u73B0%u573A%u91C7%u96C6%u5934%u50CF%u7167%u7247%uFF0C%u5E76%u5339%u914D%u6570%u636E%u5E93%u4E2D%u7684%u6570%u636E%uFF0C%u6765%u7FFB%u8BD1%u6210%u8EAB%u4EFD%u4FE1%u606F%u3002%u901A%u5E38%u4E00%u4E2A%u4EBA%u8138%u8BC6%u522B%u7B97%u6CD5%u53EF%u4EE5%u5206%u4F5C%u4E24%u6B65%uFF1A%0A-%20Face%20verification%0A-%20Face%20recognition%0A%0A%u524D%u8005%u8F93%u5165%u7167%u7247%u548CID%u4FE1%u606F%uFF0C%u5224%u65AD%u662F%u5426%u5339%u914D%u3002%u540E%u8005%u8F93%u5165%u7167%u7247%u4FE1%u606F%uFF0C%u4E0E%u6570%u636E%u5E93%u4E2D%u5DF2%u6709%u7684%u7167%u7247%u96C6%u8FDB%u884C%u5339%u914D%u3002%u5F53%u524D%u8005%u7B97%u6CD5%u7CBE%u786E%u5EA6%u8DB3%u591F%u9AD8%u65F6%uFF0C%u5C31%u53EF%u4EE5%u5E94%u7528%u5230%u540E%u8005%u4E2D%u3002%0A%0A%23%23%20One%20Shot%20Learning%0AFace%20recognition%u9996%u8981%u89E3%u51B3%u7684%u5C31%u662FOne%20Shot%20Learning%u95EE%u9898%u3002%u4F8B%u5982%u516C%u53F8%u7684%u95E8%u7981%u7CFB%u7EDF%uFF0C%u901A%u5E38%u6BCF%u4E2A%u5458%u5DE5%u53EA%u4E0A%u4F20%u4E00%u5F20%u4E2A%u4EBA%u7167%u7247%uFF0C%u90A3%u4E48%u8FD9%u4E2A%u7CFB%u7EDF%u5C31%u8981%u80FD%u591F%u6B63%u786E%u8BC6%u522B%u8BE5%u5458%u5DE5%u3002%u5373%u4FBF%u6240%u6709%u7684%u5458%u5DE5%u90FD%u4E0A%u4F20%u4E86%u7167%u7247%uFF0C%u5176%u5B9E%u8FD9%u4E2A%u6837%u672C%u96C6%u7684%u6570%u91CF%u7EA7%u4ECD%u7136%u4E0D%u4F1A%u5F88%u9AD8%u3002%u600E%u4E48%u5728%u4E00%u4E2A%u5C0F%u6837%u672C%u96C6%u4E0A%u6709%u6548%u8BAD%u7EC3%u6765%u8FBE%u5230%u6BD4%u8F83%u597D%u7684%u6548%u679C%uFF0C%u8FD9%u5C31%u662FOne%20Shot%20Learning%u9700%u8981%u89E3%u51B3%u7684%u95EE%u9898%u3002%u540E%u9762%u63D0%u5230%u7684Siamese%u7F51%u7EDC%u5C31%u662FOne%20Shot%20Learning%u7684%u4E00%u79CD%u89E3%u51B3%u65B9%u6848%u3002%0A%23%23%20Siamese%20Network%0ASiamese%u5C31%u662F%u4E0B%u56FE%u8FD9%u6837%u7684%u4E00%u4E2A%u7F51%u7EDC%u3002%u5B83%u5305%u542B2%u4E2A%u6216%u8005%u66F4%u591A%u4E2A%u5B8C%u5168%u4E00%u6837%u7684%u7F51%u7EDC%u5206%u652F%u3002%u6BCF%u4E2A%u5206%u652F%u5C06%u8F93%u5165%u6570%u636E%u6620%u5C04%u6210%u6700%u7EC8%u7684activation%u5411%u91CF%u8F93%u51FA%u3002%u7136%u540E%u901A%u8FC7%u6BD4%u8F83%u8FD9%u4E24%u4E2A%u5411%u91CF%u7684%u76F8%u4F3C%u5EA6%uFF0C%u6765%u5EA6%u91CF%u4E24%u5E45%u56FE%u7247%u7684%u76F8%u4F3C%u5EA6%u3002%0A%21%5BAlt%20text%7C700x0%5D%28./1538276798191.png%29%0A%u5176%u4E2D%u4E24%u8DEF%u8F93%u51FA%u7684difference%u7528%u4E0B%u5F0F%u6765%u8868%u660E%uFF1A%0A%24%24d%28x%5E%7B%28i%29%7D%2C%20x%5E%7B%28j%29%7D%29%20%3D%20%5Cbegin%7BVmatrix%7D%20f%28x%5E%7B%28i%29%7D%29%20-%20f%28x%5E%7B%28j%29%7D%29%20%5Cend%7BVmatrix%7D_2%5E2%24%24%0A%u5982%u679C%u8F93%u51FA%u5C42%uFF0C%u91C7%u7528sigmoid%u6FC0%u6D3B%u51FD%u6570%uFF0C%u5219%u9884%u6D4B%u8F93%u51FA%u4E3A%uFF1A%0A%24%24%5Chat%20y%20%3D%20%5Csigma%28%20%5CSigma_%7Bk%3D1%7D%5E%7B128%7D%20w_i%5Cbegin%7Bvmatrix%7D%20f%28x%5E%7B%28i%29%7D%29_k%20-%20f%28x%5E%7B%28j%29%7D%29_k%5Cend%7Bvmatrix%7D%20+%20b%29%24%24%0A%0A%u4E0A%u9762%u662F%u8BAD%u7EC3%u597D%u6240%u6709%u6743%u91CD%u540E%uFF0C%u5982%u4F55%u4F7F%u7528%u4E00%u4E2ASiamese%u7F51%u7EDC%u6765%u5F97%u5230%u56FE%u7247%u5F02%u540C%u7684%u9884%u6D4B%u3002%u90A3%u4E48%u5982%u4F55%u8BAD%u7EC3%u4E00%u4E2ASiamese%u7F51%u7EDC%u5462%uFF1F%u8981%u8BAD%u7EC3%u4E00%u4E2A%u7F51%u7EDC%uFF0C%u9996%u5148%u6211%u4EEC%u9700%u8981%u6A21%u578B%u7684%u635F%u5931%u51FD%u6570J%uFF0C%u5176%u6B21%u6211%u4EEC%u9700%u8981%u6EE1%u8DB3%u8BAD%u7EC3%u7684%u6837%u672C%u96C6%u3002%0A%0A%23%23%23%20Triplet%20Loss%0A%u8981%u8BAD%u7EC3%u4E00%u4E2ASiamese%u7F51%u7EDC%uFF0C%u5B83%u7684%u8F93%u5165%u6837%u672C%u53EF%u4EE5%u8FD9%u6837%u9009%u62E9%0A%21%5BAlt%20text%7C500x0%5D%28./1538274930628.png%29%0A%u4E00%u51713%u5F20%u56FE%u7247%uFF0C%u5206%u6210%u4E24%u7EC4%uFF0C%u8BAD%u7EC3%u7684%u76EE%u6807%u662F%u5DE6%u8FB9%u4E00%u7EC4%u8F93%u51FA%u7684difference%u51FD%u6570%u8981%u5C0F%u4E8E%u53F3%u8FB9%u4E00%u7EC4%u8F93%u51FA%u7684difference%u51FD%u6570%u3002%u8003%u8651%u968F%u673A%u566A%u58F0%uFF0C%u518D%u6DFB%u52A0%u4E00%u4E9Bmargin%uFF0C%u53EF%u4EE5%u5F97%u5230%u635F%u5931%u51FD%u6570%u5982%u4E0B%uFF1A%0A%24%24%5Cmathscr%20L%28A%2CP%2CN%29%20%3D%20max%28%5Cbegin%7BVmatrix%7D%20f%28A%29%20-%20f%28P%29%5Cend%7BVmatrix%7D%5E2%20-%20%5Cbegin%7BVmatrix%7D%20f%28A%29%20-%20f%28N%29%5Cend%7BVmatrix%7D%5E2%20+%20%5Calpha%2C%200%29%24%24%0A%24%24%20J%20%3D%20%5CSigma_%7Bi%3D1%7D%5Em%20%5Cmathscr%20L%28A%5E%7B%28i%29%7D%2C%20P%5E%7B%28i%29%7D%2C%20N%5E%7B%28i%29%7D%29%24%24%0A-%20%24%5Calpha%24%u5C31%u662Fmargin%0A-%20A%u4EE3%u8868Anchor%uFF0CP%u4EE3%u8868Positive%uFF0CN%u4EE3%u8868Negative%0A-%20m%u662Fmini-batch%u4E2D%u7684%u6837%u672C%u6570%0A%0A%u5982%u679C%u67091k%u4E2A%u4EBA%u768410k%u5F20%u56FE%u7247%uFF0C%u53EF%u4EE5%u751F%u6210%u5F88%u591A%u4E2A%u8FD9%u6837APN%u7684%u6837%u672C%u7EC4%u3002%u751F%u6210%u65F6%uFF0C%u6700%u597D%u4E0D%u8981%u91C7%u7528%u968F%u673A%u7B97%u6CD5%uFF0C%u56E0%u4E3A%u5982%u679C%u662F%u968F%u673A%u751F%u6210APN%u7684%u8BDD%uFF0C%u5BFC%u81F4AN%u5F88%u6709%u53EF%u80FD%u56FE%u7247%u672C%u8EAB%u5DEE%u522B%u5C31%u5F88%u5927%uFF0C%u6240%u4EE5%u5C31%u5F88%u5BB9%u6613%u6EE1%u8DB3%24%5Cmathscr%20L%28A%2CP%2CN%29%20%5Cge%200%24%uFF0C%u8FD9%u6837%u7684%u6837%u672C%u5C31%u662F%u65E0%u6548%u6837%u672C%u3002%u5728%u751F%u6210%u6837%u672C%u65F6%uFF0C%u5C24%u5176%u662F%u6311%u9009AN%u5BF9%u65F6%uFF0C%u5C3D%u91CF%u6311%u9009%u6BD4%u8F83%u76F8%u4F3C%u7684%uFF0C%u4EE5%u8FDB%u884C%u6709%u6548%u8BAD%u7EC3%u3002%0A%u901A%u8FC7%u5BF9%u8BE5%u635F%u5931%u51FD%u6570J%u8FDB%u884C%u524D%u5411%u4F20%u64AD%u548C%u53CD%u5411%u4F20%u64AD%uFF0C%u6765%u8FD0%u884CGradient%20descent%u7B97%u6CD5%uFF0C%u8FED%u4EE3%u5F97%u5230%u6700%u7EC8Siamese%u7F51%u7EDC%u7684%u6A21%u578B%u53C2%u6570%u3002%0A%0A%23%23%20%u53C2%u8003%u6587%u732E%0A-%20Siamese%20Network%3A%20Taigman%20et.%20al.%2C%202014%2C%20DeepFace%20closing%20the%20gap%20to%20human%20level%20performance.%0A-%20Triplet%20Loss%3A%20Schroff%20et.%20al.%2C%202015%2C%20FaceNet%3A%20A%20unified%20embedding%20for%20face%20recognition%20and%20clustering.%0A-%20%5BSiamese%20Network%20%26%20Triplet%20Loss%5D%28https%3A//towardsdatascience.com/siamese-network-triplet-loss-b4ca82c1aec8%29