Deep Learning (12) - Object Detection

Edit


这一章主要介绍了当下大行其道的Computer Vision中的Object Detection,也就是目标检测。课程由浅入深,其间也深入介绍了目标检测算法的集大成者YOLO算法。

模型定义


输入是一张图片,经过CNN网络,输出预测标签。标签定义通常是:

  • Pc: 是否有需要检测的目标
  • : 代表bounding box
  • : 代表三种分类,例如:car, pedestrian, motorcycle

损失函数(用最小二乘法):

检测方法

滑动窗口检测法


检测方法就是如图所示,先用最小的窗口截取图片的每个角落,跑一遍CNN,看看有没有命中的格子,如果没有,就换大一点的窗口,如果还没有就再换大一点的窗口,以此类推。这是最容易想到的方法,但是缺点就是计算量太大,效率极低。

滑动窗口的卷积实现

其本质还是利用窗口来采样图片,并输出该窗口中是否有检测对象。但滑动窗口类似串行算法,卷积实现类似于并行算法,其共享了很多计算步骤,效率更高。
下面将会对滑动窗口算法网络进行一步步的改造,最终实现卷积算法来一次性检测所有窗口的结果。

1. 使用1x1 convolution (Network in network)来取代FC层


假设就是14x14的窗口使用在14x14的图片上,最终运算出来的结果是一个1x1x4的volume,代表4个分类上的结果。

2. 增加图像尺寸后的情况


图像尺寸在长宽方向上都增加2个pixel,则图像尺寸是16x16。如果仍然采用14x14的窗口,则会需要4个窗口来检测这个16x16的图像。经过上述CNN网络迭代后,结果是4x4x4。其每一个格子代表了一个窗口的运算结果。这实际上就是滑动窗口的卷积实现了。

更大一点的图像会是怎么样的情况呢?


同理,如果是28x28的图像,则会有8个窗口来检测,最终产生的volume是8x8x4,代表这8个窗口的检测结果。
卷积算法通过一次性迭代,输出了一幅图片上所有分片的检测结果,其效率要远远大于滑动窗口的运算效率。

YOLO Algorithm

YOLO = You Only Look Once
这是公认比较高效的目标检测算法。据说这篇论文的难度也很高,比较难读懂。

IOU

IOU = Intersection Over Union,翻译成交并比。交集和并集的比例。如下图:


蓝色框是算法预测出的目标位置,红色框是实际目标位置。黄色阴影是交集,绿色阴影是并集。那么:

以IOU的大小来判断,这一次目标检测的质量。例如,我们通常要求交并比达到0.5或0.6以上,才算一次有效的检测。

Non-max Suppression Algorithm

翻译成非极大值抑制。
当有如下的检测结果的时候:


理论上,一个物体只属于一个box。但是,通常在预测的时候,每一个物体四周的box都有可能产生有效预测。Non-max suppression要做的就是过滤掉质量较差的预测,为每个物体只留下一个有效预测。具体做法如下


经过Non-max suppression过滤后,最终结果如下:


每个物体只保留一个有效检测。

Anchor Box

Anchor Box要解决的问题就是像下面这幅图,有两个不同形状的物体,其中心点重叠。也就是一个grid box中有两个物体,这个通过一般的y label是没法反映出来的。


通常的y label是:

解决办法就是设置Anchor Box,这样的图片就设置两个Anchor Box。当如果有更多的图片交叠的可能的时候,那就要设置多个Anchor Box。y修正成为:
y上半部分检测Anchor Box 1的物体,下半部分检测Anchor Box 2的物体。

YOLO算法

先将样本图片切割成网格,针对每个网格填充标签y。

  • 每个网格的左上点坐标为(0,0),右下角坐标为(1,1)。所以是真实物体尺寸针对网格的比例。所以上图右边车子的bounding box的长宽可能是0.4x0.9,左边车子的长宽比可能是0.5x0.6。
  • 如果有目标物体则,否则
  • 如果有可能有图片重叠在一个格子的时候,需要设置对应的Anchor Box

后面就是经典的CNN了,最终得出对y的预测


最终输出的图像,可能是如下:


采用non-max suppression来过滤出最终的预测结果。

参考文献

YOLO: Redmon et al., 2015, You Only Look Once: Unified real-time object detection

%23%20Deep%20Learning%20%2812%29%20-%20%20Object%20Detection%0A@%28myblog%29%5Bdeep%20learning%2C%20machine%20learning%5D%0A%u8FD9%u4E00%u7AE0%u4E3B%u8981%u4ECB%u7ECD%u4E86%u5F53%u4E0B%u5927%u884C%u5176%u9053%u7684Computer%20Vision%u4E2D%u7684Object%20Detection%uFF0C%u4E5F%u5C31%u662F%u76EE%u6807%u68C0%u6D4B%u3002%u8BFE%u7A0B%u7531%u6D45%u5165%u6DF1%uFF0C%u5176%u95F4%u4E5F%u6DF1%u5165%u4ECB%u7ECD%u4E86%u76EE%u6807%u68C0%u6D4B%u7B97%u6CD5%u7684%u96C6%u5927%u6210%u8005YOLO%u7B97%u6CD5%u3002%0A%23%23%20%u6A21%u578B%u5B9A%u4E49%0A%21%5BAlt%20text%7C600x0%5D%28./1537747343242.png%29%0A%u8F93%u5165%u662F%u4E00%u5F20%u56FE%u7247%uFF0C%u7ECF%u8FC7CNN%u7F51%u7EDC%uFF0C%u8F93%u51FA%u9884%u6D4B%u6807%u7B7E%u3002%u6807%u7B7E%u5B9A%u4E49%u901A%u5E38%u662F%uFF1A%0A%24%24y%20%3D%20%5Cbegin%7Bbmatrix%7D%20Pc%20%5C%5C%20b_x%20%5C%5C%20b_y%20%5C%5C%20b_h%20%5C%5C%20b_w%20%5C%5C%20c_1%20%5C%5C%20c_2%20%5C%5C%20c_3%20%5Cend%7Bbmatrix%7D%24%24%0A-%20Pc%3A%20%u662F%u5426%u6709%u9700%u8981%u68C0%u6D4B%u7684%u76EE%u6807%0A-%20%24b_x%2C%20b_y%2C%20b_h%2C%20b_w%24%3A%20%u4EE3%u8868bounding%20box%0A-%20%24c_1%2C%20c_2%2C%20c_3%24%3A%20%u4EE3%u8868%u4E09%u79CD%u5206%u7C7B%uFF0C%u4F8B%u5982%uFF1Acar%2C%20pedestrian%2C%20motorcycle%0A%0A%u635F%u5931%u51FD%u6570%28%u7528%u6700%u5C0F%u4E8C%u4E58%u6CD5%29%uFF1A%0A%24%24%5Cmathscr%20L%20%28%5Chat%20y%2C%20y%29%20%3D%20%0A%5Cbegin%7Bcases%7D%0A%5CSigma_i%20%28%5Chat%20y_i%20-%20y_i%29%5E2%20%26%20y_1%20%3D%201%20%5C%5C%20%28%5Chat%20y_1%20-%20y_1%29%5E2%20%26%20y_1%20%3D%200%0A%5Cend%7Bcases%7D%24%24%0A%0A%23%23%20%u68C0%u6D4B%u65B9%u6CD5%0A%23%23%23%20%u6ED1%u52A8%u7A97%u53E3%u68C0%u6D4B%u6CD5%0A%21%5BAlt%20text%5D%28./1537828997891.png%29%0A%u68C0%u6D4B%u65B9%u6CD5%u5C31%u662F%u5982%u56FE%u6240%u793A%uFF0C%u5148%u7528%u6700%u5C0F%u7684%u7A97%u53E3%u622A%u53D6%u56FE%u7247%u7684%u6BCF%u4E2A%u89D2%u843D%uFF0C%u8DD1%u4E00%u904DCNN%uFF0C%u770B%u770B%u6709%u6CA1%u6709%u547D%u4E2D%u7684%u683C%u5B50%uFF0C%u5982%u679C%u6CA1%u6709%uFF0C%u5C31%u6362%u5927%u4E00%u70B9%u7684%u7A97%u53E3%uFF0C%u5982%u679C%u8FD8%u6CA1%u6709%u5C31%u518D%u6362%u5927%u4E00%u70B9%u7684%u7A97%u53E3%uFF0C%u4EE5%u6B64%u7C7B%u63A8%u3002%u8FD9%u662F%u6700%u5BB9%u6613%u60F3%u5230%u7684%u65B9%u6CD5%uFF0C%u4F46%u662F%u7F3A%u70B9%u5C31%u662F%u8BA1%u7B97%u91CF%u592A%u5927%uFF0C%u6548%u7387%u6781%u4F4E%u3002%0A%0A%23%23%23%20%u6ED1%u52A8%u7A97%u53E3%u7684%u5377%u79EF%u5B9E%u73B0%0A%u5176%u672C%u8D28%u8FD8%u662F%u5229%u7528%u7A97%u53E3%u6765%u91C7%u6837%u56FE%u7247%uFF0C%u5E76%u8F93%u51FA%u8BE5%u7A97%u53E3%u4E2D%u662F%u5426%u6709%u68C0%u6D4B%u5BF9%u8C61%u3002%u4F46%u6ED1%u52A8%u7A97%u53E3%u7C7B%u4F3C%u4E32%u884C%u7B97%u6CD5%uFF0C%u5377%u79EF%u5B9E%u73B0%u7C7B%u4F3C%u4E8E%u5E76%u884C%u7B97%u6CD5%uFF0C%u5176%u5171%u4EAB%u4E86%u5F88%u591A%u8BA1%u7B97%u6B65%u9AA4%uFF0C%u6548%u7387%u66F4%u9AD8%u3002%0A%u4E0B%u9762%u5C06%u4F1A%u5BF9%u6ED1%u52A8%u7A97%u53E3%u7B97%u6CD5%u7F51%u7EDC%u8FDB%u884C%u4E00%u6B65%u6B65%u7684%u6539%u9020%uFF0C%u6700%u7EC8%u5B9E%u73B0%u5377%u79EF%u7B97%u6CD5%u6765%u4E00%u6B21%u6027%u68C0%u6D4B%u6240%u6709%u7A97%u53E3%u7684%u7ED3%u679C%u3002%0A%23%23%23%23%201.%20%u4F7F%u75281x1%20convolution%20%28Network%20in%20network%29%u6765%u53D6%u4EE3FC%u5C42%0A%21%5BAlt%20text%5D%28./1537916911046.png%29%0A%u5047%u8BBE%u5C31%u662F14x14%u7684%u7A97%u53E3%u4F7F%u7528%u572814x14%u7684%u56FE%u7247%u4E0A%uFF0C%u6700%u7EC8%u8FD0%u7B97%u51FA%u6765%u7684%u7ED3%u679C%u662F%u4E00%u4E2A1x1x4%u7684volume%uFF0C%u4EE3%u88684%u4E2A%u5206%u7C7B%u4E0A%u7684%u7ED3%u679C%u3002%0A%23%23%23%23%202.%20%u589E%u52A0%u56FE%u50CF%u5C3A%u5BF8%u540E%u7684%u60C5%u51B5%0A%21%5BAlt%20text%5D%28./1537917090503.png%29%0A%u56FE%u50CF%u5C3A%u5BF8%u5728%u957F%u5BBD%u65B9%u5411%u4E0A%u90FD%u589E%u52A02%u4E2Apixel%uFF0C%u5219%u56FE%u50CF%u5C3A%u5BF8%u662F16x16%u3002%u5982%u679C%u4ECD%u7136%u91C7%u752814x14%u7684%u7A97%u53E3%uFF0C%u5219%u4F1A%u9700%u89814%u4E2A%u7A97%u53E3%u6765%u68C0%u6D4B%u8FD9%u4E2A16x16%u7684%u56FE%u50CF%u3002%u7ECF%u8FC7%u4E0A%u8FF0CNN%u7F51%u7EDC%u8FED%u4EE3%u540E%uFF0C%u7ED3%u679C%u662F4x4x4%u3002%u5176%u6BCF%u4E00%u4E2A%u683C%u5B50%u4EE3%u8868%u4E86%u4E00%u4E2A%u7A97%u53E3%u7684%u8FD0%u7B97%u7ED3%u679C%u3002%u8FD9%u5B9E%u9645%u4E0A%u5C31%u662F%u6ED1%u52A8%u7A97%u53E3%u7684%u5377%u79EF%u5B9E%u73B0%u4E86%u3002%0A%0A%u66F4%u5927%u4E00%u70B9%u7684%u56FE%u50CF%u4F1A%u662F%u600E%u4E48%u6837%u7684%u60C5%u51B5%u5462%uFF1F%0A%21%5BAlt%20text%5D%28./1537917281175.png%29%0A%u540C%u7406%uFF0C%u5982%u679C%u662F28x28%u7684%u56FE%u50CF%uFF0C%u5219%u4F1A%u67098%u4E2A%u7A97%u53E3%u6765%u68C0%u6D4B%uFF0C%u6700%u7EC8%u4EA7%u751F%u7684volume%u662F8x8x4%uFF0C%u4EE3%u8868%u8FD98%u4E2A%u7A97%u53E3%u7684%u68C0%u6D4B%u7ED3%u679C%u3002%0A%u5377%u79EF%u7B97%u6CD5%u901A%u8FC7%u4E00%u6B21%u6027%u8FED%u4EE3%uFF0C%u8F93%u51FA%u4E86%u4E00%u5E45%u56FE%u7247%u4E0A%u6240%u6709%u5206%u7247%u7684%u68C0%u6D4B%u7ED3%u679C%uFF0C%u5176%u6548%u7387%u8981%u8FDC%u8FDC%u5927%u4E8E%u6ED1%u52A8%u7A97%u53E3%u7684%u8FD0%u7B97%u6548%u7387%u3002%0A%0A%23%23%20YOLO%20Algorithm%0AYOLO%20%3D%20You%20Only%20Look%20Once%0A%u8FD9%u662F%u516C%u8BA4%u6BD4%u8F83%u9AD8%u6548%u7684%u76EE%u6807%u68C0%u6D4B%u7B97%u6CD5%u3002%u636E%u8BF4%u8FD9%u7BC7%u8BBA%u6587%u7684%u96BE%u5EA6%u4E5F%u5F88%u9AD8%uFF0C%u6BD4%u8F83%u96BE%u8BFB%u61C2%u3002%0A%23%23%23%20IOU%0AIOU%20%3D%20Intersection%20Over%20Union%uFF0C%u7FFB%u8BD1%u6210%u4EA4%u5E76%u6BD4%u3002%u4EA4%u96C6%u548C%u5E76%u96C6%u7684%u6BD4%u4F8B%u3002%u5982%u4E0B%u56FE%uFF1A%0A%21%5BAlt%20text%5D%28./1538001652894.png%29%0A%u84DD%u8272%u6846%u662F%u7B97%u6CD5%u9884%u6D4B%u51FA%u7684%u76EE%u6807%u4F4D%u7F6E%uFF0C%u7EA2%u8272%u6846%u662F%u5B9E%u9645%u76EE%u6807%u4F4D%u7F6E%u3002%u9EC4%u8272%u9634%u5F71%u662F%u4EA4%u96C6%uFF0C%u7EFF%u8272%u9634%u5F71%u662F%u5E76%u96C6%u3002%u90A3%u4E48%uFF1A%0A%24%24IOU%28%u4EA4%u5E76%u6BD4%29%20%3D%20%5Ccfrac%20%7B%u9EC4%u8272%u7684%u9762%u79EF%7D%7B%u7EFF%u8272%u7684%u9762%u79EF%7D%24%24%0A%u4EE5IOU%u7684%u5927%u5C0F%u6765%u5224%u65AD%uFF0C%u8FD9%u4E00%u6B21%u76EE%u6807%u68C0%u6D4B%u7684%u8D28%u91CF%u3002%u4F8B%u5982%uFF0C%u6211%u4EEC%u901A%u5E38%u8981%u6C42%u4EA4%u5E76%u6BD4%u8FBE%u52300.5%u62160.6%u4EE5%u4E0A%uFF0C%u624D%u7B97%u4E00%u6B21%u6709%u6548%u7684%u68C0%u6D4B%u3002%0A%0A%23%23%23%20Non-max%20Suppression%20Algorithm%0A%u7FFB%u8BD1%u6210%u975E%u6781%u5927%u503C%u6291%u5236%u3002%0A%u5F53%u6709%u5982%u4E0B%u7684%u68C0%u6D4B%u7ED3%u679C%u7684%u65F6%u5019%uFF1A%0A%21%5BAlt%20text%7C400x0%5D%28./1538002346979.png%29%0A%u7406%u8BBA%u4E0A%uFF0C%u4E00%u4E2A%u7269%u4F53%u53EA%u5C5E%u4E8E%u4E00%u4E2Abox%u3002%u4F46%u662F%uFF0C%u901A%u5E38%u5728%u9884%u6D4B%u7684%u65F6%u5019%uFF0C%u6BCF%u4E00%u4E2A%u7269%u4F53%u56DB%u5468%u7684box%u90FD%u6709%u53EF%u80FD%u4EA7%u751F%u6709%u6548%u9884%u6D4B%u3002Non-max%20suppression%u8981%u505A%u7684%u5C31%u662F%u8FC7%u6EE4%u6389%u8D28%u91CF%u8F83%u5DEE%u7684%u9884%u6D4B%uFF0C%u4E3A%u6BCF%u4E2A%u7269%u4F53%u53EA%u7559%u4E0B%u4E00%u4E2A%u6709%u6548%u9884%u6D4B%u3002%u5177%u4F53%u505A%u6CD5%u5982%u4E0B%0A%21%5BAlt%20text%7C400x0%5D%28./1538002565930.png%29%0A%u7ECF%u8FC7Non-max%20suppression%u8FC7%u6EE4%u540E%uFF0C%u6700%u7EC8%u7ED3%u679C%u5982%u4E0B%uFF1A%0A%21%5BAlt%20text%7C400x0%5D%28./1538002626027.png%29%0A%u6BCF%u4E2A%u7269%u4F53%u53EA%u4FDD%u7559%u4E00%u4E2A%u6709%u6548%u68C0%u6D4B%u3002%0A%0A%23%23%23%20Anchor%20Box%0AAnchor%20Box%u8981%u89E3%u51B3%u7684%u95EE%u9898%u5C31%u662F%u50CF%u4E0B%u9762%u8FD9%u5E45%u56FE%uFF0C%u6709%u4E24%u4E2A%u4E0D%u540C%u5F62%u72B6%u7684%u7269%u4F53%uFF0C%u5176%u4E2D%u5FC3%u70B9%u91CD%u53E0%u3002%u4E5F%u5C31%u662F%u4E00%u4E2Agrid%20box%u4E2D%u6709%u4E24%u4E2A%u7269%u4F53%uFF0C%u8FD9%u4E2A%u901A%u8FC7%u4E00%u822C%u7684y%20label%u662F%u6CA1%u6CD5%u53CD%u6620%u51FA%u6765%u7684%u3002%0A%21%5BAlt%20text%5D%28./1538002823587.png%29%0A%u901A%u5E38%u7684y%20label%u662F%uFF1A%0A%24%24y%20%3D%20%5Cbegin%7Bbmatrix%7D%20Pc%20%5C%5C%20b_x%20%5C%5C%20b_y%20%5C%5C%20b_h%20%5C%5C%20b_w%20%5C%5C%20c_1%20%5C%5C%20c_2%20%5C%5C%20c_3%20%5Cend%7Bbmatrix%7D%24%24%0A%u89E3%u51B3%u529E%u6CD5%u5C31%u662F%u8BBE%u7F6EAnchor%20Box%uFF0C%u8FD9%u6837%u7684%u56FE%u7247%u5C31%u8BBE%u7F6E%u4E24%u4E2AAnchor%20Box%u3002%u5F53%u5982%u679C%u6709%u66F4%u591A%u7684%u56FE%u7247%u4EA4%u53E0%u7684%u53EF%u80FD%u7684%u65F6%u5019%uFF0C%u90A3%u5C31%u8981%u8BBE%u7F6E%u591A%u4E2AAnchor%20Box%u3002y%u4FEE%u6B63%u6210%u4E3A%uFF1A%0A%24%24y%20%3D%20%5Cbegin%7Bbmatrix%7D%20Pc%20%5C%5C%20b_x%20%5C%5C%20b_y%20%5C%5C%20b_h%20%5C%5C%20b_w%20%5C%5C%20c_1%20%5C%5C%20c_2%20%5C%5C%20c_3%20%5C%5C%20Pc%20%5C%5C%20b_x%20%5C%5C%20b_y%20%5C%5C%20b_h%20%5C%5C%20b_w%20%5C%5C%20c_1%20%5C%5C%20c_2%20%5C%5C%20c_3%5Cend%7Bbmatrix%7D%24%24%0Ay%u4E0A%u534A%u90E8%u5206%u68C0%u6D4BAnchor%20Box%201%u7684%u7269%u4F53%uFF0C%u4E0B%u534A%u90E8%u5206%u68C0%u6D4BAnchor%20Box%202%u7684%u7269%u4F53%u3002%0A%0A%23%23%23%20YOLO%u7B97%u6CD5%0A%u5148%u5C06%u6837%u672C%u56FE%u7247%u5207%u5272%u6210%u7F51%u683C%uFF0C%u9488%u5BF9%u6BCF%u4E2A%u7F51%u683C%u586B%u5145%u6807%u7B7Ey%u3002%0A%21%5BAlt%20text%7C300x0%5D%28./1538003638979.png%29%0A-%20%u6BCF%u4E2A%u7F51%u683C%u7684%u5DE6%u4E0A%u70B9%u5750%u6807%u4E3A%280%2C0%29%uFF0C%u53F3%u4E0B%u89D2%u5750%u6807%u4E3A%281%2C1%29%u3002%u6240%u4EE5%24b_x%2C%20b_y%20%5Cin%20%280%2C1%29%24%uFF0C%24b_h%2C%20b_w%24%u662F%u771F%u5B9E%u7269%u4F53%u5C3A%u5BF8%u9488%u5BF9%u7F51%u683C%u7684%u6BD4%u4F8B%u3002%u6240%u4EE5%u4E0A%u56FE%u53F3%u8FB9%u8F66%u5B50%u7684bounding%20box%u7684%u957F%u5BBD%u53EF%u80FD%u662F0.4x0.9%uFF0C%u5DE6%u8FB9%u8F66%u5B50%u7684%u957F%u5BBD%u6BD4%u53EF%u80FD%u662F0.5x0.6%u3002%0A-%20%u5982%u679C%u6709%u76EE%u6807%u7269%u4F53%u5219%24P_c%20%3D%201%24%uFF0C%u5426%u5219%24P_c%20%3D%200%24%0A-%20%u5982%u679C%u6709%u53EF%u80FD%u6709%u56FE%u7247%u91CD%u53E0%u5728%u4E00%u4E2A%u683C%u5B50%u7684%u65F6%u5019%uFF0C%u9700%u8981%u8BBE%u7F6E%u5BF9%u5E94%u7684Anchor%20Box%0A%0A%u540E%u9762%u5C31%u662F%u7ECF%u5178%u7684CNN%u4E86%uFF0C%u6700%u7EC8%u5F97%u51FA%u5BF9y%u7684%u9884%u6D4B%24%5Chat%20y%24%0A%21%5BAlt%20text%7C600x0%5D%28./1538051571327.png%29%0A%u6700%u7EC8%u8F93%u51FA%u7684%u56FE%u50CF%uFF0C%u53EF%u80FD%u662F%u5982%u4E0B%uFF1A%0A%21%5BAlt%20text%7C300x0%5D%28./1538051783160.png%29%0A%u91C7%u7528non-max%20suppression%u6765%u8FC7%u6EE4%u51FA%u6700%u7EC8%u7684%u9884%u6D4B%u7ED3%u679C%u3002%0A%0A%0A%0A%23%23%20%u53C2%u8003%u6587%u732E%0AYOLO%3A%20Redmon%20et%20al.%2C%202015%2C%20You%20Only%20Look%20Once%3A%20Unified%20real-time%20object%20detection%0A%0A%0A%0A%0A