FCN

Fully Convolution Networks for Semantic Segmentation[1]

作者是来自UC Berkeley的Jonathan Long, Evan Shelhamer, Trevor Darrell. 论文引用[1]:Shelhamer, Evan et al. “Fully convolutional networks for semantic segmentation.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014): 3431-3440.

Time

  • 2014.Nov

Key Words

  • fully convolutional network

动机

  1. 目的是建一个fully convolution network, 接收任意尺寸的输入,产生相应尺寸的输出 with efficient inference and learning.

总结

  1. 之前的工作用convnets for semantic segmentation, in which each pixel is labeled with the class of its enclosing object or region, 但是这么做有缺点。

  2. 作者们的方法,不需要pre- and post-processing complications, including superpixels, proposals or post-hoc refinement by random fields or local classifiers. global information resolves what while local information resolves where.

  3. 典型的recognition nets, 全连接层有固定的维度,没有了spatial coordinates. 然而,这些全连接层也能被视为convolutions with kernels that cover their entire input regions.

  4. 将coarse outputs 连接到dense pixels的另一个方式是interpolation. 用反卷积的方式来进行上采样,可以end-to-end learning.

  5. 不太明白这个pathwise training..

  6. 将之前的VGG, GoogLeNet的final classifier layer去掉, 换成 \(1 \times 1\)的卷积 with channel dimension 21 to predict scores at each of the coarse output locations, followed by a deconvolution layer to bilinearly upsample the coarse outputs to pixel-dense outputs.

FCN \(Fig.1^{[1]}\): Fully convolutional networks can efficiently learn to make dense predictions for per-pixel tasks like semantic segmentation.

Transforming \(Fig.2^{[1]}\): Transforming fully connected layers into convolution layers enables a classification net to output a heatmap. Adding layers and a spatial loss produces an efficient machine for end-to-end learning.