Slowfast

SlowFast Networks for Video Recognition[1]

作者是来自FAIR的Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He.论文引用[1]:Feichtenhofer, Christoph et al. “SlowFast Networks for Video Recognition.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2018): 6201-6210.

Time

  • 2018.Dec

Key Words

  • Slow pathway to capture spatial semantics
  • lightweight Fast pathway to capture temporal motion and fine temporal resolution

动机

  1. all spatiotemporal orientations are not equally likely, there is no reason for us to treat space and time symmetrically.
  2. inspired by biological studies on the retinal ganglion cells in the primate visual system,受灵长类动物的视觉系统的视网膜神经节细胞的启发。一种Parvocellualr(P-cells)约80%,Magnocellualr(M-cells)约20%,
    • M-cells operates at high temporal frequency \(\rightarrow\) fast temporal changes
    • P-cells可以检测到空间信息:spatial detail and color, lower temporal resolution ### 总结
  3. 光流是手工设计的representation,two-stream methods不能端到端的与flow一起学习
  4. \(\alpha\) 是Slow 和Fast pathway的 framte rate ratio, \(\alpha\) > 1, is the key of SlowFast. Fast pathway has a ratio of \(\beta\) < 1 channels of the Slow pathway.因此Fast pathway的计算量要小一些。
  5. 用双边连接来fuse 两个pathway的信息,由于2个pathway有不同的temporal dimensions,因此需要进行transformation. Fast pathway 没有temporal downsampling layers。use non-degenerate temporal convolution。
  6. AVA Detection:
    • 在res5的最后一个特征图抽取ROI features, 将2D RoI at a frame 扩展到3D RoI by replicating it along temporal axis.然后通过RoIAlign计算RoI features,进行global average pooling temporally. RoI features经过max-pooled之后,fed to a per-class, sigmoid-based classifier for multi-label prediction
    • 作者这里用的off-the-shelf detector: 用Dectron来训练一个person-detector, ResNeXt-101-FPN + Faster R-CNN backbone。在ImageNet和COCO human keypoint images上进行预训练,然后再AVA 上进行person detection 微调。然后,region proposals for action detection are detected person boxes with a confidence of > 0.8

Framework \(Fig.1^{[1]}\) A SlowFast network has a low frame rate, low temporal resolution Slow pathway and a high frame rate, α× higher temporal resolution Fast pathway. The Fast pathway is lightweight by using a fraction (β, e.g., 1/8) of channels. Lateral connections fuse them.

用Slowfast 进行action recognition

踩了很多坑,终于能有结果了,或者说反馈了,能看到识别到动作,但是还存在很多问题。

Bugs

  1. libstdc++.so.6: version `GLIBCXX_3.4.20' not found

按照stackoverflow上的说法,answer1answer2,由于环境中有GLIBCXX_3.4.20,所以最后用export LD_LIBRARY_PATH=/path/to/lib:$LD_LIBRARY_PATH 解决了

  1. 训练完slowfast,但是推理的时候,没有任何结果

在配置文件.yaml里DEMO那个地方,需要设置已经训练好的针对所要检测物体的weights和yaml

构造AVA数据集

  1. 数据集的格式和要求

下次弄完了再更

  1. AVA的Google官网没有提供视频的下载,有视频下载的链接:

AVA格式数据集制作的相关参考链接: