YOWO
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization[1]
作者是来自Technical Univ of Munich的Okan Kopuklu, Xiangyu Wei, Gerhard Rigoll。论文引用[1]:Köpüklü, Okan et al. “You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization.” ArXiv abs/1911.06644 (2019): n. pag.
Time
- 2019.Nov.15(v1)
- 2021.Oct.18(v5)
Key Words
- single-stage with two branches
总结
- 当前的网络抽取时序信息和keyframe的空间信息是用两个分开的网络,然后用一个额外的mechanism来融合得到detections。YOWO是一个单阶段的架构,有两个分支,来同时抽取当前的时序和空间信息,预测bboxes和action 的概率 directly from video clips in one evaluation。因为架构是统一的,因此可以端到端的优化。YOWO架构速度快,能够做到在16-frames input clips上做到 34 frames-per-second,62 frames-per-second on 8-frames input clips。是当前在STAD任务上最快的架构。