Spatio-Temporal
Action Detection Under Large Motion[1]
作者是来自ETHZ的Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher
Yu和Luc Van Gool。论文引用[1]:Singh, Gurkirt et al. “Spatio-Temporal
Action Detection Under Large Motion.” 2023 IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) (2022): 5998-6007.
Time
Key Words
- track information for feature aggregation rather than tube from
proposals
- 3 motion categories: large motion、medium motion、small motion
总结
- 当前的STAD的tube
detection的方法经常将一个给定的keyframe上的bbox
proposal扩展成一个3D temporal
cuboid,然后从邻近帧进行pool
features。如果actor的位置或者shape表现出了large 2D motion和variability
through frames,这样的pooling不能够积累有意义的spaito-temporal
features。在这个工作中,作者旨在研究cuboid-aware feature
aggregation in action detection under large
action。进一步,提出了在large
motion的情况下,通过tracking actors和进行temporal
feature aggregation along the respective tracks增强actor
feature representation,定义了在不同的固定的time scales下的actor
motion的IoU。有large motion的action会随着时间导致lower
IoU,slower actions会随着时间维持higher IoU。作者发现track-aware
feature aggregation持续地实现了很大的提升in action
detection。