AVA Dataset
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions[1]
作者是来自Google Research、Inria Laboratoire Jean Kuntzmann, Grenoble, France, UC Berkeley的Chunhui Gu、Chen Sun、David A.Ross等人。论文引用[1]:Gu, Chunhui et al. “AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017): 6047-6056.
Time
- 2017.May
Key Words
- aotmic visual actions rather than composite actions
- precise spatio-temporal annotations with possibly multiple annotations for each person
- exhaustive annotation of these atomic actions over 15-minute video clips
- people temporally linked across consecutive segments
总结
dataset is sourced from 15th - 30th minute time intervals of 430 different movies, which given 1 Hz sampling frequency gives us nearly 900 keyframes for each movie. In each keyframe, every person is labeled with (possibly multiple) actions from AVA vocabulary. Each person is linked to the consecutive keyframes to provide short temporal sequences of action labels.