YOWOv3
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition[1]
作者是Nguyen Dang Duc Manh, Duong Viet Hang等人。论文引用[1]:Dang, Duc M et al. “YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition.” (2024).
Time
- 2024.Aug
Key Words
- one-stage detector
- different configurations to customie different model components
- efficient while reducing computational resource requirements
总结
- YOWOv3是YOWOv2的增强版,提供了更多的approach,用了不同的configurations来定制不同的model,YOWOv3比YOWOv2更好。
- STAD是计算机视觉中一个常见的任务,涉及到检测location(bbox),
timing(exact frame),and type(class of action),
需要对时间和空间特征进行建模。有很多的方法来解决STAD的问题,例如ViT,ViT的效果很好,但是计算量比较大。例如Hiera
model由超过600M的参数,VideoMAEv2由超过1B的参数,增加了训练的成本和消耗。为了解决STAD问题,同时最大程度减弱训练和推理时间的成本,有人提出用了YOWO方法,虽然可以做到实时,但是也有限制:不是一个efficient
model with low computational
requirements。框架的作者已经停止维护了,但是还有很多的问题。本文的contribution如下:
- new lightweight framework for STAD
- efficient model
- multiple pretrained resources for application:creating a range of pretrained resources spanning from lightweight to sophisticated models to cater to diverse requirements for real-world applications。