Visual Object Tracking Survey
Visual Object Tracking(VOT)
OC-SORT
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking[1]
作者是来自CMU、上海AI Lab和英伟达的Junkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, Kris Kitani. 论文引用[1]:Cao, Jinkun et al. “Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 9686-9696.
Time
- 2023.Feb
Key Words
- limitations of SORT: sensitivity to the noise of state estimations, error accumulation over time and being estimation-centric
- Observation-Centric SORT, Simple,Online, and Real-Time
- occlusion and non-linear object motion
OSTrack
Mixformer
MixFormer: End-to-End Tracking with Iterative Mixed Attention[1]
作者是来自南大的Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu. 论文引用[1]:Cui, Yutao et al. “MixFormer: End-to-End Tracking with Iterative Mixed Attention.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 13598-13608.
Time
- 2022.Mar
Key Words
- compact tracking framework
- unify the feature extraction and target integration solely with a transformer-based architecture
VOT,MOT,SOT的区别
- VOT是标首帧,MOT是多目标追踪,SOT是单目标追踪。
动机
Tracking经常用多阶段的pipeline:feature extraction,target information integration,bounding box estimation。为了简化这个pipeline,作者提出了一个紧凑的tracking框架,名为MixFormer。
target information integration解释: fuse the target and search region information
Ubuntu with Nvidia Drivers
在Ubuntu 20.04里安装 Nvidia RTX 3060显卡的驱动
之前照着网上的教程弄过一次,记得是通过命令行来弄的,结果搞得黑屏,好不容易解决了黑屏的问题,进入桌面之后,显示不了Wifi和蓝牙,好像缺了很多东西,搞得很狂躁。这两天跑Tracking,大部分是Ubuntu环境下的,就趁这次机会重装一下系统,然后找个新的教程
找到了两个方式:
- 通过 Ubuntu 自带的Software & updates里头的additional drivers里,能看到有 Nvidia的显卡驱动,勾一个合适的就行,很简单。。全程没有什么bug。。害得我上次弄了好久。
- 去nvidia官网上下载驱动,名称一般是 Nvidia-Linux-xxx.run,运行的时候需要先禁用掉 nouveau,在哪个文件里加上: blacklist nouveau, options nouveau modeset=0,然后重启,看看lsmod一下,看看nouveau有没有被禁用掉。然后运行 .run文件,在运行.run文件的时候:提示可以用Ubuntu里 additional drivers来安装。
DropMAE
Masked Autoencoders with Saptial-Attention Dropout for Tracking Tasks[1]
作者是来自CityU、IDEA、Tecent AI Lab、CUHK(SZ)的 Qiangqiang Wu、Tianyu Yang、Ziquan Liu、Baoyuan Wu、Ying Shan、Antoni B.Chan. 论文引用[1]:Wu, Qiangqiang et al. “DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023): 14561-14571.
Time
- 2023.Apr
Key Words
- masked autoencoder
- temporal matching-based
- spatial-attention dropout
动机
- 将MAE应用到下游任务如: visual object tracking(VOT) and video object segmentation(VOS). 简单的扩展MAE是mask out frame patches in videos and reconstruct the frame pixels.然而作者发现这个会严重依赖于spatial cues, 当进行frame reconstruction的时候忽略temporal relations, 这个导致sub-optimal temporal matching representations for VOT and VOS.
FCN
Fully Convolution Networks for Semantic Segmentation[1]
作者是来自UC Berkeley的Jonathan Long, Evan Shelhamer, Trevor Darrell. 论文引用[1]:Shelhamer, Evan et al. “Fully convolutional networks for semantic segmentation.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014): 3431-3440.
Time
- 2014.Nov
Key Words
- fully convolutional network
动机
- 目的是建一个fully convolution network, 接收任意尺寸的输入,产生相应尺寸的输出 with efficient inference and learning.
AI Resources
AI 相关的Resources
####公开课
- CS231n, CS25
- 台大李宏毅的课
InternLM大模型开源社区
- 链接为:
https://aicarrier.feishu.cn/wiki/RPyhwV7GxiSyv7k1M5Mc9nrRnbd
,是一个飞书文档,挺全面的
CS自学
- csdiy
- 计算机专业学习路线:https://hackway.org/docs/cs/intro
个人博客
- 苏剑林博客:https://spaces.ac.cn/
- https://lilianweng.github.io/
工具网站
- AI Paper Collector
- Paper with code
- HuggingFace docs
- AI Conference Deadline: https://aideadlin.es/?sub=ML,CV,CG,NLP,RO,SP,DM,AP,KR,HCI
- 深度学习实验管理wandb
IEEE论文的LaTex Template
- 可以在这里找:
https://journals.ieeeauthorcenter.ieee.org/create-your-ieee-journal-article/authoring-tools-and-templates/tools-for-ieee-authors/ieee-article-templates/
参考链接:
季恩比特的微博
FCOS
FCOS: Fully Convolutional One-Stage Object Detection[1]
作者是来自澳大利亚的阿德莱德大学的ZhiTian, Chunhua Shen, Hao Chen, Tong He. 论文引用[1]:Tian, Zhi et al. “FCOS: Fully Convolutional One-Stage Object Detection.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 9626-9635.
Time
- 2019.Apr
Key Words
- one-stage
- FCN
- per-pixel prediction fashion
动机
- 基于anchor的检测器的一些缺点:对于一些超参数敏感:例如aspect ratio,etc;计算量大;处理一些large shape variations的物体的时候有困难。