Mixformer
MixFormer: End-to-End Tracking with Iterative Mixed Attention[1]
作者是来自南大的Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu. 论文引用[1]:Cui, Yutao et al. “MixFormer: End-to-End Tracking with Iterative Mixed Attention.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 13598-13608.
Time
- 2022.Mar
Key Words
- compact tracking framework
- unify the feature extraction and target integration solely with a transformer-based architecture
VOT,MOT,SOT的区别
- VOT是标首帧,MOT是多目标追踪,SOT是单目标追踪。
动机
Tracking经常用多阶段的pipeline:feature extraction,target information integration,bounding box estimation。为了简化这个pipeline,作者提出了一个紧凑的tracking框架,名为MixFormer。
target information integration解释: fuse the target and search region information
Ubuntu with Nvidia Drivers
在Ubuntu 20.04里安装 Nvidia RTX 3060显卡的驱动
之前照着网上的教程弄过一次,记得是通过命令行来弄的,结果搞得黑屏,好不容易解决了黑屏的问题,进入桌面之后,显示不了Wifi和蓝牙,好像缺了很多东西,搞得很狂躁。这两天跑Tracking,大部分是Ubuntu环境下的,就趁这次机会重装一下系统,然后找个新的教程
找到了两个方式:
- 通过 Ubuntu 自带的Software & updates里头的additional drivers里,能看到有 Nvidia的显卡驱动,勾一个合适的就行,很简单。。全程没有什么bug。。害得我上次弄了好久。
- 去nvidia官网上下载驱动,名称一般是 Nvidia-Linux-xxx.run,运行的时候需要先禁用掉 nouveau,在哪个文件里加上: blacklist nouveau, options nouveau modeset=0,然后重启,看看lsmod一下,看看nouveau有没有被禁用掉。然后运行 .run文件,在运行.run文件的时候:提示可以用Ubuntu里 additional drivers来安装。
DropMAE
Masked Autoencoders with Saptial-Attention Dropout for Tracking Tasks[1]
作者是来自CityU、IDEA、Tecent AI Lab、CUHK(SZ)的 Qiangqiang Wu、Tianyu Yang、Ziquan Liu、Baoyuan Wu、Ying Shan、Antoni B.Chan. 论文引用[1]:Wu, Qiangqiang et al. “DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023): 14561-14571.
Time
- 2023.Apr
Key Words
- masked autoencoder
- temporal matching-based
- spatial-attention dropout
动机
- 将MAE应用到下游任务如: visual object tracking(VOT) and video object segmentation(VOS). 简单的扩展MAE是mask out frame patches in videos and reconstruct the frame pixels.然而作者发现这个会严重依赖于spatial cues, 当进行frame reconstruction的时候忽略temporal relations, 这个导致sub-optimal temporal matching representations for VOT and VOS.
FCN
Fully Convolution Networks for Semantic Segmentation[1]
作者是来自UC Berkeley的Jonathan Long, Evan Shelhamer, Trevor Darrell. 论文引用[1]:Shelhamer, Evan et al. “Fully convolutional networks for semantic segmentation.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014): 3431-3440.
Time
- 2014.Nov
Key Words
- fully convolutional network
动机
- 目的是建一个fully convolution network, 接收任意尺寸的输入,产生相应尺寸的输出 with efficient inference and learning.
AI Resources
AI 相关的Resources
####公开课
- CS231n, CS25
- 台大李宏毅的课
InternLM大模型开源社区
- 链接为:
https://aicarrier.feishu.cn/wiki/RPyhwV7GxiSyv7k1M5Mc9nrRnbd
,是一个飞书文档,挺全面的
CS自学
- csdiy
- 计算机专业学习路线:https://hackway.org/docs/cs/intro
个人博客
- 苏剑林博客:https://spaces.ac.cn/
- https://lilianweng.github.io/
工具网站
- AI Paper Collector
- Paper with code
- HuggingFace docs
- AI Conference Deadline: https://aideadlin.es/?sub=ML,CV,CG,NLP,RO,SP,DM,AP,KR,HCI
- 深度学习实验管理wandb
IEEE论文的LaTex Template
- 可以在这里找:
https://journals.ieeeauthorcenter.ieee.org/create-your-ieee-journal-article/authoring-tools-and-templates/tools-for-ieee-authors/ieee-article-templates/
参考链接:
季恩比特的微博
FCOS
FCOS: Fully Convolutional One-Stage Object Detection[1]
作者是来自澳大利亚的阿德莱德大学的ZhiTian, Chunhua Shen, Hao Chen, Tong He. 论文引用[1]:Tian, Zhi et al. “FCOS: Fully Convolutional One-Stage Object Detection.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 9626-9635.
Time
- 2019.Apr
Key Words
- one-stage
- FCN
- per-pixel prediction fashion
动机
- 基于anchor的检测器的一些缺点:对于一些超参数敏感:例如aspect ratio,etc;计算量大;处理一些large shape variations的物体的时候有困难。
CornetNet
CornerNet: Detecting Objects as Paired Keypoints[1]
作者是来自Princeton的Hei Law, Jia Deng.论文引用[1]:Law, Hei and Jia Deng. “CornerNet: Detecting Objects as Paired Keypoints.” International Journal of Computer Vision 128 (2018): 642 - 656.
Time
- 2018.Aug
Key Words
动机
总结
CenterNet
Object as Points[1]
作者是来自UT Austin, UC Berkeley的Xingyi Zhou,Dequan Wang, Philipp Krahenbuhl。论文引用[1]:Zhou, Xingyi et al. “Objects as Points.” ArXiv abs/1904.07850 (2019): n. pag.
Time
- 2019.Apr
Key Words
- model object as a single point -- center point of its bounding box
- keypoint estimation
动机
- 大多数的目标检测器会产生大量的潜在的object locations,and classify each,这是wasteful, inefficient, 需要很多后处理。
RetinaNet
Focal Loss for Dense Object Detection[1]
作者是来自FAIR的Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar.论文引用[1]:Lin, Tsung-Yi et al. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2017): 318-327.
Time
- 2017.Aug
Key Word
- Focal loss focues training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
- class imbalance between foreground and background classes during training.
- easy negatives