Young's Blog

OSTrack

发表于 2024-05-22 本文字数： 0 阅读时长 ≈ 1 分钟

Mixformer

发表于 2024-05-22 更新于 2024-05-31 分类于 Papers 本文字数： 1.9k 阅读时长 ≈ 7 分钟

MixFormer: End-to-End Tracking with Iterative Mixed Attention^[1]

作者是来自南大的Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu. 论文引用[1]:Cui, Yutao et al. “MixFormer: End-to-End Tracking with Iterative Mixed Attention.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 13598-13608.

Time

2022.Mar

Key Words

compact tracking framework
unify the feature extraction and target integration solely with a transformer-based architecture

VOT，MOT,SOT的区别

VOT是标首帧，MOT是多目标追踪，SOT是单目标追踪。

动机

Tracking经常用多阶段的pipeline：feature extraction，target information integration，bounding box estimation。为了简化这个pipeline，作者提出了一个紧凑的tracking框架，名为MixFormer。
target information integration解释： fuse the target and search region information

阅读全文 »

Ubuntu with Nvidia Drivers

发表于 2024-05-22 分类于 methods 本文字数： 267 阅读时长 ≈ 1 分钟

在Ubuntu 20.04里安装 Nvidia RTX 3060显卡的驱动

之前照着网上的教程弄过一次，记得是通过命令行来弄的，结果搞得黑屏，好不容易解决了黑屏的问题，进入桌面之后，显示不了Wifi和蓝牙，好像缺了很多东西，搞得很狂躁。这两天跑Tracking，大部分是Ubuntu环境下的，就趁这次机会重装一下系统，然后找个新的教程
找到了两个方式：
- 通过 Ubuntu 自带的Software & updates里头的additional drivers里，能看到有 Nvidia的显卡驱动，勾一个合适的就行，很简单。。全程没有什么bug。。害得我上次弄了好久。
- 去nvidia官网上下载驱动，名称一般是 Nvidia-Linux-xxx.run，运行的时候需要先禁用掉 nouveau，在哪个文件里加上: blacklist nouveau, options nouveau modeset=0，然后重启，看看lsmod一下，看看nouveau有没有被禁用掉。然后运行 .run文件，在运行.run文件的时候：提示可以用Ubuntu里 additional drivers来安装。

DropMAE

发表于 2024-05-21 更新于 2024-05-22 分类于 Papers 本文字数： 912 阅读时长 ≈ 3 分钟

Masked Autoencoders with Saptial-Attention Dropout for Tracking Tasks^[1]

作者是来自CityU、IDEA、Tecent AI Lab、CUHK(SZ)的 Qiangqiang Wu、Tianyu Yang、Ziquan Liu、Baoyuan Wu、Ying Shan、Antoni B.Chan. 论文引用[1]:Wu, Qiangqiang et al. “DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023): 14561-14571.

Time

2023.Apr

Key Words

masked autoencoder
temporal matching-based
spatial-attention dropout

动机

将MAE应用到下游任务如: visual object tracking(VOT) and video object segmentation(VOS). 简单的扩展MAE是mask out frame patches in videos and reconstruct the frame pixels.然而作者发现这个会严重依赖于spatial cues, 当进行frame reconstruction的时候忽略temporal relations, 这个导致sub-optimal temporal matching representations for VOT and VOS.

阅读全文 »

FCN

发表于 2024-05-20 更新于 2024-05-21 分类于 Papers 本文字数： 361 阅读时长 ≈ 1 分钟

Fully Convolution Networks for Semantic Segmentation^[1]

作者是来自UC Berkeley的Jonathan Long, Evan Shelhamer, Trevor Darrell. 论文引用[1]:Shelhamer, Evan et al. “Fully convolutional networks for semantic segmentation.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014): 3431-3440.

Time

2014.Nov

Key Words

fully convolutional network

动机

目的是建一个fully convolution network, 接收任意尺寸的输入，产生相应尺寸的输出 with efficient inference and learning.

阅读全文 »

AI Resources

发表于 2024-05-13 更新于 2024-10-11 分类于 Tools 本文字数： 165 阅读时长 ≈ 1 分钟

AI 相关的Resources

####公开课

CS231n, CS25
台大李宏毅的课

InternLM大模型开源社区

链接为：https://aicarrier.feishu.cn/wiki/RPyhwV7GxiSyv7k1M5Mc9nrRnbd，是一个飞书文档，挺全面的

CS自学

csdiy
计算机专业学习路线：https://hackway.org/docs/cs/intro

个人博客

苏剑林博客：https://spaces.ac.cn/
https://lilianweng.github.io/

工具网站

AI Paper Collector
Paper with code
HuggingFace docs
AI Conference Deadline: https://aideadlin.es/?sub=ML,CV,CG,NLP,RO,SP,DM,AP,KR,HCI
深度学习实验管理wandb

IEEE论文的LaTex Template

可以在这里找：
- https://journals.ieeeauthorcenter.ieee.org/create-your-ieee-journal-article/authoring-tools-and-templates/tools-for-ieee-authors/ieee-article-templates/

参考链接：

季恩比特的微博

FCOS

发表于 2024-05-11 更新于 2024-05-20 分类于 Papers 本文字数： 1.1k 阅读时长 ≈ 4 分钟

FCOS: Fully Convolutional One-Stage Object Detection^[1]

作者是来自澳大利亚的阿德莱德大学的ZhiTian, Chunhua Shen, Hao Chen, Tong He. 论文引用[1]:Tian, Zhi et al. “FCOS: Fully Convolutional One-Stage Object Detection.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 9626-9635.

Time

2019.Apr

Key Words

one-stage
FCN
per-pixel prediction fashion

动机

基于anchor的检测器的一些缺点：对于一些超参数敏感：例如aspect ratio,etc；计算量大；处理一些large shape variations的物体的时候有困难。

阅读全文 »

CornetNet

发表于 2024-05-11 分类于 Papers 本文字数： 54 阅读时长 ≈ 1 分钟

CornerNet: Detecting Objects as Paired Keypoints^[1]

作者是来自Princeton的Hei Law, Jia Deng.论文引用[1]:Law, Hei and Jia Deng. “CornerNet: Detecting Objects as Paired Keypoints.” International Journal of Computer Vision 128 (2018): 642 - 656.

Time

2018.Aug

Key Words

动机

总结

CenterNet

发表于 2024-05-11 更新于 2024-05-19 分类于 Papers 本文字数： 686 阅读时长 ≈ 2 分钟

Object as Points^[1]

作者是来自UT Austin, UC Berkeley的Xingyi Zhou，Dequan Wang, Philipp Krahenbuhl。论文引用[1]:Zhou, Xingyi et al. “Objects as Points.” ArXiv abs/1904.07850 (2019): n. pag.

Time

2019.Apr

Key Words

model object as a single point -- center point of its bounding box
keypoint estimation

动机

大多数的目标检测器会产生大量的潜在的object locations，and classify each，这是wasteful, inefficient, 需要很多后处理。

阅读全文 »

RetinaNet

发表于 2024-05-11 更新于 2024-05-17 分类于 Papers 本文字数： 744 阅读时长 ≈ 3 分钟

Focal Loss for Dense Object Detection^[1]

作者是来自FAIR的Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar.论文引用[1]:Lin, Tsung-Yi et al. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2017): 318-327.

Time

2017.Aug

Key Word

Focal loss focues training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
class imbalance between foreground and background classes during training.
easy negatives

阅读全文 »

MixFormer: End-to-End Tracking with Iterative Mixed Attention[1]

Time

Key Words

VOT，MOT,SOT的区别

动机

在Ubuntu 20.04里安装 Nvidia RTX 3060显卡的驱动

Masked Autoencoders with Saptial-Attention Dropout for Tracking Tasks[1]

Time

Key Words

动机

Fully Convolution Networks for Semantic Segmentation[1]

Time

Key Words

动机

AI 相关的Resources

InternLM大模型开源社区

CS自学

个人博客

工具网站

IEEE论文的LaTex Template

参考链接：

FCOS: Fully Convolutional One-Stage Object Detection[1]

Time

Key Words

动机

CornerNet: Detecting Objects as Paired Keypoints[1]

Time

Key Words

动机

总结

Object as Points[1]

Time

Key Words

动机

Focal Loss for Dense Object Detection[1]

Time

Key Word

MixFormer: End-to-End Tracking with Iterative Mixed Attention^[1]

Masked Autoencoders with Saptial-Attention Dropout for Tracking Tasks^[1]

Fully Convolution Networks for Semantic Segmentation^[1]

FCOS: Fully Convolutional One-Stage Object Detection^[1]

CornerNet: Detecting Objects as Paired Keypoints^[1]

Object as Points^[1]

Focal Loss for Dense Object Detection^[1]