Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[1]

作者是来自MSRA的Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.论文引用[1]:Liu, Ze et al. “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 9992-10002.

Time

  • 2021.Mar

Key Words

  • Shifted windows
  • non-overlapping local windows
  • hierarchical feature maps
  • linear computational complexity to image size
  • much lower latency
阅读全文 »

Multiscale Vision Transformer

MViT的作者是来自FAIR和 UC Berkeley的Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhichegn Yan, Jitendra, Malik, Christoph Feichtenhofer。MViTv2的作者也是他们,多了Chao-Yuan Wu. 论文引用[1]:Fan, Haoqi et al. “Multiscale Vision Transformers.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 6804-6815. [2]:Li, Yanghao et al. “MViTv2: Improved Multiscale Vision Transformers for Classification and Detection.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021): 4794-4804.

Time

  • MViT:2021.Apr
  • MViTv2: 2021.Dec

MViT

动机

  1. posit that the fundamental vision principle of resolution and channel scaling, can be beneficial for transformer models across a variety of visual recognition tasks.

Key Words

  • connect seminal idea of multiscale feature hierarchy with transformer model
  • progressively expand the channel capacity, while pooling the resolution from input to output of the network.
阅读全文 »

Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection[1]

### Time

### Key Words

动机

总结

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection[1]

作者是来HKUST(Guangzhou)、HKUST、清华、IDEA研究院的Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M.Ni, Heung-Yeung Shum.论文引用[1]:Zhang, Hao et al. “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.” ArXiv abs/2203.03605 (2022): n. pag.

Time

  • 2022.Mar

Key Words

  • DETR
  • DeNoising Anchor
  • mixed query
阅读全文 »

Video Twin Transformer[1]

作者是来自MSRA、USTC、HUST、THU的Ze Liu, Jia Ning, Yue Cao, Yixuan Wei,Zheng Zhang, Stephen Lin, Han Hu。论文引用[1]:Liu, Ze et al. “Video Swin Transformer.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021): 3192-3201.

Time

  • 2021.Jun

动机

阅读全文 »

ViViT: A Video Vision Transformer[1]

作者是来自Google Research的Anurag Arnab, Mostafa Dehghani, Georg, Heigold, Chen Sun, Mario Lucic, Cordelia Schmid。论文引用[1]:Arnab, Anurag et al. “ViViT: A Video Vision Transformer.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 6816-6826.

Time

  • 2021.Jun

Key Words

  • spatio-temporal tokens
  • transformer
  • regularize model, factorising model along spatial and temporal dimensions to increase efficiency and scalability
阅读全文 »

总结一些使用torch过程中的常用的tips和相关知识点

  1. torch 中的dim:
Dimension
  1. Softmax 和Sigmoid函数:

https://zhuanlan.zhihu.com/p/525276061

https://www.cnblogs.com/cy0628/p/13921725.html

  1. torchscript 是pytorch模型的中间表示,Pytorch提供了一套JIT工具:Just-in-Time,允许用户将模型转换为Torchscript格式;保存后的torchscript模型可以在像C++这种高性能的环境中运行,torchscript是一种从pytorch代码创建可序列化和可优化模型的方法。任何torchscript程序都可以从python进程中加载,并加载到没有python解释器的环境中。torchscript能将动态图转化为静态图。torchscript常和torch.jit合起来用。两种方式:
    • torch.jit.trace:把模型和example输入进去,然后调用模型,记录下模型run的时候所进行的操作,但是有decision branch例如if-else这种,torch.jit.trace只是记录当前代码走的路径,control-flow被抹除了;生成的TorchScript模型可以直接用于推理,而不需要Python解释器。只支持前向传播。这意味着它不能用于训练或反向传播。此外,由于它是通过实际输入数据来跟踪模型的,因此它可能无法处理一些边缘情况或异常输入
    • torch.jit.script:这有分支的情况下,用torch.jit.script。forward方法会被默认编译,forward中被调用的方法也会按照被调用的顺序被编译;相比之下,torch.jit.script允许用户将整个训练循环(包括前向传播和反向传播)转换为TorchScript模型。这意味着它可以直接用于模型的训练和验证。torch.jit.script可以处理更广泛的模型和计算图,并且可以更好地处理异常情况。此外,它还支持自定义类和函数,这使得它更加灵活和强大
    • 如果想要方法不被编译,可以使用 @torch.jit.ignore 或者 @torch.jit.unused
    • 把pytorch模型部署到c++平台上的流程主要是:模型转换、保存序列化模型、C++中加载序列化的pytorch模型以及执行script module
    • 相关链接:
      • https://mp.weixin.qq.com/s/7JjRGgg1mKlIRuSyPC9tmg
      • https://blog.csdn.net/hxxjxw/article/details/120835884
      • Pytorch中文翻译的网站:https://pytorch.ac.cn/docs/stable/index.html
      • https://developer.baidu.com/article/detail.html?id=2995518
      • https://pytorch.panchuang.net/EigthSection/torchScript/
  2. 常用的注意力模块的一些链接:
    • https://www.cnblogs.com/wxkang/p/17133460.html,各种注意力机制
    • https://www.cnblogs.com/Fish0403/p/17221430.html, SE和CBAM
    • https://cloud.tencent.com/developer/article/1776357, Vision Transformer的综述
  3. 一个好用的可视化工具:torchinfo:
    • pip install torchinfo,能够查看网络模型的输入输出,尺寸,参数量等各类型的指标,方便理解模型。

RCNN[1]、Fast RCNN[2]、Faster RCNN[3] Mask RCNN[4]系列

  1. 目标检测的two-stage 方法的系列,从RCNN 到Faster RCNN,RCNN的作者是来自UC Berkeley的Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik;Fast RCNN的作者是来自Microsoft Research的Ross Girshick;Faster RCNN的作者是来自Microsoft Research的Shaoqing Ren, Kaiming He, Ross Girshick, 和孙剑; Mask RCNN的作者是论文引用[1]:Girshick, Ross B. et al. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” 2014 IEEE Conference on Computer Vision and Pattern Recognition (2013): 580-587. [2]:Girshick, Ross B.. “Fast R-CNN.” (2015).,[3]:>作者是来自Microsoft Research的Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun.[4]:

相关资源

这是一个openmmlab中算法的一些教程和学习资料,介绍的还不错

openmmlab Book

这个是百度PaddlePaddle的资料:

PaddlePaddle Edu

Time

  • RCNN: 2013.Nov
  • Fast RCNN: 2015.Apr
  • Faster RCNN: 2015.Jun
阅读全文 »
0%