CornerNet: Detecting Objects as Paired Keypoints[1]

作者是来自Princeton的Hei Law, Jia Deng.论文引用[1]:Law, Hei and Jia Deng. “CornerNet: Detecting Objects as Paired Keypoints.” International Journal of Computer Vision 128 (2018): 642 - 656.

Time

  • 2018.Aug

Key Words

动机

总结

Object as Points[1]

作者是来自UT Austin, UC Berkeley的Xingyi Zhou,Dequan Wang, Philipp Krahenbuhl。论文引用[1]:Zhou, Xingyi et al. “Objects as Points.” ArXiv abs/1904.07850 (2019): n. pag.

Time

  • 2019.Apr

Key Words

  • model object as a single point -- center point of its bounding box
  • keypoint estimation

动机

  1. 大多数的目标检测器会产生大量的潜在的object locations,and classify each,这是wasteful, inefficient, 需要很多后处理。
阅读全文 »

Focal Loss for Dense Object Detection[1]

作者是来自FAIR的Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar.论文引用[1]:Lin, Tsung-Yi et al. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2017): 318-327.

Time

  • 2017.Aug

Key Word

  • Focal loss focues training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
  • class imbalance between foreground and background classes during training.
  • easy negatives
阅读全文 »

Feature Pyramid Networks for Object Detection[1]

作者是来自FAIR和Cornell的Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie.论文引用[1]:Lin, Tsung-Yi et al. “Feature Pyramid Networks for Object Detection.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 936-944.

Time

  • 2017.Apr

Key Word

  • multi-scale, pyramidal hierarchy
  • top-down architecture with lateral connections
  • high-level sematic feature maps at all scales.
阅读全文 »

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[1]

作者是何恺明,张祥宇,任少卿和孙剑。论文引用[1]:He, Kaiming et al. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2014): 1904-1916.

Time

  • 2014.Jun

Key Words

  • spatial pyramid pooling

动机

  1. 当前的CNNs要求输入图片有fixed-size, 这个要求可能会降低recognition accuracy for images or sub-images of an arbitrary size.
阅读全文 »

Integrally Migrating Pre-trained Transformer Encoder-Decoders for Visual Object Detection[1]

作者是来自国科大和清华的Qixiang Ye老师团队的,论文引用[1]:Zhang, Xiaosong et al. “Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection.” 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2022): 6802-6811.

Time

  • 2022.Dec

Key Words

  • Pretrained Encoder-Decoder for object detection
  • multi-scale feature modulator
  • few-shot object detection

动机

  • MAE基于MIM的代理任务,pre-trains encoder-decoder representation models,encoders for feature extraction and decoders for image context modeling. MAE的decoder的spatial context modeling是否对object localization有益?

  • 在看了MAE、DETR之后,没看这篇文章之前,有了和这篇文章差不多的思路。。

阅读全文 »

Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification[1]

作者是来自CMU的Gerard Maggiolino, Adnan Ahmad, Jinkun Cao, Kris Kitani.论文引用[1]:Maggiolino, Gerard et al. “Deep OC-Sort: Multi-Pedestrian Tracking by Adaptive Re-Identification.” 2023 IEEE International Conference on Image Processing (ICIP) (2023): 3025-3029.

Time

  • 2023.Feb

Key Words

  • Kalman filter
  • adding appearance cues to motion-based object association
  • camera motion compensation
  • introduce visual appearance to OC-SORT

动机

  1. 基于运动的关联 for MOT在最近的一些检测器中,重新占了主导地位。尽管如此,在启发式的、缺乏对特征退化的robustness的模型之外,很少工作考虑到吸收外观cues。本文中,作者提出了一个新的方式,来利用目标的外观,自适应地将外观匹配(appearance matching)集成到现有的基于motion的高性能的方法中。
阅读全文 »

ByteTrack: Multi-Object Tracking by Associating Every Detection Box[1]

作者是来自华科、港大和字节跳动的Yifu Zhang、Peize Sun、Yi Jiang、Dongdong Yu等人。论文引用[1]:Zhang, Yifu et al. “ByteTrack: Multi-Object Tracking by Associating Every Detection Box.” European Conference on Computer Vision (2021).

Time

  • 2021.Oct

Key Words

  • Multi-object tracking

动机

  1. objects with low detection boxes are simply thrown away, which brings non-negligible true object missing and fragmented trajectories。

总结

  1. 多目标追踪旨在估计bbox和目标的identity。大部分的方法通过关联detection box whose scores are higher than a threshold. objects with low detection scores, occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories.为了解决这个问题,提出了一个简单有效的关联方法,通过关联几乎所有的detection box而不是仅仅高分的box,来实现追踪。对于低分的detection box,利用它们与tracklets的相似度来恢复true objects和过滤掉background detections。当应用9个不同的SOTA trackers的时候,该方法有提高。
阅读全文 »

Learning Transferable Visual Models From Natural Language Supervision[1]

作者是来自OpenAI的Alec Radford, Jong Wook Kim, Chris Halacy, Aditya Ramesh, Gabriel Goh, Sandhini, Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, IIya Sustkever.论文引用[1]:Radford, Alec et al. “Learning Transferable Visual Models From Natural Language Supervision.” International Conference on Machine Learning (2021).

Time

  • Feb.2021

Key Words

  • image-text pairs
  • CLIP: Contrastive Language-Image Pre-training
  • Learning from natural language supervision
  • perform a wide set of tasks during pre-training including OCR,geo-localization, action recognition, and more
阅读全文 »

再见,四月

在4月中旬,自己学会了蛙泳,感觉真的好棒,从3月开始,基本每周去一次游泳馆,室友带我游了一次之后,后面虽然,呛了好多次水,喝了很多水😹;但是,在4月初的某个周六,自己下水,一下子就找到感觉了,第一次,差一点能游到对岸;后面每次有进步:能够一口气游到对岸;中间小歇几分钟,就能够游一个来回。感觉很棒!!!技能+1😁

另外,听到了很好听的专辑,真的上头!!🎶🎸

五月

多读Paper,多去Coding💻!继续加油!!

0%