ByteTrack

ByteTrack: Multi-Object Tracking by Associating Every Detection Box[1]

作者是来自华科、港大和字节跳动的Yifu Zhang、Peize Sun、Yi Jiang、Dongdong Yu等人。论文引用[1]:Zhang, Yifu et al. “ByteTrack: Multi-Object Tracking by Associating Every Detection Box.” European Conference on Computer Vision (2021).

Time

  • 2021.Oct

Key Words

  • Multi-object tracking

动机

  1. objects with low detection boxes are simply thrown away, which brings non-negligible true object missing and fragmented trajectories。

总结

  1. 多目标追踪旨在估计bbox和目标的identity。大部分的方法通过关联detection box whose scores are higher than a threshold. objects with low detection scores, occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories.为了解决这个问题,提出了一个简单有效的关联方法,通过关联几乎所有的detection box而不是仅仅高分的box,来实现追踪。对于低分的detection box,利用它们与tracklets的相似度来恢复true objects和过滤掉background detections。当应用9个不同的SOTA trackers的时候,该方法有提高。
  1. Tracking-by-Detection 是当前MOT中最有效的范式。由于视频中的复杂场景,检测器会去做很多不完美的预测。SOTA的MOT方法,需要去处理true positive、false positive trade-off in detection box,来消除低置信度的detection box。然而,消除低置信度的detection box是正确的方式吗?作者的回答是: No。低置信度的detection box有时候indicate 目标的存在, e.g occluded objects。过滤掉这些目标回对MOT造成不可挽回的错误,带来不可忽视的missing detection和fragmented trajectories。

  2. 在文章中,与tracklets的相似度(similarity with tracklets)提供了一个strong cue来区分objects和background in low score detection boxes。为了在matching processing中,利用detection box from high scores to low ones,提出了一个简单有效的方法BYTE。每个detection box是tracklet的基本单元,as byte in computer program,首先基于运动相似度或者外观相似度,将high score detection box与tracklets进行关联。采用卡尔曼滤波来预测new frame中tracklets的位置。这个相似度可以通过predicted box和detection box的IoU或者Re-ID feature distance来进行计算。然后用相同的运动相似度执行第二次匹配between unmatched tracklets and low score detection

  3. 一个MOT的理想的solution is never a detector and the following association;besides,well-designed of their junction area is also important。BYTE的创新在于detection 和association的junction area,low score detection boxes are bridges to boost both of them。当BYTE应用于9个不同的trackers:包括: Re-ID based ones、motion based ones、chain-based one and attention based one。都有显著的提高。Bytetrack,采用最近的YOLOX检测器,来得到detection boxes,用提出的BYTE方法进行关联。

    • Tracking by Detection: 很多的方法利用powerful detectors来得到higher tracking performance。很多的方法是直接用single image上的detection box for tracking。
    • Detection by Tracking: Tracking也可以用来帮助得到更精确的detection boxes。一些方法利用SOT或者卡尔曼滤波来预测tracklets in the following frame中的位置,fuse predicted boxes with detectio boxes to enhance the detection results。