Train With VideoMAE

对VideoMAE进行训练或者微调的遇到的Bug

  1. 训练videoMAE时报错, File "/home/MAE-Action-Detection/run_class_finetuning.py", line 404, in main train_stats = train_one_epoch( File "/home/MAE-Action-Detection/engine_for_finetuning.py", line 59, in train_one_epoch for step, (samples, boxes, _) in enumerate(metric_logger.log_every(data_loader, print_freq, header)): File "/home/MAE-Action-Detection/utils.py", line 141, in log_every for obj in iterable: File "/home/anaconda3/envs/VideoMAE/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/anaconda3/envs/VideoMAE/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/anaconda3/envs/VideoMAE/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/anaconda3/envs/VideoMAE/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) File "av/error.pyx", line 78, in av.error.FFmpegError.init TypeError: init() takes at least 3 positional arguments (2 given)

解决方法是:将torch, torchvision等相应的版本升级到1.13就行了,原来是1.9

  1. AVA数据集格式的问题

参照AlphAction的格式,其中,boxs文件夹下的ava_det.json文件的中的bbox的格式默认是 x1,y1,w,h而不是x1,y1,x2,y2,所以后续的AVADataset里的有个地方,就是Box(mode="xyxy).convert("xywh")。如果bbox的格式是x1,y1,x2,y2,则不需要convert.("xywh"),如果是默认的x1,y1,w,h,则需要convert("xywh")。这是个大坑。。好像也没有看到作者有说明。。。