问题
今天模型训练,遇到了个bug
先是在dataloder那报了这样一个错
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
然后后面报
RuntimeError: Trying to resize storage that is not resizable
完整错误代码如下
Traceback (most recent call last):
File "train_temp.py", line 100, in<module>fordatain train_dataloader:
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in collate
return[collate(samples, collate_fn_map=collate_fn_map)forsamplesin transposed]# Backwards compatibility.
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in<listcomp>return[collate(samples, collate_fn_map=collate_fn_map)forsamplesin transposed]# Backwards compatibility.
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 172, in collate_numpy_array_fn
return collate([torch.as_tensor(b)forbin batch], collate_fn_map=collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fn
out = elem.new(storage).resize_(len(batch), *list(elem.size()))
RuntimeError: Trying to resize storage that is not resizable
解决
一开始,在博客上看到是num_works设置有问题,需要设置为0 或 和显卡相同的数
当时,还是有点怀疑,因为之前还设置了16,显卡是4张,也没报错,还是尝试了下,看看问题解决没,(因为当时没想法了),果然,仍然报错
后来,看到这篇博客,感谢博主大大(点击),作者在末尾,提到数据维度不统一的问题,于是,就在dataloder中打印了下自己的数据维度,结果发现,输入的input和label的shape竟然不一样!!!!
一个是3843841,一个是2562561
要怀疑人生了>_<
然后,改了裁剪的大小,就好了^_^
琐碎
1 num_works是有多少个进程去加载数据,与显卡数量无关,只不过一般是相等,可以在训练的时候慢慢增加num_works直到加载数据速度无明显提升
2 数据集数据集!
版权归原作者 thwwu 所有, 如有侵权,请联系我们删除。