下载transformers的预训练模型时,使用bert-base-cased等模型在AutoTokenizer和AutoModel时并不会有太多问题。但在下载deberta-v3-base时可能会发生很多报错。
首先,
from transformers import AutoTokneizer, AutoModel, AutoConfig
checkpoint = 'microsoft/deberta-v3-base'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
此时会发生报错,提示
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
解决方法是
pip install transformers sentencepiece
继续导入tokenizer,又会有如下报错
ImportError:
DeberetaV2Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment.
按照说明,如果这个时候点开所给的链接,那么你可能会花上一下午时间也搞不好(我就是这样)
其实直接在环境当中
pip install --no-binary=protobuf protobuf
即可
注意,千万不要直接
pip install protobuf
否则有可能有如下报错
couldn't build proto file into descriptor pool: duplicate file name (sentencepiece_model.proto)
引用:
- 抱抱脸transformers 报错Couldn't instantiate the backend tokenizer - 简书
Couldn‘t build proto file into descriptor pool_小米爱大豆的博客-CSDN博客
Protobuf · Issue #10020 · huggingface/transformers · GitHub
版权归原作者 Guti Haz 所有, 如有侵权,请联系我们删除。