Huggingface Transformers Deberta-v3-base安装踩坑记录

下载transformers的预训练模型时，使用bert-base-cased等模型在AutoTokenizer和AutoModel时并不会有太多问题。但在下载deberta-v3-base时可能会发生很多报错。

首先，

from transformers import AutoTokneizer, AutoModel, AutoConfig
checkpoint = 'microsoft/deberta-v3-base'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

此时会发生报错，提示

ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

解决方法是

pip install transformers sentencepiece

继续导入tokenizer，又会有如下报错

ImportError: 
DeberetaV2Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment.

按照说明，如果这个时候点开所给的链接，那么你可能会花上一下午时间也搞不好（我就是这样）

其实直接在环境当中

pip install --no-binary=protobuf protobuf

即可

注意，千万不要直接

pip install protobuf

否则有可能有如下报错

couldn't build proto file into descriptor pool: duplicate file name (sentencepiece_model.proto)

引用：

抱抱脸transformers 报错Couldn't instantiate the backend tokenizer - 简书

Couldn‘t build proto file into descriptor pool_小米爱大豆的博客-CSDN博客

Protobuf · Issue #10020 · huggingface/transformers · GitHub

标签：人工智能深度学习 transformer

本文转载自: https://blog.csdn.net/weixin_43160040/article/details/127909459
版权归原作者 Guti Haz 所有，如有侵权，请联系我们删除。

Huggingface Transformers Deberta-v3-base安装踩坑记录

发表评论

“Huggingface Transformers Deberta-v3-base安装踩坑记录”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航