Qwen2-1___5B-Instruct 推理

1、Qwen2-1.5B-Instruct模型下载

from modelscope import snapshot_download
# 在modelscope上下载Qwen模型到本地目录下
model_dir = snapshot_download("qwen/Qwen2-1.5B-Instruct", cache_dir="./", revision="master")

2、Qwen2-1.5B-Instruct 推理

完整代码实现：

import torch
from modelscope import AutoTokenizer
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForSeq2Seq
if __name__ == '__main__':
    # Transformers加载模型权重
    tokenizer = AutoTokenizer.from_pretrained("./qwen/Qwen2-1___5B-Instruct/", use_fast=False, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained("./qwen/Qwen2-1___5B-Instruct/", device_map="auto",
                                                 torch_dtype=torch.bfloat16)
    device = "cuda"  # the device to load the model onto
    prompt = "给我简单介绍一下大语言模型。"
    messages = [
        {"role": "system", "content": "你是一个有用的助手。"},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(response)

其中，tokenizer.apply_chat_template 函数输入参数如下：

conversation: Union[List[Dict[str, str]], List[List[Dict[str, str]]]], 一个字典列表，其中每个字典包含 'role' 和 'content' 键，表示至今的对话记录。
tools: Optional[List[Dict]] = None, 这是一个工具列表（callable functions），这些工具将对模型可用。如果模板不支持函数调用，那么这个参数将不会产生任何效果。
**documents: **Optional[List[Dict[str, str]]] = None, 一个字典列表，表示可以被模型访问的文档，前提是模型正在执行RAG（检索增强生成）。如果模板不支持RAG，这个参数将不会产生任何效果。
**chat_template: **Optional[str] = None, 用于此转换的Jinja模板。通常无需向此参数传递任何内容，因为默认情况下会使用模型自身的模板。
**add_generation_prompt: **bool = False, 是否以指示助手消息开始的令牌来结束提示信息。当你想要从模型生成一个响应时，这会很有帮助。需要注意的是，这个参数会被传给聊天模板，因此必须在模板中支持这个参数，本参数才能生效。
tokenize: bool = True, 是否对输出进行分词。如果设置为
False
，输出将会是一个字符串。
**padding: **bool = False, 是否将序列填充至最大长度。如果
tokenize
为
False
，则此选项不会产生任何影响。
**truncation: **bool = False, 是否在最大长度处截断序列。如果
tokenize
设置为
False
，则此选项没有效果。
**max_length: **Optional[int] = None, 用于填充或截断的最大长度（按令牌计）。如果
tokenize
设置为
False
，则此设定没有效果。如果没有指定，将使用分词器的
max_length
属性作为默认值。
**return_tensors: **Optional[Union[str, TensorType]] = None,

'tf'：返回 TensorFlow 的 tf.Tensor 对象。

'pt'：返回 PyTorch 的 torch.Tensor 对象。

'np'：返回 NumPy 的 np.ndarray 对象。

'jax'：返回 JAX 的 jnp.ndarray 对象。

return_dict: bool = False,
**tokenizer_kwargs: **Optional[Dict[str, Any]] = None,

输出格式参考：

'<|im_start|>system
你是一个有用的助手。<|im_end|>
<|im_start|>user
给我简单介绍一下大语言模型。<|im_end|>
<|im_start|>assistant
'

标签： python 深度学习人工智能

本文转载自: https://blog.csdn.net/weixin_42029635/article/details/140323778
版权归原作者 momo_42029635 所有，如有侵权，请联系我们删除。

Qwen2-1___5B-Instruct 推理

1、Qwen2-1.5B-Instruct模型下载

2、Qwen2-1.5B-Instruct 推理

发表评论

“Qwen2-1___5B-Instruct 推理”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航