1、系统配置
2、配置环境
2.1、创建虚拟环境
安装Anaconda,参考https://blog.csdn.net/weixin_43881345/article/details/136051556
创建一个基于python3.10的虚拟环境
conda create -n minicpm python=3.10
创建虚拟环境报错,根据提示删除环境变量Path中多余的引号
重新创建python3.10的虚拟环境,取名minicpm,并激活环境安装依赖
conda create -n minicpm python=3.10
activate minicpm
pip install Pillow==10.1.0
pip install bitsandbytes==0.43.1
2.2、下载模型
huggingface镜像站:https://hf-mirror.com/
魔塔:https://modelscope.cn/
MiniCPM-2B-sft-int4:https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-int4/files
MiniCPM-2B-128k:https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/files
MiniCPM-Llama3-V-2_5(8B):https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5/files
MiniCPM-Llama3-V-2_5-int4(8B):https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4/files
MiniCPM-Llama3-V-2_5-gguf:https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf/files
通过git方式下载
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-2B-sft-int4.git
git clone https://www.modelscope.cn/openbmb/MiniCPM-2B-128k.git
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5.git
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5-int4.git
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5-gguf.git
2.3、下载仓库源码
https://github.com/OpenBMB/MiniCPM-V
https://github.com/OpenBMB/MiniCPM
2.4、Anaconda虚拟环境安装依赖
cd MiniCPM-V
pip install -r requirements.txt
2.5、配置torch
2.5.1、查看cuda版本
#hello_world.py
import torch
# 输出带CPU,表示torch是CPU版本的,否则会是+cuxxx
print(f'torch的版本是:{torch.__version__}')
print(f'torch是否能使用cuda:{torch.cuda.is_available()}')
print(f'GPU数量:{torch.cuda.device_count()}')# 查看GPU数量
print(f'torch方法查看CUDA版本:{torch.version.cuda}')# torch方法查看CUDA版本
输出结果说明torch无法使用GPU(如支持GPU略过2.5、配置torch)
查看cuda驱动版本(安装显卡驱动后生成),本机12.4
nvidia-smi
查看cuda运行时版本(安装CUDA Tookit后生成),本机11.7
nvcc -V
2.5.2、安装支持cuda的torch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
重新执行hello_world.py脚本,cuda已支持GPU
3、搭建MiniCPM-Llama3-V-2_5
3.1、本地WebUI Demo部署
配置模型路径
修改MiniCPM-V-main/web_demo_2.5.py,model_path 改为本地模型路径
显存较低,此处使用int4量化模型
model_path = ‘D:/model/MiniCPM-Llama3-V-2_5-int4’
修改完成后,启动WebUI服务
# 对于 NVIDIA GPU,请运行:
python web_demo_2.5.py --device cuda
# 对于搭载 MPS 的 Mac(Apple 芯片或 AMD GPU),请运行:
PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps
浏览器访问http://IP:8080
耗时约19.5s
3.2、API调用
创建脚本api_cpm_v2.5.py
#api_cpm_v2.5.py
import torch
import time
from PIL import Image
from transformers import AutoModel, AutoTokenizer
# 设置模型路径
model_path = 'D:/model/MiniCPM-Llama3-V-2_5-int4'# 加载模型和分词器
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()# 设置随机种子以保证结果可复现
torch.manual_seed(0)
start_time = time.time()
image = Image.open('D:/work/svc_mini_cpm/MiniCPM-V-main/assets/airplane.jpeg').convert('RGB')
question = '图片描述了什么?'
msgs = [{'role': 'user','content': question}]
res = model.chat(
image=image,
msgs=msgs,
tokenizer=tokenizer,
sampling=True,
temperature=0.7
)
end_time = time.time()
execution_time = end_time - start_time
print(res)
print("程序执行时间为:", execution_time,"秒")
成功获取结果
4、搭建MiniCPM-2B
4.1、本地WebUI Demo部署(失败)
启动WebUI服务
cd MiniCPM-main
python demo/vllm_based_demo.py --model_path D:/model/MiniCPM-2B-128k
安装vLLM,下载https://github.com/vllm-project/vllm
cd vllm-main
pip install -r requirements-cuda.txt
发生错误error: subprocess-exited-with-error
pip install --upgrade setuptools
4.2、API调用(失败)
创建脚本api_cpm.py
#api_cpm.pyfrom transformers import AutoModelForCausalLM, AutoTokenizer
import time
import torch
# 设置随机种子以保证结果可复现
torch.manual_seed(0)
path = 'D:/model/MiniCPM-2B-dpo-bf16'# 加载tokenizer和模型
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
start_time = time.time()
responds, history = model.chat(tokenizer,"山东省最高的山是哪座山,它比黄山高还是矮?差距多少?", temperature=0.5, top_p=0.8, repetition_penalty=1.02)
end_time = time.time()
execution_time = end_time - start_time
print(responds)
print("程序执行时间为:", execution_time,"秒")
成功获得输出结果
当模型切换为MiniCPM-2B-sft-int4时,执行报错
手动安装 flash_attn,下载win编译好的flash attention,选择本机torch匹配的版本(cu121)
https://github.com/bdashore3/flash-attention/releases/tag/v2.4.1
直达下载链接:https://github.com/bdashore3/flash-attention/releases/download/v2.4.1/flash_attn-2.4.1+cu121torch2.1cxx11abiFALSE-cp310-cp310-win_amd64.whl
pip install D:/work/svc_mini_cpm/flash_attn-2.4.1+cu121torch2.1cxx11abiFALSE-cp310-cp310-win_amd64.whl
5、通过ollama加载模型
5.1、安装ollama
https://github.com/ollama/ollama
安装后执行C:\Users\用户\AppData\Local\Programs\Ollama\ollama app.exe,首次执行需下载模型
5.2、加载官方模型
官方模型库:https://ollama.com/library
ollama run modelbest/minicpm-2b-dpo
API调用
http://localhost:11434/api/generate
“stream”: false关闭流式输出
5.3、加载本地模型
手动创建模型配置文件
编辑文件内容
FROM ./MiniCPM-2B-dpo-Q4_0.gguf
通过PowerShell导入配置
ollama create MiniCPM-2B-dpo-Q4_0 -f MiniCPM-2B-dpo-Q4_0.mb
导入成功,运行模型
ollama run MiniCPM-2B-dpo-Q4_0
6、优化推理效率(失败)
下载w64devkit
https://github.com/skeeto/w64devkit/releases
运行w64devkit.exe,查看cmake、gcc版本
下载llama.cpp
https://github.com/ggerganov/llama.cpp
使用w64devkit.exe编译llama.cpp
cd d:
cd Download/models/llama.cpp-master
make
下载编译好的llama.cpp
https://github.com/ggerganov/llama.cpp/releases
版权归原作者 老干部rye 所有, 如有侵权,请联系我们删除。