在Mac m1运行ChatGLM3-6B cpu版本1-3秒出结果

实测：

输入内容：295个字，1.9秒开始出结果，这个速度接近T4。

具体过程如下：

1.准备环境

git submodule update --init --recursive

python3 -m pip install -U pip

python3 -m pip install torch tabulate tqdm transformers accelerate sentencepiece

2.下载chatglm3-6b

brew install git-lfs

git lfs install

下载到/Users/xxx/chatglm3-6b

3.生成cpu版本

python3 chatglm_cpp/convert.py -i /Users/xxx/chatglm3-6b -t q4_0 -o chatglm3-ggml.bin

4.开启Metal for M1，安装chatglm-cpp

CMAKE_ARGS="-DGGML_METAL=ON" pip install -U chatglm-cpp

5.运行模型web

python examples/web_demo.py -m chatglm3-ggml.bin

标签： ChatGLM3 M1 Mac

本文转载自: https://blog.csdn.net/wxl781227/article/details/134325649
版权归原作者 wxl781227 所有，如有侵权，请联系我们删除。