text-generation-webui在linux服务器上的部署和运行（保姆教程/踩坑记录）

最近在学习LLMs（大语言模型）相关的论文和代码实践，想要借助text-generation-webui作为部署和微调的界面工具。由于本地的算力和资源有限，我想要将项目都部署在linux命令行式服务器上，由于在CSDN上没有完整的教程，故个人摸索了很久，最终成功部署并运行。其中遇到的主要问题有：

如何将服务器上的内容转发至本地网页打开；
环境配置问题：模型加载过程中出现报错：ImportError:.../flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol:_ZN3c104cuda9SetDeviceEi。

本篇将以部署CodeLLama-7b模型为例，手把手记录该webui的部署实践过程。

text-generation-webui部署（Linux服务器）：

text-generation-webui简介：一款帮助LLMs实现本地化部署和微调的GUI界面式工具。

项目下载：

github：https://github.com/oobabooga/text-generation-webui

首先需要从github上下载text-generation-webui项目至服务器，可以先本地下载再通过Xftp等工具传输（git下载因网络原因失败的话，可以尝试先从本地下载），也可以直接从服务器上进行下载，git指令如下：

git clone https://github.com/oobabooga/text-generation-webui.git

下载至服务器后，得到该项目的zip压缩包，通过unzip指令解压至所需目录即可得到项目文件。

unzip text-generation-webui-main

Conda环境配置：

项目下载至服务器后，需要对项目所需环境进行配置（需要首先下载配置Anaconda，相关教程很多），该项目需要依赖python版本为3.10或3.11，其中[env_name]为环境名，自己命名一个即可。

conda create --name [env_name] python=3.11

进入到text-generation-webui目录中，激活该环境，下载该项目所需的依赖包。

conda activate [env_name]
pip install -r requirements.txt

安装时间较长，安装好之后，同时需要安装该项目所需的启动文件，这里我们在linux服务器上运行，所以需要的文件是start_linux.sh，不同操作系统选择不同的安装文件。

bash start_linux.sh

至此，所需安装部分完成。

text-generation-webui运行：

本地运行：

通过运行server.py文件或start_linux.sh文件均可运行该项目：

python server.py

./start_linux.sh

可以看到，项目ui被转发至127.0.0.1:7860，倘若是图形化界面可以直接本地打开网页进入URL地址为127.0.0.1:7860即可在本地打开该webui界面。

服务器转发：

但这里我是在命令行服务器上运行，需要转发至本地PC网页中打开，所以这里需要添加--listen参数，转发并进行监听。

python server.py --listen

接下来就可以通过 [服务器地址：端口号] 进行访问了，但在这边我用的是学校内网的服务器，因此还需要服务器管理员帮我做一次转发，转发后即可通过[地址:端口号]在网页成功访问webui的网页界面了。

模型载入（CodeLLama-7b为例）：

可以看到，文件中有文件名为models的目录，我们需要把我们想运行的模型下载至models目录的文件夹下：

下载方式有很多，例如从huggingface中进行下载，具体可以参考这篇博客：【已解决】如何在服务器中下载huggingface模型，解决huggingface无法连接

下载完成后，打开webui中的models选项，然后选择该模型，点击load按钮：

顺利的话可以看到如下界面，模型被成功加载：

但不幸的是，我报错了（这里我用的别人的报错内容，但报错信息基本一致）：

Traceback (most recent call last):
 
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\utils\import_utils.py", line 1353, in _get_module
 
 
return importlib.import_module("." + module_name, self.__name__)
File "D:\Anaconda\Anaconda\envs\codellama\lib\importlib_init_.py", line 126, in import_module
 
 
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
 
File "", line 1027, in _find_and_load
 
File "", line 1006, in _find_and_load_unlocked
 
File "", line 688, in _load_unlocked
 
File "", line 883, in exec_module
 
File "", line 241, in _call_with_frames_removed
 
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\models\llama\modeling_llama.py", line 48, in
 
 
from flash_attn import flash_attn_func, flash_attn_varlen_func
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\flash_attn_init_.py", line 3, in
 
 
from flash_attn.flash_attn_interface import (
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\flash_attn\flash_attn_interface.py", line 8, in
 
 
import flash_attn_2_cuda as flash_attn_cuda
ImportError: DLL load failed while importing flash_attn_2_cuda: 找不到指定的模块。
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
 
File "E:\模型\text-generation-webui\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper
 
 
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "E:\模型\text-generation-webui\text-generation-webui\modules\models.py", line 85, in load_model
 
 
output = load_func_map[loader](model_name)
File "E:\模型\text-generation-webui\text-generation-webui\modules\models.py", line 155, in huggingface_loader
 
 
model = LoaderClass.from_pretrained(path_to_model, **params)
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\auto_factory.py", line 565, in from_pretrained
 
 
model_class = _get_model_class(config, cls._model_mapping)
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\auto_factory.py", line 387, in _get_model_class
 
 
supported_models = model_mapping[type(config)]
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\auto_factory.py", line 740, in getitem
 
 
return self._load_attr_from_module(model_type, model_name)
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\auto_factory.py", line 754, in _load_attr_from_module
 
 
return getattribute_from_module(self._modules[module_name], attr)
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\models\auto\auto_factory.py", line 698, in getattribute_from_module
 
 
if hasattr(module, attr):
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\utils\import_utils.py", line 1343, in getattr
 
 
module = self._get_module(self._class_to_module[name])
File "C:\Users\Ma\AppData\Roaming\Python\Python310\site-packages\transformers\utils\import_utils.py", line 1355, in _get_module
 
 
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
 
DLL load failed while importing flash_attn_2_cuda: 找不到指定的模块。

其中我的核心报错内容如下：

ImportError:.../flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol:_ZN3c104cuda9SetDeviceEi

这里我根据报错信息在网上搜索了很久，为数不多的解决方法提出了这是cuda和pytorch版本不匹配导致的，具体可以参考这篇文章：

text-generation-webui加载codellama报错DLL load failed while importing flash_attn_2_cuda: 找不到指定的模块。

按照这样去同步cuda和torch后，我的报错并没有得到解决，随后我发现这是flash_attn的版本问题，flash_attn作为加速组件，同样需要能够匹配cuda和torch的正确版本，很离谱的是，我按照该项目中requirement.txt文件所默认下载的flash_attn版本是不匹配默认下载的cuda和torch版本的，导致报错，这也是我找了很久才找到的原因。

知道了报错原因，解决方法就很简单了，那就是重新装过合适的flash_attn版本依赖:

flash_attn下载地址：https://github.com/Dao-AILab/flash-attention/releases

从中可以发现报错原因，默认下载的flash_attn版本为2.6.1，只能支持torch2.0/2.1，而该项目的torch版本为2.3，支持版本不匹配导致报错。于是我下载了匹配环境torch2.3和匹配我的python环境的cp311版本的flash_attn。

下载至服务器，安装至conda运行环境中：

pip install flash_attn-2.6.0.post1+cu118torch2.3cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

发现successfully load，加载成功！！

至此可以成功用服务器运行，转发至本地PC上使用text-generation-webui运行CodeLLama模型。

标签：服务器 llama 语言模型

本文转载自: https://blog.csdn.net/m0_52149686/article/details/140613938
版权归原作者 ZJU_Rain 所有，如有侵权，请联系我们删除。

text-generation-webui在linux服务器上的部署和运行（保姆教程/踩坑记录）

text-generation-webui部署（Linux服务器）：

项目下载：

Conda环境配置：

text-generation-webui运行：

本地运行：

服务器转发：

模型载入（CodeLLama-7b为例）：

发表评论

“text-generation-webui在linux服务器上的部署和运行（保姆教程/踩坑记录）”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航