在亚马逊云科技上部署Llama大模型并开发负责任的AI生活智能助手

项目简介：

小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案，帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践，并应用到自己的日常工作里。

本次介绍的是如何在亚马逊云科技上利用SageMaker机器学习服务部署Llama开源大模型，并为Llama模型的输入/输出添加Llama Guard合规性检测，避免Llama大模型生成有害、不当、虚假内容。同时我们用容器管理服务ECS托管一个AI生活智能助手，通过调用Llama大模型API为用户提供智能生活建议，并将和用户的对话历史存在DynamoDB中，让用户可以回看历史对话记录。本架构设计全部采用了云原生Serverless架构，提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下：

方案所需基础知识

什么是 Amazon SageMaker？

Amazon SageMaker 是亚马逊云科技提供的一站式机器学习服务，旨在帮助开发者和数据科学家轻松构建、训练和部署机器学习模型。SageMaker 提供了从数据准备、模型训练到模型部署的全流程工具，使用户能够高效地在云端实现机器学习项目。

什么是 Llama Guard工具？

Llama Guard 是一种专门设计的工具或框架，旨在为 Llama 模型（或其他大型语言模型）提供安全和合规的防护措施。它通过对模型的输入和输出进行监控、过滤和审查，确保生成内容符合道德标准和法律法规。Llama Guard 可以帮助开发者识别并防止潜在的有害内容输出，如不当言论、偏见、虚假信息等，从而提升 AI 模型的安全性和可靠性。

为什么要构建负责任的 AI？

防止偏见和歧视：

大型语言模型可能会在训练过程中无意中学习到数据中的偏见。构建负责任的 AI 旨在识别和消除这些偏见，确保 AI 的决策公平、公正，不会因种族、性别或其他特征而产生歧视。

提升信任和透明度：

用户对 AI 系统的信任依赖于系统的透明度和可解释性。通过构建负责任的 AI，可以增加用户对系统的理解，提升系统的可信度，确保用户能够信任 AI 提供的建议和决策。

遵守法律法规：

许多国家和地区对数据隐私、安全和公平性有严格的法律要求。构建负责任的 AI 可以确保模型在符合这些法律法规的基础上运行，避免法律风险。

保护用户隐私：

负责任的 AI 重视并保护用户的隐私权，避免在处理敏感数据时泄露用户个人信息。通过对数据进行适当的加密和匿名化，确保用户的数据安全。

防止误用和滥用：

负责任的 AI 设计包括防范系统被恶意利用或误用的机制。例如，防止 AI 系统被用于生成虚假新闻、散布虚假信息或攻击他人。

道德责任：

AI 系统的影响力越来越大，开发者和企业有责任确保这些系统对社会产生积极的影响。构建负责任的 AI 意味着在设计和部署 AI 系统时考虑到道德责任，避免对社会产生负面影响。

本方案包括的内容

1. 利用Streamlit框架开发AI生活助手，并将服务部署在Amazon Fargate上，前端利用负载均衡器实现高可用。

2. 利用Lambda无服务器计算服务实现与大模型的API交互

3. 在Amazon SageMaker上部署Llama 2大模型，并为大模型添加安全工具Llama Guard

4. 将对话记录存储到NoSQL服务DynamoDB中

项目搭建具体步骤：

登录亚马逊云科技控制台，创建一个SageMaker Studio运行Jupyter Notebook文件，并点击Open打开。

在SageMaker Studio中创建一个新的Jupyter Notebook，运行以下命令安装必要依赖和指明Llama大模型的版本。

%pip install --quiet --upgrade sagemaker
model_id = "meta-textgeneration-llama-guard-7b"
model_version = "1.*"

利用SageMaker Jumpstart功能，无需训练或者加载模型文件就可以快速部署Llama Guard大模型

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, model_version=model_version,instance_type='ml.g5.2xlarge')

accept_eula = True

try:
    predictor = model.deploy(accept_eula=accept_eula)
except Exception as e:
    print(e)

本段代码是用于测试大模型回复内容的测试代码，通过retrieve_all_examples()方法取出实例数据，并通过大模型预测得到回复。

example_payloads = model.retrieve_all_examples()

for payload in example_payloads:
    try:
        response = predictor.predict(payload.body)
        print(
            "\nInput\n",
            payload.body,
            "\n\nOutput\n",
            response[0]["generated_text"],
            "\n\n===============",
        )
    except Exception as e:
        print(e)

接下来我们利用JumpStart功能快速部署Llama 2 7B大模型

llm_model_id = "meta-textgeneration-llama-2-7b-f"
llm_model_version = "3.*"
model_llm = JumpStartModel(model_id=llm_model_id, model_version=llm_model_version)
try:
    predictor_llm = model_llm.deploy(accept_eula=accept_eula)
except Exception as e:
    print(e)

接下来我们定义一个给Llama Guard检查模型对话安全性的指示性模板，模板中明确指出不安全内容类别。

TASK = "Task: Check if there is unsafe content in 'Agent' messages in conversations, according to our safety policy with the following categories."

INSTRUCTION = """\
Provide your safety assessment for Agent in the previous conversation:
- The first line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories."""

UNSAFE_CONTENT_CATEGORIES = """\
O1: Violence and Hate. 
Should not
- Help people plan or engage in violence.
- Encourage people to engage in violence.
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics such as their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
Can
- Provide information on violence and discrimination.
- Discuss the topics of hate and violence, and can discuss historical events involving violence.
O2: Sexual Content.
Should not
- Engage in sexually explicit (that is, erotic) conversations and should not encourage people to engage in sexual activities.
Can
- Discuss topics related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
O3: Criminal Planning.
Should not
- Help people plan or engage in criminal activities such as theft, kidnapping, or financial crimes, especially when the person indicates that they might act on the information provided to them.
- Encourage people to engage in criminal activities or condone criminal behavior.
Can
- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
O4: Guns and Illegal Weapons.
Should not
- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they might act on the information provided to them.
- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
Can
- Discuss firearms and the arguments for and against firearm ownership.
O5: Regulated or Controlled Substances.
Should not
- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
- Assist or encourage people to create such substances when it is illegal to do so.
Can
- Discuss the use of regulated or controlled substances, and can provide general information on those substances (for example, what their chemical composition is and what their effects are).
- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
O6: Self-Harm.
Should not
- Encourage people to harm themselves.
- Romanticize or condone self-harm or suicide.
- Provide information regarding the methods of suicide or self-harm.
- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
Can
- Provide general information regarding self-harm and suicide (for example, statistics and the mental health challenges faced by those who take these actions).
Should
- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves."""

通过定义两个函数”format_chat_messages“和”format_guard_messages“，来统一大模型输入输出格式模板。

from itertools import cycle
from typing import Dict, List

def format_chat_messages(messages: List[Dict[str, str]]) -> List[str]:
    """Format messages for Llama-2 chat models.

    The model only supports 'system', 'user', and 'assistant' roles, starting with 'system', then 'user' and
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    prompt: List[str] = []

    if messages[0]["role"] == "system":
        content = "".join(
            ["<<SYS>>\n", messages[0]["content"], "\n<</SYS>>\n\n", messages[1]["content"]]
        )
        messages = [{"role": messages[1]["role"], "content": content}] + messages[2:]

    for user, answer in zip(messages[::2], messages[1::2]):
        prompt.extend(
            [
                "<s>",
                "[INST] ",
                (user["content"]).strip(),
                " [/INST] ",
                (answer["content"]).strip(),
                "</s>",
            ]
        )

    prompt.extend(["<s>", "[INST] ", (messages[-1]["content"]).strip(), " [/INST] "])

    return "".join(prompt)

def format_guard_messages(
    messages: List[Dict[str, str]],
    task: str = TASK,
    instruction: str = INSTRUCTION,
    unsafe_content_categories: str = UNSAFE_CONTENT_CATEGORIES,
) -> List[str]:
    """Format messages for Llama Guard models.

    The model only supports 'user' and 'assistant' roles, and must alternate (u/a/u/a/u...).
    """
    turn = cycle(["User", "Agent"])

    return "".join(
        [
            "[INST] ",
            task,
            "\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\n",
            unsafe_content_categories,
            "\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>",
            *[f"\n\n{next(turn)}: {message['content']}" for message in messages],
            "\n\n<END CONVERSATION>\n\n",
            instruction,
            " [/INST]",
        ]
    )

接下来将一段对话通过”payload_input_guard“调整为标准格式，并将内容发送到Lllama-Guard模型进行评估。

messages_input = [
    {"role": "user", "content": "I forgot how to kill a process in Linux, can you help?"}
]
payload_input_guard = {"inputs": format_guard_messages(messages_input)}

try:
    response_input_guard = predictor.predict(payload_input_guard)
    print(response_input_guard)
except Exception as e:
    print(e)

我们从llama-guard模型得到的回复是”safe“,表示对话是安全的，此时我们再把对话内容发送给Llama模型生成回复。

payload_input_llm = {
    "inputs": format_chat_messages(messages_input),
    "parameters": {"max_new_tokens": 128},
}

try:
    response_llm = predictor_llm.predict(payload_input_llm)
    print(response_llm)
except Exception as e:
    print(e)

我们再把Llama模型生成的回复发送给Llama-guard模型进行安全性评估，保证输出、输入内容都是合规、安全的。

payload_input_llm = {
    "inputs": format_chat_messages(messages_input),
    "parameters": {"max_new_tokens": 128},
}

try:
    response_llm = predictor_llm.predict(payload_input_llm)
    print(response_llm)
except Exception as e:
    print(e)

接下来我们进入Lambda服务主页，创建一个新的Lambda函数”guard_chat_function“，复制以下代码。用于通过API访问Llama大模型进行内容合规性测试。

import os
import boto3
import json
from itertools import cycle
from typing import Dict, List
import uuid

TASK = "Task: Check if there is unsafe content in 'Agent' messages in conversations according our safety policy with the below categories."

INSTRUCTION = """Provide your safety assessment for Agent in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories."""

# read unsafe_categories from unsafe_categories.txt
with open("unsafe_categories.txt", "r") as f:
    UNSAFE_CONTENT_CATEGORIES = f.read()

def format_chat_messages(messages: List[Dict[str, str]]) -> List[str]:
    """Format messages for Llama-2 chat models.

    The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    prompt: List[str] = []

    if messages[0]["role"] == "system":
        content = "".join(
            ["<<SYS>>\n", messages[0]["content"], "\n<</SYS>>\n", messages[1]["content"]]
        )
        messages = [{"role": messages[1]["role"], "content": content}] + messages[2:]

    for user, answer in zip(messages[::2], messages[1::2]):
        prompt.extend(
            [
                "<s>",
                "[INST] ",
                (user["content"]).strip(),
                " [/INST] ",
                (answer["content"]).strip(),
                "</s>",
            ]
        )

    prompt.extend(["<s>", "[INST] ", (messages[-1]["content"]).strip(), " [/INST] "])

    return "".join(prompt)

def format_guard_messages(
    messages: List[Dict[str, str]],
    task: str = TASK,
    instruction: str = INSTRUCTION,
    unsafe_content_categories: str = UNSAFE_CONTENT_CATEGORIES,
) -> List[str]:
    """Format messages for Llama Guard models.

    The model only supports 'user' and 'assistant' roles, and must alternate (u/a/u/a/u...).
    """
    turn = cycle(["User", "Agent"])

    return "".join(
        [
            "[INST] ",
            task,
            "\n\n<BEGIN UNSAFE CONTENT CATEGORIES>",
            unsafe_content_categories,
            "\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>",
            *[f"\n\n{next(turn)}: {message['content']}" for message in messages],
            "\n\n<END CONVERSATION>\n\n",
            instruction,
            " [/INST]",
        ]
    )

def lambda_handler(event, context):
    
    random_id = str(uuid.uuid4())
   
    # Get the SageMaker endpoint names from environment variables
    endpoint1_name = os.environ['GUARD_END_POINT']
    endpoint2_name = os.environ['CHAT_END_POINT']

    # Create a SageMaker client
    sagemaker = boto3.client('sagemaker-runtime')
    print(event)
    
    messages_input = [{  
            "role": "user", 
            "content": event['prompt']
        }]
    payload_input_guard = {"inputs": format_guard_messages(messages_input)}

    # Invoke the first SageMaker endpoint
    guard_resp = sagemaker.invoke_endpoint(
        EndpointName=endpoint1_name,
        ContentType='application/json',
        Body=json.dumps(payload_input_guard)
    )
    guard_result = guard_resp['Body'].read().decode('utf-8')
    for item in json.loads(guard_result):  
        guard_result=item['generated_text']

    payload_input_llm = {
    "inputs": format_chat_messages(messages_input),
    "parameters": {"max_new_tokens": 128},
    }
    # Invoke the second SageMaker endpoint
    chat_resp = sagemaker.invoke_endpoint(
        EndpointName=endpoint2_name,
        ContentType='application/json',
        Body=json.dumps(payload_input_llm)
    )
    
    chat_result = chat_resp['Body'].read().decode('utf-8')
    for item in json.loads(chat_result):  
        chat_result=item['generated_text']
    

    # store chat history
    dynamodb = boto3.client("dynamodb")
    dynamodb.put_item(
        TableName='chat_history',
        Item={
             "prompt_id": {'S': random_id},
             "prompt_content": {'S': event['prompt']},
             "guard_resp": {'S' : guard_result},
             "chat_resp": {'S': chat_result}
            })
            
    # DIY section - Add unsafe responses to the bad_prompts table
    

    # Return the results
    return {
        'Llama-Guard-Output' : guard_result,
        'Llama-Chat-Output' : chat_result 
    }

接下来我们进入到CodeBuild服务主页，创建一个容器构建项目并点击启动，构建脚本如下：

{
  "version": "0.2",
  "phases": {
    "pre_build": {
      "commands": [
        "echo 'Downloading container image from S3 bucket'",
        "aws s3 cp s3://lab-code-3a7cca20/Dockerfile .",
        "aws s3 cp s3://lab-code-3a7cca20/requirements.txt .",
        "aws s3 cp s3://lab-code-3a7cca20/app.py ."
      ]
    },
    "build": {
      "commands": [
        "echo 'Loading container image'",
        "aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 755119157746.dkr.ecr.us-east-1.amazonaws.com",
        "docker build -t streamlit-container-image .",
        "echo 'Tagging and pushing container image to ECR'",
        "docker tag streamlit-container-image:latest 755119157746.dkr.ecr.us-east-1.amazonaws.com/streamlit-repo:latest",
        "docker push 755119157746.dkr.ecr.us-east-1.amazonaws.com/streamlit-repo:latest"
      ]
    }
  },
  "artifacts": {
    "base-directory": ".",
    "files": [
      "Dockerfile"
    ]
  }
}

本CodeBuild项目将一个streamlit应用封装成了镜像，并上传到ECR镜像库。

接下来我们进入到ECS服务，按照如下脚本创建一个容器服务启动模板task definition:

{
    "taskDefinitionArn": "arn:aws:ecs:us-east-1:755119157746:task-definition/streamlit-task-definition:3",
    "containerDefinitions": [
        {
            "name": "StreamlitContainer",
            "image": "755119157746.dkr.ecr.us-east-1.amazonaws.com/streamlit-repo:latest",
            "cpu": 0,
            "links": [],
            "portMappings": [
                {
                    "containerPort": 8501,
                    "hostPort": 8501,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "entryPoint": [],
            "command": [],
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "secrets": [],
            "dnsServers": [],
            "dnsSearchDomains": [],
            "extraHosts": [],
            "dockerSecurityOptions": [],
            "dockerLabels": {},
            "ulimits": [],
            "systemControls": [],
            "credentialSpecs": []
        }
    ],
    "family": "streamlit-task-definition",
    "taskRoleArn": "arn:aws:iam::755119157746:role/ecs_cluster_role",
    "executionRoleArn": "arn:aws:iam::755119157746:role/ecs_cluster_role",
    "networkMode": "awsvpc",
    "revision": 3,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.ecr-auth"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.17"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "ecs.capability.execution-role-ecr-pull"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2",
        "FARGATE"
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "512",
    "memory": "2048",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    },
    "registeredAt": "2024-08-16T02:21:48.902Z",
    "registeredBy": "arn:aws:sts::755119157746:assumed-role/AWSLabs-Provisioner-v2-CjDTNtCaQDT/LPS-States-CreateStack",
    "tags": []
}