0


在亚马逊云科技上安全、合规地创建AI大模型训练基础设施并开发AI应用服务

项目简介:

小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案,帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践,并应用到自己的日常工作里。

本次介绍的是如何在亚马逊云科技利用Service Catalog服务创建和管理包含AI大模型的应用产品,并通过权限管理基于员工的身份职责限制所能访问的云资源,并创建SageMaker机器学习托管服务并在该服务上训练和部署大模型,通过VPC endpoint节点私密、安全的加载模型文件和模型容器镜像。本架构设计全部采用了云原生Serverless架构,提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下:

方案所需基础知识

什么是 Amazon SageMaker?

Amazon SageMaker 是亚马逊云科技提供的一站式机器学习服务,旨在帮助开发者和数据科学家轻松构建、训练和部署机器学习模型。SageMaker 提供了从数据准备、模型训练到模型部署的全流程工具,使用户能够高效地在云端实现机器学习项目。

什么是亚马逊云科技 Service Catalog?

亚马逊云科技 Service Catalog 是一项服务,旨在帮助企业创建、管理和分发经过批准的云服务集合。通过 Service Catalog,企业可以集中管理已批准的资源和配置,确保开发团队在使用云服务时遵循组织的最佳实践和合规要求。用户可以从预定义的产品目录中选择所需的服务,简化了资源部署的过程,并减少了因配置错误导致的风险。

利用 SageMaker 构建 AI 服务的安全合规好处

符合企业合规性要求

使用 SageMaker 构建 AI 服务时,可以通过 Service Catalog 预先定义和管理符合公司合规标准的配置模板,确保所有的 AI 模型和资源部署都遵循组织的安全政策和行业法规,如 GDPR 或 HIPAA。

数据安全性

SageMaker 提供了端到端的数据加密选项,包括在数据存储和传输中的加密,确保敏感数据在整个 AI 模型生命周期中的安全性。同时可以利用VPC endpoint节点,私密安全的访问S3中的数据,加载ECR镜像库中保存的AI模型镜像容器。

访问控制和监控

通过与亚马逊云科技的身份和访问管理(IAM)集成,可以细粒度地控制谁可以访问和操作 SageMaker 中的资源。再结合 CloudTrail 和 CloudWatch 等监控工具,企业可以实时跟踪和审计所有的操作,确保透明度和安全性。

本方案包括的内容

1. 通过VPC Endpoint节点,私有访问S3中的模型文件

2. 创建亚马逊云科技Service Catalog资源组,统一创建、管理用户的云服务产品。

3. 作为Service Catalog的使用用户创建一个SageMaker机器学习训练计算实例

项目搭建具体步骤:

  1. 登录亚马逊云科技控制台,进入无服务器计算服务Lambda,创建一个Lambda函数“SageMakerBuild”,复制以下代码,用于创建SageMaker Jupyter Notebook,训练AI大模型。
import json
import boto3
import requests
import botocore
import time
import base64

## Request Status ##
global ReqStatus

def CFTFailedResponse(event, status, message):
    print("Inside CFTFailedResponse")
    responseBody = {
        'Status': status,
        'Reason': message,
        'PhysicalResourceId': event['ServiceToken'],
        'StackId': event['StackId'],
        'RequestId': event['RequestId'],
        'LogicalResourceId': event['LogicalResourceId']
    }
    
    headers={
        'content-type':'',
        'content-length':str(len(json.dumps(responseBody)))     
    }    
    print('Response = ' + json.dumps(responseBody))
    try:    
        req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)
        print("delete_respond_cloudformation res "+str(req))        
    except Exception as e:
        print("Failed to send cf response {}".format(e))
        
def CFTSuccessResponse(event, status, data=None):
    responseBody = {
        'Status': status,
        'Reason': 'See the details in CloudWatch Log Stream',
        'PhysicalResourceId': event['ServiceToken'],
        'StackId': event['StackId'],
        'RequestId': event['RequestId'],
        'LogicalResourceId': event['LogicalResourceId'],
        'Data': data
    }
    headers={
        'content-type':'',
        'content-length':str(len(json.dumps(responseBody)))     
    }    
    print('Response = ' + json.dumps(responseBody))
    #print(event)
    try:    
        req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)
    except Exception as e:
        print("Failed to send cf response {}".format(e))

def lambda_handler(event, context):
    ReqStatus = "SUCCESS"
    print("Event:")
    print(event)
    client = boto3.client('sagemaker')
    ec2client = boto3.client('ec2')
    data = {}

    if event['RequestType'] == 'Create':
        try:
            ## Value Intialization from CFT ##
            project_name = event['ResourceProperties']['ProjectName']
            kmsKeyId = event['ResourceProperties']['KmsKeyId']
            Tags = event['ResourceProperties']['Tags']
            env_name = event['ResourceProperties']['ENVName']
            subnet_name = event['ResourceProperties']['Subnet']
            security_group_name = event['ResourceProperties']['SecurityGroupName']

            input_dict = {}
            input_dict['NotebookInstanceName'] = event['ResourceProperties']['NotebookInstanceName']
            input_dict['InstanceType'] = event['ResourceProperties']['NotebookInstanceType']
            input_dict['Tags'] = event['ResourceProperties']['Tags']
            input_dict['DirectInternetAccess'] = event['ResourceProperties']['DirectInternetAccess']
            input_dict['RootAccess'] = event['ResourceProperties']['RootAccess']
            input_dict['VolumeSizeInGB'] = int(event['ResourceProperties']['VolumeSizeInGB'])
            input_dict['RoleArn'] = event['ResourceProperties']['RoleArn']
            input_dict['LifecycleConfigName'] = event['ResourceProperties']['LifecycleConfigName']

        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            message = "Parameter Error: "+str(e)
            CFTFailedResponse(event, "FAILED", message)
        if ReqStatus == "FAILED":
            return None;
        print("Validating Environment name: "+env_name)
        print("Subnet Id Fetching.....")
        try:
            ## Sagemaker Subnet ##
            subnetName = env_name+"-ResourceSubnet"
            print(subnetName)
            response = ec2client.describe_subnets(
                Filters=[
                    {
                        'Name': 'tag:Name',
                        'Values': [
                            subnet_name
                        ]
                    },
                ]
            )
            #print(response)
            subnetId = response['Subnets'][0]['SubnetId']
            input_dict['SubnetId'] = subnetId
            print("Desc sg done!!")
        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            message = " Project Name is invalid - Subnet Error: "+str(e)
            CFTFailedResponse(event, "FAILED", message)
        if ReqStatus == "FAILED":
            return None;
        ## Sagemaker Security group ##
        print("Security GroupId Fetching.....")
        try:
            sgName = env_name+"-ResourceSG"
            response = ec2client.describe_security_groups(
                Filters=[
                    {
                        'Name': 'tag:Name',
                        'Values': [
                            security_group_name
                        ]
                    },
                ]
            )
            sgId = response['SecurityGroups'][0]['GroupId']
            input_dict['SecurityGroupIds'] = [sgId]
            print("Desc sg done!!")
        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            message = "Security Group ID Error: "+str(e)
            CFTFailedResponse(event, "FAILED", message)
        if ReqStatus == "FAILED":
            return None;    
        try:
            if kmsKeyId:
                input_dict['KmsKeyId'] = kmsKeyId
            else:
                print("in else")
                
            print(input_dict)
            instance = client.create_notebook_instance(**input_dict)
            print('Sagemager CLI response')
            print(str(instance))
            responseData = {'NotebookInstanceArn': instance['NotebookInstanceArn']}
            
            NotebookStatus = 'Pending'
            response = client.describe_notebook_instance(
                NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']
            )
            NotebookStatus = response['NotebookInstanceStatus']
            print("NotebookStatus:"+NotebookStatus)
            
            ## Notebook Failure ##
            if NotebookStatus == 'Failed':
                message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"
                CFTFailedResponse(event, "FAILED", message)
            else:
                while NotebookStatus == 'Pending':
                    time.sleep(200)
                    response = client.describe_notebook_instance(
                        NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']
                    )
                    NotebookStatus = response['NotebookInstanceStatus']
                    print("NotebookStatus in loop:"+NotebookStatus)
                
                ## Notebook Success ##
                if NotebookStatus == 'InService':
                    data['Message'] = "SageMaker Notebook name - "+event['ResourceProperties']['NotebookInstanceName']+" created succesfully"
                    print("message InService :",data['Message'])
                    CFTSuccessResponse(event, "SUCCESS", data)
                else:
                    message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"
                    print("message :",message)
                    CFTFailedResponse(event, "FAILED", message)
        except Exception as e:
            print(e)
            ReqStatus = "FAILED"
            CFTFailedResponse(event, "FAILED", str(e))
    if event['RequestType'] == 'Delete':
        NotebookStatus = None
        lifecycle_config = event['ResourceProperties']['LifecycleConfigName']
        NotebookName = event['ResourceProperties']['NotebookInstanceName']

        try:
            response = client.describe_notebook_instance(
                NotebookInstanceName=NotebookName
            )
            NotebookStatus = response['NotebookInstanceStatus']
            print("Notebook Status - "+NotebookStatus)
        except Exception as e:
            print(e)
            NotebookStatus = "Invalid"
            #CFTFailedResponse(event, "FAILED", str(e))
        while NotebookStatus == 'Pending':
            time.sleep(30)
            response = client.describe_notebook_instance(
                NotebookInstanceName=NotebookName
            )
            NotebookStatus = response['NotebookInstanceStatus']
            print("NotebookStatus:"+NotebookStatus)
        if NotebookStatus != 'Failed' and NotebookStatus != 'Invalid' :
            print("Delete request for Notebookk name: "+NotebookName)
            print("Stoping the Notebook.....")
            if NotebookStatus != 'Stopped':
                try:
                    response = client.stop_notebook_instance(
                        NotebookInstanceName=NotebookName
                    )
                    NotebookStatus = 'Stopping'
                    print("Notebook Status - "+NotebookStatus)
                    while NotebookStatus == 'Stopping':
                        time.sleep(30)
                        response = client.describe_notebook_instance(
                            NotebookInstanceName=NotebookName
                        )
                        NotebookStatus = response['NotebookInstanceStatus']
                    print("NotebookStatus:"+NotebookStatus)
                except Exception as e:
                    print(e)
                    NotebookStatus = "Invalid"
                    CFTFailedResponse(event, "FAILED", str(e))
                
            else:
                NotebookStatus = 'Stopped'
                print("NotebookStatus:"+NotebookStatus)
        
        if NotebookStatus != 'Invalid':
            print("Deleting The Notebook......")
            time.sleep(5)
            try:
                response = client.delete_notebook_instance(
                    NotebookInstanceName=NotebookName
                )
                print("Notebook Deleted")
                data["Message"] = "Notebook Deleted"
                CFTSuccessResponse(event, "SUCCESS", data)
            except Exception as e:
                print(e)
                CFTFailedResponse(event, "FAILED", str(e))
            
        else:
            print("Notebook Invalid status")
            data["Message"] = "Notebook is not available"
            CFTSuccessResponse(event, "SUCCESS", data)
    
    if event['RequestType'] == 'Update':
        print("Update operation for Sagemaker Notebook is not recommended")
        data["Message"] = "Update operation for Sagemaker Notebook is not recommended"
        CFTSuccessResponse(event, "SUCCESS", data)
        
    
        
            
  1. 接下来我们创建一个yaml脚本,复制以下代码,上传到S3桶中,用于通过CloudFormation,以IaC的形式创建SageMaker Jupyter Notebook。
AWSTemplateFormatVersion: 2010-09-09
Description: Template to create a SageMaker notebook
Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: Environment detail
        Parameters:
          - ENVName
      - Label:
          default: SageMaker Notebook configuration
        Parameters:
          - NotebookInstanceName
          - NotebookInstanceType
          - DirectInternetAccess
          - RootAccess
          - VolumeSizeInGB
      - Label:
          default: Load S3 Bucket to SageMaker
        Parameters:
          - S3CodePusher
          - CodeBucketName
      - Label:
          default: Project detail
        Parameters:
          - ProjectName
          - ProjectID
    ParameterLabels:
      DirectInternetAccess:
        default: Default Internet Access
      NotebookInstanceName:
        default: Notebook Instance Name
      NotebookInstanceType:
        default: Notebook Instance Type
      ENVName:
        default: Environment Name
      ProjectName:
        default: Project Suffix
      RootAccess:
        default: Root access
      VolumeSizeInGB:
        default: Volume size for the SageMaker Notebook
      ProjectID:
        default: SageMaker ProjectID
      CodeBucketName:
        default: Code Bucket Name        
      S3CodePusher:
        default: Copy code from S3 to SageMaker
Parameters:
  SubnetName:
    Default: ProSM-ResourceSubnet
    Description: Subnet Random String
    Type: String
  SecurityGroupName:
    Default: ProSM-ResourceSG
    Description: Security Group Name
    Type: String
  SageMakerBuildFunctionARN:
    Description: Service Token Value passed from Lambda Stack
    Type: String
  NotebookInstanceName:
    AllowedPattern: '[A-Za-z0-9-]{1,63}'
    ConstraintDescription: >-
      Maximum of 63 alphanumeric characters. Can include hyphens (-), but not
      spaces. Must be unique within your account in an AWS Region.
    Description: SageMaker Notebook instance name
    MaxLength: '63'
    MinLength: '1'
    Type: String
  NotebookInstanceType:
    ConstraintDescription: Must select a valid notebook instance type.
    Default: ml.t3.medium
    Description: Select Instance type for the SageMaker Notebook
    Type: String
  ENVName:
    Description: SageMaker infrastructure naming convention
    Type: String
  ProjectName:
    Description: >-
      The suffix appended to all resources in the stack.  This will allow
      multiple copies of the same stack to be created in the same account.
    Type: String
  RootAccess:
    Description: Root access for the SageMaker Notebook user
    AllowedValues:
      - Enabled
      - Disabled
    Default: Enabled
    Type: String
  VolumeSizeInGB:
    Description: >-
      The size, in GB, of the ML storage volume to attach to the notebook
      instance. The default value is 5 GB.
    Type: Number
    Default: '20'
  DirectInternetAccess:
    Description: >-
      If you set this to Disabled this notebook instance will be able to access
      resources only in your VPC. As per the Project requirement, we have
      Disabled it.
    Type: String
    Default: Disabled
    AllowedValues:
      - Disabled
    ConstraintDescription: Must select a valid notebook instance type.
  ProjectID:
    Type: String
    Description: Enter a valid ProjectID.
    Default: QuickStart007
  S3CodePusher:
    Description: Do you want to load the code from S3 to SageMaker Notebook
    Default: 'NO'
    AllowedValues:
      - 'YES'
      - 'NO'
    Type: String
  CodeBucketName:
    Description: S3 Bucket name from which you want to copy the code to SageMaker.
    Default: lab-materials-bucket-1234
    Type: String    
Conditions:
  BucketCondition: !Equals 
    - 'YES'
    - !Ref S3CodePusher
Resources:
  SagemakerKMSKey:
    Type: 'AWS::KMS::Key'
    Properties:
      EnableKeyRotation: true
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
      KeyPolicy:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
          Action: 
            - 'kms:Encrypt'
            - 'kms:PutKeyPolicy' 
            - 'kms:CreateKey' 
            - 'kms:GetKeyRotationStatus' 
            - 'kms:DeleteImportedKeyMaterial' 
            - 'kms:GetKeyPolicy' 
            - 'kms:UpdateCustomKeyStore' 
            - 'kms:GenerateRandom' 
            - 'kms:UpdateAlias'
            - 'kms:ImportKeyMaterial'
            - 'kms:ListRetirableGrants' 
            - 'kms:CreateGrant' 
            - 'kms:DeleteAlias'
            - 'kms:RetireGrant'
            - 'kms:ScheduleKeyDeletion' 
            - 'kms:DisableKeyRotation' 
            - 'kms:TagResource' 
            - 'kms:CreateAlias' 
            - 'kms:EnableKeyRotation' 
            - 'kms:DisableKey'
            - 'kms:ListResourceTags'
            - 'kms:Verify' 
            - 'kms:DeleteCustomKeyStore'
            - 'kms:Sign' 
            - 'kms:ListKeys'
            - 'kms:ListGrants'
            - 'kms:ListAliases' 
            - 'kms:ReEncryptTo' 
            - 'kms:UntagResource' 
            - 'kms:GetParametersForImport'
            - 'kms:ListKeyPolicies'
            - 'kms:GenerateDataKeyPair'
            - 'kms:GenerateDataKeyPairWithoutPlaintext' 
            - 'kms:GetPublicKey' 
            - 'kms:Decrypt' 
            - 'kms:ReEncryptFrom'
            - 'kms:DisconnectCustomKeyStore' 
            - 'kms:DescribeKey'
            - 'kms:GenerateDataKeyWithoutPlaintext'
            - 'kms:DescribeCustomKeyStores' 
            - 'kms:CreateCustomKeyStore'
            - 'kms:EnableKey'
            - 'kms:RevokeGrant'
            - 'kms:UpdateKeyDescription' 
            - 'kms:ConnectCustomKeyStore' 
            - 'kms:CancelKeyDeletion' 
            - 'kms:GenerateDataKey'
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
        - Sid: Allow access for Key Administrators
          Effect: Allow
          Principal:
            AWS: 
              - !GetAtt SageMakerExecutionRole.Arn
          Action:
            - 'kms:CreateAlias'
            - 'kms:CreateKey'
            - 'kms:CreateGrant' 
            - 'kms:CreateCustomKeyStore'
            - 'kms:DescribeKey'
            - 'kms:DescribeCustomKeyStores'
            - 'kms:EnableKey'
            - 'kms:EnableKeyRotation'
            - 'kms:ListKeys'
            - 'kms:ListAliases'
            - 'kms:ListKeyPolicies'
            - 'kms:ListGrants'
            - 'kms:ListRetirableGrants'
            - 'kms:ListResourceTags'
            - 'kms:PutKeyPolicy'
            - 'kms:UpdateAlias'
            - 'kms:UpdateKeyDescription'
            - 'kms:UpdateCustomKeyStore'
            - 'kms:RevokeGrant'
            - 'kms:DisableKey'
            - 'kms:DisableKeyRotation'
            - 'kms:GetPublicKey'
            - 'kms:GetKeyRotationStatus'
            - 'kms:GetKeyPolicy'
            - 'kms:GetParametersForImport'
            - 'kms:DeleteCustomKeyStore'
            - 'kms:DeleteImportedKeyMaterial'
            - 'kms:DeleteAlias'
            - 'kms:TagResource'
            - 'kms:UntagResource'
            - 'kms:ScheduleKeyDeletion'
            - 'kms:CancelKeyDeletion'
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
        - Sid: Allow use of the key
          Effect: Allow
          Principal:
            AWS: 
              - !GetAtt SageMakerExecutionRole.Arn

          Action:
            - kms:Encrypt
            - kms:Decrypt
            - kms:ReEncryptTo
            - kms:ReEncryptFrom
            - kms:GenerateDataKeyPair
            - kms:GenerateDataKeyPairWithoutPlaintext
            - kms:GenerateDataKeyWithoutPlaintext
            - kms:GenerateDataKey
            - kms:DescribeKey
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
        - Sid: Allow attachment of persistent resources
          Effect: Allow
          Principal:
            AWS: 
              - !GetAtt SageMakerExecutionRole.Arn

          Action:
            - kms:CreateGrant
            - kms:ListGrants
            - kms:RevokeGrant
          Resource:
            - !Join 
              - ''
              - - 'arn:aws:kms:'
                - !Ref 'AWS::Region'
                - ':'
                - !Ref 'AWS::AccountId'
                - ':key/*'
          Condition:
            Bool:
              kms:GrantIsForAWSResource: 'true'
  KeyAlias:
    Type: AWS::KMS::Alias
    Properties:
      AliasName: 'alias/SageMaker-CMK-DS'
      TargetKeyId:
        Ref: SagemakerKMSKey
  SageMakerExecutionRole:
    Type: 'AWS::IAM::Role'
    Properties:
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - sagemaker.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: !Join 
            - ''
            - - !Ref ProjectName
              - SageMakerExecutionPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - 'iam:ListRoles'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:iam::'
                      - !Ref 'AWS::AccountId'
                      - ':role/*'
              - Sid: CloudArnResource
                Effect: Allow
                Action:
                  - 'application-autoscaling:DeleteScalingPolicy'
                  - 'application-autoscaling:DeleteScheduledAction'
                  - 'application-autoscaling:DeregisterScalableTarget'
                  - 'application-autoscaling:DescribeScalableTargets'
                  - 'application-autoscaling:DescribeScalingActivities'
                  - 'application-autoscaling:DescribeScalingPolicies'
                  - 'application-autoscaling:DescribeScheduledActions'
                  - 'application-autoscaling:PutScalingPolicy'
                  - 'application-autoscaling:PutScheduledAction'
                  - 'application-autoscaling:RegisterScalableTarget'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:autoscaling:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':*'
              - Sid: ElasticArnResource
                Effect: Allow
                Action:
                  - 'elastic-inference:Connect'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:elastic-inference:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':elastic-inference-accelerator/*'  
              - Sid: SNSArnResource
                Effect: Allow
                Action:
                  - 'sns:ListTopics'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sns:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':*'
              - Sid: logsArnResource
                Effect: Allow
                Action:
                  - 'cloudwatch:DeleteAlarms'
                  - 'cloudwatch:DescribeAlarms'
                  - 'cloudwatch:GetMetricData'
                  - 'cloudwatch:GetMetricStatistics'
                  - 'cloudwatch:ListMetrics'
                  - 'cloudwatch:PutMetricAlarm'
                  - 'cloudwatch:PutMetricData'
                  - 'logs:CreateLogGroup'
                  - 'logs:CreateLogStream'
                  - 'logs:DescribeLogStreams'
                  - 'logs:GetLogEvents'
                  - 'logs:PutLogEvents'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:logs:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':log-group:/aws/lambda/*'
              - Sid: KmsArnResource
                Effect: Allow
                Action:
                  - 'kms:DescribeKey'
                  - 'kms:ListAliases'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:kms:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':key/*'
              - Sid: ECRArnResource
                Effect: Allow
                Action:
                  - 'ecr:BatchCheckLayerAvailability'
                  - 'ecr:BatchGetImage'
                  - 'ecr:CreateRepository'
                  - 'ecr:GetAuthorizationToken'
                  - 'ecr:GetDownloadUrlForLayer'
                  - 'ecr:DescribeRepositories'
                  - 'ecr:DescribeImageScanFindings'
                  - 'ecr:DescribeRegistry'
                  - 'ecr:DescribeImages'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:ecr:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':repository/*'
              - Sid: EC2ArnResource
                Effect: Allow
                Action:        
                  - 'ec2:CreateNetworkInterface'
                  - 'ec2:CreateNetworkInterfacePermission'
                  - 'ec2:DeleteNetworkInterface'
                  - 'ec2:DeleteNetworkInterfacePermission'
                  - 'ec2:DescribeDhcpOptions'
                  - 'ec2:DescribeNetworkInterfaces'
                  - 'ec2:DescribeRouteTables'
                  - 'ec2:DescribeSecurityGroups'
                  - 'ec2:DescribeSubnets'
                  - 'ec2:DescribeVpcEndpoints'
                  - 'ec2:DescribeVpcs'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:ec2:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':instance/*'
              - Sid: S3ArnResource
                Effect: Allow
                Action: 
                  - 's3:CreateBucket'
                  - 's3:GetBucketLocation'
                  - 's3:ListBucket'       
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:s3::'
                      - ':*sagemaker*'                  
              - Sid: LambdaInvokePermission
                Effect: Allow
                Action:
                  - 'lambda:ListFunctions'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:lambda:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':function'
                      - ':*'
              - Effect: Allow
                Action: 'sagemaker:InvokeEndpoint'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action:
                  - 'sagemaker:CreateTrainingJob'
                  - 'sagemaker:CreateEndpoint'
                  - 'sagemaker:CreateModel'
                  - 'sagemaker:CreateEndpointConfig'
                  - 'sagemaker:CreateHyperParameterTuningJob'
                  - 'sagemaker:CreateTransformJob'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
                  'ForAllValues:StringEquals':
                    'aws:TagKeys':
                      - Username
              - Effect: Allow
                Action:
                  - 'sagemaker:DescribeTrainingJob'
                  - 'sagemaker:DescribeEndpoint'
                  - 'sagemaker:DescribeEndpointConfig'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action:
                  - 'sagemaker:DeleteTags'
                  - 'sagemaker:ListTags'
                  - 'sagemaker:DescribeNotebookInstance'
                  - 'sagemaker:ListNotebookInstanceLifecycleConfigs'
                  - 'sagemaker:DescribeModel'
                  - 'sagemaker:ListTrainingJobs'
                  - 'sagemaker:DescribeHyperParameterTuningJob'
                  - 'sagemaker:UpdateEndpointWeightsAndCapacities'
                  - 'sagemaker:ListHyperParameterTuningJobs'
                  - 'sagemaker:ListEndpointConfigs'
                  - 'sagemaker:DescribeNotebookInstanceLifecycleConfig'
                  - 'sagemaker:ListTrainingJobsForHyperParameterTuningJob'
                  - 'sagemaker:StopHyperParameterTuningJob'
                  - 'sagemaker:DescribeEndpointConfig'
                  - 'sagemaker:ListModels'
                  - 'sagemaker:AddTags'
                  - 'sagemaker:ListNotebookInstances'
                  - 'sagemaker:StopTrainingJob'
                  - 'sagemaker:ListEndpoints'
                  - 'sagemaker:DeleteEndpoint'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:sagemaker:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':notebook-instance-lifecycle-config/*'
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action:
                  - 'ecr:SetRepositoryPolicy'
                  - 'ecr:CompleteLayerUpload'
                  - 'ecr:BatchDeleteImage'
                  - 'ecr:UploadLayerPart'
                  - 'ecr:DeleteRepositoryPolicy'
                  - 'ecr:InitiateLayerUpload'
                  - 'ecr:DeleteRepository'
                  - 'ecr:PutImage'
                Resource: 
                  - !Join 
                    - ''
                    - - 'arn:aws:ecr:'
                      - !Ref 'AWS::Region'
                      - ':'
                      - !Ref 'AWS::AccountId'
                      - ':repository/*sagemaker*'
              - Effect: Allow
                Action:
                  - 's3:GetObject'
                  - 's3:ListBucket'
                  - 's3:PutObject'
                  - 's3:DeleteObject'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:s3:::'
                      - !Ref SagemakerS3Bucket
                  - !Join 
                    - ''
                    - - 'arn:aws:s3:::'
                      - !Ref SagemakerS3Bucket
                      - /*
                Condition:
                  StringEquals:
                    'aws:PrincipalTag/ProjectID': !Ref ProjectID
              - Effect: Allow
                Action: 'iam:PassRole'
                Resource:
                  - !Join 
                    - ''
                    - - 'arn:aws:iam::'
                      - !Ref 'AWS::AccountId'
                      - ':role/*'
                Condition:
                  StringEquals:
                    'iam:PassedToService': sagemaker.amazonaws.com
  CodeBucketPolicy:
    Type: 'AWS::IAM::Policy'
    Condition: BucketCondition
    Properties:
      PolicyName: !Join 
        - ''
        - - !Ref ProjectName
          - CodeBucketPolicy
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action:
              - 's3:GetObject'
            Resource:
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref CodeBucketName
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref CodeBucketName
                  - '/*'
      Roles:
        - !Ref SageMakerExecutionRole
  SagemakerS3Bucket:
    Type: 'AWS::S3::Bucket'
    Properties:
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
  S3Policy:
    Type: 'AWS::S3::BucketPolicy'
    Properties:
      Bucket: !Ref SagemakerS3Bucket
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: AllowAccessFromVPCEndpoint
            Effect: Allow
            Principal: "*"
            Action:
              - 's3:Get*'
              - 's3:Put*'
              - 's3:List*'
              - 's3:DeleteObject'
            Resource:
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref SagemakerS3Bucket
              - !Join 
                - ''
                - - 'arn:aws:s3:::'
                  - !Ref SagemakerS3Bucket
                  - '/*'
            Condition:
              StringEquals:
                "aws:sourceVpce": "<PASTE S3 VPC ENDPOINT ID>"
  EFSLifecycleConfig:
    Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'
    Properties:
      NotebookInstanceLifecycleConfigName: 'Provisioned-LC'
      OnCreate:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash 
                - |
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
      OnStart:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash  
                - |
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config  
  EFSLifecycleConfigForS3:
    Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'
    Properties:
      NotebookInstanceLifecycleConfigName: 'Provisioned-LC-S3'
      OnCreate:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash 
                - |
                  # Copy Content
                - !Sub >
                  aws s3 cp s3://${CodeBucketName} /home/ec2-user/SageMaker/ --recursive 
                - |
                  # Set sts endpoint
                - >
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
      OnStart:
        - Content: !Base64 
            'Fn::Join':
              - ''
              - - |
                  #!/bin/bash  
                - |
                  aws configure set sts_regional_endpoints regional 
                - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config  
  SageMakerCustomResource:
    Type: 'Custom::SageMakerCustomResource'
    DependsOn: S3Policy
    Properties:
      ServiceToken: !Ref SageMakerBuildFunctionARN
      NotebookInstanceName: !Ref NotebookInstanceName
      NotebookInstanceType: !Ref NotebookInstanceType
      KmsKeyId: !Ref SagemakerKMSKey
      ENVName: !Join 
        - ''
        - - !Ref ENVName
          - !Sub Subnet1Id
      Subnet: !Ref SubnetName
      SecurityGroupName: !Ref SecurityGroupName
      ProjectName: !Ref ProjectName
      RootAccess: !Ref RootAccess
      VolumeSizeInGB: !Ref VolumeSizeInGB
      LifecycleConfigName: !If [BucketCondition, !GetAtt EFSLifecycleConfigForS3.NotebookInstanceLifecycleConfigName, !GetAtt EFSLifecycleConfig.NotebookInstanceLifecycleConfigName]  
      DirectInternetAccess: !Ref DirectInternetAccess
      RoleArn: !GetAtt 
        - SageMakerExecutionRole
        - Arn
      Tags:
        - Key: ProjectID
          Value: !Ref ProjectID
        - Key: ProjectName
          Value: !Ref ProjectName
Outputs:
  Message:
    Description: Execution Status
    Value: !GetAtt 
      - SageMakerCustomResource
      - Message
  SagemakerKMSKey:
    Description: KMS Key for encrypting Sagemaker resource
    Value: !Ref KeyAlias
  ExecutionRoleArn:
    Description: ARN of the Sagemaker Execution Role
    Value: !Ref SageMakerExecutionRole
  S3BucketName:
    Description: S3 bucket for SageMaker Notebook operation
    Value: !Ref SagemakerS3Bucket
  NotebookInstanceName:
    Description: Name of the Sagemaker Notebook instance created
    Value: !Ref NotebookInstanceName
  ProjectName:
    Description: Project ID used for SageMaker deployment
    Value: !Ref ProjectName
  ProjectID:
    Description: Project ID used for SageMaker deployment
    Value: !Ref ProjectID
  1. 接下来我们进入VPC服务主页,进入Endpoint功能,点击Create endpoint创建一个VPC endpoint节点,用于SageMaker私密安全的访问S3桶中的大模型文件。

  1. 为节点命名为“s3-endpoint”,并选择节点访问对象类型为AWS service,选择s3作为访问服务。

  1. 选择节点所在的VPC,并配置路由表,最后点击创建。

  1. 接下来我们进入亚马逊云科技service catalog服务主页,进入Portfolio功能,点击create创建一个新的portfolio,用于统一管理一整个包括不同云资源的服务。

  1. 为service portfolio起名“SageMakerPortfolio“,所有者选为CQ。

  1. 接下来我们为Portfolio添加云资源,点击"create product"

  1. 我们选择通过CloudFormation IaC脚本的形式创建Product云资源,为Product其名为”SageMakerProduct“,所有者设置为CQ。

  1. 在Product中添加CloudFormation脚本文件,我们通过URL的形式,将我们在第二步上传到S3中的CloudFormation脚本URL填入,并设置版本为1,最后点击Create创建Product云资源。

11.接下来我们进入到Constraints页面,点击create创建Constraints,用于通过权限管理限制利用Service Catalog Product对云资源的操作。

  1. 选择限制我们刚刚创建的的Product: "SageMakerProduct",选择限制的类型为创建。

  1. 为限制添加IAM角色规则,IAM角色中配置了对Product权限管理规则,再点击Create创建。

  1. 接下来我们点击Access,创建一个Access来限制可以访问Product云资源的用户。

  1. 我们添加了角色”SCEndUserRole“,用户代替用户访问Product创建云资源。

  1. 接下来我们开始利用Service Catalog Product创建一些列的云资源。选中我们刚创建的Product,点击Launch

  1. 为我们要创建的云资源Product起一个名字”DataScientistProduct“, 选择我们前一步创建的版本号1。

  1. 为将要通过Product创建的SageMaker配置参数,环境名以及实例名

  1. 添加我们在最开始创建的Lambda函数ARN ID,点击Launch开始创建。

  1. 最后回到SageMaker服务主页,可以看到我们利用Service Catalog Product功能成功创建了一个新的Jupyter Notebook实例。利用这个实例,我们就可以开发我们的AI服务应用。

以上就是在亚马逊云科技上利用亚马逊云科技安全、合规地训练AI大模型和开发AI应用全部步骤。欢迎大家未来与我一起,未来获取更多国际前沿的生成式AI开发方案。


本文转载自: https://blog.csdn.net/m0_66628975/article/details/141216385
版权归原作者 佛州小李哥 所有, 如有侵权,请联系我们删除。

“在亚马逊云科技上安全、合规地创建AI大模型训练基础设施并开发AI应用服务”的评论:

还没有评论