karpenter的主要目的是根据pod的调度请求,动态预置节点来满足无法调度的pod。清空pod时快速删除节点释放资源
default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) were unschedulable
default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
karpenter Pod should schedule on ip-192-168-21-5.cn-north-1.compute.internal
工作流,Watching,Evaluating ,Provisioning ,Removing
部署和配置karpenter
创建eks集群
#!/bin/bashexportCLUSTER_NAME="testkarpenter"exportAWS_DEFAULT_REGION="cn-north-1"exportAWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"echo$KARPENTER_VERSION$CLUSTER_NAME$AWS_DEFAULT_REGION$AWS_ACCOUNT_ID
eksctl create cluster -f - <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ${CLUSTER_NAME}
region: ${AWS_DEFAULT_REGION}
version: "1.23"
tags:
karpenter.sh/discovery: ${CLUSTER_NAME}
managedNodeGroups:
- instanceType: m5.large
amiFamily: AmazonLinux2
name: ${CLUSTER_NAME}-ng
desiredCapacity: 1
minSize: 1
maxSize: 2
iam:
withOIDC: true
EOF
部署karpenter
exportKARPENTER_VERSION=v0.22.1
exportCLUSTER_NAME="testkarpenter"exportAWS_DEFAULT_REGION="cn-north-1"exportAWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"echo$KARPENTER_VERSION$CLUSTER_NAME$AWS_DEFAULT_REGION$AWS_ACCOUNT_IDTEMPOUT=$(mktemp)curl-fsSL https://karpenter.sh/"${KARPENTER_VERSION}"/getting-started/getting-started-with-eksctl/cloudformation.yaml >$TEMPOUT\&& aws cloudformation deploy \
--stack-name "Karpenter-${CLUSTER_NAME}"\
--template-file "${TEMPOUT}"\--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides "ClusterName=${CLUSTER_NAME}"
eksctl create iamidentitymapping \--username system:node:{{EC2PrivateDNSName}}\--cluster"${CLUSTER_NAME}"\--arn"arn:aws-cn:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"\--group system:bootstrappers \--group system:nodes
eksctl create iamserviceaccount \--cluster"${CLUSTER_NAME}"--name karpenter --namespace karpenter \
--role-name "${CLUSTER_NAME}-karpenter"\
--attach-policy-arn "arn:aws-cn:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}"\
--role-only \--approve
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com ||trueexportCLUSTER_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME}--query"cluster.endpoint"--output text)"exportKARPENTER_IAM_ROLE_ARN="arn:aws-cn:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version${KARPENTER_VERSION}--namespace karpenter --create-namespace \--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN}\--setsettings.aws.clusterName=${CLUSTER_NAME}\--setsettings.aws.clusterEndpoint=${CLUSTER_ENDPOINT}\--setsettings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME}\--setsettings.aws.interruptionQueueName=${CLUSTER_NAME}\--wait# 旧版本v18部署# helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter --create-namespace \# --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \# --set clusterName=${CLUSTER_NAME} \# --set clusterEndpoint=${CLUSTER_ENDPOINT} \# --set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \# --wait
创建示例provisioner
exportCLUSTER_NAME="testkarpenter"cat<<EOF| kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
limits:
resources:
cpu: 1000
providerRef:
name: default
ttlSecondsAfterEmpty: 10
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
karpenter.sh/discovery: ${CLUSTER_NAME}
securityGroupSelector:
karpenter.sh/discovery: ${CLUSTER_NAME}
EOF
部署负载测试
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 1
EOF
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
清理资源
#!/bin/bashset-xexportCLUSTER_NAME="testkarpenter"exportAWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"#helm uninstall karpenter --namespace karpenter
aws iam detach-role-policy --role-name="${CLUSTER_NAME}-karpenter" --policy-arn="arn:aws-cn:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}"
aws iam delete-policy --policy-arn="arn:aws-cn:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}"
aws iam delete-role --role-name="${CLUSTER_NAME}-karpenter"
aws cloudformation delete-stack --stack-name "Karpenter-${CLUSTER_NAME}"
aws ec2 describe-launch-templates \| jq -r".LaunchTemplates[].LaunchTemplateName"\|grep-i"Karpenter-${CLUSTER_NAME}"\|xargs -I{} aws ec2 delete-launch-template --launch-template-name {}
eksctl delete cluster --name"${CLUSTER_NAME}"
可以通过configmap或者容器环境变量的方式配置karpenter
https://karpenter.sh/v0.22.1/concepts/settings/#environment-variables–cli-flags
Environment:
CLUSTER_NAME: testkarpenter
CLUSTER_ENDPOINT: https://Bxxxxxxxxxxxxxxx39D808.sk1.cn-north-1.eks.amazonaws.com.c
provisioner
provisioner为karpenter的crd指定资源的预置配置,每个provisioner管理一组不同的节点。
provisioner适配不同资源要求的pod,karpenter根据pod的属性调度和预置资源(计算方式未知),不需要再创建节点组了。
- pod可以使用 Well-known labels 配置pod请求特定实例(类型,架构,操作系统等)
- 默认的
AWSNodeTemplate
通过安全组和子网资源判断节点启动,资源使用karpenter.sh/discovery
标记 ttlSecondsAfterEmpty
参数为pod删除节点空置后关闭节点前的时间。ttlSecondsUntilExpired
指定节点存续时间weight
指定provisioner权重
只要pod的掉地请求不超过provisioner的限制就会寻找最佳匹配,但是如果无法匹配,pod会保持在
unscheduled
状态。
集群和karpenter controller运行后,配置provisioner和wordload的约束
- Set up provisioners,可以设置的约束有,taints,labels,requirements
- Deploy workloads,可以指定pod的约束有,resources,nodeselector,nodeAffinity,podAffinity/podAntiAffinity,tolerations,topologySpreadConstraints,Persistent volume topology
karpenter自动预置新节点响应无法调度的pod,机制是获取集群events,然后向底层cloud provider发送指令
provisioner的使用逻辑
provisioner能够为节点设置约束,限制节点创建的区域,类型,在节点启动时增加label。
集群中配置的每个provisoner都会循环检查
provisioner应该是互斥的,一个pod不应该通过适配多个provisioner。如果有多个provisoner则会按照权重选择权重最高
可以参考官方的provisioner配置启动实例
这里使用默认的awsnotetemplate,并且在启动节点时增加taint,指定启动类型为
on-demand
,权重为10
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:name: banana
spec:taints:-key: example.com/special-taint
effect: NoSchedule
labels:billing-team: my-team
requirements:-key: karpenter.sh/capacity-type
operator: In
values:["on-demand"]limits:resources:cpu:1000providerRef:name: default
ttlSecondsAfterEmpty:10weight:10
指定pod容忍和nodeselector,此时默认provisioner和banana都适配,于是按照权重选择banana
Launching node with 3 pods requesting {"cpu":"3125m","pods":"5"} from types t3a.xlarge, c5a.xlarge, t3.xlarge, c6i.xlarge, c5.xlarge and 166 other(s){"commit":"51becf8-dirty", "provisioner":"banana"}
node templates
awsnodetemplate是aws相关的特定配置,不同的provisioner可以使用同一个awsnotetemplate,可以理解为对于默认创建的启动模板的参数更改
示例模板和可用的参数
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:name: default
spec:providerRef:name: default
---apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:name: default
spec:subnetSelector:{...}# required, discovers tagged subnets to attach to instancessecurityGroupSelector:{...}# required, discovers tagged security groups to attach to instancesinstanceProfile:"..."# optional, overrides the node's identity from global settingsamiFamily:"..."# optional, resolves a default ami and userdataamiSelector:{...}# optional, discovers tagged amis to override the amiFamily's defaultuserData:"..."# optional, overrides autogenerated userdata with a merge semantictags:{...}# optional, propagates tags to underlying EC2 resourcesmetadataOptions:{...}# optional, configures IMDS for the instanceblockDeviceMappings:[...]# optional, configures storage devices for the instance
例如通过awsnodetemplate指定所需要使用的ami,在启动节点时karpenter自动识别ami兼容的架构启动实例,如果发现多个可用ami,则使用最新的一个。如果没有找到ami则不启动实例
https://karpenter.sh/v0.22.1/concepts/node-templates/#ami-selection
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:name: default
spec:amiSelector:aws-ids:"ami-08f2bf224b42c81da"subnetSelector:karpenter.sh/discovery: testkarpenter
securityGroupSelector:karpenter.sh/discovery: testkarpenter
karpenter工作流
karpenter通过几层约束限制提供资源
- cloud provider约束,包括实例类型,架构,区域
- provisioner约束
- pod deployment约束
节点启动流程
- 发现需要预置资源的pod,包含commitid
Found 15 provisionable pod(s){"commit":"51becf8-dirty"}
- 计算pod需要启动的节点数量,以下预期数量为17说明包括了ds的pod
Launching node with 15 pods requesting {"cpu":"15125m","pods":"17"} from types c3.4xlarge, c4.4xlarge, r3.4xlarge, m4.4xlarge, c5.4xlarge and 110 other(s){"commit":"51becf8-dirty", "provisioner":"default"}
- 创建启动模板并启动实例
Created launch template, Karpenter-testkarpenter-8870469202198737887 {"commit":"51becf8-dirty", "provisioner":"default"}Launched instance: i-09fxxxxxxxx14e, hostname: ip-192-168-39-191.cn-north-1.compute.internal, type: r5.4xlarge, zone: cn-north-1b, capacityType: spot {"commit":"51becf8-dirty", "provisioner":"default"}
创建的启动模板中userdata如下#!/bin/bash -xeexec>>(tee /var/log/user-data.log|logger -t user-data -s2>/dev/console)2>&1/etc/eks/bootstrap.sh 'testkarpenter' --apiserver-endpoint 'https://xxxxxxxx.sk1.cn-north-1.eks.amazonaws.com.cn' --b64-cluster-ca 'xxxxxxx'\--container-runtime containerd \--kubelet-extra-args '--node-labels=billing-team=my-team,karpenter.sh/capacity-type=on-demand,karpenter.sh/provisioner-name=banana --register-with-taints=example.com/special-taint=:NoSchedule'
- 删除pod节点空置,添加节点终止的ttl
Added TTL to empty node{"commit":"51becf8-dirty", "node":"ip-192-168-39-191.cn-north-1.compute.internal"}
- karpenter使用
finalizer
处理删除节点,cordon节点,drain所有的pod,然后终止实例,删除节点Triggering termination after 30s for empty node{"commit":"51becf8-dirty", "node":"ip-192-168-39-191.cn-north-1.compute.internal"}Cordoned node{"commit":"51becf8-dirty", "node":"ip-192-168-39-191.cn-north-1.compute.internal"}Deleted node{"commit":"51becf8-dirty", "node":"ip-192-168-39-191.cn-north-1.compute.internal"}Deleted launch template Karpenter-testkarpenter-xx (lt-0aa5exxxxbcfe0){"commit":"51becf8-dirty"}
相关错误
如果没有创建provisioner,会出现以下错误,pod处于pending状态
Provisioning failed, creating scheduler, no provisioners found {"commit":"51becf8-dirty"}
如果provisioner的配置和pod的调度要求不一致,并且不存在可用的其他provisioner,则出现以下不兼容错误
controller.provisioning Could not schedule pod, incompatible with provisioner "banana", did not tolerate example.com/special-taint=:NoSchedule {"commit":"51becf8-dirty", "pod":"default/condition-workload-64f8b97857-lxsjv"}
如果启动时容量不足会出现以下错误
Provisioning failed, launching node, creating cloud provider instance, with fleet error(s), InsufficientInstanceCapacity: We currently do not have sufficient p3.8xlarge capacity in the Availability Zone you requested (cn-north-1a). Our system will be working on provisioning additional capacity. You can currently get p3.8xlarge capacity by not specifying an Availability Zone in your request or choosing cn-north-1b.
默认provisioner不兼容,于实controller放松约束,继续调度pod
Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default]{"commit":"51becf8-dirty", "pod":"karpenter/karpenter-76f776664b-5r9fz"}
controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints ={"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}}{"commit":"51becf8-dirty", "pod":"karpenter/karpenter-76f776664b-5r9fz"}
版权归原作者 zhojiew 所有, 如有侵权,请联系我们删除。