0


Prometheus监控k8s

文章目录

前情提要

这边已经将Prometheus安装到k8s中了

监控k8s节点

安装node_exporter,我这边是采用的DaemonSet来确保每个节点都有对应的pod

  1. apiVersion: apps/v1
  2. kind: DaemonSet
  3. metadata:name: node-exporter
  4. namespace: monitor
  5. labels:k8s-app: node-exporter
  6. spec:selector:matchLabels:k8s-app: node-exporter
  7. template:metadata:labels:k8s-app: node-exporter
  8. spec:hostPID:true#这几项是定义了该pod直接共享node的资源,这样也不需要用svc来暴露端口了hostIPC:truehostNetwork:truecontainers:-image: bitnami/node-exporter:latest
  9. args:---web.listen-address=$(HOSTIP):9100---path.procfs=/host/proc
  10. ---path.sysfs=/host/sys
  11. ---path.rootfs=/host/root
  12. ---collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
  13. ---collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
  14. name: node-exporter
  15. env:-name: HOSTIP
  16. valueFrom:fieldRef:fieldPath: status.hostIP
  17. resources:requests:cpu: 150m
  18. memory: 180Mi
  19. limits:cpu: 150m
  20. memory: 180Mi
  21. securityContext:runAsNonRoot:truerunAsUser:65534volumeMounts:-name: proc # 我对应的卷都是用的hostPath,直接将宿主机卷挂给pod避免pod无法正常获取node信息mountPath: /host/proc
  22. -name: sys
  23. mountPath: /host/sys
  24. -name: root
  25. mountPath: /host/root
  26. mountPropagation: HostToContainer
  27. readOnly:trueports:-containerPort:9100protocol: TCP
  28. name: http
  29. tolerations:#这里是为了让pod能在master上运行,加了容忍度-key: node-role.kubernetes.io/control-plane
  30. operator: Exists
  31. effect: NoSchedule
  32. volumes:-name: proc
  33. hostPath:path: /proc
  34. -name: dev
  35. hostPath:path: /dev
  36. -name: sys
  37. hostPath:path: /sys
  38. -name: root
  39. hostPath:path: /
  1. root@master1:~/k8s-prometheus# kubectl apply -f node-export.yaml
  2. root@master1:~/k8s-prometheus# kubectl get pod -n monitor
  3. NAME READY STATUS RESTARTS AGE
  4. grafana-core-5c68549dc7-t92fv 1/1 Running 0 2d17h
  5. node-exporter-6fqvt 1/1 Running 0 2d19h
  6. node-exporter-bgrjn 1/1 Running 0 2d19h
  7. node-exporter-bk7m4 1/1 Running 0 2d19h
  8. node-exporter-m7wgx 1/1 Running 0 2d19h
  9. node-exporter-mbgtg 1/1 Running 0 2d19h
  10. node-exporter-rdtcs 1/1 Running 0 2d19h
  11. prometheus-7d659686d7-x62vt 1/1 Running 0 2d16h
  12. root@master1:~/k8s-prometheus# ss -lntp |grep node_exporter
  13. LISTEN 0409610.10.21.170:9100 0.0.0.0:* users:(("node_exporter",pid=2597275,fd=3))#可以看到它直接使用了宿主机的9100端口

监控coreDns服务

coredns自身提供了/metrics接口,我们直接配置prometheus去它的9153拿数据即可

  1. root@master1:~/k8s-prometheus# kubectl -n kube-system get svc
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. kube-dns ClusterIP 10.100.0.10 <none>53/UDP,53/TCP,9153/TCP 22d
  4. root@master1:~/k8s-prometheus# kubectl -n kube-system get cm
  5. NAME DATA AGE
  6. coredns 1 22d
  7. extension-apiserver-authentication 6 22d
  8. kube-proxy 2 22d
  9. kube-root-ca.crt 1 22d
  10. kubeadm-config 1 22d
  11. kubelet-config 1 22d
  12. root@master1:~/k8s-prometheus# kubectl -n kube-system get cm coredns -o yaml
  13. apiVersion: v1
  14. data:
  15. Corefile: |
  16. .:53 {
  17. errors
  18. health {
  19. lameduck 5s
  20. }
  21. ready
  22. kubernetes cluster.local in-addr.arpa ip6.arpa {
  23. pods insecure
  24. fallthrough in-addr.arpa ip6.arpa
  25. ttl 30}
  26. prometheus :9153
  27. forward . /etc/resolv.conf {
  28. max_concurrent 1000}
  29. cache 30
  30. loop
  31. reload
  32. loadbalance
  33. }
  34. kind: ConfigMap
  35. metadata:
  36. creationTimestamp: "2022-10-11T10:23:37Z"
  37. name: coredns
  38. namespace: kube-system
  39. resourceVersion: "228"
  40. uid: 632b41d8-6dda-40b7-81c1-b6cffefaf2e8

监控Ingress-nginx

Ingress-nginx自身默认也提供了/metrics接口,但是需要我们先给他暴露10254端口,暴露之后我们直接配置prometheus去它的9153拿数据即可
暴露Ingress端口有多种方法,比如把对应deployment的网络模式改为hostNetwork,直接将node的10254给到Ingress
或者先在deployment中将端口加上,再去Ingress对应的svc里面将10254端口代理一下
这边我用的是第一种方式

  1. root@master1:~# kubectl -n ingress-nginx edit deployments.apps ingress-nginx-controller

需要改的地方有两处

  1. root@master1:~# kubectl -n ingress-nginx get deployments.apps ingress-nginx-controller -o yamlapiVersion: apps/v1
  2. kind: Deployment
  3. metadata:annotations:deployment.kubernetes.io/revision:"8"kubectl.kubernetes.io/last-applied-configuration:|
  4. {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"ingress-nginx","app.kubernetes.io/version":"1.1.1","helm.sh/chart":"ingress-nginx-4.0.15"},"name":"ingress-nginx-controller","namespace":"ingress-nginx"},"spec":{"minReadySeconds":0,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/name":"ingress-nginx"}},"strategy":{"rollingUpdate":{"maxUnavailable":1},"type":"RollingUpdate"},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-nginx","app.kubernetes.io/name":"ingress-nginx"}},"spec":{"containers":[{"args":["/nginx-ingress-controller","--election-id=ingress-controller-leader","--controller-class=k8s.io/ingress-nginx","--configmap=$(POD_NAMESPACE)/ingress-nginx-controller","--validating-webhook=:8443","--validating-webhook-certificate=/usr/local/certificates/cert","--validating-webhook-key=/usr/local/certificates/key","--watch-ingress-without-class=true","--publish-status-address=localhost"],"env":[{"name":"POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"LD_PRELOAD","value":"/usr/local/lib/libmimalloc.so"}],"image":"registry.cn-qingdao.aliyuncs.com/kubernetes_xingej/nginx-ingress-controller:v1.1.1","imagePullPolicy":"IfNotPresent","lifecycle":{"preStop":{"exec":{"command":["/wait-shutdown"]}}},"livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/healthz","port":10254,"scheme":"HTTP"},"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"name":"controller","ports":[{"containerPort":80,"hostPort":80,"name":"http","protocol":"TCP"},{"containerPort":443,"hostPort":443,"name":"https","protocol":"TCP"},{"containerPort":8443,"name":"webhook","protocol":"TCP"}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/healthz","port":10254,"scheme":"HTTP"},"initialDelaySeconds":10,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"resources":{"requests":{"cpu":"100m","memory":"90Mi"}},"securityContext":{"allowPrivilegeEscalation":true,"capabilities":{"add":["NET_BIND_SERVICE"],"drop":["ALL"]},"runAsUser":101},"volumeMounts":[{"mountPath":"/usr/local/certificates/","name":"webhook-cert","readOnly":true}]}],"dnsPolicy":"ClusterFirst","nodeSelector":{"ingress-ready":"true","kubernetes.io/os":"linux"},"serviceAccountName":"ingress-nginx","terminationGracePeriodSeconds":0,"tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Equal"}],"volumes":[{"name":"webhook-cert","secret":{"secretName":"ingress-nginx-admission"}}]}}}}creationTimestamp:"2022-10-31T08:30:33Z"generation:8labels:app.kubernetes.io/component: controller
  5. app.kubernetes.io/instance: ingress-nginx
  6. app.kubernetes.io/managed-by: Helm
  7. app.kubernetes.io/name: ingress-nginx
  8. app.kubernetes.io/version: 1.1.1
  9. helm.sh/chart: ingress-nginx-4.0.15
  10. name: ingress-nginx-controller
  11. namespace: ingress-nginx
  12. resourceVersion:"3780917"uid: e4b28ba0-eb0b-45e3-a0b8-c03797c8f416
  13. spec:progressDeadlineSeconds:600replicas:1revisionHistoryLimit:10selector:matchLabels:app.kubernetes.io/component: controller
  14. app.kubernetes.io/instance: ingress-nginx
  15. app.kubernetes.io/name: ingress-nginx
  16. strategy:rollingUpdate:maxSurge: 25%
  17. maxUnavailable:1type: RollingUpdate
  18. template:metadata:creationTimestamp:nulllabels:app.kubernetes.io/component: controller
  19. app.kubernetes.io/instance: ingress-nginx
  20. app.kubernetes.io/name: ingress-nginx
  21. spec:containers:-args:- /nginx-ingress-controller
  22. ---election-id=ingress-controller-leader
  23. ---controller-class=k8s.io/ingress-nginx
  24. ---configmap=$(POD_NAMESPACE)/ingress-nginx-controller
  25. ---validating-webhook=:8443---validating-webhook-certificate=/usr/local/certificates/cert
  26. ---validating-webhook-key=/usr/local/certificates/key
  27. ---watch-ingress-without-class=true
  28. ---publish-status-address=localhost
  29. env:-name: POD_NAME
  30. valueFrom:fieldRef:apiVersion: v1
  31. fieldPath: metadata.name
  32. -name: POD_NAMESPACE
  33. valueFrom:fieldRef:apiVersion: v1
  34. fieldPath: metadata.namespace
  35. -name: LD_PRELOAD
  36. value: /usr/local/lib/libmimalloc.so
  37. image: bitnami/nginx-ingress-controller:1.1.1
  38. imagePullPolicy: IfNotPresent
  39. lifecycle:preStop:exec:command:- /wait-shutdown
  40. livenessProbe:failureThreshold:5httpGet:path: /healthz
  41. port:10254scheme: HTTP
  42. initialDelaySeconds:10periodSeconds:10successThreshold:1timeoutSeconds:1name: controller
  43. ports:-containerPort:10254#这两行是将10254暴露出来hostPort:10254name: prometheus
  44. protocol: TCP
  45. -containerPort:80hostPort:80name: http
  46. protocol: TCP
  47. -containerPort:443hostPort:443name: https
  48. protocol: TCP
  49. -containerPort:8443hostPort:8443name: webhook
  50. protocol: TCP
  51. readinessProbe:failureThreshold:3httpGet:path: /healthz
  52. port:10254scheme: HTTP
  53. initialDelaySeconds:10periodSeconds:10successThreshold:1timeoutSeconds:1resources:requests:cpu: 100m
  54. memory: 90Mi
  55. securityContext:allowPrivilegeEscalation:truecapabilities:add:- NET_BIND_SERVICE
  56. drop:- ALL
  57. runAsUser:101terminationMessagePath: /dev/termination-log
  58. terminationMessagePolicy: File
  59. volumeMounts:-mountPath: /usr/local/certificates/
  60. name: webhook-cert
  61. readOnly:truednsPolicy: ClusterFirst
  62. hostNetwork:true# 这里是声明直接将宿主机的端口给Ingress使用nodeSelector:ingress-ready:"true"kubernetes.io/os: linux
  63. restartPolicy: Always
  64. schedulerName: default-scheduler
  65. securityContext:{}serviceAccount: ingress-nginx
  66. serviceAccountName: ingress-nginx
  67. terminationGracePeriodSeconds:0tolerations:-effect: NoSchedule
  68. key: node-role.kubernetes.io/master
  69. operator: Equal
  70. volumes:-name: webhook-cert
  71. secret:defaultMode:420secretName: ingress-nginx-admission
  72. status:availableReplicas:1conditions:-lastTransitionTime:"2022-10-31T08:30:33Z"lastUpdateTime:"2022-10-31T08:30:33Z"message: Deployment has minimum availability.
  73. reason: MinimumReplicasAvailable
  74. status:"True"type: Available
  75. -lastTransitionTime:"2022-10-31T08:46:21Z"lastUpdateTime:"2022-11-03T10:25:05Z"message: ReplicaSet "ingress-nginx-controller-87bc8b9c7" has successfully progressed.
  76. reason: NewReplicaSetAvailable
  77. status:"True"type: Progressing
  78. observedGeneration:8readyReplicas:1replicas:1updatedReplicas:1

这时候可以检查一下宿主机的端口是否被ingress占用

  1. root@node1:~# ss -lntp |grep nginx-ingress
  2. LISTEN 04096127.0.0.1:10245 0.0.0.0:* users:(("nginx-ingress-c",pid=1398881,fd=8))

监控k8s-state-metrics

  1. root@master1:~# git clone https://github.com/kubernetes/kube-state-metrics.git
  2. root@master1:~# cd kube-state-metrics/examples/standard/
  3. root@master1:~/kube-state-metrics/examples/standard# ls
  4. cluster-role-binding.yaml cluster-role.yaml deployment.yaml service-account.yaml service.yaml
  5. # 克隆下来之后直接有几个yaml文件,稍微修改一下就可以直接用了

这边改一下service.yaml

  1. root@master1:~/kube-state-metrics/examples/standard# cat service.yaml
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. labels:
  6. app.kubernetes.io/component: exporter
  7. app.kubernetes.io/name: kube-state-metrics
  8. app.kubernetes.io/version: 2.6.0
  9. name: kube-state-metrics
  10. namespace: kube-system
  11. spec:
  12. type: ClusterIP # 刚clone下来是无头服务,我这边改为了ClusterIP
  13. ports:
  14. - name: http-metrics
  15. port: 8080
  16. targetPort: http-metrics
  17. - name: telemetry
  18. port: 8081
  19. targetPort: telemetry
  20. selector:
  21. app.kubernetes.io/name: kube-state-metrics

然后apply一下并进行检查

  1. root@master1:~/kube-state-metrics/examples/standard# kubectl apply -f . #.不要掉了,指的apply当前目录下的yaml
  2. root@master1:~/kube-state-metrics/examples/standard# kubectl get svc,pod -n kube-system
  3. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  4. service/kube-dns ClusterIP 10.100.0.10 <none>53/UDP,53/TCP,9153/TCP 23d
  5. service/kube-state-metrics ClusterIP 10.100.162.224 <none>8080/TCP,8081/TCP 91m #这就是刚刚生成的svc
  6. NAME READY STATUS RESTARTS AGE
  7. pod/coredns-c676cc86f-7bd45 1/1 Running 5(2d2h ago) 22d
  8. pod/coredns-c676cc86f-7l2c8 1/1 Running 0 22d
  9. pod/coredns-c676cc86f-gl6qc 1/1 Running 0 22d
  10. pod/coredns-c676cc86f-rxcjn 1/1 Running 0 22d
  11. pod/kube-apiserver-master1.org 1/1 Running 6(2d2h ago) 17d
  12. pod/kube-apiserver-master2.org 1/1 Running 0 17d
  13. pod/kube-apiserver-master3.org 1/1 Running 0 17d
  14. pod/kube-controller-manager-master1.org 1/1 Running 7(2d2h ago) 23d
  15. pod/kube-controller-manager-master2.org 1/1 Running 0 17d
  16. pod/kube-controller-manager-master3.org 1/1 Running 0 17d
  17. pod/kube-proxy-5pnh5 1/1 Running 0 21d
  18. pod/kube-proxy-vws54 1/1 Running 0 21d
  19. pod/kube-proxy-wlknz 1/1 Running 0 21d
  20. pod/kube-proxy-wtnnf 1/1 Running 5(2d2h ago) 21d
  21. pod/kube-proxy-xkkhp 1/1 Running 0 21d
  22. pod/kube-proxy-zf4vg 1/1 Running 0 21d
  23. pod/kube-scheduler-master1.org 1/1 Running 7(2d2h ago) 23d
  24. pod/kube-scheduler-master2.org 1/1 Running 0 22d
  25. pod/kube-scheduler-master3.org 1/1 Running 0 22d
  26. pod/kube-state-metrics-757ff7b448-lwwlg 1/1 Running 0 90m #这个就是刚刚生成的pod

验证一下metrics是否有数据

  1. root@node1:~# curl -I 10.100.162.224:8080/metrics
  2. HTTP/1.1 200 OK
  3. Content-Type: text/plain;version=0.0.4
  4. Date: Thu, 03 Nov 202216:25:21 GMT

修改Prometheus的configmap并重载Prometheus

  1. root@master1:~/k8s-prometheus# cat prometheus-configmap.yaml
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: prometheus-config
  6. namespace: monitor
  7. data:
  8. prometheus.yml: |
  9. global:
  10. scrape_interval: 15s
  11. scrape_timeout: 15s
  12. scrape_configs:
  13. - job_name: 'prometheus'
  14. static_configs:
  15. - targets: ['localhost:9090']
  16. - job_name: 'Linux Server'
  17. static_configs:
  18. - targets:
  19. - 'xxx.xxx.xxx.xxx:9100'
  20. - 'xxx.xxx.xxx.xxx:9100'
  21. - 'xxx.xxx.xxx.xxx:9100'
  22. - 'xxx.xxx.xxx.xxx:9100'
  23. - 'xxx.xxx.xxx.xxx:9100'
  24. - 'xxx.xxx.xxx.xxx:9100'
  25. - job_name: 'coredns'
  26. static_configs: #我这边基本上都是用IP写的target,实际上也可以用域名,默认的域名书写方式为 kube-dns.kube-system.svc.cluster.local
  27. - targets: ['xxx.xxx.xxx.xxx:9153']
  28. - job_name: 'ingress-nginx'
  29. static_configs:
  30. - targets: ['xxx.xxx.xxx.xxx:10254']
  31. - job_name: 'k8s-state-metrics'
  32. static_configs:
  33. - targets:
  34. - '10.100.162.224:8080'
  35. - '10.100.162.224:8081'
  1. root@master1:~/k8s-prometheus# kubectl apply -f prometheus-configmap.yaml
  2. root@master1:~/k8s-prometheus# curl -XPOST 10.100.122.13:9090/-/reload

查看是否生效
在这里插入图片描述

Grafana展示监控的数据

基本上大部分需要import的都可以直接去Grafana官网搜索,搜索完成之后可以copy 对应id,也可以下载下来
下面演示导入ID的操作(如果是导入json的话,只需要复制json文件粘贴到对应的框里面即可)
在这里插入图片描述在这里插入图片描述

节点基础信息监控

这个dashboard导入的是9276
在这里插入图片描述

coredns监控

这里导入的是5926,效果如下
在这里插入图片描述

Ingress监控

这里导入的是9614,效果如下
在这里插入图片描述

k8s信息监控

这个dashboard导入的是13105
在这里插入图片描述


本文转载自: https://blog.csdn.net/weixin_67405599/article/details/127674731
版权归原作者 陈骄 所有, 如有侵权,请联系我们删除。

“Prometheus监控k8s”的评论:

还没有评论