简介
在业务运维场景中,需要对核心的API接口进行拨测。而各个接口需要传递的参数或者接口之间的依赖是比较复杂的,通常接口之间都是通过链式请求来完成一个业务场景。常见的就是先登录,拿到token以后,再进行后续的API请求。postman提供了基于GUI的方式完成这种场景适配,但是对于运维来讲,需要定时的基于策略的形式来对API进行监控。本篇文章就带你从0-1打造API监控体系。
知识储备
Postman使用方法
Docker基础知识
部署步骤
- 从postman导出collection
以下文件以拨测httpbin.org为例,在Postman的GUI工具中导出拨测的json文件(httpbin.json)。示例中包含两个接口,一个模拟认证,一个模拟接口请求
- 将导出的文件放到docker中运行
docker run -d -p 8080:8080 -v ./httpbin.json:/runner/collection.json kevinniu666/postman-prometheus:1.0.0
- 获取拨测指标
curl 10.128.120.52:8080/metrics
# TYPE postman_lifetime_runs_total counter
postman_lifetime_runs_total{collection="httpbin"} 1
# TYPE postman_lifetime_iterations_total counter
postman_lifetime_iterations_total{collection="httpbin"} 1
# TYPE postman_lifetime_requests_total counter
postman_lifetime_requests_total{collection="httpbin"} 2
# TYPE postman_stats_iterations_total gauge
postman_stats_iterations_total{collection="httpbin"} 1
# TYPE postman_stats_iterations_failed gauge
postman_stats_iterations_failed{collection="httpbin"} 0
# TYPE postman_stats_requests_total gauge
postman_stats_requests_total{collection="httpbin"} 2
# TYPE postman_stats_requests_failed gauge
postman_stats_requests_failed{collection="httpbin"} 0
# TYPE postman_stats_tests_total gauge
postman_stats_tests_total{collection="httpbin"} 2
# TYPE postman_stats_tests_failed gauge
postman_stats_tests_failed{collection="httpbin"} 0
# TYPE postman_stats_test_scripts_total gauge
postman_stats_test_scripts_total{collection="httpbin"} 4
# TYPE postman_stats_test_scripts_failed gauge
postman_stats_test_scripts_failed{collection="httpbin"} 0
# TYPE postman_stats_assertions_total gauge
postman_stats_assertions_total{collection="httpbin"} 3
# TYPE postman_stats_assertions_failed gauge
postman_stats_assertions_failed{collection="httpbin"} 0
# TYPE postman_stats_transfered_bytes_total gauge
postman_stats_transfered_bytes_total{collection="httpbin"} 794
# TYPE postman_stats_resp_avg gauge
postman_stats_resp_avg{collection="httpbin"} 541
# TYPE postman_stats_resp_min gauge
postman_stats_resp_min{collection="httpbin"} 494
# TYPE postman_stats_resp_max gauge
postman_stats_resp_max{collection="httpbin"} 588
# TYPE postman_request_status_code gauge
postman_request_status_code{request_name="authentication",iteration="0",collection="httpbin"} 200
# TYPE postman_request_resp_time gauge
postman_request_resp_time{request_name="authentication",iteration="0",collection="httpbin"} 588
# TYPE postman_request_resp_size gauge
postman_request_resp_size{request_name="authentication",iteration="0",collection="httpbin"} 54
# TYPE postman_request_status_ok gauge
postman_request_status_ok{request_name="authentication",iteration="0",collection="httpbin"} 1
# TYPE postman_request_failed_assertions gauge
postman_request_failed_assertions{request_name="authentication",iteration="0",collection="httpbin"} 0
# TYPE postman_request_total_assertions gauge
postman_request_total_assertions{request_name="authentication",iteration="0",collection="httpbin"} 1
# TYPE postman_request_status_code gauge
postman_request_status_code{request_name="business-request",iteration="0",collection="httpbin"} 200
# TYPE postman_request_resp_time gauge
postman_request_resp_time{request_name="business-request",iteration="0",collection="httpbin"} 494
# TYPE postman_request_resp_size gauge
postman_request_resp_size{request_name="business-request",iteration="0",collection="httpbin"} 740
# TYPE postman_request_status_ok gauge
postman_request_status_ok{request_name="business-request",iteration="0",collection="httpbin"} 1
# TYPE postman_request_failed_assertions gauge
postman_request_failed_assertions{request_name="business-request",iteration="0",collection="httpbin"} 0
# TYPE postman_request_total_assertions gauge
postman_request_total_assertions{request_name="business-request",iteration="0",collection="httpbin"} 2
- 将指标接入prometheus中,prometheus的配置文件中添加以下信息。
- job_name: http-api-monitor
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 10.128.120.52:8080 #该ip为容器运行节点IP地址
labels:
usage: "httpbin接口拨测"
- 配置prometheus告警规则,详细的规则,大家可以根据prometheus的指标自己来设置。
- alert: 接口返回码异常
expr: postman_request_status_code != 200
for: 1m
labels:
severity: error
annotations:
summary: "接口响应代码非200"
description: "后端接口拨测失败"
- alert: 接口返回内容判定失败
expr: postman_request_failed_assertions != 0
for: 1m
labels:
severity: error
annotations:
summary: "接口返回内容判定失败"
description: "后端接口postman测试未通过"
接下来的事情就是对接Alertmanager,并把告警发送给运维了。如果需要Grafana的图表,项目中有说明。源代码地址:
GitHub - kevinniu666/postman-prometheus: Run Postman collections continuously and export results as Prometheus metrics
写在后面
整个实现方案的核心是这个容器,它将postman的运行转化成为了prometheus可识别的指标。这个容器的源代码已经在文档中提供了。如果你对nodejs有了解,可以自己去修改源码。该容器可以通过环境变量控制一些行为,列举如下:
放一张产线的业务拨测图表:
版权归原作者 一直学下去 所有, 如有侵权,请联系我们删除。