0


腾讯mini项目-【指标监控服务重构】2023-08-24

今日已办

Jeager

功能

  1. 监控分布式工作流程并排除故障
  2. 识别性能瓶颈
  3. 追踪根本原因
  4. 分析服务依赖关系

部署

  • 部署 Deployment — Jaeger documentation (jaegertracing.io)
  • 支持 clickhouse jaegertracing/jaeger-clickhouse: Jaeger ClickHouse storage plugin implementation (github.com)
  • 使用 prometheus 监控 Service Performance Monitoring (SPM) — Jaeger documentation (jaegertracing.io)
  • 使用 elasticsearch docker - How to configure Jaeger with elasticsearch? - Stack Overflow

image-20230824164528524

image-20230824164557310

  • github issue jaeger-collector: Failed to init storage factory · Issue #931 · jaegertracing/jaeger (github.com)

image-20230824164635380

version:"3"services:proxy:image: traefik:v3.0
    container_name: proxy
    hostname: proxy
    networks:- elastic-jaeger
    ports:-"80:80"-"8080:8080"command:---ping=true
      ---api.dashboard=true
      ---api.insecure=true
      ---providers.file.directory=/etc/traefik
      ---providers.file.watch=true
      ---entrypoints.web-entrypoint.address=:80---entrypoints.kafka-entrypoint.address=:9092---accesslog=true
      ---metrics.openTelemetry=true
      ---metrics.openTelemetry.address=jaeger-collector:4317---metrics.openTelemetry.grpc=true
      ---metrics.openTelemetry.insecure=true
      ---tracing.openTelemetry=true
      ---tracing.openTelemetry.address=jaeger-collector:4317---tracing.openTelemetry.grpc=true
      ---tracing.openTelemetry.insecure=true
      ---log.level=WARN # DEBUG, INFO, WARN, ERROR, FATAL, PANIChealthcheck:test:["CMD-SHELL","traefik healthcheck --ping"]interval: 5s
      timeout: 3s
      retries:3volumes:- ./config/traefik:/etc/traefik

  elasticsearch:image: elasticsearch:7.17.12
    container_name: elasticsearch
    networks:- elastic-jaeger
    ports:-"127.0.0.1:9200:9200"-"127.0.0.1:9300:9300"restart: on-failure
    environment:- cluster.name=jaeger-cluster
      - discovery.type=single-node
      - http.host=0.0.0.0
      - transport.host=127.0.0.1
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
      - xpack.security.enabled=false
    volumes:- esdata:/usr/share/elasticsearch/data

  jaeger-collector:container_name: jaeger-collector
    image: jaegertracing/jaeger-collector
    ports:-"14269:14269"-"14268:14268"-"14267:14267"-"14250:14250"-"9411:9411"-"4317:4317"networks:- elastic-jaeger
    restart: on-failure
    environment:- SPAN_STORAGE_TYPE=elasticsearch
    command:["--es.server-urls=http://elasticsearch:9200","--es.num-shards=1","--es.num-replicas=0","--log-level=error"]depends_on:- elasticsearch

  jaeger-agent:container_name: jaeger-agent
    image: jaegertracing/jaeger-agent
    hostname: jaeger-agent
    command:["--reporter.grpc.host-port=jaeger-collector:14250"]ports:-"5775:5775/udp"-"6831:6831/udp"-"6832:6832/udp"-"5778:5778"networks:- elastic-jaeger
    restart: on-failure
    environment:- SPAN_STORAGE_TYPE=elasticsearch
    depends_on:- jaeger-collector

  jaeger-query:container_name: jaeger-query
    image: jaegertracing/jaeger-query
    environment:- SPAN_STORAGE_TYPE=elasticsearch
      - no_proxy=localhost
    ports:-"16686:16686"-"16687:16687"networks:- elastic-jaeger
    restart: on-failure
    command:["--es.server-urls=http://elasticsearch:9200","--span-storage.type=elasticsearch","--log-level=debug"]depends_on:- jaeger-agent

volumes:esdata:driver: local

networks:elastic-jaeger:driver: bridge
  • Service Performance Monitoring (SPM) — Jaeger documentation (jaegertracing.io)

image-20230824151625898

image-20230824163003324

可以看到指标了

image-20230824163605029

jaeger的 trace 展示与 grafana,signoz 不一致

image-20230824200138255

出现异常,为修改相关代码,先前可以在Prometheus观测到traefik的指标【已修复】

image-20230824201645823

Otel-collector 的 Pipeline

理解了整个 otel-collector 的 Pipeline 的流程和各个组件的功能

  • spanmetrics 是一个 connector
  • 它可以作为一个 receiver 【可以接收上游 trace pipeline 的 spanmetrics - 它作为一个 exporter】来开启一个metric 的 pipeline
  • 它可以作为一个 exporter 【存储 trace pipeline 的 span 指标】
  • spanmetrics 定义为 processer,可以在 trace 的 pipeline 中将 span的指标导出到 Prometheus 里

image-20230824212138251

可以观测到 traefik、venus、profile 上报的 metrics!

image-20230824212552469

明日待办

  1. 压测 jaeger
  2. 测试替换 jaeger 的数据库为 es
标签: go clickhouse 重构

本文转载自: https://blog.csdn.net/xzx18822942899/article/details/133259443
版权归原作者 奥库甘道夫 所有, 如有侵权,请联系我们删除。

“腾讯mini项目-【指标监控服务重构】2023-08-24”的评论:

还没有评论