0


如何查看yarn container日志 查看yarn任务所有container信息 yarn gc日志的查看和简单分析

查看yarn任务日志 任务运行中和结束时查看container的日志 GC日志的简单分析

文章目录


一、shell和yarn基础命令准备

  1. 1. 别名alias和错误输出重定向
  1. $ aliaslg='yarn logs -applicationId application_1652362266025_4019 $@ 2>/dev/null'# 这样后面测试的时候就不用每次书写前年那固定的一长串了 -log_files <args> 等参数将填充至"$@"位置# 2>/dev/null 错误日志重定向至黑洞,更方便查看结果
  2. $ lg -log_files stdout |head
  3. Container: container_e105_1652362266025_4019_01_000022 on ****-bg-w03_45454_1652490636096
  4. LogAggregationType: AGGREGATED
  5. =========================================================================================
  6. LogType:stdout
  7. LogLastModifiedTime:Sat May 14 09:10:36 +0800 2022
  8. LogLength:6368
  9. LogContents:
  10. 2022-05-14 08:11:37 Starting to run new task attempt: attempt_1652362266025_4019_1_10_000002_0
  11. 1.783: [GC pause (Metadata GC Threshold)(young)(initial-mark), 0.0371225 secs][Parallel Time: 13.8 ms, GC Workers: 23]
  1. 2. yarn的几个基础命令
  1. app|application prints application(s) report/kill application/manage long running application
  2. applicationattempt prints applicationattempt(s) report
  3. container prints container(s) report
  4. logs dump container logs
  1. 3. find的-type参数 grep的-o -e参数
  1. find[options]
  2. -type c
  3. d 文件夹:directory
  4. p named pipe (FIFO)
  5. f 常规文件:regular file
  6. s 套接字:socket
  7. grep[options]
  8. -e, --regexp=PATTERN 正则:use PATTERN for matching
  9. -i, --ignore-case 忽略大小写:ignore case distinctions
  10. -w, --word-regexp 匹配完整单词:force PATTERN to match only whole words
  11. -v, --invert-match 反向选择匹配内容:select non-matching lines
  12. -n, --line-number 打印所在行号:print line number with output lines
  13. -h, --no-filename 结果首列不输出文件名:suppress the file name prefix on output
  14. -o, --only-matching 只输出匹配到的部分:show only the part of a line matching PATTERN
  15. -r, --recursive 文件夹递归:like --directories=recurse
  16. [注]:
  17. egrep 等同 grep -e

二、app日志查看

1. 直接使用yarn logs,可选择指定日志类型,也可"手动"重定向至本地文件

代码如下(示例):

  1. $ yarn logs -applicationId application_1652362266025_4019 -log_files stderr 2>/dev/null |head
  2. Container: container_e105_1652362266025_4019_01_000022 on ***-bg-w03_45454_1652490636096
  3. LogAggregationType: AGGREGATED
  4. =========================================================================================
  5. LogType:stderr
  6. LogLastModifiedTime:Sat May 14 09:10:36 +0800 2022
  7. LogLength:188
  8. LogContents:
  9. 2022-05-14 08:11:37 Starting to run new task attempt: attempt_1652362266025_4019_1_10_000002_0
  10. 2022-05-14 08:11:41 Completed running task attempt: attempt_1652362266025_4019_1_10_000002_0
  11. ...

2. 使用yarn logs查看,并使用-out参数"自动"落盘本地文件

  1. # 没权限安装tree 此处用的另一集群# 将日志落盘到当前文件夹logs中
  2. $ yarn logs -applicationId application_1650527019982_0697 -out ./logs
  3. # 查看日志目录结构 子目录为节点名称 内部的文件为节点中运行此app的container的日志
  4. $ tree logs/
  5. logs/
  6. ├── node-group-***1.mrs-lcnd.com_8041
  7. ├── container_e03_1650527019982_0697_01_000001
  8. ├── container_e03_1650527019982_0697_01_000003
  9. ├── container_e03_1650527019982_0697_01_000012
  10. ...
  11. ├── node-group-***2.mrs-lcnd.com_8041
  12. ├── container_e03_1650527019982_0697_01_000002
  13. ├── container_e03_1650527019982_0697_01_000004
  14. ├── container_e03_1650527019982_0697_01_000006
  15. ...
  16. └── container_e03_1650527019982_0697_01_000044
  17. └── node-group-***3.mrs-lcnd.com_8041
  18. ├── container_e03_1650527019982_0697_01_000005
  19. ├── container_e03_1650527019982_0697_01_000009
  20. ├── container_e03_1650527019982_0697_01_000010
  21. ...
  22. # 因为没有使用-log_files进行过滤,所有每个文件都包含所属container的所有日志类型,可使用以下命令验证:
  23. $ cat logs/node-group-***1.mrs-lcnd.com_8041/container_e03_1650527019982_0697_01_000001 |egrep -i '^LogType'
  24. LogType:container-localizer-syslog
  25. LogType:directory.info
  26. LogType:launch_container.sh
  27. LogType:prelaunch.out
  28. LogType:stderr
  29. LogType:stdout
  30. LogType:syslog

3. 查看running状态app的containerId及其日志

  1. # 1. 获取application的attemptId
  2. $ yarn applicationattempt -list application_1652362266025_4832 2>/dev/null
  3. Total number of application attempts :1
  4. ApplicationAttempt-Id State AM-Container-Id Tracking-URL
  5. appattempt_1652362266025_4832_000001 RUNNING container_e105_1652362266025_4832_01_000001 http://****-bg-w01:8088/proxy/application_1652362266025_4832/
  6. # 2. 获取此attempt对应的container
  7. $ yarn container -list appattempt_1652362266025_4832_000001 2>/dev/null
  8. # 输出格式同上 为方便查看 此处转换一下格式
  9. Total number of containers :1
  10. # 表头 # 内容
  11. Container-Id container_e105_1652362266025_4832_01_000001
  12. Start Time Sat May 1416:19:38 +0800 2022
  13. Finish Time N/A
  14. State RUNNING
  15. Host ****-bg-w18:45454
  16. Node Http Address http://****-bg-w18:8042
  17. LOG-URL http://****-bg-w01:8188/applicationhistory/logs/****-bg-w18:45454/container_e105_1652362266025_4832_01_000001/container_e105_1652362266025_4832_01_000001/dmp_operator1
  18. # 此为一个hive应用,当前没有查询任务,列表中只有1个running状态的container,对比发现,此container就是am所在的container,当有查询任务运行时,此处会出现多个container# 3. 查看container的日志
  19. $ yarn logs -containerId container_e105_1652362266025_4835_01_000001 -log_files stderr 2>/dev/null |head
  20. Container: container_e105_1652362266025_4835_01_000001 on ****-bg-w19:45454
  21. LogAggregationType: LOCAL
  22. ===========================================================================
  23. LogType:stderr
  24. LogLastModifiedTime:Sat May 1416:32:26 +0800 2022
  25. LogLength:472
  26. LogContents:
  27. 2022-05-14 16:23:24 Running Dag: dag_1652362266025_4835_1
  28. 2022-05-14 16:23:59 Completed Dag: dag_1652362266025_4835_1
  29. 2022-05-14 16:25:45 Running Dag: dag_1652362266025_4835_2
  30. ...

4. 如何查询已结束运行的container信息

  1. 通过3发现,通过attemptId只能查询处于运行状态的container信息,如果想要查看历史的信息,可以使用yarn logs
  1. $ lg -show_application_log_info
  2. Application State: Completed.
  3. Container: container_e105_1652362266025_4019_01_000022 on ****-bg-w03_45454_1652490636096
  4. Container: container_e105_1652362266025_4019_01_000009 on ****-bg-w04_45454_1652490635977
  5. Container: container_e105_1652362266025_4019_01_000014 on ****-bg-w04_45454_1652490635977
  6. Container: container_e105_1652362266025_4019_01_000020 on ****-bg-w06_45454_1652490636083
  7. Container: container_e105_1652362266025_4019_01_000012 on ****-bg-w07_45454_1652490636059
  8. ...
  9. # 使用egrep -o 从中切分出container的id
  10. $ lg -show_application_log_info |egrep -o 'container_e[_0-9]*'
  11. container_e105_1652362266025_4019_01_000022
  12. container_e105_1652362266025_4019_01_000009
  13. container_e105_1652362266025_4019_01_000014
  14. container_e105_1652362266025_4019_01_000020
  15. container_e105_1652362266025_4019_01_000012
  16. ...
  17. # 同理也可切分出node节点,不过node节点也可以使用命令查看
  18. $ lg -list_nodes
  19. ****-bg-w03_45454_1652490636096
  20. ****-bg-w04_45454_1652490635977
  21. ****-bg-w06_45454_1652490636083
  22. ****-bg-w07_45454_1652490636059
  23. ****-bg-w10_45454_1652490636542
  24. ...

5. 查看gc日志

万字长文教你看懂java G1垃圾回收日志

  1. gc日志在yarn中的类型为:LogType:stdout
  1. 通过下图,我们主要从日志提取 时间offset gc实际用时real 两行,对gc情况有个大致的了解

gc日志查看

  1. 1. 使用命令查看
  2. # app级别: yarn logs -applicationId <application ID># container级别: yarn logs -applicationId <application ID> -containerId <Container ID># 实操中可能后者更有意义,因为gc一般出现在个别container中
  3. $ lg -containerId container_e105_1652362266025_4019_01_000014 -log_files stdout |egrep'\[GC pause|real='1.529: [GC pause (Metadata GC Threshold)(young)(initial-mark), 0.0286059 secs][Times: user=0.24sys=0.02, real=0.03 secs][Times: user=0.09sys=0.00, real=0.01 secs][Times: user=0.03sys=0.01, real=0.01 secs]...
  4. 273.086: [GC pause (G1 Evacuation Pause)(young), 0.0094619 secs][Times: user=0.07sys=0.01, real=0.01 secs]400.869: [GC pause (G1 Humongous Allocation)(young)(initial-mark), 0.0097788 secs][Times: user=0.07sys=0.02, real=0.01 secs][Times: user=0.15sys=0.00, real=0.01 secs][Times: user=0.02sys=0.01, real=0.01 secs]# 输出格式化
  5. $ lg -containerId container_e105_1652362266025_4019_01_000014 -log_files stdout |egrep'\[GC pause|real='|column -s, -t
  6. 1.529: [GC pause (Metadata GC Threshold)(young)(initial-mark)0.0286059 secs][Times: user=0.24sys=0.02real=0.03 secs][Times: user=0.09sys=0.00real=0.01 secs][Times: user=0.03sys=0.01real=0.01 secs]2.617: [GC pause (Metadata GC Threshold)(young)(initial-mark)0.0365129 secs][Times: user=0.37sys=0.04real=0.04 secs][Times: user=0.15sys=0.00real=0.01 secs][Times: user=0.03sys=0.00real=0.01 secs]...
  7. 2. 使用-out落盘到本地,使用shell命令查看
  8. # 使用命令找出绝对路径
  9. $ find logs/ -type f
  10. logs/****-bg-w03_45454_1652490636096/container_e105_1652362266025_4019_01_000022
  11. logs/****-bg-w17_45454_1652490635505/container_e105_1652362266025_4019_01_000005
  12. logs/****-bg-w21_45454_1652490636022/container_e105_1652362266025_4019_01_000011
  13. logs/****-bg-w10_45454_1652490636542/container_e105_1652362266025_4019_01_000023
  14. ...
  15. # 同上,cat后使用egrep进行查看# 这里演示一下gc次数统计
  16. $ find logs/ -type f |whileread line
  17. >do>echo$line>egrep'\[GC pause'$line|wc -l
  18. >done
  19. logs/****-bg-w03_45454_1652490636096/container_e105_1652362266025_4019_01_000022
  20. 3
  21. logs/****-bg-w17_45454_1652490635505/container_e105_1652362266025_4019_01_000005
  22. 5
  23. logs/****-bg-w21_45454_1652490636022/container_e105_1652362266025_4019_01_000011
  24. 2
  25. logs/****-bg-w10_45454_1652490636542/container_e105_1652362266025_4019_01_000023
  26. 1304
  27. logs/****-bg-w06_45454_1652490636083/container_e105_1652362266025_4019_01_000020
  28. 5...
  29. # 很显然,其中一个container的gc次数过于频繁

查看GC日志


总结

其实yarn开启日志聚合功能以后,很多都可以网页端查看,但是使用命令其实可以实现更加强大和精细化的功能,可以通过官方的–help文档多多学习和测试。
上述命令也可以写一个简单的脚本进行封装,以方便出现异常时进行快速定位。

  1. 如有错误或者更好的方法还请大家不吝指正,大家一起变得更强!
标签: bash yarn 大数据

本文转载自: https://blog.csdn.net/sinat_25207217/article/details/124769090
版权归原作者 xiaofeng_37 所有, 如有侵权,请联系我们删除。

“如何查看yarn container日志 查看yarn任务所有container信息 yarn gc日志的查看和简单分析”的评论:

还没有评论