0


[Flink] Flink On Yarn(yarn-session.sh)启动错误

在Flink上启动 yarn-session.sh时出现 The number of requested virtual cores for application master 1 exceeds the maximum number of virtual cores 0 available in the Yarn Cluster.错误。

版本说明:

Hadoop: 3.3.4

Flink:1.17.1

问题

在Flink On Yarn上启动

yarn-session.sh

时出现如下错误:

ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli[]-Errorwhile running the Flinksession.

org.apache.flink.client.deployment.ClusterDeploymentException:Couldn't deploy Yarn session cluster
​    at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:437)~[flink-dist-1.17.1.jar:1.17.1]
​    at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:608)~[flink-dist-1.17.1.jar:1.17.1]
​    at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:869)~[flink-dist-1.17.1.jar:1.17.1]
​    at java.security.AccessController.doPrivileged(NativeMethod)~[?:1.8.0_231]
​    at javax.security.auth.Subject.doAs(Subject.java:422)~[?:1.8.0_231]
​    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)~[hadoop-common-3.3.4.jar:?]
​    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)~[flink-dist-1.17.1.jar:1.17.1]
​    at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:869)[flink-dist-1.17.1.jar:1.17.1]Caused by:org.apache.flink.configuration.IllegalConfigurationException:The number of requested virtual cores for application master 1 exceeds the maximum number of virtual cores 0 available in the YarnCluster.
​    at org.apache.flink.yarn.YarnClusterDescriptor.isReadyForDeployment(YarnClusterDescriptor.java:338)~[flink-dist-1.17.1.jar:1.17.1]
​    at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:567)~[flink-dist-1.17.1.jar:1.17.1]
​    at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:430)~[flink-dist-1.17.1.jar:1.17.1]
​    ...7 more
------------------------------------------------------------The program finished withthe following exception:org.apache.flink.client.deployment.ClusterDeploymentException:Couldn't deploy Yarn session cluster
​    at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:437)
​    at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:608)
​    at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:869)
​    at java.security.AccessController.doPrivileged(NativeMethod)
​    at javax.security.auth.Subject.doAs(Subject.java:422)
​    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
​    at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
​    at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:869)Caused by:org.apache.flink.configuration.IllegalConfigurationException:The number of requested virtual cores for application master 1 exceeds the maximum number of virtual cores 0 available in the YarnCluster.
​    at org.apache.flink.yarn.YarnClusterDescriptor.isReadyForDeployment(YarnClusterDescriptor.java:338)
​    at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:567)
​    at org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:430)
​    ...7 more

原因

在yarn-site.xml文件中配置了所有可能相关的参数,重启yarn服务,执行

yarn-session.sh

错误依旧:

<property><name>yarn.containers.vcores</name><value>8</value></property><property><name>yarn.nodemanager.resource.cpu-vcores</name><value>4</value></property><property><name>yarn.scheduler.maximum-allocation-vcores</name><value>2</value></property>

在看yarn cluster上的信息时突然发现

Unhealth Nodes

,然后查看了具体信息:
Unhealth-report
具体原因就是磁盘使用空间占比超过了90了(yarn默认为90),则认为不健康,不健康相当于这个节点不可用,由于本地只有一个节点,所以相当于整个集群不可用,于是就出现了开头的错误信息。
Unhealth-report的具体信息

解决

根据

Health-report

的提示,在

yarn-site.xml

中添加了如下参数:

<property><name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name><value>99</value></property>

重启yarn,再查看节点状态为正常了,再执行flink的

yarn-session.sh

就可以正常启动了。
问题修复后的Yarn Cluster Node状态
Flink yarn-session.sh启动成功

总结

在Flink中使用yarn-session时,如果出现yarn相关的错误,可以到Yarn的WebUI上查看可能的

Unhealth-report

和具体的错误信息,再根据具体信息调整配置后不断调试,直到解决问题。

标签: flink java hadoop

本文转载自: https://blog.csdn.net/yuxiao97/article/details/131051363
版权归原作者 雨潇先生 所有, 如有侵权,请联系我们删除。

“[Flink] Flink On Yarn(yarn-session.sh)启动错误”的评论:

还没有评论