0


Hadoop 完全分布式部署( 3.3.6 )

hadoop 完全分布式 hadoop-3.3.6

前置条件:

  • hadoop-3.3.6版本及更高版本(运行时)支持jdk-8和jdk-11,建议使用jdk-8

配置环境变量 /etc/profile

exportHADOOP_HOME=/opt/module/hadoop
exportHADOOP_COMMON_HOME=$HADOOP_HOMEexportHADOOP_HDFS_HOME=$HADOOP_HOMEexportHADOOP_YARN_HOME=$HADOOP_HOMEexportHADOOP_MAPRED_HOME=$HADOOP_HOMEexportPATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

hadoop-env.sh 需要配置jdk环境变量和用户权限

exportJAVA_HOME=/opt/module/jdk
exportHDFS_NAMENODE_USER=root
exportHDFS_DATANODE_USER=root
exportHDFS_JOURNALNODE_USER=root
exportHDFS_SECONDARYNAMENODE_USER=root

yarn-env.sh 需要配置jdk的环境变量和用户权限

exportJAVA_HOME=/opt/module/jdk
exportYARN_NODEMANAGER_USER=root
exportYARN_RESOURCEMANAGER_USER=root

以下为核心配置文件

  • core-site.xml
<property><name>fs.defaultFS</name><value>hdfs://master:9000</value></property><property><name>hadoop.tmp.dir</name><value>/opt/module/hadoop/datas/tmpdir</value></property>
  • hdfs-site.xml
<property><name>dfs.replication</name><value>2</value></property><property><name>dfs.namenode.name.dir</name><value>/opt/module/hadoop/datas/namedir</value></property><property><name>dfs.datanode.data.dir</name><value>/opt/module/hadoop/datas/datadir</value></property><property><name>dfs.namenode.http-address</name><value>master:50070</value></property><property><name>dfs.namenode.secondary.http-address</name><value>slave1:50090</value></property><!--hive.hosts 允许 root 代理用户访问 Hadoop 文件系统设置 --><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><!--hive.groups 允许 Hive 代理用户访问 Hadoop 文件系统设置 --><property><name>hadoop.proxyuser.root.groups</name><value>*</value></property>
  • yarn-site.xml
<!--指定resourcemanager运行在哪个节点--><property><name>yarn.resourcemanager.hostname</name><value>master</value></property><!--指定resourcemanager的web地址和端口--><property><name>yarn.resourcemanager.webapp.address</name><value>master:8088</value></property><!--指定nodemanager启动时加载server的方式--><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!--日志聚集功能--><property><name>yarn.log-aggregation-enable</name><value>true</value></property><!--删除聚合日志之前保留聚合日志的时间--><property><name>yarn.log-aggregation.retain-seconds</name><value>86400</value></property><!--检查并删除过期日志文件的间隔时间--><property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3600</value></property><!--远程日志聚合目录的根目录(hdfs)--><property><name>yarn.nodemanager.remote-app-log-dir</name><value>/yarn-logs</value></property><!--远程日志目录后缀--><property><name>yarn.nodemanager.remote-app-log-dir-suffix</name><value>logs</value></property><!--yarn日志服务器的url--><property><name>yarn.log.server.url</name><value>http://master:19888/jobhistory/logs</value></property>
  • mapred-site.xml
<property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>master:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>master:19888</value></property><property><name>mapreduce.jobhistory.webapp.http.address</name><value>master:19890</value></property><property><name>mapreduce.jobhistory.intermediate-done-dir</name><value>mapred-history/tmpdir</value></property><property><name>mapreduce.jobhistory.done-dir</name><value>mapred-history/donedir</value></property><property><name>yarn.app.mapreduce.am.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value></property><property><name>mapreduce.map.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value></property><property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value></property><!--设置yarn的类路径到mapreduce-site文件里面,可以避免出现mapreduce运行时出现找不到主类的错误,类路径只能写完整路径--><property><name>yarn.application.classpath</name><value>
        /opt/module/hadoop/etc/hadoop,
        /opt/module/hadoop/share/hadoop/common/lib/*,
        /opt/module/hadoop/share/hadoop/common/*,
        /opt/module/hadoop/share/hadoop/hdfs,
        /opt/module/hadoop/share/hadoop/hdfs/lib/*,
        /opt/module/hadoop/share/hadoop/hdfs/*,
        /opt/module/hadoop/share/hadoop/mapreduce/lib/*,
        /opt/module/hadoop/share/hadoop/mapreduce/*,
        /opt/module/hadoop/share/hadoop/yarn,
        /opt/module/hadoop/share/hadoop/yarn/lib/*,
        /opt/module/hadoop/share/hadoop/yarn/*
        </value></property>
  • workers
slave1
slave2

执行以下命令步骤启动hadoop

  1. 格式化namenode:hdfs namenode -format
  2. 启动集群节点hdfs、yarn一起启动:start-all.sh
  3. 启动历史(日志)服务器:mapred --daemon start historyserver
  4. 使用jps验证是否启动成功

master:

  • NameNode
  • ResourceManager
  • JobHistoryServer

slave1:

  • DataNode
  • NodeManager
  • SecondaryNameNode

slave2:

  • DataNode
  • NodeManager

hdfs Web地址:http://master:50070

secondary web地址:http://slave1:50090

resourcemanager web地址:http://master:8088

jobhistory web地址:http://master:19888


本文转载自: https://blog.csdn.net/2203_75584759/article/details/140277809
版权归原作者 操练小白 所有, 如有侵权,请联系我们删除。

“Hadoop 完全分布式部署( 3.3.6 )”的评论:

还没有评论