hadoop依赖zookeeper,需提前安装好
我的zookeeper安装在datanode1,datanode2,datanode3
1. 创建用户和组
groupadd hadoop
useradd hadoop -g hadoop
passwd hadoop
2.配置hosts
vim /etc/hosts
3.配置ssh免密登录
在namenode节点上 切换hadoop用户
执行 ssh-keygen -t rsa
然后连续敲击三次enter 结果如下图
copy其他机器 ssh-copy-id 用户名@目标服务器的IP
ssh-copy-id hadoop@namenode2
输入yes 然后输入刚才设置的hadoop用户密码
然后依次copy到其他节点(包括自己)
ssh-copy-id hadoop@datanode1
ssh-copy-id hadoop@datanode2
ssh-copy-id hadoop@datanode3
如下图在namenode节点上执行就可以了(最后一个是验证是否成功)
4.上传hadoop包解压
安装包下载地址
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
上传完成后进行解压
tar -zxvf hadoop-2.10.0.tar.gz
如果解压后目录所属用户不是hadoop 需要授权
chown -R hadoop:hadoop hadoop-2.10.0
切换root用户,配置hadoop环境变量vim /etc/profile
如果使用hadoop用户,则在用户根目录编辑隐藏文件 vim .bash_profile
直接翻页到最后
5.修改文件
修改hdfs-site.xml,core-site.xml,yarn-site.xml,mapred-site.xml四个文件
进入目录 cd /opt/hadoop-2.10.0/etc/hadoop
5.1 mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value><description>框架MR使用YARN</description></property><property><name>mapred.compress.map.output</name><value>true</value><description>开启压缩</description></property><property><name>mapred.map.output.compression.codec</name><value>com.hadoop.compression.lzo.LzoCodec</value><description>设定LZO压缩方式</description></property><property><name>mapred.child.env</name><value>LD_LIBRARY_PATH=/usr/local/lib</value><description>LZO本地库目录</description></property><property><name>mapreduce.jobtracker.hosts.exclude.filename</name><value>/home/hadoop/optional/exclude_node</value><description>需要忽略的DataNode列表</description></property><property><name>mapred.tasktracker.map.tasks.maximum</name><value>8</value><description>每一个TaskTracker可同时运行Map任务数</description></property><property><name>mapred.tasktracker.reduce.tasks.maximum</name><value>5</value><description>每一个TaskTracker可同时运行Reduce任务数</description></property></configuration>
5.2 core-site.xml
<configuration><property><name>fs.defaultFS</name><value>hdfs://ns1</value><description>指定namenode集群(nameservice)别名</description></property><property><name>hadoop.tmp.dir</name><value>/home/hadoop/datas/hdfs</value><description>指定hadoop运行时产生文件的默认存储路径</description></property><property><name>io.file.buffer.size</name><value>131072</value></property><property><name>ha.zookeeper.quorum</name><value>datanode1:2181,datanode2:2181,datanode3:2181</value><description>指定zk集群地址</description></property><property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value><description>开启各种压缩方式</description></property><property><name>io.compression.codec.lzo.class</name><value>com.hadoop.compression.lzo.LzoCodec</value><description>设定LZO压缩方式具体实现</description></property><property><name>hadoop.proxyuser.hadoop.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.hadoop.groups</name><value>*</value></property></configuration>
5.3 yarn-site.xml
<configuration><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value><description>开启RM高可靠</description></property><property><name>yarn.resourcemanager.cluster-id</name><value>yrc</value><description>指定resourcemanager集群别名</description></property><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value><description>指定各个resourcemanager别名</description></property><property><name>yarn.resourcemanager.hostname.rm1</name><value>namenode1</value><description>rm1具体host</description></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>namenode2</value><description>rm2具体host</description></property><property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value><description>RM状态信息存储方式,基于ZK(ZKStore),另一种基于内存(MemStore)</description></property><property><name>yarn.resourcemanager.zk-address</name><value>datanode1:2181,datanode2:2181,datanode3:2181/hadoop</value><description>指定zk集群地址</description></property><property><name>yarn.resourcemanager.connect.retry-interval.ms</name><value>10000</value><description>rm失联后重新链接的时间</description></property><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value><description>开启自动恢复功能</description></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value><description>是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true</description></property><property><name>yarn.nodemanager.vmem-pmem-ratio</name><value>4</value><description>任务每使用1MB物理内存,最多可使用虚拟内存量,默认是2.1</description></property><property><name>yarn.scheduler.minimum-allocation-mb</name><value>11264</value></property><property><name>yarn.scheduler.maximum-allocation-mb</name><value>33792</value></property><property><name>yarn.nodemanager.resource.memory-mb</name><value>33792</value></property><property><name>yarn.resourcemanager.max-completed-applications</name><value>300</value></property><property><name>yarn.app.mapreduce.am.resource.mb</name><value>11264</value></property><property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx13107m</value></property><property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value></property><property><name>yarn.log-aggregation-enable</name><value>true</value></property><property><name>yarn.log-aggregation.retain-seconds</name><value>2592000</value><description>retain 30 days</description></property><property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>86400</value></property></configuration>
5.4 hdfs-site.xml
<configuration><property><name>dfs.replication</name><value>3</value><description>数据副本数量,非datanode结点数量,默认3份</description></property><property><name>dfs.support.append</name><value>true</value></property><property><name>dfs.client.block.write.replace-datanode-on-failure.policy</name><value>NEVER</value></property><property><name>dfs.client.block.write.replace-datanode-on-failure.enable</name><value>true</value></property><property><name>dfs.permissions</name><value>false</value><description>关闭权限限制</description></property><property><name>dfs.nameservices</name><value>ns1</value><description>指定namenode集群(nameservice)别名,需要和core-site.xml中的保持一致</description></property><property><name>dfs.ha.namenodes.ns1</name><value>nn1,nn2</value><description>指定各个namenode别名</description></property><property><name>dfs.namenode.rpc-address.ns1.nn1</name><value>namenode1:9000</value><description>nn1的RPC通信地址</description></property><property><name>dfs.namenode.http-address.ns1.nn1</name><value>namenode1:50070</value><description>nn1的http通信地址</description></property><property><name>dfs.namenode.rpc-address.ns1.nn2</name><value>namenode2:9000</value><description>nn2的RPC通信地址</description></property><property><name>dfs.namenode.http-address.ns1.nn2</name><value>namenode2:50070</value><description>nn2的http通信地址</description></property><property><name>dfs.namenode.name.dir</name><value>file:/home/hadoop/datas/hdfs/name</value><description>指定namenode元数据存储地址</description></property><property><name>dfs.datanode.data.dir</name><value>file:/home/hadoop/datas/hdfs/data</value><description>指定datanode数据存储地址</description></property><property><name>dfs.journalnode.edits.dir</name><value>/home/hadoop/datas/hdfs/journal</value><description>指定JournalNode在本地磁盘存放数据的位置</description></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value><description>开启NameNode失败自动切换</description></property><property><name>dfs.client.failover.proxy.provider.ns1</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value><description>配置失败自动切换实现方式</description></property><property><name>dfs.ha.fencing.methods</name><value>
sshfence
shell(/bin/true)</value><description>配置隔离机制方法,多个机制用换行分割,即每个机制占用一行</description></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hadoop/.ssh/id_rsa</value><description>使用sshfence隔离机制时需要ssh免登陆</description></property><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><description>配置sshfence隔离机制超时时间</description></property><property><name>dfs.datanode.max.transfer.threads</name><value>20960</value></property><property><name>dfs.datanode.max.xcievers</name><value>20960</value><description>同时处理文件的上限</description></property><property><name>dfs.hosts.exclude</name><value>/home/hadoop/optional/exclude_node</value><description>需要忽略的DataNode列表</description></property><property><name>dfs.datanode.socket.write.timeout</name><value>600000</value></property><property><name>dfs.client.socket-timeout</name><value>600000</value></property></configuration>
5.5 配置datanode
##目录在hadoop-2.10.0/etc/hadoop下
vim slaves
datanode1
datanode2
datanode3
6.复制分发授权
将配置好的hadoop发送到其他几台机器上
scp -r ./hadoop-2.10.0 namenode2:
P
W
D
s
c
p
−
r
.
/
h
a
d
o
o
p
−
2.10.0
d
a
t
a
n
o
d
e
1
:
PWD scp -r ./hadoop-2.10.0 datanode1:
PWDscp−r./hadoop−2.10.0datanode1:PWD
scp -r ./hadoop-2.10.0 datanode2:
P
W
D
s
c
p
−
r
.
/
h
a
d
o
o
p
−
2.10.0
d
a
t
a
n
o
d
e
3
:
PWD scp -r ./hadoop-2.10.0 datanode3:
PWDscp−r./hadoop−2.10.0datanode3:PWD
检查其他服务器hadoop目录权限,如果是root则进行授权给hadoop用户
chown -R hadoop:hadoop ./hadoop-2.10.0/
7.启动
所有节点机器都切换到hadoop用户 su - hadoop
##分别在datanode的三台机器上启动journalnode
./sbin/hadoop-daemon.sh start journalnode
##下面命令直接在namenode上执行即可
#初始化hdfs
hdfs namenode -format
#启动hadoop
./sbin/start-all.sh
页面访问 hdfs ip换成自己name的ip即可
http://192.168.1.100:50070/
版权归原作者 weixin_43877505 所有, 如有侵权,请联系我们删除。