文章目录
一、连接比赛节点
1.通过工具连接到ip主机
用户hadoop
密码qweQWE123!@#
使用root登录,或者切换到root
1.1固定IP
①查看网络配置
cd /etc/sysconfig/network-scripts/
②编辑
vi ifcfg-ens33
③修改及新增
BOOTPROTO=static
ONBOOT="yes"IPADDR="192.168.200.131"NETMASK="255.255.255.0"GATEWAY="192.168.200.2"DNS1="192.168.200.2"
④重启网卡
systemctl restart network
2.修改hosts
2.1查看本机名
cat /etc/hosts
2.2修改域名和IP地址的映射文件
vi /etc/hosts
2.3内容
2.4修改主机名
hostnamectl set-hostname Hadoop
3.重启
reboot
4.查看
5.ping比赛节点
ping hadoop
二、关闭防火墙
1.使用systemctl关闭(临时)
sudo systemctl stop filewalld
2.查看防火墙状态
sudo systemctl status filewalld
3.使用service关闭(永久)
sudoservice filewalld stop
三、配置时间同步
1.查看时间
date
2.安装ntpdate
sudo yum -y install ntpdate
3.更新yum源
3.1备份
sudomv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup 3.3.2
3.2下载
sudocurl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
3.3清空并生成缓存
sudo yum clean all
sudo yum makecache
4.安装utpdate
sudo yum -y install ntpdate
5.同步时间
sudo ntpdate -u pool.ntp.org
四、安装软件
1.切入/usr/local
#删除文件夹下文件sudorm -rf /usr/local
2.安装jdk
2.1卸载原有的环境
①查看系统自带的OpenJDK相关文件
rpm -qa |grep java
②删除文件
cd /usr/lib/jvm
rm -rf /usr/lib/jvm
③卸载
yum -y remove java-1.7.0-openjdk*
yum -y remove java-1.8.0-openjdk*
2.2安装jdk
①解压缩
tar -zxvf jdk-8u171-linux-x64.tar.gz
②配置环境变量
sudovi /etc/profile
③内容
#JAVA_HOMEexportJAVA_HOME=/usr/local/jdk1.8.0_171
exportPATH=$PATH:$JAVA_HOME/bin
④让修改后的文件生效
source /etc/profile
⑤查看版本
java -version
3.安装hadoop
3.1解压缩
tar -zxvf hadoop-2.9.2.tar.gz
3.2hadoop环境变量
sudovi /etc/profile
3.3内容
##HADOOP_HOMEexportHADOOP_HOME=/usr/local/hadoop-2.9.2
exportPATH=$PATH:$HADOOP_HOME/bin
exportPATH=$PATH:$HADOOP_HOME/sbin
3.4将修改后的文件生效
source /etc/profile
3.5查看版本
hadoop version
4.配置Hadoop集群
#切换到hadoop配置目录下的hadoopcd /usr/local/hadoop-2.9.2/etc/hadoop/
4.1编辑
hadoop-env.sh
vi hadoop-env.sh
修改JAVA_HOME 路径
export JAVA_HOME=/usr/local/jdk1.8.0_171
4.2配置
core-site.xml
vi core-site.xml
----在configuration里----
<property><name>fs.defaultFS</name><value>hdfs://hadoop:9000</value></property><property><name>hadoop.tmp.dir</name><value>/usr/local/hadoop-2.9.2/data/tmp</value></property><property><name>dfs.http.address</name><value>0.0.0.0:50070</value></property>
4.3配置
hdfs-site.xml
vi hdfs-site.xml
----在configuration里----
<property><name>dfs.replication</name><value>1</value></property>
4.4格式化NameNode
/usr/local/hadoop-2.9.2/bin/hdfs namenode -format
4.5配置
yarn-env.sh
vi yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_171
4.6配置
yarn-site.xml
vi yarn-site.xml
----在configuration里----
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.resourcemanager.hostname</name><value>hadoop</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>0.0.0.0:8088</value></property>
4.7配置
mapred-env.sh
vi mapred-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_171
4.8重命名
mv mapred-site.xml.template mapred-site.xml
①配置
mapred-site.xml
vi mapred-site.xml
<property><name>mapreduce.framework.name</name><value>yarn</value></property>
4.9启动
/usr/local/hadoop-2.9.2/sbin/start-all.sh
输入yes及密码
4.10查看是否运行
jps
访问 NameNode 的 Web 界面
访问 ResourceManager 的 Web 界面
五、ssh免密配置
1.生成密钥
回车四次,所有主机都要执行
ssh-keygen
2.将本机公钥文件复制到其它虚拟机上
接收方开机,所有主机都要执行
格式,分发时先输入yes后输对应主机的密码
ssh-copy-id 主机名称
3.查看是否成功免密登录
ssh 主机名称
4.启动Hadoop集群
/usr/local/hadoop-2.9.2/sbin/start-all.sh
六、代码
1.Windows环境变量
变量名
HADOOP_HOME
,值为安装目录
D:\JavaSoftware\hadoop-2.9.2
2.Path变量
%HADOOP_HOME%\bin
3.创建Maven项目
4.导⼊hadoop依赖
<dependencies><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-core</artifactId><version>2.8.2</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.9.2</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.9.2</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.9.2</version></dependency></dependencies><!--maven打包插件 --><build><plugins><plugin><artifactId>maven-compiler-plugin</artifactId><version>2.3.2</version><configuration><source>1.8</source><target>1.8</target></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>
5.添加log4j.properties
log4j.rootLogger=info, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
6.整体思路梳理
Map阶段:
- map()⽅法中把传⼊的数据转为String类型
- 根据空格切分出单词
- 输出<单词,1>
Reduce阶段:
- 汇总各个key(单词)的个数,遍历value数据进⾏累加
- 输出key的总数
Driver
- 获取配置⽂件对象,获取job对象实例
- 指定程序jar的本地路径
- 指定Mapper/Reducer类
- 指定Mapper输出的kv数据类型
- 指定最终输出的kv数据类型
- 指定job处理的原始数据路径
- 指定job输出结果路径
- 提交作业
- 编写Mapper类
7.示例一
7.1Map阶段
packagecn.bdqn;importorg.apache.hadoop.io.IntWritable;importorg.apache.hadoop.io.LongWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Mapper;importjava.io.IOException;publicclassWordcountMapperextendsMapper<LongWritable,Text,Text,IntWritable>{Text k =newText();IntWritable v =newIntWritable(1);@Overrideprotectedvoidmap(LongWritable key,Text value,Context context)throwsIOException,InterruptedException{// 1 获取一行String line = value.toString();// 2 切割String[] words = line.split(" ");// 3 输出for(String word : words){
k.set(word);
context.write(k, v);}}}
7.2Reducer阶段
packagecn.bdqn;importorg.apache.hadoop.io.IntWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Reducer;importjava.io.IOException;publicclassWordcountReducerextendsReducer<Text,IntWritable,Text,IntWritable>{int sum;IntWritable v =newIntWritable();@Overrideprotectedvoidreduce(Text key,Iterable<IntWritable> values,Context
context)throwsIOException,InterruptedException{// 1 累加求和
sum =0;for(IntWritable count : values){
sum += count.get();}System.out.println();// 2 输出
v.set(sum);
context.write(key, v);}}
7.3Driver阶段
packagecn.bdqn;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.IntWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importjava.io.IOException;publicclassWordcountDriver{publicstaticvoidmain(String[] args)throwsIOException,ClassNotFoundException,InterruptedException{// 1 获取配置信息以及封装任务Configuration configuration =newConfiguration();Job job =Job.getInstance(configuration);// 2 设置jar加载路径
job.setJarByClass(WordcountDriver.class);// 3 设置map和reduce类
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
job.setCombinerClass(WordcountReducer.class);// 4 设置map输出
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);// 5 设置最终输出kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(CombineTextInputFormat.class);// 6 设置输入和输出路径FileInputFormat.setInputPaths(job,newPath(args[0]));FileOutputFormat.setOutputPath(job,newPath(args[1]));// 7 提交boolean result = job.waitForCompletion(true);System.exit(result ?0:1);}}
8.示例二
8.1序列化
packagecn.bdqn.demo1;importorg.apache.hadoop.io.Writable;importjava.io.DataInput;importjava.io.DataOutput;importjava.io.IOException;// 1 实现writable接口publicclassSpeakBeanimplementsWritable{privatelong selfDuration;privatelong thirdPartDuration;privatelong sumDuration;//2 反序列化时,需要反射调用空参构造函数,所以必须有publicSpeakBean(){}publicSpeakBean(long selfDuration,long thirdPartDuration){this.selfDuration = selfDuration;this.thirdPartDuration = thirdPartDuration;this.sumDuration =this.selfDuration +this.thirdPartDuration;}//3 写序列化方法publicvoidwrite(DataOutput out)throwsIOException{
out.writeLong(selfDuration);
out.writeLong(thirdPartDuration);
out.writeLong(sumDuration);}//4 反序列化方法//5 反序列化方法读顺序必须和写序列化方法的写顺序必须一致publicvoidreadFields(DataInput in)throwsIOException{this.selfDuration = in.readLong();this.thirdPartDuration = in.readLong();this.sumDuration = in.readLong();}// 6 编写toString方法,方便后续打印到文本@OverridepublicStringtoString(){return selfDuration +"\t"+ thirdPartDuration +"\t"+ sumDuration;}publiclonggetSelfDuration(){return selfDuration;}publicvoidsetSelfDuration(long selfDuration){this.selfDuration = selfDuration;}publiclonggetThirdPartDuration(){return thirdPartDuration;}publicvoidsetThirdPartDuration(long thirdPartDuration){this.thirdPartDuration = thirdPartDuration;}publiclonggetSumDuration(){return sumDuration;}publicvoidsetSumDuration(long sumDuration){this.sumDuration = sumDuration;}publicvoidset(long selfDuration,long thirdPartDuration){this.selfDuration = selfDuration;this.thirdPartDuration = thirdPartDuration;this.sumDuration =this.selfDuration +this.thirdPartDuration;}}
8.1Map阶段
packagecn.bdqn.demo1;importorg.apache.hadoop.io.LongWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Mapper;importjava.io.IOException;publicclassSpeakDurationMapperextendsMapper<LongWritable,Text,Text,SpeakBean>{SpeakBean v =newSpeakBean();Text k =newText();@Overrideprotectedvoidmap(LongWritable key,Text value,Context context)throwsIOException,InterruptedException{// 1 获取一行String line = value.toString();// 2 切割字段String[] fields = line.split("\t");// 3 封装对象// 取出设备idString deviceId = fields[1];// 取出自有和第三方时长数据long selfDuration =Long.parseLong(fields[fields.length -3]);long thirdPartDuration =Long.parseLong(fields[fields.length -2]);
k.set(deviceId);
v.set(selfDuration, thirdPartDuration);// 4 写出
context.write(k, v);}}
8.2Reducer阶段
packagecn.bdqn.demo1;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Reducer;importjava.io.IOException;publicclassSpeakDurationReducerextendsReducer<Text,SpeakBean,Text,SpeakBean>{@Overrideprotectedvoidreduce(Text key,Iterable<SpeakBean> values,Context
context)throwsIOException,InterruptedException{long self_Duration =0;long thirdPart_Duration =0;// 1 遍历所用bean,将其中的自有,第三方时长分别累加for(SpeakBean sb : values){
self_Duration += sb.getSelfDuration();
thirdPart_Duration += sb.getThirdPartDuration();}// 2 封装对象SpeakBean resultBean =newSpeakBean(self_Duration,
thirdPart_Duration);// 3 写出
context.write(key, resultBean);}}
8.3Driver阶段
packagecn.bdqn.demo1;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importjava.io.IOException;publicclassSpeakerDriver{publicstaticvoidmain(String[] args)throwsIllegalArgumentException,IOException,ClassNotFoundException,InterruptedException{// 输入输出路径需要根据自己电脑上实际的输入输出路径设置//args = new String[]{"d:/input/input/speak.data", "d:/output222"};// 1 获取配置信息,或者job对象实例Configuration configuration =newConfiguration();Job job =Job.getInstance(configuration);// 6 指定本程序的jar包所在的本地路径
job.setJarByClass(SpeakerDriver.class);// 2 指定本业务job要使用的mapper/Reducer业务类
job.setMapperClass(SpeakDurationMapper.class);
job.setReducerClass(SpeakDurationReducer.class);// 3 指定mapper输出数据的kv类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(SpeakBean.class);// 4 指定最终输出的数据的kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(SpeakBean.class);// 5 指定job的输入原始文件所在目录FileInputFormat.setInputPaths(job,newPath(args[0]));FileOutputFormat.setOutputPath(job,newPath(args[1]));// 7 将job中配置的相关参数,以及job所用的java类所在的jar包, 提交给yarn去运行boolean result = job.waitForCompletion(true);System.exit(result ?0:1);}}
七、打包上传
1.打包
2.结果
3.在Linux创建文件
4.hdfs命令
4.1帮助
hadoop fs -help
4.2创建目标目录
hadoop fs -mkdir -p /user/root
4.3上传
hadoop fs -put a.txt /
4.4删除空目录
hadoop fs -rmdir /user/root
4.5删除文件
hadoop fs -rm -f /user/root/a.txt
4.6执行代码
hadoop jar jar文件 启动类 读取文件位置 输出文件位置
示例
hadoop jar Hadoop_Demo-1.0-SNAPSHOT.jar cn.bdqn.demo.WordcountDriver /b.txt /out1
hadoop jar Hadoop_Demo-1.0-SNAPSHOT.jar cn.bdqn.demo1.SpeakerDriver /a.txt /out
版权归原作者 奋怒的少年猪猪 所有, 如有侵权,请联系我们删除。