educoder--MapReduce基础实战各关卡通关答案

第1关：成绩统计:

任务描述
相关知识
什么是MapReduce
如何使用MapReduce进行运算
代码解释
编程要求
测试说明
任务描述
本关任务：使用Map/Reduce计算班级中年龄最大的学生。

相关知识
为了完成本关任务，你需要掌握：1.什么是MapReduce，2.如何使用MapReduce进行运算。

什么是MapReduce
MapReduce是一种可用于数据处理的编程模型，我们现在设想一个场景，你接到一个任务，任务是：挖掘分析我国气象中心近年来的数据日志，该数据日志大小有3T,让你分析计算出每一年的最高气温，如果你现在只有一台计算机，如何处理呢？我想你应该会读取这些数据，并且将读取到的数据与目前的最大气温值进行比较。比较完所有的数据之后就可以得出最高气温了。不过以我们的经验都知道要处理这么多数据肯定是非常耗时的。

如果我现在给你三台机器，你会如何处理呢？看到下图你应该想到了：最好的处理方式是将这些数据切分成三块，然后分别计算处理这些数据（Map），处理完毕之后发送到一台机器上进行合并（merge），再计算合并之后的数据，归纳（reduce）并输出。

这就是一个比较完整的MapReduce的过程了。

开始你的任务吧，祝你成功！

答案代码--------------------------------------

importjava.io.IOException;importjava.util.StringTokenizer;importjava.io.IOException;importjava.util.StringTokenizer;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.*;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importorg.apache.hadoop.util.GenericOptionsParser;publicclassWordCount{/********** Begin **********///Mapper函数publicstaticclassTokenizerMapperextendsMapper<LongWritable,Text,Text,IntWritable>{privatefinalstaticIntWritable one =newIntWritable(1);privateText word =newText();privateint maxValue =0;publicvoidmap(LongWritable key,Text value,Context context)throwsIOException,InterruptedException{StringTokenizer itr =newStringTokenizer(value.toString(),"\n");while(itr.hasMoreTokens()){String[] str = itr.nextToken().split(" ");String name = str[0];
                one.set(Integer.parseInt(str[1]));
                word.set(name);
                context.write(word,one);}//context.write(word,one);}}publicstaticclassIntSumReducerextendsReducer<Text,IntWritable,Text,IntWritable>{privateIntWritable result =newIntWritable();publicvoidreduce(Text key,Iterable<IntWritable> values,Context context)throwsIOException,InterruptedException{int maxAge =0;int age =0;for(IntWritable intWritable : values){
                maxAge =Math.max(maxAge, intWritable.get());}
            result.set(maxAge);
            context.write(key, result);}}publicstaticvoidmain(String[] args)throwsException{Configuration conf =newConfiguration();Job job =newJob(conf,"word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);String inputfile ="/user/test/input";String outputFile ="/user/test/output/";FileInputFormat.addInputPath(job,newPath(inputfile));FileOutputFormat.setOutputPath(job,newPath(outputFile));
        job.waitForCompletion(true);/********** End **********/}}

命令行

touch file01
echo Hello World Bye World
cat file01
echo Hello World Bye World >file01
cat file01
touch file02
echo Hello Hadoop Goodbye Hadoop >file02
cat file02
start-dfs.sh
hadoop fs -mkdir /usr
hadoop fs -mkdir /usr/input
hadoop fs -ls /usr/output
hadoop fs -ls /
hadoop fs -ls /usr
hadoop fs -put file01 /usr/input
hadoop fs -put file02 /usr/input
hadoop fs -ls /usr/input

第2关：文件内容合并去重

任务描述
相关知识
map类
Reducer类
Job类
编程要求
测试说明
任务描述
本关任务：使用Map/Reduce编程实现文件合并和去重操作。

相关知识
通过上一小节的学习我们了解了MapReduce大致的使用方式，本关我们来了解一下Mapper类，Reducer类和Job类。

map类
首先我们来看看Mapper对象：

在编写MapReduce程序时，要编写一个类继承Mapper类，这个Mapper类是一个泛型类型，它有四个形参类型，分别指定了map()函数的输入键，输入值，和输出键，输出值的类型。就第一关的例子来说，输入键是一个长整型，输入值是一行文本，输出键是单词，输出值是单词出现的次数。

答案代码-------------------

importjava.io.IOException;importjava.util.*;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.*;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importorg.apache.hadoop.util.GenericOptionsParser;publicclassMerge{/**
     * @param args
     * 对A,B两个文件进行合并，并剔除其中重复的内容，得到一个新的输出文件C
     *///在这重载map函数，直接将输入中的value复制到输出数据的key上 注意在map方法中要抛出异常：throws IOException,InterruptedExceptionpublicstaticclassMapextendsMapper<Object,Text,Text,Text>{/********** Begin **********/publicvoidmap(Object key,Text value,Context content)throwsIOException,InterruptedException{Text text1 =newText();Text text2 =newText();StringTokenizer itr =newStringTokenizer(value.toString());while(itr.hasMoreTokens()){
                text1.set(itr.nextToken());
                text2.set(itr.nextToken());
                content.write(text1, text2);}}/********** End **********/}//在这重载reduce函数，直接将输入中的key复制到输出数据的key上  注意在reduce方法上要抛出异常：throws IOException,InterruptedExceptionpublicstaticclassReduceextendsReducer<Text,Text,Text,Text>{/********** Begin **********/publicvoidreduce(Text key,Iterable<Text> values,Context context)throwsIOException,InterruptedException{Set<String> set =newTreeSet<String>();for(Text tex : values){
                set.add(tex.toString());}for(String tex : set){
                context.write(key,newText(tex));}}/********** End **********/}publicstaticvoidmain(String[] args)throwsException{// TODO Auto-generated method stubConfiguration conf =newConfiguration();
        conf.set("fs.default.name","hdfs://localhost:9000");Job job =Job.getInstance(conf,"Merge and duplicate removal");
        job.setJarByClass(Merge.class);
        job.setMapperClass(Map.class);
        job.setCombinerClass(Reduce.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);String inputPath ="/user/tmp/input/";//在这里设置输入路径String outputPath ="/user/tmp/output/";//在这里设置输出路径FileInputFormat.addInputPath(job,newPath(inputPath));FileOutputFormat.setOutputPath(job,newPath(outputPath));System.exit(job.waitForCompletion(true)?0:1);}}

第3关：信息挖掘 - 挖掘父子关系

任务描述
编程要求
测试说明
任务描述
本关任务：对给定的表格进行信息挖掘。

编程要求
你编写的程序要能挖掘父子辈关系，给出祖孙辈关系的表格。规则如下：

孙子在前，祖父在后；
输入文件路径：/user/reduce/input；
输出文件路径：/user/reduce/output。
测试说明
程序会对你编写的代码进行测试：
下面给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。

输入文件内容如下：

child          parent
Steven        Lucy
Steven        Jack
Jone         Lucy
Jone         Jack
Lucy         Mary
Lucy         Frank
Jack         Alice
Jack         Jesse
David       Alice
David       Jesse
Philip       David
Philip       Alma
Mark       David
Mark       Alma

输出文件内容如下：

grand_child    grand_parent
Mark    Jesse
Mark    Alice
Philip    Jesse
Philip    Alice
Jone    Jesse
Jone    Alice
Steven    Jesse
Steven    Alice
Steven    Frank
Steven    Mary
Jone    Frank

Jone Mary
开始你的任务吧，祝你成功！

答案代码------------------------

importjava.io.IOException;importjava.util.*;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.IntWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importorg.apache.hadoop.util.GenericOptionsParser;publicclass simple_data_mining {publicstaticint time =0;/**
     * @param args
     * 输入一个child-parent的表格
     * 输出一个体现grandchild-grandparent关系的表格
     *///Map将输入文件按照空格分割成child和parent，然后正序输出一次作为右表，反序输出一次作为左表，需要注意的是在输出的value中必须加上左右表区别标志publicstaticclassMapextendsMapper<Object,Text,Text,Text>{publicvoidmap(Object key,Text value,Context context)throwsIOException,InterruptedException{/********** Begin **********/String line = value.toString();String[] childAndParent = line.split(" ");List<String> list =newArrayList<>(2);for(String childOrParent : childAndParent){if(!"".equals(childOrParent)){
                     list.add(childOrParent);}}if(!"child".equals(list.get(0))){String childName = list.get(0);String parentName = list.get(1);String relationType ="1";
                  context.write(newText(parentName),newText(relationType +"+"+ childName +"+"+ parentName));
                  relationType ="2";
                  context.write(newText(childName),newText(relationType +"+"+ childName +"+"+ parentName));}/********** End **********/}}publicstaticclassReduceextendsReducer<Text,Text,Text,Text>{publicvoidreduce(Text key,Iterable<Text> values,Context context)throwsIOException,InterruptedException{/********** Begin **********///输出表头if(time ==0){
                context.write(newText("grand_child"),newText("grand_parent"));
                time++;}//获取value-list中value的childList<String> grandChild =newArrayList<>();//获取value-list中value的parentList<String> grandParent =newArrayList<>();//左表，取出child放入grand_childfor(Text text : values){String s = text.toString();String[] relation = s.split("\\+");String relationType = relation[0];String childName = relation[1];String parentName = relation[2];if("1".equals(relationType)){
                    grandChild.add(childName);}else{
                    grandParent.add(parentName);}}//右表，取出parent放入grand_parentint grandParentNum = grandParent.size();int grandChildNum = grandChild.size();if(grandParentNum !=0&& grandChildNum !=0){for(int m =0; m < grandChildNum; m++){for(int n =0; n < grandParentNum; n++){//输出结果
                    context.write(newText(grandChild.get(m)),newText(
                                grandParent.get(n)));}}}/********** End **********/}}publicstaticvoidmain(String[] args)throwsException{// TODO Auto-generated method stubConfiguration conf =newConfiguration();Job job =Job.getInstance(conf,"Single table join");
        job.setJarByClass(simple_data_mining.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);String inputPath ="/user/reduce/input";//设置输入路径String outputPath ="/user/reduce/output";//设置输出路径FileInputFormat.addInputPath(job,newPath(inputPath));FileOutputFormat.setOutputPath(job,newPath(outputPath));System.exit(job.waitForCompletion(true)?0:1);}}

标签：大数据 hadoop python

本文转载自: https://blog.csdn.net/weixin_45818379/article/details/117790528
版权归原作者 刘向阳啊 所有，如有侵权，请联系我们删除。

educoder--MapReduce基础实战各关卡通关答案

第1关：成绩统计:

答案代码--------------------------------------

第2关：文件内容合并去重

答案代码-------------------

第3关：信息挖掘 - 挖掘父子关系

答案代码------------------------

发表评论

“educoder--MapReduce基础实战各关卡通关答案”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航