0


Idea在本地环境(Win11)连接虚拟机Hadoop并运行相关程序,超详细!

本文是为了对大学老师使用过时的操作进行使用hadoop的简化

文章目录


前言

最近在上课的时候用到了hadoop,但是老师使用的教学软件还是在虚拟机上运行eclipse进行开发,对于一个习惯了使用Idea的人来说是相当痛苦的。以下命令操作均在hadoop安装目录执行


一、相关环境配置

1.1虚拟机环境

本人虚拟机版本是ubuntukylin-16.04-desktop-amd64,hadoop版本是3.4.1,虚拟机java版本为Java8

1.2修改core-site.xml

core-site.xml位于你的hadoop安装目录的/etc/hadoop/下,修改红线ip为你的虚拟机ip地址在这里插入图片描述
查看虚拟机ip地址的方法

ip addr

在这里插入图片描述

1.3idea中的xml文件依赖引入(maven创建新项目)

注意将版本改为你虚拟机上hadoop的版本

<?xml version="1.0" encoding="UTF-8"?><projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.example</groupId><artifactId>HadoopExp</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target><hadoop.version>3.4.1</hadoop.version></properties><dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>3.4.1</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>3.4.1</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.4.1</version></dependency></dependencies></project>

1.4运行实例(合并指定文件夹中的文件)

代码

classMyPathFilterimplementsPathFilter{String reg =null;MyPathFilter(String reg){this.reg = reg;}publicbooleanaccept(Path path){if(!(path.toString().matches(reg)))returntrue;returnfalse;}}/***
 * 利用FSDataOutputStream和FSDataInputStream合并HDFS中的文件
 */publicclassMergeFile{Path inputPath =null;//待合并的文件所在的目录的路径Path outputPath =null;//输出文件的路径publicMergeFile(String input,String output){this.inputPath =newPath(input);this.outputPath =newPath(output);}publicvoiddoMerge()throwsIOException{Configuration conf =newConfiguration();//设置HDFS的配置参数,修改为你自己的地址
        conf.set("fs.defaultFS","hdfs://192.168.216.164:9000");
        conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");FileSystem fsSource =FileSystem.get(URI.create(inputPath.toString()), conf);FileSystem fsDst =FileSystem.get(URI.create(outputPath.toString()), conf);//下面过滤掉输入目录中后缀为.abc的文件FileStatus[] sourceStatus = fsSource.listStatus(inputPath,newMyPathFilter(".*\\.abc"));FSDataOutputStream fsdos = fsDst.create(outputPath);PrintStream ps =newPrintStream(System.out);//下面分别读取过滤之后的每个文件的内容,并输出到同一个文件中for(FileStatus sta : sourceStatus){//下面打印后缀不为.abc的文件的路径、文件大小System.out.print("路径:"+ sta.getPath()+"    文件大小:"+ sta.getLen()+"   权限:"+ sta.getPermission()+"   内容:");FSDataInputStream fsdis = fsSource.open(sta.getPath());byte[] data =newbyte[1024];int read =-1;while((read = fsdis.read(data))>0){
                ps.write(data,0, read);
                fsdos.write(data,0, read);}
            fsdis.close();}
        ps.close();
        fsdos.close();}publicstaticvoidmain(String[] args)throwsIOException{MergeFile merge =newMergeFile("hdfs://192.168.216.164:9000/user/shi/input",//输入文件路径"hdfs://192.168.216.164:9000/user/shi/merge.txt");//输出文件路径
        merge.doMerge();}}

二、WorldCount示例

2.1创建实例文件wordfile1.txt

内容随意输入

2.2上传到hadoop

进入hadoop安装目录

./bin/hdfs dfs -put ./wordfile1.txt 你的hadoop文件夹

如果没有创建文件夹,请使用以下命令创建,-p是指创建父目录,如果输入的路径不存在

./bin/hdfs dfs -mkdir -p 你的文件夹

查看是否上传成功

./bin/hdfs dfs -ls 你的hadoop文件夹

2.3创建存放jar的文件夹

随意创建一个目录即可,以我的为例

./bin/hdfs dfs -mkdir myapp

2.4将在idea写好的程序打包(以maven为例)

在这里插入图片描述
代码

/**
 * @author Shi
 * @version 1.0
 * @description: TODO
 * @date 2024/11/12 13:23
 */importjava.io.IOException;importjava.net.URI;importjava.util.Iterator;importjava.util.StringTokenizer;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.FileSystem;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.IntWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importorg.apache.hadoop.util.GenericOptionsParser;publicclassWordCount{publicWordCount(){}publicstaticvoidmain(String[] args)throwsException{Configuration conf =newConfiguration();String[] otherArgs =(newGenericOptionsParser(conf, args)).getRemainingArgs();if(otherArgs.length <2){System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}System.out.println("Input: "+ otherArgs[0]);System.out.println("Output: "+ otherArgs[1]);Job job =Job.getInstance(conf,"word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);for(int i =0; i < otherArgs.length -1;++i){FileInputFormat.addInputPath(job,newPath(otherArgs[i]));}//判断输出文件夹是否存在Path outputPath =newPath(otherArgs[otherArgs.length -1]);FileSystem fsSource =FileSystem.get(URI.create(outputPath.toString()), conf);if(fsSource.exists(outputPath)){System.out.println("输出文件夹已存在,删除:"+outputPath.toString());
            fsSource.delete(outputPath,true);}FileOutputFormat.setOutputPath(job, outputPath);System.exit(job.waitForCompletion(true)?0:1);System.out.println("运行结束");}publicstaticclassTokenizerMapperextendsMapper<Object,Text,Text,IntWritable>{privatestaticfinalIntWritable one =newIntWritable(1);privateText word =newText();publicTokenizerMapper(){}publicvoidmap(Object key,Text value,Mapper<Object,Text,Text,IntWritable>.Context context)throwsIOException,InterruptedException{StringTokenizer itr =newStringTokenizer(value.toString());while(itr.hasMoreTokens()){this.word.set(itr.nextToken());
                context.write(this.word, one);}}}publicstaticclassIntSumReducerextendsReducer<Text,IntWritable,Text,IntWritable>{privateIntWritable result =newIntWritable();publicIntSumReducer(){}publicvoidreduce(Text key,Iterable<IntWritable> values,Reducer<Text,IntWritable,Text,IntWritable>.Context context)throwsIOException,InterruptedException{int sum =0;IntWritable val;for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()){
                val =(IntWritable)i$.next();}this.result.set(sum);
            context.write(key,this.result);}}}

2.5在虚拟机上运行jar包

进入hadoop安装目录

./bin/hadoop jar ./myapp/HadoopExp-1.0-SNAPSHOT.jar WordCount/user/shi/input/wordfile1.txt  /user/shi/output/

命令解读:

  • ./bin/hadoop jar:hadoop运行jar命令
  • ./myapp/HadoopExp-1.0-SNAPSHOT.jar:指定jar位置,运行时需改为你的路径
  • /user/shi/input/wordfile1.txt:输入文件路径
  • /user/shi/output/:词频统计输出路径

运行
在这里插入图片描述
查看结果:

输出结果:这里是指输出output的所有文件

./bin/hdfs dfs -cat /user/shi/output/*

在这里插入图片描述

结束!


本文转载自: https://blog.csdn.net/m0_74221471/article/details/143721215
版权归原作者 叫我伟明桑 所有, 如有侵权,请联系我们删除。

“Idea在本地环境(Win11)连接虚拟机Hadoop并运行相关程序,超详细!”的评论:

还没有评论