1. 安装前的准备工作
1.1 更新系统并安装必要的工具
在终端中运行以下命令:
sudoapt-get update
sudoapt-getinstall-ysshrsynccurl
1.2 安装 Java
如果系统中没有安装 Java,可以通过以下命令安装 OpenJDK:
sudoapt-getinstall-y openjdk-8-jdk
验证 Java 是否安装成功:
java-version
1.3 配置 SSH 无密码登录
Hadoop 需要通过 SSH 进行节点间通信。首先,生成 SSH 密钥对:
ssh-keygen -t rsa -P""-f ~/.ssh/id_rsa
然后,将公钥添加到授权密钥列表中:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod600 ~/.ssh/authorized_keys
测试无密码登录:
ssh localhost
2. 下载和安装 Hadoop
2.1 下载 Hadoop
访问 Apache Hadoop 的官方网站并下载最新的稳定版本。你可以使用
curl
命令下载:
curl-O https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
2.2 解压 Hadoop
下载完成后,解压 Hadoop 压缩包:
tar-xzvf hadoop-3.3.6.tar.gz
2.3 配置环境变量
编辑
~/.bashrc
(当前用户目录下,也就是用户的目录)添加 Hadoop 的环境变量:
exportHADOOP_HOME=/home/hdfs/hadoop-3.3.6
exportHADOOP_HDFS_HOME=$HADOOP_HOMEexportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
保存文件并刷新环境变量:
source ~/.bashrc
3. 配置 Hadoop
3.1 配置 core-site.xml
编辑
$HADOOP_HOME/etc/hadoop/core-site.xml
文件,设置默认文件系统为 HDFS:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></property></configuration>
3.2 配置 hdfs-site.xml
编辑
$HADOOP_HOME/etc/hadoop/
hdfs-site.xml`,设置远程访问地址,以及namenode。
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><!-- 设置数据副本数 --><property><name>dfs.replication</name><value>1</value></property><!-- NameNode的存储目录 --><property><name>dfs.namenode.name.dir</name><value>file:///home/hdfs/hadoop-3.3.6/namenode</value></property><!-- NameNode的RPC地址 --><property><name>dfs.namenode.rpc-address</name><value>0.0.0.0:9000</value></property><!-- DataNode的存储目录 --><property><name>dfs.datanode.data.dir</name><value>file:///home/hdfs/hadoop-3.3.6/datanode</value></property></configuration>
3.1 确定 Java 安装目录
sudo update-alternatives --config java
3.4 配置 JAVA_HOME
编辑 $HADOOP_HOME/etc/hadoop/hadoop-env.sh文件,配置
JAVA_HOME
设置:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
4. 启动 HDFS
4.1 格式化 NameNode
在首次启动 HDFS 之前,需要格式化 NameNode:
hdfs namenode -format
4.2 启动 HDFS
启动 NameNode 和 DataNode 服务:
start-dfs.sh
你可以使用以下命令检查进程是否启动成功:
jps
正常情况下,你应该会看到
NameNode
和
DataNode
进程在运行。
结束进程:
kill -9 pid #可以先通过jps查看进程,再杀掉
4.3 验证 HDFS
你可以通过浏览器访问 NameNode 的 Web UI,地址是:
http://192.168.186.77:9870
4.4 查看数据节点状态
hdfs dfsadmin -report
5. 项目结构
5.1 pom.xml
<?xml version="1.0" encoding="UTF-8"?><projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>3.3.2</version><relativePath/><!-- lookup parent from repository --></parent><groupId>org.example</groupId><artifactId>hdfs_hadoop</artifactId><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.3.6</version></dependency></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><configuration><excludes><exclude><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></exclude></excludes></configuration></plugin></plugins></build></project>
5.2 HdfsHadoopApplication.java
packageorg.example;importorg.springframework.boot.SpringApplication;importorg.springframework.boot.autoconfigure.SpringBootApplication;@SpringBootApplicationpublicclassHdfsHadoopApplication{publicstaticvoidmain(String[] args){SpringApplication.run(HdfsHadoopApplication.class, args);}}
5.3 HDFSService.java
packageorg.example.service;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.*;importorg.example.model.SimpleFileStatusDTO;importorg.springframework.stereotype.Service;importorg.springframework.web.multipart.MultipartFile;importjava.io.IOException;importjava.io.InputStream;importjava.net.URI;importjava.util.*;@ServicepublicclassHDFSService{privatestaticfinalStringHDFS_URI="hdfs://192.168.186.77:9000";privatestaticfinalStringBASE_DIR="/home";// HDFS 上的基本目录privatefinalFileSystem fileSystem;// 构造函数,初始化 FileSystempublicHDFSService()throwsIOException,InterruptedException{Configuration configuration =newConfiguration();
configuration.set("fs.defaultFS",HDFS_URI);// 设置环境变量,指定 HDFS 用户System.setProperty("HADOOP_USER_NAME","liber");// 初始化 FileSystemthis.fileSystem =FileSystem.get(URI.create(HDFS_URI), configuration);}// 1. 上传文件到 HDFSpublicvoiduploadFile(MultipartFile file,String subDirectory)throwsIOException{// 生成新的文件名,避免重名冲突String originalFilename = file.getOriginalFilename();String newFilename =UUID.randomUUID()+"_"+ originalFilename;// 目标目录路径String targetDirectory =BASE_DIR+(subDirectory.startsWith("/")? subDirectory :"/"+ subDirectory);Path directoryPath =newPath(targetDirectory);// 如果目录不存在,创建目录if(!fileSystem.exists(directoryPath)){
fileSystem.mkdirs(directoryPath);}// 目标文件路径Path destinationPath =newPath(targetDirectory +"/"+ newFilename);// 上传文件try(FSDataOutputStream outputStream = fileSystem.create(destinationPath)){
outputStream.write(file.getBytes());}}// 2. 删除文件或目录publicvoiddeleteFile(String hdfsPath)throwsIOException{
fileSystem.delete(newPath(BASE_DIR+"/"+hdfsPath),true);}// 3. 列出目录内容publicMap<String,Object>listFiles(String subDirectory)throwsIOException{String directoryPath =BASE_DIR+(subDirectory.startsWith("/")? subDirectory :"/"+ subDirectory);FileStatus[] fileStatuses = fileSystem.listStatus(newPath(directoryPath));List<SimpleFileStatusDTO> fileStatusDTOList =newArrayList<>();for(FileStatus fileStatus : fileStatuses){
fileStatusDTOList.add(newSimpleFileStatusDTO(fileStatus));}Map<String,Object> map=newHashMap<>();
map.put("basePath", subDirectory);
map.put("files", fileStatusDTOList);return map;}// 4. 创建目录publicvoidcreateDirectory(String subDirectory)throwsIOException{String targetDirectory =BASE_DIR+(subDirectory.startsWith("/")? subDirectory :"/"+ subDirectory);Path path =newPath(targetDirectory);if(!fileSystem.exists(path)){
fileSystem.mkdirs(path);}else{thrownewIOException("Directory already exists: "+ targetDirectory);}}// 5. 下载文件publicInputStreamreadFileAsStream(String hdfsFilePath)throwsIOException{Path path =newPath(BASE_DIR+hdfsFilePath);return fileSystem.open(path);}// 6. 重命名文件或目录publicvoidrename(String sourceSubDirectory,String destSubDirectory)throwsIOException{String sourcePath =BASE_DIR+(sourceSubDirectory.startsWith("/")? sourceSubDirectory :"/"+ sourceSubDirectory);String destPath =BASE_DIR+(destSubDirectory.startsWith("/")? destSubDirectory :"/"+ destSubDirectory);Path src =newPath(sourcePath);Path dst =newPath(destPath);if(!fileSystem.rename(src, dst)){thrownewIOException("Failed to rename: "+ sourcePath +" to "+ destPath);}}}
5.4 SimpleFileStatusDTO.java
packageorg.example.model;importlombok.Data;importlombok.NoArgsConstructor;importorg.apache.hadoop.fs.FileStatus;@Data@NoArgsConstructorpublicclassSimpleFileStatusDTO{privateString pathSuffix;privatelong length;privateboolean isDirectory;publicSimpleFileStatusDTO(FileStatus fileStatus){String pathSuffix = fileStatus.getPath().toString();this.pathSuffix = pathSuffix.substring(pathSuffix.lastIndexOf("/")+1);this.length = fileStatus.getLen();this.isDirectory = fileStatus.isDirectory();}}
5.5 HDFSController.java
packageorg.example.controller;importorg.example.service.HDFSService;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.core.io.InputStreamResource;importorg.springframework.http.HttpHeaders;importorg.springframework.http.ResponseEntity;importorg.springframework.web.bind.annotation.*;importorg.springframework.web.multipart.MultipartFile;importjava.io.IOException;importjava.util.Map;@RestController@RequestMapping("/hdfs")publicclassHDFSController{privatefinalHDFSService hdfsService;@AutowiredpublicHDFSController(HDFSService hdfsService){this.hdfsService = hdfsService;}// 1. 上传文件@PostMapping("/upload")publicResponseEntity<String>uploadFile(@RequestParam("file")MultipartFile file,@RequestParam("hdfsDirectory")String hdfsDirectory){try{
hdfsService.uploadFile(file, hdfsDirectory);returnResponseEntity.ok("上传成功");}catch(IOException e){returnResponseEntity.status(500).body(null);}}// 2. 下载文件@GetMapping("/download")publicResponseEntity<InputStreamResource>downloadFile(@RequestParam("hdfsFilePath")String hdfsFilePath){try{String filename = hdfsFilePath.substring(hdfsFilePath.lastIndexOf("/")+1);InputStreamResource resource =newInputStreamResource(hdfsService.readFileAsStream(hdfsFilePath));returnResponseEntity.ok().header(HttpHeaders.CONTENT_DISPOSITION,"attachment; filename=\""+ filename +"\"").body(resource);}catch(IOException e){returnResponseEntity.status(500).body(null);}}// 3. 删除文件或目录@DeleteMapping("/delete")publicResponseEntity<String>deleteFile(@RequestParam("hdfsPath")String hdfsPath){try{
hdfsService.deleteFile(hdfsPath);returnResponseEntity.ok("File deleted successfully");}catch(IOException e){returnResponseEntity.status(500).body("Failed to delete file: "+ e.getMessage());}}// 4. 列出目录内容@GetMapping("/list")publicResponseEntity<Map<String,Object>>listFiles(@RequestParam("directoryPath")String directoryPath){try{Map<String,Object> files = hdfsService.listFiles(directoryPath);returnResponseEntity.ok(files);}catch(IOException e){returnResponseEntity.status(500).body(null);}}// 5. 创建目录@PostMapping("/mkdir")publicResponseEntity<String>createDirectory(@RequestParam("directoryPath")String directoryPath){try{
hdfsService.createDirectory(directoryPath);returnResponseEntity.ok("Directory created successfully");}catch(IOException e){returnResponseEntity.status(500).body("Failed to create directory: "+ e.getMessage());}}// 6. 重命名文件或目录@PostMapping("/rename")publicResponseEntity<String>rename(@RequestParam("sourcePath")String sourcePath,@RequestParam("destPath")String destPath){try{
hdfsService.rename(sourcePath, destPath);returnResponseEntity.ok("File renamed successfully");}catch(IOException e){returnResponseEntity.status(500).body("Failed to rename file: "+ e.getMessage());}}}
5.6 application.yml
spring:application:name: hdfs_hadoop
servlet:multipart:max-file-size: 1024MB
max-request-size: 1024MB
5.7 index.html
<!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><metaname="viewport"content="width=device-width, initial-scale=1.0"><title>HDFS 文件管理</title><!-- Vue.js CDN --><scriptsrc="https://cdn.jsdelivr.net/npm/vue@2"></script><!-- Axios CDN 用于 HTTP 请求 --><scriptsrc="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script><!-- Bootstrap CDN 用于样式 --><linkhref="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"rel="stylesheet"><style>.current-path{font-weight: bold;font-size: 1.2em;margin-bottom: 15px;}.go-up-item{background-color: #f8f9fa;cursor: pointer;}.go-up-item:hover{background-color: #e2e6ea;}.file-item{cursor: pointer;}.file-item:hover{background-color: #f1f3f5;}.btn-icon{background-color: transparent;border: none;color: #007bff;cursor: pointer;padding: 0.2rem;}.btn-icon:hover{color: #0056b3;}.form-inline{display: flex;align-items: center;gap: 10px;margin-bottom: 15px;}.form-inline input{flex: 1;}</style></head><body><divid="app"class="container mt-5"><h1class="mb-4">HDFS 文件管理</h1><!-- 目录列表、创建目录和上传文件 --><divclass="mb-3"><h4>管理目录</h4><divclass="current-path"><span>📁 {{ currentPath }}</span></div><!-- 创建目录的内联表单 --><divclass="form-inline"><inputtype="text"v-model="newDirectoryPath"placeholder="新目录名称"class="form-control"><button@click="createDirectory"class="btn btn-info">创建目录</button><button@click="showUploadDialog"class="btn btn-primary ms-2">上传文件</button></div><ulclass="list-group"><liv-if="currentPath !== '/'"@click="goUpOneLevel"class="list-group-item go-up-item"><strong>🔙 返回上一级</strong></li><liv-for="file in files":key="file.pathSuffix"class="list-group-item d-flex justify-content-between align-items-center file-item"><div@click="file.directory ? onDirectoryClick(file) : null"><spanv-if="file.directory">📁</span><spanv-else>📄</span>
{{ file.pathSuffix }}
<!-- 当是文件时显示文件大小 --><spanv-if="!file.directory"class="text-muted">({{ formatFileSize(file.length) }})</span></div><div><button@click="showRenameDialog(file)"class="btn-icon"><span>✏️</span></button><buttonv-if="file.directory"@click="deleteFile(currentPath + '/' + file.pathSuffix)"class="btn-icon"><span>🗑️</span></button><buttonv-if="!file.directory"@click="downloadFile(currentPath + '/' + file.pathSuffix)"class="btn-icon"><span>⬇️</span></button><buttonv-if="!file.directory"@click="deleteFile(currentPath + '/' + file.pathSuffix)"class="btn-icon"><span>🗑️</span></button></div></li></ul></div><!-- 上传文件的模态框 --><divclass="modal"tabindex="-1"role="dialog"id="uploadModal"><divclass="modal-dialog"role="document"><divclass="modal-content"><divclass="modal-header"><h5class="modal-title">上传文件</h5><buttontype="button"class="btn-close"data-bs-dismiss="modal"aria-label="Close"></button></div><divclass="modal-body"><inputtype="file"@change="onFileChange"class="form-control"></div><divclass="modal-footer"><buttontype="button"class="btn btn-secondary"data-bs-dismiss="modal">关闭</button><buttontype="button"class="btn btn-primary"@click="handleUpload">上传</button></div></div></div></div><!-- 重命名的模态框 --><divclass="modal"tabindex="-1"role="dialog"id="renameModal"><divclass="modal-dialog"role="document"><divclass="modal-content"><divclass="modal-header"><h5class="modal-title">重命名</h5><buttontype="button"class="btn-close"data-bs-dismiss="modal"aria-label="Close"></button></div><divclass="modal-body"><inputtype="text"v-model="renameNewName"class="form-control"></div><divclass="modal-footer"><buttontype="button"class="btn btn-secondary"data-bs-dismiss="modal">关闭</button><buttontype="button"class="btn btn-primary"@click="handleRename">重命名</button></div></div></div></div></div><script>newVue({el:'#app',data:{uploadFile:null,currentPath:'/',// 当前目录路径newDirectoryPath:'',files:[],renameFile:null,// 需要重命名的文件或目录renameNewName:'',// 新名称},methods:{// 处理文件选择onFileChange(event){this.uploadFile = event.target.files[0];},// 显示上传模态框showUploadDialog(){const modal =newbootstrap.Modal(document.getElementById('uploadModal'));
modal.show();},// 显示重命名模态框showRenameDialog(file){this.renameFile = file;this.renameNewName = file.pathSuffix;const modal =newbootstrap.Modal(document.getElementById('renameModal'));
modal.show();},// 上传文件asynchandleUpload(){try{const formData =newFormData();
formData.append('file',this.uploadFile);
formData.append('hdfsDirectory',this.currentPath);await axios.post('/hdfs/upload', formData,{headers:{'Content-Type':'multipart/form-data'}});this.listFiles();// 上传后刷新文件列表const modal = bootstrap.Modal.getInstance(document.getElementById('uploadModal'));
modal.hide();// 上传后隐藏模态框}catch(error){
console.error('上传文件时出错:', error);}},// 重命名文件或目录asynchandleRename(){try{const sourcePath =this.currentPath +'/'+this.renameFile.pathSuffix;const destPath =this.currentPath +'/'+this.renameNewName;await axios.post('/hdfs/rename',null,{params:{ sourcePath, destPath }});this.listFiles();// 重命名后刷新文件列表const modal = bootstrap.Modal.getInstance(document.getElementById('renameModal'));
modal.hide();// 重命名后隐藏模态框}catch(error){
console.error('重命名文件或目录时出错:', error);}},// 列出目录中的文件asynclistFiles(){try{const response =await axios.get('/hdfs/list',{params:{directoryPath:this.currentPath }});this.files = response.data.files;// 取出 files 数组this.currentPath = response.data.basePath;// 更新当前路径}catch(error){
console.error('列出文件时出错:', error);}},// 下载文件asyncdownloadFile(filePath){try{const response =await axios.get('/hdfs/download',{params:{hdfsFilePath: filePath },responseType:'blob'});const url = window.URL.createObjectURL(newBlob([response.data]));const link = document.createElement('a');
link.href = url;
link.setAttribute('download', filePath.split('/').pop());
document.body.appendChild(link);
link.click();}catch(error){
console.error('下载文件时出错:', error);}},// 删除文件或目录asyncdeleteFile(filePath){try{await axios.delete('/hdfs/delete',{params:{hdfsPath: filePath }});this.listFiles();// 刷新文件列表}catch(error){
console.error('删除文件或目录时出错:', error);}},// 创建新目录asynccreateDirectory(){try{await axios.post('/hdfs/mkdir',null,{params:{directoryPath:this.currentPath +'/'+this.newDirectoryPath }});this.newDirectoryPath ='';// 清空输入框this.listFiles();// 创建目录后刷新文件列表}catch(error){
console.error('创建目录时出错:', error);}},// 返回上一级目录goUpOneLevel(){const pathParts =this.currentPath.split('/').filter(part=> part);if(pathParts.length >1){
pathParts.pop();this.currentPath ='/'+ pathParts.join('/');}else{this.currentPath ='/';}this.listFiles();// 刷新文件列表},// 进入一个目录onDirectoryClick(file){if(!this.currentPath.endsWith('/')){this.currentPath +='/';}if(!this.currentPath.endsWith(file.pathSuffix)){this.currentPath += file.pathSuffix;}this.listFiles();// 刷新文件列表以显示点击的目录的内容},// 格式化文件大小formatFileSize(size){if(size <1024)return size +' B';elseif(size <1048576)return(size /1024).toFixed(2)+' KB';elseif(size <1073741824)return(size /1048576).toFixed(2)+' MB';elsereturn(size /1073741824).toFixed(2)+' GB';}},mounted(){this.listFiles();// 页面加载时加载初始目录中的文件}});</script><!-- Bootstrap JS 用于模态框 --><scriptsrc="https://cdn.jsdelivr.net/npm/@popperjs/[email protected]/dist/umd/popper.min.js"></script><scriptsrc="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.min.js"></script></body></html>
6. 测试验证
6.1 创建目录
6.2 创建结果
6.3 上传文件
6.4 上传结果
6.5 重命名测试
6.6 重命名结果
6.7 其他
删除,下载等不再赘余。
6.8 查看HDFS默认的文件管理
7、遇到错误
第1个错误
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
下载对应版本的原生文件:https://github.com/cdarlint/winutils
然后修改下面内容到系统环境即可。
HADOOP_HOME=D:\hadoop-3.3.6\bin
%HADOOP_HOME%\bin
重要的话要说三遍:记得重启,记得重启,记得重启IDEA编译器。
第2个错误
Engine2 : Call: addBlock took 169ms
2024-08-11T19:14:22.716+08:00 DEBUG 13116 — [hdfs_hadoop] [ Thread-5] org.apache.hadoop.hdfs.DataStreamer : pipeline = [DatanodeInfoWithStorage[127.0.0.1:9866,DS-d52f1df8-88e2-4807-bc48-842e7b9f07a2,DISK]], blk_1073741826_1002
2024-08-11T19:14:22.716+08:00 DEBUG 13116 — [hdfs_hadoop] [ Thread-5] org.apache.hadoop.hdfs.DataStreamer : Connecting to datanode 127.0.0.1:9866
2024-08-11T19:14:22.718+08:00 WARN 13116 — [hdfs_hadoop] [ Thread-5] org.apache.hadoop.hdfs.DataStreamer : Exception in createBlockOutputStream blk_1073741826_1002
java.net.ConnectException: Connection refused: getsockopt
at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[na:na]
org.apache.hadoop.ipc.RemoteException: File /home/7dff6c94-88d2-4b62-83b9-92f93253b473_01.jpg could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
说明:一开始,我以为一个时间节点不行,后来想试一下集群还是不行,最后通过以下方式进行修改:
**编辑
/etc/hosts
文件**:
sudo nano /etc/hosts
- 我是把liber-vmware-virtual-platform 改为 192.168.186.77 连接成功,不然默认是127.0.0.1。
重要的话要说三遍:记得重启,记得重启,记得重启Ubuntu。
8. 总结
基于Ubuntu24.04 TLS 安装Hadoop以及配置HDFS,通过Spring Boot 和 Vue 实现一个简单的文件管理系统。
版权归原作者 someliber 所有, 如有侵权,请联系我们删除。