Hadoop之FileSystem使用

前言：

在对hadoop的HDFS文件系统进行操作的时候，我们使用HADOOP_HOME/bin/hdfs dfs [command]，该command指的就是相应的文件操作，这是使用shell的方式。

同样，hadoop也提供了使用java来操作HDFS文件的方法

本次，我们就来简单看下如何使用java来操作HDFS

准备工作：

* 创建一个maven项目，命名hadoop

* 引入hadoop操作的相关包，hadoop-client（读者可以在https://mvnrepository.com/ 中找到对应的版本，笔者使用的是hadoop-2.7.0，故引入该版本的包）

 
   
   
     2.7.0 
    
   
   
    
    
      org.apache.hadoop 
     
    
      hadoop-client 
     
    
      ${hadoop.version}

1.获取FileSystem

FileSystem是一个抽象类，定义了hadoop的一个文件系统接口，里面基本包含了对HDFS文件操作的所有API

下面是获取FileSystem的代码：

public class HDFSDemo { FileSystem fileSystem; String hdfsUri = "hdfs://hadoop:9000";// 对应于core-site.xml中的FS.default / * 获取FileSystem * FileSystem是HDFS的一个抽象，用于操作HDFS * 里面的操作等同于HADOOP_HOME/bin/hdfs dfs里的操作 * @throws URISyntaxException * @throws IOException * @throws InterruptedException */ public void getFileSystem() throws IOException, URISyntaxException, InterruptedException{ Configuration configuration = new Configuration(); // 可通过指定的方式来获取，也可通过加载core-site.xml来获取 // hxw用户为超级用户 fileSystem = FileSystem.get(new URI(hdfsUri),configuration,"hxw"); } }

注意：由于笔者的hadoop超级管理员是hxw，所以直接使用该用户获取，以便获取最高权限。否则可能会在以下的操作中由于权限不够导致操作失败

2.FileSystem的常规操作

 / * 创建文件夹 * @throws IOException */ public void createDir() throws IOException{ Path dir = new Path("/user/hadoop/mapreduce/input"); fileSystem.mkdirs(dir); } / * 创建文件 * @throws IOException */ public void createFile() throws IOException{ Path path = new Path("/user/hadoop/mapreduce/input/wordcount.txt"); FSDataOutputStream out = fileSystem.create(path); String data = "I believe, for every drop of rain that falls, A flower grows"; out.writeChars(data); } / * 删除文件夹或文件 * @throws IOException * @throws IllegalArgumentException */ public void deleteFile() throws IllegalArgumentException, IOException{ Path path = new Path("/user/"); if(fileSystem.exists(path)){ fileSystem.delete(path, true);// 循环删除文件夹 }else{ System.out.println(path.getName() + "is not exists"); } } / * 读取文件 * @throws IOException */ public void readFile() throws IOException{ Path path = new Path("/user/hadoop/mapreduce/input/wordcount.txt"); if(fileSystem.isFile(path)){ ByteBuffer buf = ByteBuffer.allocate(1024); FSDataInputStream file = fileSystem.open(path); int read = 0; while((read = file.read(buf)) != -1){ System.out.print(new String(buf.array())); buf.clear(); }; } } / * 展示列表文件 * @throws FileNotFoundException * @throws IOException */ public void listFiles() throws FileNotFoundException, IOException{ Path path = new Path("/user"); // 获取其路径下的所有子文件夹或文件 FileStatus[] listStatus = fileSystem.listStatus(path); for (FileStatus fileStatus : listStatus) { System.out.println(fileStatus); } // 展示所有的文件 RemoteIterator 
  
    listFiles = fileSystem.listFiles(path, true); LocatedFileStatus next = null; while(listFiles.hasNext()){ next = listFiles.next(); System.out.println(next); } } / * 获取文件属性 * @throws IOException */ public void queryPosition() throws IOException{ Path path = new Path("/user/hadoop/mapreduce/input/wordcount.txt"); FileStatus fileStatus = fileSystem.getFileStatus(path); // 获取文件所在集群位置 BlockLocation[] fileBlockLocations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); for (BlockLocation blockLocation : fileBlockLocations) { System.out.println(blockLocation);//0,120,hadoop } // 获取checksum FileChecksum fileChecksum = fileSystem.getFileChecksum(path); //MD5-of-0MD5-of-512CRC32C:cb95bb44dab0fcfeb617d7f95d System.out.println(fileChecksum); // 获取集群中的所有节点信息 DistributedFileSystem dfs = (DistributedFileSystem)fileSystem; DatanodeInfo[] dataNodeStats = dfs.getDataNodeStats(); for (DatanodeInfo datanodeInfo : dataNodeStats) { System.out.println(datanodeInfo);//192.168.241.129:50010 } } @Test public void readHDFSFile(){ HDFSDemo d = new HDFSDemo(); try { d.getFileSystem();// 获取FileSystem // d.deleteFile();// 删除 // d.createDir();//创建文件夹 // d.createFile(); // d.listFiles(); // d.readFile(); d.queryPosition(); } catch (IllegalArgumentException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (URISyntaxException e) { e.printStackTrace(); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }

注意：以上所使用的fileSystem就是刚才获取的FileSystem

总结：以上操作都是比较常规和简单的操作，故笔者不再详细叙述。有兴趣的同学可以多尝试一下其中的方法

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/207550.html原文链接：https://javaforall.net

Hadoop之FileSystem使用

关于作者

全栈程序员-站长

发表回复

Hadoop之FileSystem使用

关于作者

全栈程序员-站长

相关推荐

griddata三维空间插值「建议收藏」

win10怎么完全卸载sql2012_软件卸载了数据还在吗

光耦的參数的理解

金智维获评“AI SaaS影响力企业”标杆，K-APA平台与Ki-Agent铸就企业级智能体领导力

炸锅了！3天付费用户翻4倍，Kimi K2.5改写国产大模型出海格局

【动画教程】真封神南极服务端2.52架设第三集[通俗易懂]

发表回复