Hadoop生态圈python + mapreduce + wordcount

Hadoop生态圈python + mapreduce + wordcountHadoop生态圈python+mapreduce+wordcount启动hadoop进度发布文件hdfsdfs-put/home/hadoop/hadoop/input/user/hadoop/input查看hdfs现在有一些文件[hadoop@master0hadoop]$hdfsdfs-ls/Found1itemsdrwxr-xr-x-hadoopsupergroup02019-12-0402

大家好,又见面了,我是你们的朋友全栈君。

Hadoop生态圈python + mapreduce + wordcount

启动hadoop进度

发布文件

hdfs dfs -put /home/hadoop/hadoop/input /user/hadoop/input 

查看hdfs现在有一些文件

[hadoop@master0 hadoop]$ hdfs dfs -ls / Found 1 items drwxr-xr-x - hadoop supergroup 0 2019-12-04 02:17 /user 

经验|  Hadoop生态圈python + mapreduce + wordcount

查看上传的文件是否正确

经验|  Hadoop生态圈python + mapreduce + wordcount

运行程序,查询字符串出现次数

 

查看输出结果

[hadoop@master0 hdfs]$ hdfs dfs -cat /user/hadoop/output/* work, 63 worker 315 would 62 write-operations. 62 written 62 ........ ....... ....... 

编写mapreduce编程,推送到流中进行运算

#!/usr/bin/env Python3 # -*- coding: utf-8 -*- # @Software: PyCharm # @virtualenv:workon # @contact: 1040691703@qq.com # @Desc:Code descripton __author__ = '未昔/AngelFate' __date__ = '2019/12/4 20:20' import sys for line in sys.stdin: line = line.strip() words = line.split() for word in words: print("%s\t%s"%(word,1)) 
#!/usr/bin/env Python3 # -*- coding: utf-8 -*- # @Software: PyCharm # @virtualenv:workon # @contact: 1040691703@qq.com # @Desc:Code descripton __author__ = '未昔/AngelFate' __date__ = '2019/12/4 20:25' import sys current_word = None #记录前一个单词, 用于比较 count = 0 word = None current_count = 0 for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: continue if current_word == word: current_count += count else: if current_word: print("%s\t%s" % (current_word, current_count)) current_count = count current_word = word if word == current_word: print("%s\t%s"%(word,count)) 
[hadoop@master0 hadoop]$ bin/hadoop jar\ share/hadoop/tools/lib/****.jar \ -file mapper.py -mapper "python mapper.py" \ -file reducer.py -reducer "python reducer.py" \ -input /user/hadoop/input -output /user/hadoop/input 

[hadoop @ master0 hadoop] $ hadoop fs -cat input / part-00000
经验|  Hadoop生态圈python + mapreduce + wordcount

(API), 62 (C++, 62 (FUSE) 62 (HDFS) 62 (HDFS). 63 (JAR) 63 (JRE) 63 (RPC) 62 (SPoF), 62 (a,b,c) 62 (a,b,c), 62 (but 62 (either 63 (more 63 (ssh) 63 (the 62 (typically 62 (webapp) 62 (x,y,z) 62 (x,y,z). 62 1.6 63 2.0 62 2012,[66] 62 3, 62 3rd-party 62 A 369 API 124 ARchive 63 An 62 Append. 62 B 124 Because 62 C#, 62 Clients 62 Cocoa, 62 Common 126 Data 62 DataNode 126 DataNode. 63 Distributed 63 Each 62 Environment 63 Erlang, 62 Failure 62 Federation, 62 File 182 Filesystem 62 For 119 HDFS 732 HDFS, 62 HDFS-UI 62 HDFS. 62 HTTP, 62 Hadoop 986 Hadoop-compatible 63 Hadoop. 63 Haskell, 62 I/O 62 In 121 It 62 Java 312 Java, 62 Job 63 JobTracker 63 Linux 62 MapReduce 126 MapReduce/MR1 63 May 62 Moreover, 62 NameNode 126 NameNode) 62 NameNode, 189 OCaml), 62 OS 63 PHP, 62 POSIX 124 POSIX-compliant 62 POSIX-compliant, 62 Perl, 62 Point 62 Python, 62 RAID 124 Ruby, 62 Runtime 63 Secure 63 Shell 63 Similarly, 63 Single 62 Smalltalk, 62 Some 62 System 63 TCP/IP 62 Task 63 TaskTracker, 63 The 552 These 125 This 187 Thrift 62 Tracker, 126 Unix 62 Userspace 62 Web 62 When 125 With 62 YARN/MR2)[58] 63 a 1994 abstractions, 63 access 62 achieved 62 achieves 62 across 250 actions, 62 acts 63 added 62 addition, 62 advantage 124 aims 62 allowing 62 also 62 alternate 63 although 62 always 62 amount 62 an 187 and 1747 and, 63 announced 62 application 124 application. 62 applications 63 applications. 63 approach 63 architecture 63 are 437 around, 62 as 249 automatic 62 available 62 available. 125 awareness 124 awareness: 63 backbone 63 backup 62 backup. 62 be 497 because 62 become 62 been 62 between 187 block 62 blocks 62 both 63 bottleneck 124 browsed 62 builds 62 but 62 by 249 call 62 can 561 capabilities, 62 certain 62 checkpointed 62 choosing 62 client 124 cluster 376 cluster, 63 code 63 command-line 62 commands 62 communicate 62 communication. 62 compliance 62 compute-only 63 concurrent 62 configurations 62 connects 62 consider 62 consists 126 contains 187 copies 62 corruption 63 create 62 criticality. 62 data 934 data, 62 data-intensive 62 data-only 63 data. 63 datanode 62 datanodes, 62 dedicated 63 default 62 demonstrated 62 designed 62 developing 62 differ 62 different 62 directly 62 directories. 62 directory 124 distributed 124 distributed, 62 does 124 due 124 each 124 edit 62 effective 63 engine 63 entire 62 equivalents. 63 especially 62 every 63 example: 62 execute 63 extent 62 fact, 62 fail 62 fail-over. 62 failed 62 failing 63 failure; 63 failures 63 file 810 file-system 249 file-system-specific 63 files 125 files, 62 files. 62 files[65] 62 for 869 framework. 62 from 62 fully 124 generate 125 gigabytes 62 goals 62 goes 124 hardware 63 has 186 have 125 having 124 hence 62 high-availability 62 high. 62 higher. 63 host 63 hosts 62 hosts, 62 huge 124 if 125 images 62 immutable 62 impact 125 in 560 inability 62 includes 125 incorrectly 62 increase 62 increased 62 index, 63 information 63 information, 62 instead 62 interface 62 interface, 62 interpret 62 is 622 is, 63 is. 63 issue, 62 issues 62 it 187 its 124 job 249 job-completion 62 jobs 62 jobs. 62 journal 62 keep 62 lack 62 language 62 large 124 larger 63 letting 62 level 63 libraries.A 6 libraries.File 6 libraries.For 6 libraries.HDFS 15 libraries.Hadoop 16 libraries.In 4 libraries.The 9 local 62 location 63 location. 62 log 62 loss 63 machines. 62 main 62 manage 63 managed 63 management 62 manually 62 map 186 master 126 may 62 memory 63 metadata 124 metadata, 62 method 63 methods 62 might 62 misleading 62 mostly 62 mounted 62 mounted,[62] 62 move 62 multi-node 63 multiple 312 name 125 namely, 62 namenode 496 namenode's 125 namenode, 62 namenodes. 62 namespaces 62 native 62 necessary 63 needed 63 network 249 new 62 node 500 nodes 251 nodes. 189 nodes: 62 nominally 62 non-POSIX 62 nonstandard 63 normally 63 not 310 number 124 occurs, 63 of 1622 offline. 62 on 622 one 125 only 63 operations 62 options 62 or 562 other 248 other. 62 outage 63 over 248 package 63 package, 63 perform 124 performance 124 plus 62 point 62 portable 62 possible 63 power 63 precisely, 63 preventing 63 prevents 62 primary 248 problem 62 problem, 62 procedure 62 programming 62 project 62 protocol 62 provide 125 provides 63 rack 126 rack, 62 rack. 62 rack/switch 63 racks. 63 range 62 rebalance 62 reduce 249 reduces 125 redundancy 125 regularly 62 release 62 reliability 62 remain 63 remote 124 replaced 63 replay 62 replicating 125 replication 124 request. 62 require 125 requirements 62 requires 63 requiring 62 restart 62 running 62 same 125 saves 62 scalability 62 scalable, 62 scheduled 62 schedules 124 scheduling 126 scripts 126 secondary 250 separate 62 served 62 server 188 serves 62 set 63 shell 62 should 63 shutdown 63
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/143518.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • DHCP协议原理及其实现流程

    DHCP协议原理及其实现流程DHCP(Dynamic Host Configuration Protocol):动态主机配置协议在常见的小型网络中(例如家庭网络和学生宿舍网),网络管理员都是采用手工分配IP地址的方法,而到了中、大型网络,这种方法就不太适用了。在中、大型网络,特别是大型网络中,往往有超过100台的客户机,手动分配IP地址的方法就不太合适了。因此,我们必须引入一种高效的IP地址分配方法,幸好,DHCP(Dy

    2022年5月23日
    29
  • Pycharm轻松创建Flask项目

    Pycharm轻松创建Flask项目打开Pycharm的file,选择创建新的项目,然后弹出对话框,我们可以看到里面有很多的案例,Flask、Django等等,我们选择生成Flask的demo程序。选择创建之后一个简易的Flask项目就出现在我们眼前,第一个是入口程序,还有一个static的静态目录,templates是模板存放的位置。在Pycharm上面有个run,我们可以选择run来启动Flask的服务,默认打开…

    2022年8月25日
    3
  • android开发之短信监听拦截_BroadcastReceiver的使用

    1.广播被分为两种不同的类型:“普通广播(Normal broadcasts)”和“有序广播(Ordered broadcasts)”。     普通广播是完全异步的,可以在同一时刻(逻辑上)被所有接收者接收到,消息传递的效率比较高, 但缺点是:接收者不能将处理结果传递给下一个接收者,并且无法终止广播Intent的传播。          有序广播是按照接收者声明的优先级别,被

    2022年3月10日
    32
  • ShFileOperation函数详解

    ShFileOperation函数详解[WinAPI]ShFileOperation函数详解2010-04-1110:24ShFileOperation只有一个参数是LPSHFILEOPSTRUCT型的相当于delphi中的TSHFileOpStruct;  c语言定义为:  typedef struct _SHFILEOPSTRUCT{   HWND         hwnd

    2022年7月18日
    16
  • Latex中的空格

    两个quad空格a\qquadb两个m的宽度quad空格a\quadb一个m的宽度大空格a\b1/3m宽度中等空格a\;b2/7m宽度小空格a\,b1/6m宽度没有空格ab

    2022年4月3日
    65
  • C++学习——CString,char * ,string的相互转换

    C++学习——CString,char * ,string的相互转换CString头文件#include <afx.h>string头文件#include <string.h>1、CString转char *CString cstr;char *p =(LPSTR)(LPCTSTR)cstr;2、string转 CStringCString.format(”%s”, string.c_str());用c_str()确实比dat…

    2022年8月18日
    3

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号