- 浏览: 721852 次
- 性别:
- 来自: 重庆
文章分类
- 全部博客 (113)
- 移动支付 (1)
- 系统安全 (1)
- JAVA-Team研发环境 (2)
- 配置管理 (3)
- 开发 (3)
- 页面脚本开发 (5)
- 系统脚本开发 (1)
- 数据库 (6)
- 分布式 (8)
- JAVA基础及算法 (7)
- 开发集成及部署 (4)
- activeMQ (0)
- Thrift (1)
- memcached (11)
- linux (16)
- LVS (4)
- 日志 (4)
- hadoop (7)
- tcpdump (1)
- wireshark (1)
- test (0)
- Android BaseActivity (0)
- 云存储 (1)
- HTTP (3)
- S3 (1)
- xml (2)
- 序列化 (1)
- 部署 (0)
- 系统架构 (9)
- 存储 (0)
- 设计模式 (1)
- Spring (2)
- J2EE (4)
- maven (3)
- MYSQL (2)
- LDAP (2)
- JQuery (1)
- easyui (1)
- web前端 (1)
- tomcat (1)
- nosql (2)
- 开发技巧 (1)
- JPA (1)
- hibernate (1)
- Quartz (1)
最新评论
-
xinglianxlxl:
对我有用,非常感谢
xstream初步使用 -
liangbo11:
eclipse都无法启动
JDK扩展DCEVM让WEB程序完全不重启调试 -
Love_wh1314:
果然是这个问题。。。维护别人的代码,开始还以为自己改错了,结 ...
JQuery 实践问题 - toLowerCase 错误 -
tonyyan:
谢谢分享!
MAVEN Scope使用 -
908311595:
多谢楼主分享
xstream初步使用
基础环境
3台linux环境机器,本文采用3个VMWare做的虚拟机安装linux AS 5,本文采用vmware的NAT方式规划IP
分别为:
机器名 | IP | 说明 |
Hadoop00 | 192.168.91.10 | Master, nameNode, SecondaryNamenode, jobTracker |
Hadoop01 | 192.168.91.11 | Slave,dataNode, tasktracker |
Hadoop02 | 192.168.91.12 | Slave,dataNode, tasktracker |
在三台机器中配置好IP和HOST
/etc/hosts中添加
192.168.91.10 hadoop00
192.168.91.11 hadoop01
192.168.91.12 hadoop02
用户准备
创建hadoop运行的专用用户和组,这里我使用hadoop作为用户名和组名。在三台机器分别建立用户和组。
groupadd hadoop
useradd –g hadoop –G hadoop hadoop
配置密钥方式免密码登录
因为hadoop需要nameNode能无密码方式登录和访问各个dataNode,所以要配置操作系统hadoop运行用户的密钥方式无密码登录。
只需要在nameNode(hadoop00, 192.168.91.10)配置免密钥登录其它dataNade。在nameNode中生成公私钥对,然后把公钥发送到各个dataNode。
在nameNode上操作:
[hadoop@hadoop00 ~]$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
直接回车,完成后会在~/.ssh/生成两个文件:id_dsa 和id_dsa.pub。这两个是成对出现,类似钥匙和锁。再把id_dsa.pub 追加到授权key 里面(当前并没有authorized_keys文件):
[hadoop@hadoop00 ~]$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
注意:需要修改.ssh和authorized_keys的访问权限,否则可能无法正常登录
[hadoop@hadoop00 ~]$ chmod 700 ~/.ssh
[hadoop@hadoop00 ~]$ chmod 600 ~/.ssh/ authorized_keys
测试本机无密码登录
[hadoop@hadoop00 ~]$ ssh localhost
拷贝公钥id_dsa.pub到各dataNode
[hadoop@hadoop00 ~]$ scp ~/.ssh/id_dsa.pub hadoop@hadoop01:/home/hadoop/
[hadoop@hadoop00 ~]$ scp ~/.ssh/id_dsa.pub hadoop@hadoop02:/home/hadoop/
分别登录各个dataNode,追加公钥id_dsa.pub到dataNode的authorized_keys中
[hadoop@hadoop01 ~] mkdir .ssh
[hadoop@hadoop01 ~] chmod 700 .ssh
[hadoop@hadoop01 ~] cat id_dsa.pub >> .ssh/authorized_keys
[hadoop@hadoop01 ~] chmod 600 .ssh/authorized_keys
测试nameNode无密码访问dataNode
[hadoop@hadoop00 ~] ssh hadoop01
Last login: Thu Sep 22 07:57:07 2011 from hadoop00
安装配置环境变量
下载安装hadoop-0.21.0
http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.21.0/hadoop-0.21.0.tar.gz
下载JDK版本:jdk-6u24-linux-i586.bin
Hadoop下载后直接就要到hadoop的用户主目录
[hadoop@hadoop00 ~] cd /home/hadoop
[hadoop@hadoop00 ~] tar –xzvf hadoop-0.21.0.tar.gz
待配置完成后,直接拷贝到各个dataNode
JDK的安装配置,安装就免了,配置环境变量如下(master和各slave配置相同)
vi ~/.bash_profile
在文件结尾加入:
# java env
export JAVA_HOME=/usr/java/jdk1.6.0_24
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
# hadoop env
export HADOOP_HOME=/home/hadoop/hadoop-0.21.0
export PATH=$HADOOP_HOME/bin:$PATH
配置hadoop
配置nameNode的hadoop
1.配置hadoop环境shell文件:hadoop-0.21.0/conf/hadoop-env.sh
# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.6.0_24
2.配置:hadoop-0.21.0/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoopdata</value> </property> <property> <name>fs.default.name</name> <value>hdfs://hadoop00:9000</value> </property> <property> <name>dfs.hosts.exclude</name> <value>excludes</value> </property> </configuration>
3.配置:hadoop-0.21.0/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/hadoopname</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hadoopdata</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
4.配置:hadoop-0.21.0/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoop00:9001</value> </property> </configuration>
拷贝nameNode配置好的hadoop到各个dataNode相同的目录
[hadoop@hadoop00 ~] zip -r hadoop-0.21.0.zip hadoop-0.21.0
[hadoop@hadoop00 ~] scp hadoop-0.21.0.zip hadoop@hadoop01:/home/hadoop
[hadoop@hadoop00 ~] scp hadoop-0.21.0.zip hadoop@hadoop02:/home/hadoop
分别登录两台dataNode,直接解压hadoop-0.21.0.zip
[hadoop@hadoop01 ~] unzip hadoop-0.21.0.zip
[hadoop@hadoop02 ~] unzip hadoop-0.21.0.zip
启动和停止hadoop
Hadoop直接在nameNode上运行命令启动,nameNode会负责自动连接,启动和停止所有的dataNode.
1.启动
[hadoop@hadoop00 ~]$ ~/hadoop-0.21.0/bin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
starting namenode, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-namenode-hadoop00.out
192.168.91.11: starting datanode, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-datanode-hadoop01.out
192.168.91.12: starting datanode, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-datanode-hadoop02.out
192.168.91.10: starting secondarynamenode, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop00.out
starting jobtracker, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-jobtracker-hadoop00.out
192.168.91.12: starting tasktracker, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-tasktracker-hadoop02.out
192.168.91.11: starting tasktracker, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-tasktracker-hadoop01.out
2.停止
[hadoop@hadoop00 ~]$ ~/hadoop-0.21.0/bin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-mapred.sh
stopping namenode
192.168.91.12: stopping datanode
192.168.91.11: stopping datanode
192.168.91.10: stopping secondarynamenode
stopping jobtracker
192.168.91.11: stopping tasktracker
192.168.91.12: stopping tasktracker
初始配置HDFS
1、 格式化HDFS文件系统
[hadoop@hadoop00 ~]$ hadoop namenode -format
2、 查看HDFS
[hadoop@hadoop00 ~]$ hadoop fs -ls /
11/09/24 07:49:55 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/09/24 07:49:56 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2011-09-22 08:05 /home
drwxr-xr-x - hadoop supergroup 0 2011-09-22 11:29 /jobtracker
drwxr-xr-x - hadoop supergroup 0 2011-09-22 11:23 /user
3、 通过WEB查看hadoop
查看集群状态 http://192.168.91.10:50070/dfshealth.jsp
查看JOB状态 http://192.168.91.10:50030/jobtracker.jsp
运行hadoop的example-wordcount
Wordcount程序是一个简单的计算输入文件中每个单词出现的次数,并输出到指定的目录下。该程序是官方的例子,在hadoop-0.21.0安装目录下的:hadoop-mapred-examples-0.21.0.jar
在hdfs上建立程序的输入目录和文件,同时建立程序的输出目录.
[hadoop@hadoop00 ~]$ mkdir input
[hadoop@hadoop00 ~]$ cat a a a a a b b b c c c c c c c c c 1 1 1 > input/file
[hadoop@hadoop00 ~]$ hadoop fs –mkdir /wordcount
[hadoop@hadoop00 ~]$ hadoop fs –put input /wordcount
[hadoop@hadoop00 ~]$ hadoop jar hadoop-0.21.0/hadoop-mapred-examples-0.21.0.jar wordcount /wordcount/input /wordcount/output
11/09/24 08:11:25 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/09/24 08:11:26 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
11/09/24 08:11:26 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/09/24 08:11:26 INFO input.FileInputFormat: Total input paths to process : 2
11/09/24 08:11:26 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
11/09/24 08:11:26 INFO mapreduce.JobSubmitter: number of splits:2
11/09/24 08:11:27 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
11/09/24 08:11:27 INFO mapreduce.Job: Running job: job_201109240745_0002
11/09/24 08:11:28 INFO mapreduce.Job: map 0% reduce 0%
11/09/24 08:11:44 INFO mapreduce.Job: map 50% reduce 0%
11/09/24 08:11:50 INFO mapreduce.Job: map 100% reduce 0%
11/09/24 08:11:57 INFO mapreduce.Job: map 100% reduce 100%
11/09/24 08:11:59 INFO mapreduce.Job: Job complete: job_201109240745_0002
11/09/24 08:11:59 INFO mapreduce.Job: Counters: 34
……
[hadoop@hadoop00 ~]$ hadoop fs -cat /wordcount/output/part-r-00000
11/09/24 08:18:09 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/09/24 08:18:09 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
1 3
a 5
b 3
c 9
发表评论
-
Hadoop-HDFS源码学习草记
2012-03-03 21:34 14642012-3-3: HDFS protocol ... -
Hadoop 常见问题
2012-02-18 13:49 2840开发客户端调用无权限问题 异常信息: org. ... -
Hadoop-单机安装配置
2012-02-18 11:54 1531单机安装主要用于程序逻辑调试。安装步骤基本通分布式安装, ... -
Hadoop初学-HDFS基础
2011-10-31 22:12 9546HDFS是hadoop的分布式文件系统,全称:Hado ... -
bash-文件数据分析案例hadoop
2011-10-22 17:57 1351在学习hadoop的过程中,看到一个bash脚本使用awk分析 ... -
Hadoop初学-mapreduce
2011-10-22 15:22 1890看完hadoop权威指南第一,二章,初步了解了hadoop的结 ...
相关推荐
Hadoop-0.21.0分布式集群配置.doc
本文的目的是为当前最新版本的Hadoop 2.8.0提供最为详细的安装说明,以帮助减少安装过程中遇到的困难,并对一些错误原因进行说明,hdfs配置使用基于QJM(Quorum Journal Manager)的HA。本文的安装只涉及了hadoop-...
CentOS-6.4 64位系统下hadoop-2.2.0+hbase-0.96+zookeeper-3.4.5 分布式安装配置
hadoop2.7分布式完全安装配置手册,含hbase1.2安装配置,图文并茂,一看就会。
的一套接口实现自己的分布式文件系统,然后经过简单的配置后,存储在该文件 系统上的数据便可以被 MapReduce 处理。 官网下载速度非常缓慢,因此将hadoop-3.3.4 版本放在这里,欢迎大家来下载使用!
hadoop-2.7.3+zookeeper-3.4.8+hadoop-2.7.3分布式环境搭建整理(王三旗亲试成功安装)
hadoop完全分布式安装整合hive,亲自在实际服务器安装测试成功
Hadoop2.7.6完全分布式安装配置,亲测可用!配置文档如:https://blog.csdn.net/likunwen_001/article/details/80434579
Linux上的Hadoop伪分布式安装及其相关配置步骤,以及Hadoop3种运行模式的相关知识。该文档摘自:http://public.bigdataedu.org/ ,仅供学习参考!
4. 理解为何需要配置 SSH 免密登录,掌握 Linux 环境下 SSH 的安装、免密登录的配置。 5. 熟练掌握在 Linux 环境下如何部署全分布模式 Hadoop 集群。 二、实验环境 本实验所需的软硬件环境包括 PC、VMware ...
Hadoop安装教程_单机_伪分布式配置
Hadoop完全分布式环境搭建文档,绝对原创,并且本人亲自验证并使用,图文并茂详细介绍了hadoop完全分布式环境搭建所有步骤,条例格式清楚,不能成功的,请给我留言!将给与在线支持!
Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程 按照文档中的操作步骤,一步步操作就可以完全实现hadoop2.2.0版本的完全分布式集群搭建过程
将文件中的hadoop配置文件、自动安装脚本,自己下载的jdk、hadoop文件放于同一目录下,(注jdk、hadoop文件名中要包含关键字jdk、hadoop),然后运行脚本。详细请看说明。
关于分布式安装,请浏览:http://hbase.apache.org/book/standalone_dist.html#distributed,关于HBase使用外置的ZooKeeper配置,请浏览:http://hbase.apache.org/book/zookeeper.html。所有在线的文档,均会出现在...
Linux Hadoop 伪分布式配置 一个节点,线程模仿分布式
文件名: hadoop-3.3.6.tar.gz 这是 Hadoop 3.3.6 版本的安装包(即二进制版,不是源码),文件格式为 tar.gz,解压后直接使用. Hadoop 是一个由 Apache 基金会所开发的分布式系统...下载后请按照官方文档进行安装和配置。