Installation
- Here, we are going to set up a simple Hadoop Cluster.
- We are going to use three Amazon Elastic Compute Cloud (EC2) to set up this Cluster.
- Each of the three EC2 is:
- Red Hat Enterprise Linux version 7.2 (HVM), EBS General Purpose (SSD) Volume Type
- 1 cpu,1g memory
- Overview
Hostname | Note |
---|---|
hadoop-01 | NameNote |
hadoop-02 | DataNote |
hadoop-03 | DataNote |
1 Launch three EC2 Instances
Name them hadoop-01, hadoop-02, hadoop-03.
Note that there is a Security Group
in EC2, by default, It may not allow some inbound or outbound traffic. It's necessary to set your instance to a Security Group
that allows all traffic, or change your EC2's current Security Group
's behavior.
2 Create Hadoop user
In each EC2 Instance :
2.1 Add user
sudo useradd -u 900 hadoop
sudo passwd hadoop
2.2 Add hadoop to sudoers
make /etc/sudoers
writable.
sudo chmod +w /etc/sudoers
edit /etc/sudoers
(sudo vi /etc/sudoers), add a line like this:
hadoop ALL=(ALL) NOPASSWD: ALL
3 Config ssh
3.1 Installation
ssh is installed in most linux distributions. Check whether ssh is installed:
rpm -qa|grep ssh
Got:
libssh2-1.4.3-10.el7.x86_64
openssh-server-6.6.1p1-22.el7.x86_64
openssh-clients-6.6.1p1-22.el7.x86_64
openssh-6.6.1p1-22.el7.x86_64
So we don't need to install ssh.
3.2 Configuration
3.2.1 Config hosts
In the Amazon EC2 Dashboard, find out each instance Private IP, add them to hosts(All the three EC2s,sudo vi /etc/hosts
).
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.24.205 hadoop-01
172.31.28.133 hadoop-02
172.31.17.172 hadoop-03
3.2.2 Config ssh key
In hadoop-01, login as hadoop
, and generate public/private rsa key pair
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Add the public key to authorized keys
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Check whether it is OK
ssh localhost
Got
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Not OK.I fix this by
sudo chmod -R 700 /home/hadoop
Then copy the /home/hadoop/.ssh
folder to hadoop-02, hadoop-03.
In the hadoop-02, hadoop-03
sudo chmod -R 700 /home/hadoop
Check whether it is OK
ssh hadoop-02
ssh hadoop-03
OK!
4 Install JDK
In hadoop-01: Check whether JDK is installed.
java -version
I got -bash: java: command not found
. So we need to install JDK. In this case, we use java-1.8.0-openjdk
.
sudo yum install java-1.8.0-openjdk
Install JDK to hadoop-02, hadoop-03 in hadoop-01.(Of course, you can install jdk in hadoop-02, hadoop-03)
ssh -t hadoop-02 'sudo yum install java-1.8.0-openjdk'
ssh -t hadoop-03 'sudo yum install java-1.8.0-openjdk'
5 Install Hadoop
In hadoop-01
5.1 Download Hadoop
curl --remote-name https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar -xf hadoop-2.6.0.tar.gz
cd hadoop-2.6.0
5.2 Config JAVA_HOME
Find out the jdk dir
sudo find / -name *openjdk*
Got:
/usr/lib/jvm/jre-1.8.0-openjdk
Edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
5.3 Config Hadoop
Edit etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-01/</value>
</property>
</configuration>
Edit etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
Edit etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Edit etc/hadoop/slaves
hadoop-02
hadoop-03
5.4 Copy hadoop to hadoop-02, hadoop-03
cd ..
scp -r hadoop-2.6.0 hadoop-02:/home/hadoop/
scp -r hadoop-2.6.0 hadoop-03:/home/hadoop/
6 Start Hadoop Cluster
6.1 Format HDFS filesystem
cd hadoop-2.6.0
./bin/hdfs namenode -format
6.2 Start HDFS
./sbin/start-dfs.sh
6.3 Start Yarn
./sbin/start-yarn.sh