Installation

  1. Here, we are going to set up a simple Hadoop Cluster.
  2. We are going to use three Amazon Elastic Compute Cloud (EC2) to set up this Cluster.
  3. Each of the three EC2 is:
    • Red Hat Enterprise Linux version 7.2 (HVM), EBS General Purpose (SSD) Volume Type
    • 1 cpu,1g memory
  4. Overview
Hostname Note
hadoop-01 NameNote
hadoop-02 DataNote
hadoop-03 DataNote

1 Launch three EC2 Instances

Name them hadoop-01, hadoop-02, hadoop-03.
Note that there is a Security Group in EC2, by default, It may not allow some inbound or outbound traffic. It's necessary to set your instance to a Security Group that allows all traffic, or change your EC2's current Security Group's behavior.

2 Create Hadoop user

In each EC2 Instance :

2.1 Add user

sudo useradd -u 900 hadoop
sudo passwd hadoop

2.2 Add hadoop to sudoers

make /etc/sudoers writable.

sudo chmod +w /etc/sudoers

edit /etc/sudoers(sudo vi /etc/sudoers), add a line like this:

hadoop  ALL=(ALL)       NOPASSWD: ALL

3 Config ssh

3.1 Installation

ssh is installed in most linux distributions. Check whether ssh is installed:

rpm -qa|grep ssh

Got:

libssh2-1.4.3-10.el7.x86_64
openssh-server-6.6.1p1-22.el7.x86_64
openssh-clients-6.6.1p1-22.el7.x86_64
openssh-6.6.1p1-22.el7.x86_64

So we don't need to install ssh.

3.2 Configuration

3.2.1 Config hosts

In the Amazon EC2 Dashboard, find out each instance Private IP, add them to hosts(All the three EC2s,sudo vi /etc/hosts).

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.24.205   hadoop-01
172.31.28.133   hadoop-02
172.31.17.172   hadoop-03
3.2.2 Config ssh key

In hadoop-01, login as hadoop, and generate public/private rsa key pair

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Add the public key to authorized keys

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Check whether it is OK

ssh localhost

Got

Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Not OK.I fix this by

sudo chmod -R 700 /home/hadoop

Then copy the /home/hadoop/.ssh folder to hadoop-02, hadoop-03.
In the hadoop-02, hadoop-03

sudo chmod -R 700 /home/hadoop

Check whether it is OK

ssh hadoop-02
ssh hadoop-03

OK!

4 Install JDK

In hadoop-01: Check whether JDK is installed.

java -version

I got -bash: java: command not found. So we need to install JDK. In this case, we use java-1.8.0-openjdk.

sudo yum install java-1.8.0-openjdk

Install JDK to hadoop-02, hadoop-03 in hadoop-01.(Of course, you can install jdk in hadoop-02, hadoop-03)

ssh -t hadoop-02 'sudo yum install java-1.8.0-openjdk'
ssh -t hadoop-03 'sudo yum install java-1.8.0-openjdk'

5 Install Hadoop

In hadoop-01

5.1 Download Hadoop

curl --remote-name https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar -xf hadoop-2.6.0.tar.gz
cd hadoop-2.6.0

5.2 Config JAVA_HOME

Find out the jdk dir

sudo find / -name *openjdk*

Got:

/usr/lib/jvm/jre-1.8.0-openjdk

Edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk

5.3 Config Hadoop

Edit etc/hadoop/core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop-01/</value>
  </property>
</configuration>

Edit etc/hadoop/hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
</configuration>

Edit etc/hadoop/yarn-site.xml

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop-01</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

Edit etc/hadoop/slaves

hadoop-02
hadoop-03

5.4 Copy hadoop to hadoop-02, hadoop-03

cd ..
scp -r hadoop-2.6.0 hadoop-02:/home/hadoop/
scp -r hadoop-2.6.0 hadoop-03:/home/hadoop/

6 Start Hadoop Cluster

6.1 Format HDFS filesystem

cd hadoop-2.6.0
./bin/hdfs namenode -format

6.2 Start HDFS

./sbin/start-dfs.sh

6.3 Start Yarn

./sbin/start-yarn.sh

results matching ""

    No results matching ""