Steps For Installing & Configuring Hadoop in Standalone Mode
$ java -version
You might want to create a dedicated user for running Apache Hadoop but it is not a prerequisite. In our demonstration, we will be using a default user for running Hadoop.
Environment
Ubuntu 10.10
JDK 6 or above
Hadoop-1.1.2 (Any stable release)
Follow these steps for installing and configuring Hadoop on a single node:
Step-1. Install Java
In this tutorial, we will use Java 1.6 therefore describing the installation of Java 1.6 in detail.
Use the below command to begin the installation of Java
$ sudo apt-get install openjdk-
6
-jdk
or
$ sudo apt-get install sun-java6-jdk
This will install the full JDK under /usr/lib/jvm/java-6-sundirectory.
Step-2. Verify Java installation
You can verify java installation using the following command
On executing this command, you should see output similar to the following:
java version “1.6.0_27″
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
Step-3. SSH configuration
- Install SSH using the command.sudo apt-get install ssh
- Generate ssh key
ssh -keygen -t rsa -P “” (press enter when asked for a file name; this will generate a passwordless ssh file)
- Now copy the public key (id_rsa.pub) of current machine to authorized_keysBelow command copies the generated public key in the .ssh/authorized_keys file.
- Verify ssh configuration using the comman
Pressing yes will add localhost to known hosts
Step-4. Download Hadoop
Download the latest stable release of Apache Hadoop from http://hadoop.apache.org/releases.html.
Unpack the release tar – zxvf hadoop-1.0.3.tar.gz
Save the extracted folder to an appropriate location, HADOOP_HOME will be pointing to this directory.
Step-5. Verify Hadoop
Check if the following directories exist under HADOOP_HOME: bin, conf, lib, bin
Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME)
export HADOOP_HOME=/home/user/hadoop
Now place the Hadoop binary directory on your command-line path by executing the command
export PATH=$PATH:$HADOOP_HOME/bin
Use this command to verify your Hadoop installation:
hadoop version
The o/p should be similar to below one
Hadoop 1.1.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
Step-6. Configure JAVA_HOME
Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir.
Java_Home can be configured in ~/.bash_profile or ~/.bashrc file. Alternatively you can also let hadoop know this by setting Java_Home in hadoop conf/hadoop-env.sh file.
Use the below command to set JAVA_HOME on Ubuntu
export JAVA_HOME=/usr/lib/jvm/java-
6
-sun
JAVA_HOME can be verified by command
echo $JAVA_HOME
Step-7. Create Data Directory for Hadoop
An advantage of using Hadoop is that with just a limited number of directories you can set it up to work correctly. Let us create a directory with the name hdfs and three sub-directories name, data and tmp.
Since a Hadoop user would require to read-write to these directories you would need to change the permissions of above directories to 755 or 777 for Hadoop user.
Step-8. Configure Hadoop XML files
Next, we will configure Hadoop XML file. Hadoop configuration files are in the HADOOP_HOME/conf dir.
conf/core-site.xml
1
2
3
4
5
6
7
8
9
10
| <!--?xml version= "1.0" -->> <!--?xml -stylesheet type= "text/xsl" href= "configuration.xsl" ?--> <! -- Putting site-specific property overrides the file. --> fs. default .name hdfs: //localhost:9000 hadoop.temp.dir /home/girish/hdfs/temp<span style= "font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;" > </span> |
conf/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
| <! -- Putting site specific property overrides in the file. --> dfs.name.dir /home/girish/hdfs/name dfs.data.dir /home/girish/hdfs/data dfs.replication 1 |
1
| <strong style= "font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 13px; line-height: 19px;" >conf/mapred-site.xml</strong> |
1
2
3
4
| <! -- Putting site-specific property overrides this file. --> mapred.job.tracker localhost: 9001 |
conf/masters
Not required in single node cluster.
conf/slaves
Not required in single node cluster.
Step-9. Format Hadoop Name Node-
conf/slaves
Not required in single node cluster.
Step-9. Format Hadoop Name Node-
Execute the below command from hadoop home directory
1
| $ ~/hadoop/bin/hadoop namenode -format |
The following image gives an overview of a Hadoop Distributed File System Architecture.
Step-10. Start Hadoop daemons
1
| $ ~/hadoop/bin/start-all.sh |
Step-11. Verify the daemons are running
1
| $ jps ( if jps is not in path, try /usr/java/latest/bin/jps) |
output will look similar to this
9316 SecondaryNameNode
9203 DataNode
9521 TaskTracker
9403 JobTracker
9089 NameNode
Now we have all the daemons running:
Note: If your master server fails to start due to the dfs safe mode issue, execute this on the Hadoop command line:
1
| hadoop dfsadmin -safemode leave |
Also make sure to format the namenode again if you make changes to your configuration.
Step-12. Verify UIs by namenode & job tracker
Open a browser window and type the following URLs:
namenode UI: http://machine_host_name:50070
job tracker UI: http://machine_host_name:50030
substitute ‘machine host name’ with the public IP of your node e.g: http://localhost:50070
Now you have successfully installed and configured Hadoop on a single node.
BASIC HADOOP ADMIN COMMANDS
(Source: Getting Started with Hadoop):
The ~/hadoop/bin directory contains some scripts used to launch Hadoop DFS and Hadoop Map/Reduce daemons. These are:
- start-all.sh – Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers.
- stop-all.sh – Stops all Hadoop daemons.
- start-mapred.sh – Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
- stop-mapred.sh – Stops the Hadoop Map/Reduce daemons.
- start-dfs.sh – Starts the Hadoop DFS daemons, the namenode and datanodes.
- stop-dfs.sh – Stops the Hadoop DFS daemons.
No comments:
Post a Comment
Write a comment . .