{"id":292,"date":"2019-07-27T06:53:19","date_gmt":"2019-07-27T01:23:19","guid":{"rendered":"https:\/\/techieshouts.com\/?p=292"},"modified":"2022-08-09T19:07:29","modified_gmt":"2022-08-09T13:37:29","slug":"hadoop-installation-steps","status":"publish","type":"post","link":"https:\/\/techieshouts.com\/home\/hadoop-installation-steps\/","title":{"rendered":"HADOOP installation steps"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<div class=\"wp-block-media-text alignwide has-media-on-the-right\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" width=\"200\" height=\"140\" src=\"https:\/\/techieshouts.com\/wp-content\/uploads\/2019\/07\/HadoopInstallation-1.jpg\" alt=\"\" class=\"wp-image-318\"\/><\/figure><div class=\"wp-block-media-text__content\">\n<p style=\"text-align:left\">In realtime, <a href=\"https:\/\/techieshouts.com\/what-is-hadoop\/\">Hadoop<\/a> will be installed into a network of machines to form a cluster. Here, in this article, we will see the installation of Hadoop step by step in a single Ubuntu system. The post is written with an assumption that you already know what is a Name node, Data node, <a href=\"https:\/\/techieshouts.com\/what-is-hdfs\/\">HDFS<\/a>, etc and the basic <a href=\"https:\/\/techieshouts.com\/what-is-hadoop\/\">Hadoop<\/a> components. If not, please visit the previous articles to know about these basic concepts.<\/p>\n<\/div><\/div>\n\n\n\n<div class=\"wp-block-ub-table-of-contents ub_table-of-contents\" data-showtext=\"show\" data-hidetext=\"hide\"><div class=\"ub_table-of-contents-header\"><div class=\"ub_table-of-contents-title\">Table of contents<\/div><\/div><div style=\"display:block\" class=\"ub_table-of-contents-container ub_table-of-contents-2-column\"><ul><li><a href=\"#0--setting-up--core-sitexml-\"> Setting up core-site.xml<\/a><\/li><li><a href=\"#1-setting-up--hdfs-sitexml-\">Setting up hdfs-site.xml<\/a><\/li><li><a href=\"#2-setting-up--yarn-sitexml-\">Setting up yarn-site.xml<\/a><\/li><li><a href=\"#3-setting-up--mapred-sitexml-\">Setting up&nbsp;mapred-site.xml<\/a><\/li><li><a href=\"#4-updating-slaves-file\">Updating slaves file<\/a><\/li><\/ul><\/div><\/div>\n\n\n\n<p><\/p>\n\n\n\n<ul><li><a href=\"http:\/\/hadoop.apache.org\/releases.html\" target=\"_blank\" rel=\"noopener\"><strong>Download<\/strong><\/a> the Hadoop version that you want to install to your local machine(recommended to download the stable version).<\/li><li><strong><a href=\"http:\/\/www.oracle.com\/technetwork\/java\/javase\/archive-139210.html\" target=\"_blank\" rel=\"noopener\">Download<\/a><\/strong> Java 1.7(preferred for latest Hadoop versions) to your local machine. Please consider the 32bit\/64bit architecture before downloading.<\/li><li>Assuming that the downloaded tarballs are present under the home directory\u00a0of the logged-in user. Let&#8217;s extract the tarballs. <ol><li>~$tar -xvf hadoop-2.7.1.tar.gz<\/li><li>~$tar -xvf jdk-7u79-linux-x86_64.gz <\/li><\/ol><\/li><li>After the successful unzipping, you can see these two folders under the home directory.\u00a0<\/li><li>Now we have to set the paths for these items in the .bashrc file under the home directory. In case if you are not able to see the file press Ctrl+H(to make the hidden files visible). Open the file and add the below lines. <\/li><li>export JAVA_HOME=\/home\/user\/jdk1.7.0_79<br> export HADOOP_PREFIX=\/home\/user\/hadoop-2.7.1<br> export HADOOP_HOME=${HADOOP_PREFIX}<br> export HADOOP_CONF_DIR=${HADOOP_PREFIX}\/etc\/hadoop<br> export PATH=$JAVA_HOME\/bin:$HADOOP_HOME\/bin:$HADOOP_HOME\/sbin:$PATH <\/li><li>After appending these lines you can just save and close the file. For the values to get updated and set for the current shell we need to source the .bashrc file by\u00a0~$source ~\/.bashrc command.<\/li><li>To confirm if Java and Hadoop are installed properly just check these two items.\u00a0~$echo $JAVA_HOME,\u00a0~$hadoop version.<\/li><li>Now that we have Hadoop in our machine we need to set the Hadoop configurations to make it work. Navigate to \u201c\/home\/user\/hadoop-2.7.1\/etc\/hadoop\u201d to see the configuration files of Hadoop. <ul><li>Setting up <strong>core-site.xml<\/strong>. <\/li><li>Setting up <strong>hdfs-site.xml<\/strong><\/li><li>Setting up <strong>yarn-site.xml<\/strong><\/li><li>Setting up\u00a0<strong>mapred-site.xml<\/strong><\/li><\/ul><\/li><\/ul>\n\n\n\n<h2 id=\"0--setting-up--core-sitexml-\"> Setting up <strong>core-site.xml<\/strong><\/h2>\n\n\n\n<p>Open the core-site.xml in the path  \u201c\/home\/user\/hadoop-2.7.1\/etc\/hadoop\u201d using a text editor. Here you need to set the default file system and temp directory.<\/p>\n\n\n\n<pre data-mode=\"xml\" data-theme=\"chrome\" data-fontsize=\"14\" data-lines=\"Infinity\" class=\"wp-block-simple-code-block-ace\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?>\n&lt;configuration>\n   &lt;property>\n      &lt;name>fs.defaultFS&lt;\/name>\n      &lt;value>hdfs:\/\/localhost:8020&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>hadoop.tmp.dir&lt;\/name>\n      &lt;value>\/home\/user\/tmp&lt;\/value>\n   &lt;\/property>\n&lt;\/configuration><\/pre>\n\n\n\n<h2 id=\"1-setting-up--hdfs-sitexml-\">Setting up <strong>hdfs-site.xml<\/strong><\/h2>\n\n\n\n<p>Here you need to set the path for name node and data node.<\/p>\n\n\n\n<pre data-mode=\"xml\" data-theme=\"chrome\" data-fontsize=\"14\" data-lines=\"Infinity\" class=\"wp-block-simple-code-block-ace\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?>\n&lt;configuration>\n   &lt;property>\n      &lt;name>dfs.namenode.name.dir&lt;\/name>\n      &lt;value>\/home\/user\/name&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>dfs.datanode.data.dir&lt;\/name>\n      &lt;value>\/home\/user\/data&lt;\/value>\n   &lt;\/property>\n&lt;\/configuration><\/pre>\n\n\n\n<h2 id=\"2-setting-up--yarn-sitexml-\">Setting up <strong>yarn-site.xml<\/strong><\/h2>\n\n\n\n<p>Here you will set the properties related to node manager and resource manager(MR2)<\/p>\n\n\n\n<pre data-mode=\"xml\" data-theme=\"chrome\" data-fontsize=\"14\" data-lines=\"Infinity\" class=\"wp-block-simple-code-block-ace\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?>\n&lt;configuration>\n   &lt;property>\n      &lt;name>yarn.resourcemanager.hostname&lt;\/name>\n      &lt;value>localhost&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>yarn.nodemanager.aux-services&lt;\/name>\n      &lt;value>mapreduce_shuffle&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>yarn.log-aggregation-enable&lt;\/name>\n      &lt;value>true&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>yarn.nodemanager.remote-app-log-dir&lt;\/name>\n      &lt;value>hdfs:\/\/localhost:8020\/log\/&lt;\/value>\n   &lt;\/property>\n&lt;\/configuration><\/pre>\n\n\n\n<h2 id=\"3-setting-up--mapred-sitexml-\">Setting up&nbsp;<strong>mapred-site.xml<\/strong><\/h2>\n\n\n\n<p>You will see&nbsp;mapred-site.xml.template in the same folder. Just create a copy of that and rename it to mapred-site.xml. Here you can set the properties about MR framework and JobHistory web page<\/p>\n\n\n\n<pre data-mode=\"xml\" data-theme=\"chrome\" data-fontsize=\"14\" data-lines=\"Infinity\" class=\"wp-block-simple-code-block-ace\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?>\n&lt;configuration>\n   &lt;property>\n      &lt;name>mapreduce.framework.name&lt;\/name>\n      &lt;value>yarn&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>mapreduce.jobhistory.address&lt;\/name>\n      &lt;value>localhost:10020&lt;\/value>\n   &lt;\/property>\n   &lt;property>\n      &lt;name>mapreduce.jobhistory.webapp.address&lt;\/name>\n      &lt;value>localhost:19888&lt;\/value>\n   &lt;\/property>\n&lt;\/configuration><\/pre>\n\n\n\n<h2 id=\"4-updating-slaves-file\">Updating slaves file<\/h2>\n\n\n\n<p>After setting up the above configurations, you need to add localhost to the slaves file. This is because we are setting up the pseudo-cluster which means, both master and slave will be localhost.<\/p>\n\n\n\n<p>Navigate to ~hadoop-2.7.1\/etc\/hadoop path and open the &#8220;slaves&#8221; file in editor. Just add &#8220;localhost&#8221; to the file and save it.<\/p>\n\n\n\n<ul><li>After setting all these configurations Hadoop is set to start in Pseudo distributed mode.<\/li><li>Now you have to format the name node. Please remember this is a one-time activity and should be done only during the setup. You can use the below command to perform this.<\/li><\/ul>\n\n\n\n<pre data-mode=\"scss\" data-theme=\"twilight\" data-fontsize=\"14\" data-lines=\"Infinity\" class=\"wp-block-simple-code-block-ace\">cd $HADOOP_CONF_DIR\nhdfs namenode -format<\/pre>\n\n\n\n<ul><li>Once the setup is done, we need to enable passwordless authentication. To accomplish that, we need to execute the below set of commands in the same order as given. This procedure avoids prompting for a password when starting the daemons<\/li><\/ul>\n\n\n\n<pre data-mode=\"scss\" data-theme=\"twilight\" data-fontsize=\"14\" data-lines=\"Infinity\" class=\"wp-block-simple-code-block-ace\">~$ sudo apt-get install openssh-server\n~$ sudo service ssh start\n~$ ssh-keygen\n~$ cd .ssh\n~$ cat id_rsa.pub >> authorized_keys\n~$ chmod 600 authorized_keys<\/pre>\n\n\n\n<p>All set for launching Hadoop in your machine. Now you can start and stop Hadoop daemons using the following commands.<\/p>\n\n\n\n<p><strong>start-all.sh \u2013&nbsp;<\/strong>Five daemons of Hadoop: Namenodes, data nodes, secondary name node, resource manager, node manager will be started.<\/p>\n\n\n\n<p><strong>start-dfs.sh<\/strong> \u2013 The first 3 daemons will be started from the above list.<\/p>\n\n\n\n<p><strong>start-yarn.sh \u2013 <\/strong>The&nbsp;last 2 daemons(Yarn daemons) will be started.<\/p>\n\n\n\n<p>You can also start the specific daemon by running<\/p>\n\n\n\n<p><strong>hadoop-daemons.sh start namenode<\/strong><\/p>\n\n\n\n<p>To stop the daemons use the below commands<\/p>\n\n\n\n<p><strong>stop-all.sh<\/strong> \u2013 Will stop all daemons<\/p>\n\n\n\n<p><strong>stop-dfs.sh<\/strong> \u2013 &nbsp;Will stop the dfs daemons.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In realtime, Hadoop will be installed into a network of machines to form a cluster. Here, in this article, we will see the installation of Hadoop step by step in a single Ubuntu system. The post is written with an assumption that you already know what is a Name node, Data node, HDFS, etc and\u2026 <span class=\"read-more\"><a href=\"https:\/\/techieshouts.com\/home\/hadoop-installation-steps\/\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":304,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,125],"tags":[25,65,22,64],"_links":{"self":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/292"}],"collection":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/comments?post=292"}],"version-history":[{"count":16,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/292\/revisions"}],"predecessor-version":[{"id":326,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/292\/revisions\/326"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/media\/304"}],"wp:attachment":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/media?parent=292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/categories?post=292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/tags?post=292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}