Logo +1- 603-689-9045 Logo +91- 805-011-9991

Hadoop Admin Course

Hadoop Administration training course from Techybees provides participants expertise in all the steps necessary to operate and maintain a Hadoop

New Batch Details

Free Demo04 May 2017. 8:30 PM EST
Start Date12th May 2017
DaysMon, Wed & Fri
Time08:30 PM EST
TypeOnline, Live Instructor Led

Hadoop Administration training course from TechyBees provides participants expertise in all the steps necessary to operate and maintain a Hadoop cluster, i.e. From Planning, Installation and Configuration through load balancing, Security and Tuning, TechyBees's training course will provide hands-on preparation for the real-world challenges faced by Hadoop administrators.The course curriculum follows Apache Hadoop distribution.

Course Objectives

During the Hadoop Administration Online training, you'll master:
  1. 1. Hadoop Architecture, HDFS, Hadoop Cluster and Hadoop Administrator's role
  2. 2. Plan and Deploy a Hadoop Cluster
  3. 3. Load Data and Run Applications
  4. 4. Configuration and Performance Tuning
  5. 5. How to Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster
  6. 6. Cluster Security, Backup and Recovery
  7. 7. Insights on Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2
  8. 8. Oozie, Hcatalog/Hive, and HBase Administration and Hands-On Project

Who should go for this course?

The Hadoop Administration course is best suited to professionals with IT Admin experience such as:
  • 1. Linux / Unix Administrator
  • 2. Database Administrator
  • 3. Windows Administrator
  • 4. Infrastructure Administrator
  • 5. System Administrator

What are the pre-requisites for this Course?

This course requires basic Linux knowledge and prior knowledge of Apache Hadoop is not required. Techybeesalso offers a complementary course on "Linux Fundamentals" to all the Hadoop Administration course participants.

How will I do practicals in Online Training?

Practical Set Up: We will help you set up a virtual machine in your system. For VM installation, 8GB RAM is required. You can also create an account with AWS EC2 and use 'Free tier usage' eligible servers. This is the most preferred option currently as most of the deployments are happening over the cloud and Techybeesprovides you a step-by-step procedure guide which is available on the LMS. Additionally, our 24*7 expert support team will be available to assist you around any queries.

Why Learn Hadoop Administration?

"Big Data & Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 Forbes

McKinsey predicts that by 2018 there will be a shortage of 1.5M data experts

Mckinsey Report

Average Hadoop Admin Salary is $123k

Indeed.com Salary Data"

Which Case-Studies will be a part of the Course?

Towards end of the Course, you will get an opportunity to work on a live project, that will use the different Hadoop ecosystem components to work together in a Hadoop implementation to solve big data problems.

  1. 1. Setup a minimum 2 Node Hadoop Cluster
    Node 1 - Namenode, JobTracker,datanode, tasktracker
    Node 2 - Secondary namenode, datanode, tasktracker
  2. 2. Create a simple text file and copy to HDFS
    Find out the location of the node to which it went.
    Find in which data node the output files are written.
  3. 3. Create a large text file and copy to HDFS with a block size of 256 MB. Keep all the other files in default block size and find how block size has an impact on the performance.
  4. 4. Set a spaceQuota of 200MB for projects and copy a file of 70MB with replication=2
    Identify the reason the system is not letting you copy the file?
    How will you solve this problem without increasing the spaceQuota?
  5. 5. Configure Rack Awareness and copy the file to HDFS
    Find its rack distribution and identify the command used for it.
    Find out how to change the replication factor of the existing file.
The final certification project is based on real world use cases as follows:
Problem Statement 1:
  1. 1. Setup a Hadoop cluster with a single node or a 2 node cluster with all daemons like namenode, datanode, jobtracker, tasktracker, a secondary namenode that must run in the cluster with block size = 128MB.
  2. 2. Write a Namespace ID for the cluster and create a directory with name space quota as 10 and a space quota of 100MB in the directory.
  3. 3. Use the distcp command to copy the data to the same cluster or a different cluster, and create the list of data nodes participating in the cluster.
Problem statement 2:
  1. 1. Save the namespace of the Namenode, without using the secondary namenode, and ensure that the edit file merge, without stopping the namenode daemon
  2. 2. Set include file, so that no other nodes can talk to the namenode.
  3. 3. Set the cluster re-balancer threshold to 40%.
  4. 4. Set the map and reduce slots to s4 and 2 respectively for each node.
  • Need for a different technique for Data Storage
  • Need for a different paradigm for Data Analysis
  • The 3 V's of Big Data
  • Different distributions of Hadoop
  • HDFS Features
  • HDFS Design Assumptions
  • Overview of HDFS Architecture
  • Writing and Reading Files
  • Hands-On Exercise
  • What Is MapReduce?
  • Features of MapReduce
  • Basic MapReduce Concepts
  • Architectural Overview
  • What is a Combiner?
  • What is a Practitioner?
  • Hands-On Exercise
  • What is the Hadoop Ecosystem?
  • Integration Tools
  • Analysis Tools
  • Data Storage and Retrieval Tools
  • General planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
  • Deployment Types
  • Installing Hadoop
  • Basic Configuration Parameters
  • Hands-On Exercise on a Pseudo - Cluster
  • Hands-On Exercise on a Multi-Node Cluster
  • Advanced Parameters
  • core-site.xml parameters
  • mapred-site.xml parameters
  • hdfs-site.xml parameters
  • Configuring Rack Awareness
  • Why Hadoop Security Is Important
  • Hadoop' s Security System Concepts
  • What Kerberos Is and How it Works
  • Integrating a Secure Cluster with Other Systems
  • Managing Running Jobs
  • Hands-On Exercise
  • The FIFO Scheduler
  • The Fair Scheduler
  • The Capacity Scheduler
  • Configuring the Fair Scheduler
  • Evaluating the different schedulers
  • Hands-On Exercise
  • Checking HDFS Status
  • Hands-On Exercise
  • Copying Data Between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Hands-On Exercise
  • Name Node Metadata Backup
  • Cluster Upgrading
  • General System Monitoring
  • Managing Hadoop's Log Files
  • Using the Name Node and Job Tracker Web UIs
  • Hands-On Exercise
  • Cluster Monitoring with Ganglia
  • Common Troubleshooting Issues
  • Benchmarking Your Cluster
  • Hive
  • Pig
  • Hbase
  • Oozie

Towards the end of the course, you will work on a live project where you will be using PIG, HIVE, HBase and MapReduce to perform Big Data analytics. Following are a few industry-specific Big Data case-studies that are included in our Big Data and Hadoop Certification e.g. Finance, Retail, Media, Aviation etc. which you can consider for your project work.

Apart from these there are some twenty more use cases to choose from:

  • Market data Analysis
  • Twitter Data Analysis

Industry: Social Media

Data: It comprises of the information gathered from sites like reddit.com, stumbleupon.com which are bookmarking sites and allow you to bookmark, review, rate, search various links on any topic.reddit.com, stumbleupon.com, etc. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various links/posts URL, categories defining it and the ratings linked with it.

Problem Statement: Analyze the data in the Hadoop ecosystem to:

  1. Fetch the data into a Hadoop Distributed File System and analyze it with the help of MapReduce, Pig and Hive to find the top rated links based on the user comments, likes etc.
  2. Using MapReduce, convert the semi-structured format (XML data) into a structured format and categorize the user rating as positive and negative for each of the thousand links.
  3. Push the output HDFS and then feed it into PIG, which splits the data into two parts: Category data and Ratings data.
  4. Write a fancy Hive Query to analyze the data further and push the output is into relational database (RDBMS) using Sqoop.
  5. Use a web server running on grails/java/ruby/python that renders the result in real time processing on a website.

Industry: Retail

Data: Publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc.

Problem Statement: Analyze the data in the Hadoop ecosystem to:

  1. Get the number of complaints filed under each product
  2. Get the total number of complaints filed from a particular location
  3. Get the list of complaints grouped by location which has no timely response

Industry: Tourism

Data: The dataset comprises attributes like: City pair (combination of from and to), adults traveling, seniors traveling, children traveling, air booking price, car booking price, etc.

Problem Statement: Find the following insights from the data:

  1. Top 20 destinations people frequently travel to: Based on given data we can find the most popular destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
  2. Top 20 locations from where most of the trips start based on booked trip count
  3. Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so that the discount offers can be given to attract more bookings for these destinations.

Industry: Aviation

Data: Publicly available dataset which contains the flight details of various airlines such as: Airport id, Name of the airport, Main city served by airport, Country or territory where airport is located, Code of Airport, Decimal degrees, Hours offset from UTC, Timezone, etc.

Problem Statement: Analyze the airlines' data to:

  1. Find list of airports operating in the country
  2. Find the list of airlines having zero stops
  3. List of airlines operating with code share
  4. Which country (or) territory has the highest number of airports
  5. Find the list of active airlines in the United States

Industry: Banking and Finance

Data: Publicly available dataset which contains complete details of all the loans issued, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information.

Problem Statement: Find the number of cases per location and categorize the count with respect to reason for taking loan and display the average risk score.

Industry: Media

Data: Publicly available data from sites like rotten tomatoes, IMDB, etc.

Problem Statement: Analyze the movie ratings by different users to:

  1. Get the user who has rated the most number of movies
  2. Get the count of total number of movies rated by user belonging to a specific occupation
  3. Get the number of underage users

Data: It is about the YouTube videos and contains attributes such as: VideoID, Uploader, Age, Category, Length, views, ratings, comments, etc.

Problem Statement: Identify the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, and the top 10 most viewed videos.

FAQ's

For your practical work, we will help you set up a virtual machine in your system. For VM installation, 8GB RAM is required. You can also create an account with AWS EC2 and use 'Free tier usage' eligible servers to create your Hadoop Cluster on AWS EC2. This is the most preferred option and TechyBees provides you step-by-step procedure guide which is available on the LMS.

All our instructors are working professionals from the Industry and have at least 10-12 yrs of relevant experience in various domains. They are subject matter experts and are trained by Techybees for providing online training so that participants get a great learning experience.

Your access to the Support Team is for lifetime and will be available 24/7. The team will help you in resolving queries, during and after the course.

You will never lose any lecture. You can choose either of the two options: 1. View the recorded session of the class available in your LMS. 2. You can attend the missed session, in any other live batch.

Professionals with Administration experience can take up "Hadoop Administration" course training. It will be a natural career progression. If you are planning for Big Data Architect role then you may consider both Hadoop developer and Hadoop Administration training, sequentially.

Post-enrolment, the LMS access will be instantly provided to you and will be available for lifetime. You will be able to access the complete set of previous class recordings, PPTs, PDFs, assignments. Moreover the access to our 24x7 support team will be granted instantly as well. You can start learning right away.

Yes, the access to the course material will be available for lifetime once you have enrolled into the course.

Yes, you can enroll in the early bird batches and may join the classes later.

These classes will be completely Online Live Instructor-led Interactive sessions. You will have chat option available to discuss your queries with instructor during a class.

You can do that by posting a question on the training blog.

You can give us a CALL at +1- 603-689-9045 OR email at info@techybees.com

Yes, we do schedule free demo sessions before we start any new batch. However, you can go through the sample class recordings and it would give you a clear insight about how are the classes conducted, quality of instructors and the level of interaction in the class.

Yes, it is possible. You can enroll now and can reschedule your classes in Future. You have complete Flexibility on this at TechyBees. Also, you can check out the website to know more about our future batches.

Depending on the batch you select, Your Live Classes will be held either every weekend for 5 weeks or for 15 weekdays. It would typically be 6-7 hours of effort needed each week post live sessions. This shall comprise hands-on assignments.

1 Mbps of internet speed is preferable to attend the LIVE classes.

Techybees is committed to provide you an awesome learning experience through world-class content and best-in-class instructors. We will create an ecosystem through this training, that will enable you to convert opportunities into job offers by presenting your skills at the time of an interview. We can assist you on resume building and also share important interview questions once you are done with the training. However, please understand that we are not into job placements.

You can pay by Credit Card, Debit Card or Net Banking from all the leading banks. We use a CCAvenue Payment Gateway. For USD payment, you can pay by PayPal.

  • Enroll Now

    x

    Request a Demo

    x

Placement Assistance :

Techybees is committed to provide you an awesome learning experience through world-class content and best-in-class instructors. We will create an ecosystem through this training, that will enable you to convert opportunities into job offers by presenting your skills at the time of an interview. We can assist you on resume building and also share important interview questions once you are done with the training. However, please understand that we are not into job placements.

Request A Call Back