• Hadoop Overview Training Course

  • Sign Up

Summary

Hadoop Overview is designed for management, decision makers, technical leads, architects and anyone who needs a thorough understanding of using Hadoop to solve their data scalability problems. We will cover Hadoop basics and discuss best practices using Hadoop in enterprises dealing with large data sets. We will look into the current data problems you are dealing with and potential use cases of using Hadoop in your infrastructure. Class covers Hadoop architecture and its main components: Hadoop Distributed File System (HDFS) and MapReduce. We will present case studies on how other enterprises are using Hadoop and look into what it takes to get Hadoop up and running in your environment. We will also have lab sessions so those attending this course will get a hands on experience with Hadoop.

Duration

2 Days

Course Objectives

By the completion of this Hadoop course, the participants should be able to:

  • Understand Hadoop main components and Architecture
  • Understand Hadoop Distributed File System (HDFS)
  • Understand MapReduce abstraction and how it works
  • Understand how to plan your Hadoop cluster
  • Understand what it takes to deploy and administer Hadoop cluster
  • Understand the benefits of using Hadoop and its impact on end-users
  • Know best practices of using Hadoop in enterprise world
  • Make a decision on whether Hadoop is a suitable solution to your data problems and whether it will help you scale

Audience

This course is designed for non-Hadoop engineers and management, decision makers, technical leads, architects and anyone who wants to understand what Hadoop is and how it is used today to solve complex data problems.

Pre-requisites

Big Data Overview

Course Outline

Introduction to Hadoop

  • The amount of data processing in today’s life
  • What Hadoop is why it is important
  • Hadoop comparison with traditional systems
  • Hadoop history
  • Hadoop main components and architecture

Hadoop Distributed File System (HDFS)

  • HDFS overview and design
  • HDFS architecture
  • HDFS file storage
  • Component failures and recoveries
  • Block placement
  • Balancing the Hadoop cluster

Planning your Hadoop cluster

  • Planning a Hadoop cluster and its capacity
  • Hadoop software and hardware configuration
  • HDFS Block replication and rack awareness
  • Network topology for Hadoop cluster

Hadoop Deployment

  • Different Hadoop deployment types
  • Hadoop distribution options
  • Hadoop competitors

Map-Reduce Abstraction

  • What MapReduce is and why it is popular
  • The Big Picture of the MapReduce
  • MapReduce process and terminology
  • Working with MapReduce

What it takes to run a Hadoop cluster

  • Potential problems and solutions when running Hadoop / What to look for…
  • Adding and removing nodes
  • MapReduce components failures and recoveries
  • Scheduling Hadoop jobs
  • Best practices of monitoring a Hadoop cluster

Introduction to Hive, HBase and Pig

  • Hive as a data warehouse infrastructure
  • HBase as the “Hadoop Database”
  • Using Pig as a scripting language for Hadoop

Hadoop Case studies

  • How different organizations use Hadoop cluster in their infrastructure

How can Hadoop help you?

  • Current data problems you are dealing with today?
  • Potential use cases for using Hadoop in your organization
  • Is Hadoop the right choice to help you scale?

Lab Sessions

  • Installation and Running Hadoop
  • Basic Commands
  • Demonstration

Please complete the form below for more information about our Hadoop overview course.

back to top

Contact Us