Wednesday, 5 March 2014

Big data - Hadoop training Course Outline

This course has been designed to cover all type of audiences spanning from Architect, Administrator to developer. 

In case of any question regarding duration/fees/schedule , do call me @ 9840014739


Module 1
Big data Getting Started
What is Big Data?
What is Big Data Analytics ?
What  is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
Module 2
Hadoop Distributed File system

Eclipse Installation
Overview of HDFS
Communication Protocols
Rack Awareness
Hadoop cluster Topology
Setting up SSH for Hadoop Cluster
Running Hadoop –
          Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Module 3
MapReduce Framework

Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Module 4
Advanced MapReduce  Programming
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Module  5
Apache Hadoop Administration   
Best Practices for Hadoop setup and infrastructure

Hadoop cluster Installation preparation
   Ø Cluster network design
   Ø  Installation of Linux operating system
   Ø  Configuring SSH
   Ø  Walkthrough on Rack topology and set up

Managing Hadoop cluster
   Ø  HDFS cluster management
   Ø  Secondary Name node configuration
   Ø  Task Tracker management
   Ø  Configuring the HDFS quota
   Ø  Configuring Fair Scheduler      
   Ø  Upgrading Hadoop     
   Ø  Deploying and managing Hadoop
         clusters with Ambari

Monitoring Hadoop cluster
   Ø  Monitoring Hadoop cluster with
        Ganglia
   Ø  Monitoring Hadoop cluster with 
             Ambari
   Ø  Monitoring Hadoop cluster with Nagia

Hadoop Cluster Performance Tuning
   Ø  Benchmarking and profiling
   Ø  Using compression for input and 
             output
   Ø  Configuring optimal map and reduce
        slots  for the TT
   Ø  Fine tuning Job Tracker config
   Ø  Fine tuning Task Tracker config
   Ø  Tuning Shuffle, merge and sort
             parameters

Security Implementation
              Kerberos security mplementation      
Workflow Scheduler
              Capacity Scheduler
               Fair Scheduler   

dfsadmin & mradmin commands

Administration of Hcatalog and Hive

Backup and Recovery
Scenario based exercises
-          Data node failure & Recovery
-          Name Node Failure & Recovery
-          JT & TT failure  & Recovery
-          Removing data nodes
-          Adding Data nodes


Module 6
Pig and Pig Latin
Installation and configuration
Running Pig Lating through grunt
Writing programs
-          Filter , Load & Store functions
Writing user defined functions

Working with Scripts
Lab Exercises
Module 7
HBase and ZooKeeper
NoSQL Vs SQL
Cap  Theorem
Architecture
Installation
Configuration
Java API
MR integration
Performance Tuning
Lab Exercises
Module 8
Hive
Features of Hive
Architecture
Installation and configuration
HiveQL

Lab Exercises
Module 9
Other Hadoop eco system components
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume

Lab Exercises


http://big-data-training-in-chennai.blogspot.in/