This course has been designed to cover all type of audiences spanning from Architect, Administrator to developer.
In case of any question regarding duration/fees/schedule , do call me @ 9840014739
http://big-data-training-in-chennai.blogspot.in/
In case of any question regarding duration/fees/schedule , do call me @ 9840014739
Module 1
Big data Getting Started
|
What is Big Data?
What is Big Data Analytics ?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Eclipse Installation
Overview of HDFS
Communication Protocols
Rack Awareness
Hadoop cluster Topology
Setting up SSH for Hadoop Cluster
Running Hadoop –
Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
|
Module 5
Apache Hadoop Administration
|
Best Practices for Hadoop setup and infrastructure
Hadoop cluster Installation preparation
Ø Cluster network design
Ø Installation of Linux operating system
Ø Configuring SSH
Ø Walkthrough on Rack topology and set up
Managing Hadoop cluster
Ø HDFS cluster management
Ø Secondary Name node configuration
Ø Task Tracker management
Ø Configuring the HDFS quota
Ø Configuring Fair Scheduler
Ø Upgrading Hadoop
Ø Deploying and managing Hadoop
clusters with Ambari
Monitoring Hadoop cluster
Ø Monitoring Hadoop cluster with
Ganglia
Ø Monitoring Hadoop cluster with
Ambari
Ø Monitoring Hadoop cluster with Nagia
Hadoop Cluster Performance Tuning
Ø Benchmarking and profiling
Ø Using compression for input and
output
Ø Configuring optimal map and reduce
slots for the TT
Ø Fine tuning Job Tracker config
Ø Fine tuning Task Tracker config
Ø Tuning Shuffle, merge and sort
parameters
Security Implementation
Kerberos security mplementation
Workflow Scheduler
Capacity Scheduler
Fair Scheduler
dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
Scenario based exercises
- Data node failure & Recovery
- Name Node Failure & Recovery
- JT & TT failure & Recovery
- Removing data nodes
- Adding Data nodes
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Writing programs
- Filter , Load & Store functions
Writing user defined functions
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
MR integration
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
Lab Exercises
|
Module 9
Other Hadoop eco system components
|
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Lab Exercises
|