Big Data Hadoop – Vidushi Gyaanpeeth

Vidushi Gyaanpeeth Pune provides real time / online and placement oriented. Our extensive 32 hours of Big Data/Hadoop Virtual training cover both basic and advanced topics to help you become an expert Hadoop Professional.

COURSE OVERVIEW

Big data is nothing more than a large volume of data. It has been around for around two decades. Big data is a large amount of data owned by a company, obtained and manipulated through some new techniques in order to produce valuable format in the best way possible.

1.Introduction to Big Data and Hadoop

What is Big Data?
Types of Data
Need for Big Data
Characteristics of Big Data
Traditional IT Analytics Approach
Big Data—Use Cases
Handling Limitations of Big Data
Introduction to Hadoop
History and Milestones of Hadoop

Getting Started With Hadoop

VMware Player—Introduction
Installing VMware Player
Setting up the Virtual Environment
Oracle VirtualBox to Open a VM

Hadoop Architecture

Hadoop Cluster in commodity hardware
Hadoop core services and components
Regular file system vs. Hadoop
HDFS layer
HDFS operation principle

Hadoop Deployment

Introduction to Ubuntu Server
Hadoop installation
Single node and multi node configuration
Hadoop Configuration in cluster environment
Installing Hadoop 2.0

MapReduce

Introdution to MapReduce
Hadoop MapReduce example
Hadoop MapReduce Characteristics
Setting up your MapReduce Environment
Building a MapReduce Program
MapReduce Requirements and Features

PIG

Introduction to PIG
Components of Pig
Pig Data Model
Pig Vs. SQL
Installing Pig Engine
Datasets for Pig Development
Pig Latin
Filtering and Transforming Data
Grouping and Sorting
Pig Commmands

HIVE

What is HIVE
Characteristics of Hive
System Architecture and Components of Hive
Hive Data Models
Serialization/De-serialization
Hive file formats
Hive Query Language
HIVE: Installing, running, and programming
Hive Functions
Difference between Hive and PIG

HBase

HBase introduction
Characteristics of HBase
HBase Architecture
Storage Model of HBase
When to use HBase
HBase Data Model
HBase Families
HBase Components
Row Distribution between region servers
Data Storage
Installation of HBase
HBase Shell Commands

Commercial Distribution of Hadoop

Cloudera
Downloading Cloudera Quickstart VM
Starting the Cloudera VM
Exploring the Welcome Page
Understanding Hue
Understanding Cloudera Manager
Hortonworks Data Platform

Brief Introduction To ZooKeeper Sqoop and Flume

Introduction to ZooKeeper
Features of ZooKeeper
Introduction to Sqoop (Why, what, processing, under the hood)
Importing data into Hive
Exporting data from Hadoop using Sqoop
Introduction to Flume

Introduction to Spark & Scala Duration 8 Hrs.

Introduction
Evolution of Distributed Systems
Need of New Generation Distributed Systems
Limitations of MapReduce in Hadoop
Batch vs. Real-Time Processing
Introduction to Programming in Scala
Features of RDDs
Performing Some Basic Operations on Files in Spark Shell RDDs
Importance of Spark SQL
Benefits of Spark SQL
DataFrames
SQLContext
SQLContext
Creating a DataFrame
DML Operation-Hive Queries

Big Data & Hadoop

Course Contents (32 Hrs)