Big Data & Hadoop

Vidushi Gyaanpeeth Pune provides real time / online  and placement oriented. Our extensive 32 hours of Big Data/Hadoop Virtual  training cover both basic and advanced topics to help you become an expert Hadoop Professional.


Big data is nothing more than a large volume of data. It has been around for around two decades. Big data is a large amount of data owned by a company, obtained and manipulated through some new techniques in order to produce valuable format in the best way possible.

Course Contents     (32 Hrs)

 1.Introduction to Big Data and Hadoop      

  • What is Big Data?
  • Types of Data
  • Need for Big Data
  • Characteristics of Big Data
  • Traditional IT Analytics Approach
  • Big Data—Use Cases
  • Handling Limitations of Big Data
  • Introduction to Hadoop
  • History and Milestones of Hadoop 
  1. Getting Started With Hadoop
  • VMware Player—Introduction
  • Installing VMware Player
  • Setting up the Virtual Environment
  • Oracle VirtualBox to Open a VM 
  1. Hadoop Architecture
  • Hadoop Cluster in commodity hardware
  • Hadoop core services and components
  • Regular file system vs. Hadoop
  • HDFS layer
  • HDFS operation principle
  1. Hadoop Deployment
  • Introduction to Ubuntu Server
  • Hadoop installation
  • Single node and multi node configuration
  • Hadoop Configuration in cluster environment
  • Installing Hadoop 2.0
  1. MapReduce
  • Introdution to MapReduce
  • Hadoop MapReduce example
  • Hadoop MapReduce Characteristics
  • Setting up your MapReduce Environment
  • Building a MapReduce Program
  • MapReduce Requirements and Features 
  1. PIG
  • Introduction to PIG
  • Components of Pig
  • Pig Data Model
  • Pig Vs. SQL
  • Installing Pig Engine
  • Datasets for Pig Development
  • Pig Latin
  • Filtering and Transforming Data
  • Grouping and Sorting
  • Pig Commmands
  1. HIVE
  • What is HIVE
  • Characteristics of Hive
  • System Architecture and Components of Hive
  • Hive Data Models
  • Serialization/De-serialization
  • Hive file formats
  • Hive Query Language
  • HIVE: Installing, running, and programming
  • Hive Functions
  • Difference between Hive and PIG
  1. HBase
  • HBase introduction
  • Characteristics of HBase
  • HBase Architecture
  • Storage Model of HBase
  • When to use HBase
  • HBase Data Model
  • HBase Families
  • HBase Components
  • Row Distribution between region servers
  • Data Storage
  • Installation of HBase
  • HBase Shell Commands
  1. Commercial Distribution of Hadoop
  • Cloudera
  • Downloading Cloudera Quickstart VM
  • Starting the Cloudera VM
  • Exploring the Welcome Page
  • Understanding Hue
  • Understanding Cloudera Manager
  • Hortonworks Data Platform 
  1. Brief Introduction To ZooKeeper Sqoop and Flume
  • Introduction to ZooKeeper
  • Features of ZooKeeper
  • Introduction to Sqoop (Why, what, processing, under the hood)
  • Importing data into Hive
  • Exporting data from Hadoop using Sqoop
  • Introduction to Flume
  1. Introduction to Spark & Scala Duration 8 Hrs.
  • Introduction
  • Evolution of Distributed Systems
  • Need of New Generation Distributed Systems
  • Limitations of MapReduce in Hadoop
  • Batch vs. Real-Time Processing
  • Introduction to Programming in Scala
  • Features of RDDs
  • Performing Some Basic Operations on Files in Spark Shell RDDs
  • Importance of Spark SQL
  • Benefits of Spark SQL
  • DataFrames
  • SQLContext
  • SQLContext
  • Creating a DataFrame
  • DML Operation-Hive Queries