Big Data & Hadoop
Vidushi Gyaanpeeth Pune provides real time / online and placement oriented. Our extensive 32 hours of Big Data/Hadoop Virtual training cover both basic and advanced topics to help you become an expert Hadoop Professional.
COURSE OVERVIEW
Big data is nothing more than a large volume of data. It has been around for around two decades. Big data is a large amount of data owned by a company, obtained and manipulated through some new techniques in order to produce valuable format in the best way possible.
Course Contents (32 Hrs)
1.Introduction to Big Data and Hadoop
- What is Big Data?
- Types of Data
- Need for Big Data
- Characteristics of Big Data
- Traditional IT Analytics Approach
- Big Data—Use Cases
- Handling Limitations of Big Data
- Introduction to Hadoop
- History and Milestones of Hadoop
- Getting Started With Hadoop
- VMware Player—Introduction
- Installing VMware Player
- Setting up the Virtual Environment
- Oracle VirtualBox to Open a VM
- Hadoop Architecture
- Hadoop Cluster in commodity hardware
- Hadoop core services and components
- Regular file system vs. Hadoop
- HDFS layer
- HDFS operation principle
- Hadoop Deployment
- Introduction to Ubuntu Server
- Hadoop installation
- Single node and multi node configuration
- Hadoop Configuration in cluster environment
- Installing Hadoop 2.0
- MapReduce
- Introdution to MapReduce
- Hadoop MapReduce example
- Hadoop MapReduce Characteristics
- Setting up your MapReduce Environment
- Building a MapReduce Program
- MapReduce Requirements and Features
- PIG
- Introduction to PIG
- Components of Pig
- Pig Data Model
- Pig Vs. SQL
- Installing Pig Engine
- Datasets for Pig Development
- Pig Latin
- Filtering and Transforming Data
- Grouping and Sorting
- Pig Commmands
- HIVE
- What is HIVE
- Characteristics of Hive
- System Architecture and Components of Hive
- Hive Data Models
- Serialization/De-serialization
- Hive file formats
- Hive Query Language
- HIVE: Installing, running, and programming
- Hive Functions
- Difference between Hive and PIG
- HBase
- HBase introduction
- Characteristics of HBase
- HBase Architecture
- Storage Model of HBase
- When to use HBase
- HBase Data Model
- HBase Families
- HBase Components
- Row Distribution between region servers
- Data Storage
- Installation of HBase
- HBase Shell Commands
- Commercial Distribution of Hadoop
- Cloudera
- Downloading Cloudera Quickstart VM
- Starting the Cloudera VM
- Exploring the Welcome Page
- Understanding Hue
- Understanding Cloudera Manager
- Hortonworks Data Platform
- Brief Introduction To ZooKeeper Sqoop and Flume
- Introduction to ZooKeeper
- Features of ZooKeeper
- Introduction to Sqoop (Why, what, processing, under the hood)
- Importing data into Hive
- Exporting data from Hadoop using Sqoop
- Introduction to Flume
- Introduction to Spark & Scala Duration 8 Hrs.
- Introduction
- Evolution of Distributed Systems
- Need of New Generation Distributed Systems
- Limitations of MapReduce in Hadoop
- Batch vs. Real-Time Processing
- Introduction to Programming in Scala
- Features of RDDs
- Performing Some Basic Operations on Files in Spark Shell RDDs
- Importance of Spark SQL
- Benefits of Spark SQL
- DataFrames
- SQLContext
- SQLContext
- Creating a DataFrame
- DML Operation-Hive Queries