SlideShare a Scribd company logo
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What is Hadoop?
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What are we going to learn?
Big Data Introduction
What is Big Data
Analytics?
1
2
Big Data Analytic Domains
Why Big Data Analytics? Stages in Big Data Analytics
Big Data Analytics
Use Cases
3 4
5 6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Exploding Global Data
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Fun Facts about Global Data
6.1 Billionglobal smartphone users by 2020
In 5 yearsthere will be over 50 Billionsmart connected devices in the world
We create 2.5 Quintillion Bytes of Data Everyday
Exabyte
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Fun Facts About Big Data
If one stores the total Global Data into discs and pile
up the discs into stack, it will grow longer than that of
Eiffel Tower
300 m
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What is Big Data?
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
5 V’s Definition of Big Data
Value
?
Volume
Value Veracity
VarietyVelocity
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Growth Drivers
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Rapid adoption rate of digital infrastructure
5x faster than electricity & telephony
50
Billion
SmartObjects
World Population
Inflection Point
2003 2008 2010 2015 2020
6.307
6.721 6.894 7.347 7.83
Tablets, Laptops, Phones
“~6 things online” per person
Sensors, Smart, Objects, Device Clustered Systems
IOT: 50 Billion Devices by 2020
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Data Generated Every Minute
FACEBOOK
Users like
4,166,667
posts
TWITTER
Users send
347,222
tweets
REDDIT
Users cast
18,327
votes
INSTAGRAM
Users like
1,736,111
posts
YOUTUBE
Users upload
300 hours
of new video
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Why Big Data Analytics?
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Cost
Reduction
Improved
Services or
Products
Faster and
Better Decision
Making
Next Generation
Products
Big Data
Analytics
Why Big Data Analytics?
Cost effective storage
system for huge data
sets
Provides ways to analyze
information quickly and make
decisions
Evaluation of customer
needs & satisfaction
Automated Car,
Healthcare, etc.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What is Big Data Analytics?
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
“Big data analytics examines large and
different types of data to uncover
hidden patterns, correlations and other
insights”
What is Big Data Analytics?
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Stages in Big Data Analytics
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Stages in Big Data Analytics
?
Identifying Problem
Performing Analytics
Over Data
Designing Data
Requirement
Pre-processing Data
Visualizing Data
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Domains
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Analytics Domains
B I G D ATA A N A LY T I C S D O M A I N S
Web & E - Tailing Tele - communication Government
Healthcare Finance & Banking Retail
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Analytics Use Cases
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
1 Messages were targeted based on
voter profiles using platforms
such as Facebook, Snapchat,
Pandora radio, etc.
Big Data helped Donald Trump to win against Hillary Clinton in the US election
Build an algorithm that generated
top cities to reach the highest
concentration of persuadable voters
Collect Personal data
from various resources
like club cards, newspaper
Subscription, social media, etc.
Big Data Analytics Use Cases
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Walmart boosted its sales by leveraging the power of Big Data
While forecasting the demand for emergency supplies for approaching Hurricane
Sandy, they gain some amazing insights:
Big Data Analytics Use Cases
Extra supplies of Strawberry Pop
Tarts were dispatched to stores in
Hurricane Sandy's path in 2012,
and sold extremely well
Along with flashlights and
emergency equipment, they
found an upsurge in sales of
strawberry Pop Tarts
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apixio uses big data analytics to improve healthcare decision
Big Data Analytics Use Cases
80% of medical and clinical
information about patients is in
unstructured format, such as
written physician notes
Analysis of medical data
using variety of different
methodologies & algorithms
that are machine learning
based and have NLP
capabilities
The patient data model
generated is aggregated across
population to derive larger
insights like disease prevalence,
treatment patterns, etc.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Analytical Tools
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Analytical Tool
Hadoop provides a scalable solution to store and process huge data sets in parallel and
distributed fashion.
Apache Hive is a data warehousing tool that allows us to perform big data analytics using Hive
Query Language which is very similar to SQL.
Apache Pig is a platform, used to analyze large data sets representing them as data flows.
Apache Spark is an in-memory data processing engine that allows us to efficiently execute
streaming, machine learning or SQL workloads and requires fast iterative access to datasets.
Apache HBase is a NoSQL database that allows us to store unstructured and semi – structured
data with ease and provides real time read/write access.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Analytics Courses at
Edureka
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Data Engineer
Spark and Hadoop
Developer
 Apache Spark
Certification Training
 Big Data Hadoop
Certification Training
 Linux Administration
Certification Training
 Data Analytics with R
Certification Training
Hadoop Admin Data Analyst
 Data Analytics with R
Certification Training
 Big Data Hadoop
Certification Training
 Big Data Hadoop
Certification Training
 Hadoop Administration
Certification Training
Data Scientist
 Data Science Certification
Training
 Big Data Hadoop
Certification Training
 Statistics Essentials for
Analytics
 Data Analytics with R
Certification Training
 Machine Learning with
Mahout Certification
Training
Edureka Big Data Analytics Courses
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Challenges:
 Current data infrastructure is not capable of storing and processing the data generated by users
everyday
 Enhancing current infrastructure was very expensive and has limited scalability capabilities
Hadoop Case Study – Orbitz Worldwide
warehouse
users orbitz.com
P
R
O
C
E
S
S
I
N
G
500 GB log
data per day
1.5 Million Flight
& 1 Million Hotel
searches every day
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
 Efficient and long term Storage System
that can store any kind of data
 Analytical tool for making important
business decision
 Cost Effective
 Open Source framework that used to store
and process huge data sets
 Easily scalable as per the need
 Comes with various analytical tools
Solution: Apache HadoopRequirement:
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
 Comparison of performance of previous methodology to Hadoop
implementation
 Months worth of data is archived easily
 Earlier process took 109m 14s for extracting and processing
logs whereas MapReduce process took 25m 58s only
 Allow them to easily derive various metrics for analytics which was a
tedious task earlier
Accomplishment with Hadoop:
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Analysis of Website Log
at Orbitz
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
1. Impression List:
 It contains the ranking of each hotel in the search bar
along with the session id of the visitor who has clicked on
it.
 Format of Impression List:
(session_id, hotel_id, position, rate)
Types of Website Logs
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
2. WebTrends Log:
 It contains the details of customers who have booked a hotel through the website.
 Format of WebTrends Log:
(session_id, visitors_ip, hotel_id, booking_date, number_of_guests, booking_time)
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Problem Statement:
Analyze un-cleaned website logs i.e. WebTrend Log & Impression Log and find the position of each hotel in the search
bar against its frequency of booking
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop Deployment:
load data
Local
log data
Apache Hive
query result
source Analyst
MapReduce for processing uncleaned data
Hadoop Cluster
Hadoop Case Study – Orbitz Worldwide
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop Case Study – Orbitz Worldwide
?
Large amount of
unstructured log data
generated every day Can store any
type of data
2 Can parallelly
process data faster Output
Structured
Data
4
Hive Query
Language
6
Write fancy query to
analyze hotel position
In search bar using log data
Analytical Report
8
5
7
3
1
1
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Summary
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Summary
What is Big Data? What is Big Data Analytics?Why Big Data Analytics?
Stages in Big Data Analytics Big Data Analytics Use CasesBig Data Domains
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Thank You…
Questions/Queries/Feedback

More Related Content

Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutorial | Edureka