This Edureka Big Data Analytics Tutorial will help you to understand the basics of Big Data domain. Learn how to analyze Big Data in this tutorial. Below are the topics covered in this tutorial:
1) Big Data Introduction
2) What is Big Data Analytics?
3) Why Big Data Analytics?
4) Stages in Big Data Analytics
5) Big Data Analytics Domains
6) Big Data Analytics Use Cases
Subscribe to our channel to get updates.
Check our complete Hadoop playlist here: https://goo.gl/4OyoTW
Report
Share
Report
Share
1 of 39
More Related Content
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutorial | Edureka
2. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
What are we going to learn?
Big Data Introduction
What is Big Data
Analytics?
1
2
Big Data Analytic Domains
Why Big Data Analytics? Stages in Big Data Analytics
Big Data Analytics
Use Cases
3 4
5 6
4. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Fun Facts about Global Data
6.1 Billionglobal smartphone users by 2020
In 5 yearsthere will be over 50 Billionsmart connected devices in the world
We create 2.5 Quintillion Bytes of Data Everyday
Exabyte
9. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Rapid adoption rate of digital infrastructure
5x faster than electricity & telephony
50
Billion
SmartObjects
World Population
Inflection Point
2003 2008 2010 2015 2020
6.307
6.721 6.894 7.347 7.83
Tablets, Laptops, Phones
“~6 things online” per person
Sensors, Smart, Objects, Device Clustered Systems
IOT: 50 Billion Devices by 2020
10. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Data Generated Every Minute
FACEBOOK
Users like
4,166,667
posts
TWITTER
Users send
347,222
tweets
REDDIT
Users cast
18,327
votes
INSTAGRAM
Users like
1,736,111
posts
YOUTUBE
Users upload
300 hours
of new video
12. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Cost
Reduction
Improved
Services or
Products
Faster and
Better Decision
Making
Next Generation
Products
Big Data
Analytics
Why Big Data Analytics?
Cost effective storage
system for huge data
sets
Provides ways to analyze
information quickly and make
decisions
Evaluation of customer
needs & satisfaction
Automated Car,
Healthcare, etc.
20. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
1 Messages were targeted based on
voter profiles using platforms
such as Facebook, Snapchat,
Pandora radio, etc.
Big Data helped Donald Trump to win against Hillary Clinton in the US election
Build an algorithm that generated
top cities to reach the highest
concentration of persuadable voters
Collect Personal data
from various resources
like club cards, newspaper
Subscription, social media, etc.
Big Data Analytics Use Cases
21. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Walmart boosted its sales by leveraging the power of Big Data
While forecasting the demand for emergency supplies for approaching Hurricane
Sandy, they gain some amazing insights:
Big Data Analytics Use Cases
Extra supplies of Strawberry Pop
Tarts were dispatched to stores in
Hurricane Sandy's path in 2012,
and sold extremely well
Along with flashlights and
emergency equipment, they
found an upsurge in sales of
strawberry Pop Tarts
22. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Apixio uses big data analytics to improve healthcare decision
Big Data Analytics Use Cases
80% of medical and clinical
information about patients is in
unstructured format, such as
written physician notes
Analysis of medical data
using variety of different
methodologies & algorithms
that are machine learning
based and have NLP
capabilities
The patient data model
generated is aggregated across
population to derive larger
insights like disease prevalence,
treatment patterns, etc.
24. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Big Data Analytical Tool
Hadoop provides a scalable solution to store and process huge data sets in parallel and
distributed fashion.
Apache Hive is a data warehousing tool that allows us to perform big data analytics using Hive
Query Language which is very similar to SQL.
Apache Pig is a platform, used to analyze large data sets representing them as data flows.
Apache Spark is an in-memory data processing engine that allows us to efficiently execute
streaming, machine learning or SQL workloads and requires fast iterative access to datasets.
Apache HBase is a NoSQL database that allows us to store unstructured and semi – structured
data with ease and provides real time read/write access.
26. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Data Engineer
Spark and Hadoop
Developer
Apache Spark
Certification Training
Big Data Hadoop
Certification Training
Linux Administration
Certification Training
Data Analytics with R
Certification Training
Hadoop Admin Data Analyst
Data Analytics with R
Certification Training
Big Data Hadoop
Certification Training
Big Data Hadoop
Certification Training
Hadoop Administration
Certification Training
Data Scientist
Data Science Certification
Training
Big Data Hadoop
Certification Training
Statistics Essentials for
Analytics
Data Analytics with R
Certification Training
Machine Learning with
Mahout Certification
Training
Edureka Big Data Analytics Courses
28. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Challenges:
Current data infrastructure is not capable of storing and processing the data generated by users
everyday
Enhancing current infrastructure was very expensive and has limited scalability capabilities
Hadoop Case Study – Orbitz Worldwide
warehouse
users orbitz.com
P
R
O
C
E
S
S
I
N
G
500 GB log
data per day
1.5 Million Flight
& 1 Million Hotel
searches every day
29. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Efficient and long term Storage System
that can store any kind of data
Analytical tool for making important
business decision
Cost Effective
Open Source framework that used to store
and process huge data sets
Easily scalable as per the need
Comes with various analytical tools
Solution: Apache HadoopRequirement:
Hadoop Case Study – Orbitz Worldwide
30. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Comparison of performance of previous methodology to Hadoop
implementation
Months worth of data is archived easily
Earlier process took 109m 14s for extracting and processing
logs whereas MapReduce process took 25m 58s only
Allow them to easily derive various metrics for analytics which was a
tedious task earlier
Accomplishment with Hadoop:
Hadoop Case Study – Orbitz Worldwide
32. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
1. Impression List:
It contains the ranking of each hotel in the search bar
along with the session id of the visitor who has clicked on
it.
Format of Impression List:
(session_id, hotel_id, position, rate)
Types of Website Logs
Hadoop Case Study – Orbitz Worldwide
33. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
2. WebTrends Log:
It contains the details of customers who have booked a hotel through the website.
Format of WebTrends Log:
(session_id, visitors_ip, hotel_id, booking_date, number_of_guests, booking_time)
Hadoop Case Study – Orbitz Worldwide
34. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Problem Statement:
Analyze un-cleaned website logs i.e. WebTrend Log & Impression Log and find the position of each hotel in the search
bar against its frequency of booking
Hadoop Case Study – Orbitz Worldwide
36. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Hadoop Case Study – Orbitz Worldwide
?
Large amount of
unstructured log data
generated every day Can store any
type of data
2 Can parallelly
process data faster Output
Structured
Data
4
Hive Query
Language
6
Write fancy query to
analyze hotel position
In search bar using log data
Analytical Report
8
5
7
3
1
1