Spark is going to replace Apache Hadoop! Know Why?

Spark is going to replace Hadoop! Know Why?
www.edureka.co/apache-spark-scala-training

Agenda
At the end of the session, you will be able to:




Understand Why Learn Spark?
Know Advantages of Spark & its Survey for 2015
Discover Spark Career Path
Understand how Companies are using Spark?
Slide 2 www.edureka.co/apache-spark-scala-training

Why Spark?

Rise of Big Data
Unstructured Data
7000
6000
5000
4000
3000
2000
1000
0
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Structured Data Un-structured Data
 By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes
(ZB)
The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on
Earth.

Application of Big Data
Source: Twitter

Application of Big Data

Hadoop is not Enough!
Limitations:
Hadoop MapReduce is Limited to Batch Processing.
Real-time processing was a big “No” in Hadoop
Real-time Processing
Hadoop MapReduce is fast but not fast enoughNot Fast Enough
Conclusion:
It is essential and can be achieved using Spark!

Spark Survey and its Advantages

Spark Survey 2015!
Slide 9 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training

Advantages of Spark
Slide 10
Runs Everywhere
Generality
Ease of Use
100x faster than MR

Feature Comparision
Slide 11 Source: Databrix
Hadoop MapReduce HADOOP Spark
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
OpenSource OpenSource
Written in Java Written in Scala

Spark Features/Modules in Demand

New Features in 2015
Data Frames 
• Similar API to data frames in R and Pandas
• Automatically optimised via Spark SQL
• Released in Spark 1.3
SparkR 
• Released in Spark 1.4
• Exposes DataFrames, RDD’s & ML library in R
Machine Learning Pipelines 
• High Level API
• Featurization
• Evaluation
• Model Tuning
External Data Sources 
• Platform API to plug Data-Sources into Spark
• Pushes logic into sources
Slide 13 Source: Databrix www.edureka.co/apache-spark-scala-training

Spark Career Path

Job Roles & Industry Focus

JobTrends

Major Companies Using Hadoop

Industry Adoption

How Companies are using Spark?

General Business Goals

Demo

The Big Question!
Is Spark going to replace Hadoop?

The Big Question!
Is Spark going to replace Hadoop?
Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce
Reasons:
1.
2.
3.
Hadoop MapReduce cannot handle real-time processing
Hadoop MapReduce is slower than Hadoop Spark
With rise of IOT, Spark is a must

Questions

Survey
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make
the course better!
Please spare few minutes to take the survey after the webinar.

Spark is going to replace Apache Hadoop! Know Why?

Spark is going to replace Apache Hadoop! Know Why?

More Related Content

Spark is going to replace Apache Hadoop! Know Why?