SlideShare a Scribd company logo
Spark is going to replace Hadoop! Know Why?
www.edureka.co/apache-spark-scala-training
Agenda
At the end of the session, you will be able to:




Understand Why Learn Spark?
Know Advantages of Spark & its Survey for 2015
Discover Spark Career Path
Understand how Companies are using Spark?
Slide 2 www.edureka.co/apache-spark-scala-training
Why Spark?
Slide 3 www.edureka.co/apache-spark-scala-training
Rise of Big Data
Unstructured Data
7000
6000
5000
4000
3000
2000
1000
0
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Structured Data Un-structured Data
 By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes
(ZB)
The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on
Earth.
Slide 4 www.edureka.co/apache-spark-scala-training
Application of Big Data
Source: Twitter
Slide 5 www.edureka.co/apache-spark-scala-training
Application of Big Data
Slide 6 www.edureka.co/apache-spark-scala-training
Hadoop is not Enough!
Limitations:
Hadoop MapReduce is Limited to Batch Processing.
Real-time processing was a big “No” in Hadoop
Real-time Processing
Hadoop MapReduce is fast but not fast enoughNot Fast Enough
Conclusion:
It is essential and can be achieved using Spark!
Slide 7 www.edureka.co/apache-spark-scala-training
Spark Survey and its Advantages
Slide 8 www.edureka.co/apache-spark-scala-training
Spark Survey 2015!
Slide 9 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
Advantages of Spark
Slide 10
Runs Everywhere
Generality
Ease of Use
100x faster than MR
www.edureka.co/apache-spark-scala-training
Feature Comparision
Slide 11 Source: Databrix
Hadoop MapReduce HADOOP Spark
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
OpenSource OpenSource
Written in Java Written in Scala
www.edureka.co/apache-spark-scala-training
Spark Features/Modules in Demand
Slide 12 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
New Features in 2015
Data Frames 
• Similar API to data frames in R and Pandas
• Automatically optimised via Spark SQL
• Released in Spark 1.3
SparkR 
• Released in Spark 1.4
• Exposes DataFrames, RDD’s & ML library in R
Machine Learning Pipelines 
• High Level API
• Featurization
• Evaluation
• Model Tuning
External Data Sources 
• Platform API to plug Data-Sources into Spark
• Pushes logic into sources
Slide 13 Source: Databrix www.edureka.co/apache-spark-scala-training
Spark Career Path
Slide 14 www.edureka.co/apache-spark-scala-training
Job Roles & Industry Focus
Slide 15 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
JobTrends
Slide 16 www.edureka.co/apache-spark-scala-training
Major Companies Using Hadoop
Slide 17 www.edureka.co/apache-spark-scala-training
Industry Adoption
Slide 18 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
How Companies are using Spark?
Slide 19 www.edureka.co/apache-spark-scala-training
General Business Goals
Slide 20 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
Demo
www.edureka.co/apache-spark-scala-training
The Big Question!
Is Spark going to replace Hadoop?
Slide 22 www.edureka.co/apache-spark-scala-training
The Big Question!
Is Spark going to replace Hadoop?
Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce
Reasons:
1.
2.
3.
Hadoop MapReduce cannot handle real-time processing
Hadoop MapReduce is slower than Hadoop Spark
With rise of IOT, Spark is a must
Slide 23 www.edureka.co/apache-spark-scala-training
Questions
Slide 24 www.edureka.co/apache-spark-scala-training
Survey
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make
the course better!
Please spare few minutes to take the survey after the webinar.
Slide 25 www.edureka.co/apache-spark-scala-training
Spark is going to replace Apache Hadoop! Know Why?

More Related Content

Spark is going to replace Apache Hadoop! Know Why?

  • 1. Spark is going to replace Hadoop! Know Why? www.edureka.co/apache-spark-scala-training
  • 2. Agenda At the end of the session, you will be able to:     Understand Why Learn Spark? Know Advantages of Spark & its Survey for 2015 Discover Spark Career Path Understand how Companies are using Spark? Slide 2 www.edureka.co/apache-spark-scala-training
  • 3. Why Spark? Slide 3 www.edureka.co/apache-spark-scala-training
  • 4. Rise of Big Data Unstructured Data 7000 6000 5000 4000 3000 2000 1000 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Structured Data Un-structured Data  By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes (ZB) The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person on Earth. Slide 4 www.edureka.co/apache-spark-scala-training
  • 5. Application of Big Data Source: Twitter Slide 5 www.edureka.co/apache-spark-scala-training
  • 6. Application of Big Data Slide 6 www.edureka.co/apache-spark-scala-training
  • 7. Hadoop is not Enough! Limitations: Hadoop MapReduce is Limited to Batch Processing. Real-time processing was a big “No” in Hadoop Real-time Processing Hadoop MapReduce is fast but not fast enoughNot Fast Enough Conclusion: It is essential and can be achieved using Spark! Slide 7 www.edureka.co/apache-spark-scala-training
  • 8. Spark Survey and its Advantages Slide 8 www.edureka.co/apache-spark-scala-training
  • 9. Spark Survey 2015! Slide 9 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
  • 10. Advantages of Spark Slide 10 Runs Everywhere Generality Ease of Use 100x faster than MR www.edureka.co/apache-spark-scala-training
  • 11. Feature Comparision Slide 11 Source: Databrix Hadoop MapReduce HADOOP Spark Fast 100x faster than MapReduce Batch Processing Batch and Real-time Processing Stores Data on Disk Stores Data in Memory OpenSource OpenSource Written in Java Written in Scala www.edureka.co/apache-spark-scala-training
  • 12. Spark Features/Modules in Demand Slide 12 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
  • 13. New Features in 2015 Data Frames  • Similar API to data frames in R and Pandas • Automatically optimised via Spark SQL • Released in Spark 1.3 SparkR  • Released in Spark 1.4 • Exposes DataFrames, RDD’s & ML library in R Machine Learning Pipelines  • High Level API • Featurization • Evaluation • Model Tuning External Data Sources  • Platform API to plug Data-Sources into Spark • Pushes logic into sources Slide 13 Source: Databrix www.edureka.co/apache-spark-scala-training
  • 14. Spark Career Path Slide 14 www.edureka.co/apache-spark-scala-training
  • 15. Job Roles & Industry Focus Slide 15 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
  • 17. Major Companies Using Hadoop Slide 17 www.edureka.co/apache-spark-scala-training
  • 18. Industry Adoption Slide 18 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
  • 19. How Companies are using Spark? Slide 19 www.edureka.co/apache-spark-scala-training
  • 20. General Business Goals Slide 20 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training
  • 22. The Big Question! Is Spark going to replace Hadoop? Slide 22 www.edureka.co/apache-spark-scala-training
  • 23. The Big Question! Is Spark going to replace Hadoop? Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce Reasons: 1. 2. 3. Hadoop MapReduce cannot handle real-time processing Hadoop MapReduce is slower than Hadoop Spark With rise of IOT, Spark is a must Slide 23 www.edureka.co/apache-spark-scala-training
  • 25. Survey Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better! Please spare few minutes to take the survey after the webinar. Slide 25 www.edureka.co/apache-spark-scala-training