SlideShare a Scribd company logo
A few questions about
large-scale Machine Learning
March 9th, 2016
Theodore Vasiloudis, SICS/KTH
What can we do with machine
learning these days?
What can we do with machine learning these
days?
● We can paint pictures!
Source: Nikulin (2016)
NeuralDoodle: Turning Two-Bit Doodles into Fine Artwork
Source: Champandard (2016)
What can we do with machine learning these
days?
● We can paint pictures!
● We can beat top-ranked players at Go!
AlphaGo beats Lee Se-dol in first of five matches
What can we do with machine learning these
days?
● We can paint pictures!
● We can beat professionals at Go!
What can we do with machine learning these
days?
● We can paint pictures!
○ Optimization problems
○ i.e. we can approximate unknown functions
What can we do with machine learning these
days?
● We can paint pictures!
○ Optimization problems
○ i.e. we can approximate unknown functions
● We can beat professionals at Go!
○ Probabilistic problems
○ i.e. we can also approximate unknown distributions*
*definition abuse warning, see Silver et al.
(2016) for details
What can we not do with machine
learning these days?
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What can we not do with machine learning these days?
r/MachineLearning
What is large-scale machine
learning?
What do we mean?
What do we mean?
● Small-scale learning ● Large-scale learning
Source: Léon Bottou
What do we mean?
● Small-scale learning
○ We have a small-scale learning problem
when the active budget constraint is the
number of examples.
● Large-scale learning
Source: Léon Bottou
What do we mean?
● Small-scale learning
○ We have a small-scale learning problem
when the active budget constraint is the
number of examples.
● Large-scale learning
○ We have a large-scale learning problem
when the active budget constraint is the
computing time.
Source: Léon Bottou
Do we need a cluster to do large-
scale machine learning?
Do I need a cluster?
Probably not.
Do I need a cluster?
● But I want one!
Do I need a cluster?
● But I want one!
○ Nobody ever got fired for buying a cluster
(Appuswamy 2013)
BID Data Toolkit
● Canny et al.: “Big Data Analytics with Small Footprint: Squaring the Cloud”,
KDD 2013
● Canny et al.: “BIDMach: Large-scale Learning with Zero Memory Allocation”,
NIPS 2013 BigLearn workshop
Matrix factorization on the complete Netflix dataset
System Nodes/cores Dim Error Time (s) Cost Energy (KJ)
Graphlab 18/576 100 376 $3.50 10,000
Spark 32/128 100 0.82 146 $0.40 1000
BIDMach 1 100 0.83 90 $0.015 20
Spark 32/128 200 0.82 544 $1.45 3500
BIDMach 1 200 0.83 129 $0.02 30
BIDMach 1 500 0.83 600 $0.10 150
Kmeans on MNIST-8M
System nodes
/cores
nclust Error Time (s) Cost Energy(KJ)
Spark 32/128 256 1.5e13 180 $0.45 1150
BIDMach 1 256 1.44e13 320 $0.06 90
Sk-Learn 1/8 256 3200x4 * $1.0 10
Spark 96/384 4096 1.05e13 1100 $9.00 22000
BIDMach 1 4096 0.995e13 735 $0.12 140
Roofline design
BID Data Toolkit
Roofline design
BID Data Toolkit
Do I need a cluster?
● But I have one!
Do I need a cluster?
● But I have one!
○ Install Hadoop!
Do I need a cluster?
● But I have one!
○ Install Hadoop!
○ Nobody ever got fired for using Hadoop on
a cluster (Rowstron 2012)
Petuum
● Xing et al.: “Petuum: A New Platform for Distributed Machine Learning on
Big Data”, KDD 2015
Source: Xing (2015)
Left: Petuum vs. YahooLDA. Right: Petuum vs. Graphlab and Spark
Do I need a cluster?
● But I want fault tolerance!
Do I need a cluster?
● But I want fault tolerance!
○ Will have to use existing data processing
system
Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
● Quite different in terms of maturity
Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
● Quite different in terms of maturity
● Biased opinion: Invest early in Flink
Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
● Quite different in terms of maturity
● Biased opinion: Invest early in Flink
What are the problems with ML
systems?
What are the problems with ML systems?
● “Hidden technical debt in Machine Learning Systems”, Sculley et al (NIPS
2015)
What are the problems with ML systems?
● Boundary erosion
What are the problems with ML systems?
● Boundary erosion
○ Entanglement
What are the problems with ML systems?
● Boundary erosion
○ Entanglement
■ CACE: Changing Anything Changes Everything
What are the problems with ML systems?
● Boundary erosion
○ Entanglement
■ CACE: Changing Anything Changes Everything
○ Undeclared consumers
What are the problems with ML systems?
● Boundary erosion
○ Entanglement
■ CACE: Changing Anything Changes Everything
○ Undeclared consumers
● Data dependencies
What are the problems with ML systems?
What are the problems with ML systems?
Data ML Code Model
What are the problems with ML systems?
Data ML Code Model
A few useful questions
A few useful questions
● How easily can an entirely new algorithmic approach be tested at full
scale?
● What is the transitive closure of all data dependencies?
● How precisely can the impact of a new change to the system be
measured?
● Does improving one model or signal degrade others?
● How quickly can new members of the team be brought up to speed?
Where are we taking large-scale
machine learning?
Where are we taking large-scale machine
learning?
● Communication efficient algorithms and systems
Where are we taking large-scale machine
learning?
● Communication efficient algorithms and systems
● “Computation efficient” algorithms and systems
Active research area
How to
get here?
Where are we taking large-scale machine
learning?
● Unsupervised learning
Where are we taking large-scale machine
learning?
● Unsupervised learning
● Higher-order representations
Where are we taking large-scale machine
learning?
● Unsupervised learning
● Higher-order representations
● Reasoning about uncertainty
Where are we taking large-scale machine
learning?
Thank You.
@thvasilo
tvas@sics.se
References
Parts of the structure and content of the presentation come from the KDD tutorial “A new look at the
system, algorithm and theory foundations of Distributed ML” and used with permission from Eric Xing.
● Leon Botou: Learning with Large Datasets
● Silver (2016): Mastering the game of Go with deep neural networks and tree search
● Nikulin (2016): Exploring the Neural Algorithm of Artistic Style
● Champandard (2016): Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork
● BID Data Toolkit: BID Data Project Website
● CMU Petuum: Petuum Project
● Apache Spark: spark.apache.org
● Apache Flink: flink.apache.org
Other references found in the text.
And finally:

More Related Content

A few questions about large scale machine learning

  • 1. A few questions about large-scale Machine Learning March 9th, 2016 Theodore Vasiloudis, SICS/KTH
  • 2. What can we do with machine learning these days?
  • 3. What can we do with machine learning these days? ● We can paint pictures!
  • 5. NeuralDoodle: Turning Two-Bit Doodles into Fine Artwork Source: Champandard (2016)
  • 6. What can we do with machine learning these days? ● We can paint pictures! ● We can beat top-ranked players at Go!
  • 7. AlphaGo beats Lee Se-dol in first of five matches
  • 8. What can we do with machine learning these days? ● We can paint pictures! ● We can beat professionals at Go!
  • 9. What can we do with machine learning these days? ● We can paint pictures! ○ Optimization problems ○ i.e. we can approximate unknown functions
  • 10. What can we do with machine learning these days? ● We can paint pictures! ○ Optimization problems ○ i.e. we can approximate unknown functions ● We can beat professionals at Go! ○ Probabilistic problems ○ i.e. we can also approximate unknown distributions* *definition abuse warning, see Silver et al. (2016) for details
  • 11. What can we not do with machine learning these days?
  • 12. What can we not do with machine learning these days? r/MachineLearning
  • 13. What can we not do with machine learning these days? r/MachineLearning
  • 14. What can we not do with machine learning these days? r/MachineLearning
  • 15. What can we not do with machine learning these days? r/MachineLearning
  • 16. What can we not do with machine learning these days? r/MachineLearning
  • 17. What can we not do with machine learning these days? r/MachineLearning
  • 18. What can we not do with machine learning these days? r/MachineLearning
  • 19. What can we not do with machine learning these days? r/MachineLearning
  • 20. What is large-scale machine learning?
  • 21. What do we mean?
  • 22. What do we mean? ● Small-scale learning ● Large-scale learning Source: Léon Bottou
  • 23. What do we mean? ● Small-scale learning ○ We have a small-scale learning problem when the active budget constraint is the number of examples. ● Large-scale learning Source: Léon Bottou
  • 24. What do we mean? ● Small-scale learning ○ We have a small-scale learning problem when the active budget constraint is the number of examples. ● Large-scale learning ○ We have a large-scale learning problem when the active budget constraint is the computing time. Source: Léon Bottou
  • 25. Do we need a cluster to do large- scale machine learning?
  • 26. Do I need a cluster? Probably not.
  • 27. Do I need a cluster? ● But I want one!
  • 28. Do I need a cluster? ● But I want one! ○ Nobody ever got fired for buying a cluster (Appuswamy 2013)
  • 29. BID Data Toolkit ● Canny et al.: “Big Data Analytics with Small Footprint: Squaring the Cloud”, KDD 2013 ● Canny et al.: “BIDMach: Large-scale Learning with Zero Memory Allocation”, NIPS 2013 BigLearn workshop
  • 30. Matrix factorization on the complete Netflix dataset System Nodes/cores Dim Error Time (s) Cost Energy (KJ) Graphlab 18/576 100 376 $3.50 10,000 Spark 32/128 100 0.82 146 $0.40 1000 BIDMach 1 100 0.83 90 $0.015 20 Spark 32/128 200 0.82 544 $1.45 3500 BIDMach 1 200 0.83 129 $0.02 30 BIDMach 1 500 0.83 600 $0.10 150
  • 31. Kmeans on MNIST-8M System nodes /cores nclust Error Time (s) Cost Energy(KJ) Spark 32/128 256 1.5e13 180 $0.45 1150 BIDMach 1 256 1.44e13 320 $0.06 90 Sk-Learn 1/8 256 3200x4 * $1.0 10 Spark 96/384 4096 1.05e13 1100 $9.00 22000 BIDMach 1 4096 0.995e13 735 $0.12 140
  • 34. Do I need a cluster? ● But I have one!
  • 35. Do I need a cluster? ● But I have one! ○ Install Hadoop!
  • 36. Do I need a cluster? ● But I have one! ○ Install Hadoop! ○ Nobody ever got fired for using Hadoop on a cluster (Rowstron 2012)
  • 37. Petuum ● Xing et al.: “Petuum: A New Platform for Distributed Machine Learning on Big Data”, KDD 2015
  • 38. Source: Xing (2015) Left: Petuum vs. YahooLDA. Right: Petuum vs. Graphlab and Spark
  • 39. Do I need a cluster? ● But I want fault tolerance!
  • 40. Do I need a cluster? ● But I want fault tolerance! ○ Will have to use existing data processing system
  • 41. Distributed ML Systems ● General purpose data processing systems ○ Apache Spark ○ Apache Flink
  • 42. Distributed ML Systems ● General purpose data processing systems ○ Apache Spark ○ Apache Flink ● Both provide large-scale ML libraries (MLlib & FlinkML)
  • 43. Distributed ML Systems ● General purpose data processing systems ○ Apache Spark ○ Apache Flink ● Both provide large-scale ML libraries (MLlib & FlinkML) ● Quite different in terms of maturity
  • 44. Distributed ML Systems ● General purpose data processing systems ○ Apache Spark ○ Apache Flink ● Both provide large-scale ML libraries (MLlib & FlinkML) ● Quite different in terms of maturity ● Biased opinion: Invest early in Flink
  • 45. Distributed ML Systems ● General purpose data processing systems ○ Apache Spark ○ Apache Flink ● Both provide large-scale ML libraries (MLlib & FlinkML) ● Quite different in terms of maturity ● Biased opinion: Invest early in Flink
  • 46. What are the problems with ML systems?
  • 47. What are the problems with ML systems? ● “Hidden technical debt in Machine Learning Systems”, Sculley et al (NIPS 2015)
  • 48. What are the problems with ML systems? ● Boundary erosion
  • 49. What are the problems with ML systems? ● Boundary erosion ○ Entanglement
  • 50. What are the problems with ML systems? ● Boundary erosion ○ Entanglement ■ CACE: Changing Anything Changes Everything
  • 51. What are the problems with ML systems? ● Boundary erosion ○ Entanglement ■ CACE: Changing Anything Changes Everything ○ Undeclared consumers
  • 52. What are the problems with ML systems? ● Boundary erosion ○ Entanglement ■ CACE: Changing Anything Changes Everything ○ Undeclared consumers ● Data dependencies
  • 53. What are the problems with ML systems?
  • 54. What are the problems with ML systems? Data ML Code Model
  • 55. What are the problems with ML systems? Data ML Code Model
  • 56. A few useful questions
  • 57. A few useful questions ● How easily can an entirely new algorithmic approach be tested at full scale? ● What is the transitive closure of all data dependencies? ● How precisely can the impact of a new change to the system be measured? ● Does improving one model or signal degrade others? ● How quickly can new members of the team be brought up to speed?
  • 58. Where are we taking large-scale machine learning?
  • 59. Where are we taking large-scale machine learning? ● Communication efficient algorithms and systems
  • 60. Where are we taking large-scale machine learning? ● Communication efficient algorithms and systems ● “Computation efficient” algorithms and systems
  • 61. Active research area How to get here?
  • 62. Where are we taking large-scale machine learning?
  • 63. ● Unsupervised learning Where are we taking large-scale machine learning?
  • 64. ● Unsupervised learning ● Higher-order representations Where are we taking large-scale machine learning?
  • 65. ● Unsupervised learning ● Higher-order representations ● Reasoning about uncertainty Where are we taking large-scale machine learning?
  • 67. References Parts of the structure and content of the presentation come from the KDD tutorial “A new look at the system, algorithm and theory foundations of Distributed ML” and used with permission from Eric Xing. ● Leon Botou: Learning with Large Datasets ● Silver (2016): Mastering the game of Go with deep neural networks and tree search ● Nikulin (2016): Exploring the Neural Algorithm of Artistic Style ● Champandard (2016): Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork ● BID Data Toolkit: BID Data Project Website ● CMU Petuum: Petuum Project ● Apache Spark: spark.apache.org ● Apache Flink: flink.apache.org Other references found in the text. And finally: