A few questions about large scale machine learning
- 1. A few questions about
large-scale Machine Learning
March 9th, 2016
Theodore Vasiloudis, SICS/KTH
- 3. What can we do with machine learning these
days?
● We can paint pictures!
- 6. What can we do with machine learning these
days?
● We can paint pictures!
● We can beat top-ranked players at Go!
- 8. What can we do with machine learning these
days?
● We can paint pictures!
● We can beat professionals at Go!
- 9. What can we do with machine learning these
days?
● We can paint pictures!
○ Optimization problems
○ i.e. we can approximate unknown functions
- 10. What can we do with machine learning these
days?
● We can paint pictures!
○ Optimization problems
○ i.e. we can approximate unknown functions
● We can beat professionals at Go!
○ Probabilistic problems
○ i.e. we can also approximate unknown distributions*
*definition abuse warning, see Silver et al.
(2016) for details
- 11. What can we not do with machine
learning these days?
- 12. What can we not do with machine learning these days?
r/MachineLearning
- 13. What can we not do with machine learning these days?
r/MachineLearning
- 14. What can we not do with machine learning these days?
r/MachineLearning
- 15. What can we not do with machine learning these days?
r/MachineLearning
- 16. What can we not do with machine learning these days?
r/MachineLearning
- 17. What can we not do with machine learning these days?
r/MachineLearning
- 18. What can we not do with machine learning these days?
r/MachineLearning
- 19. What can we not do with machine learning these days?
r/MachineLearning
- 22. What do we mean?
● Small-scale learning ● Large-scale learning
Source: Léon Bottou
- 23. What do we mean?
● Small-scale learning
○ We have a small-scale learning problem
when the active budget constraint is the
number of examples.
● Large-scale learning
Source: Léon Bottou
- 24. What do we mean?
● Small-scale learning
○ We have a small-scale learning problem
when the active budget constraint is the
number of examples.
● Large-scale learning
○ We have a large-scale learning problem
when the active budget constraint is the
computing time.
Source: Léon Bottou
- 25. Do we need a cluster to do large-
scale machine learning?
- 28. Do I need a cluster?
● But I want one!
○ Nobody ever got fired for buying a cluster
(Appuswamy 2013)
- 29. BID Data Toolkit
● Canny et al.: “Big Data Analytics with Small Footprint: Squaring the Cloud”,
KDD 2013
● Canny et al.: “BIDMach: Large-scale Learning with Zero Memory Allocation”,
NIPS 2013 BigLearn workshop
- 30. Matrix factorization on the complete Netflix dataset
System Nodes/cores Dim Error Time (s) Cost Energy (KJ)
Graphlab 18/576 100 376 $3.50 10,000
Spark 32/128 100 0.82 146 $0.40 1000
BIDMach 1 100 0.83 90 $0.015 20
Spark 32/128 200 0.82 544 $1.45 3500
BIDMach 1 200 0.83 129 $0.02 30
BIDMach 1 500 0.83 600 $0.10 150
- 31. Kmeans on MNIST-8M
System nodes
/cores
nclust Error Time (s) Cost Energy(KJ)
Spark 32/128 256 1.5e13 180 $0.45 1150
BIDMach 1 256 1.44e13 320 $0.06 90
Sk-Learn 1/8 256 3200x4 * $1.0 10
Spark 96/384 4096 1.05e13 1100 $9.00 22000
BIDMach 1 4096 0.995e13 735 $0.12 140
- 35. Do I need a cluster?
● But I have one!
○ Install Hadoop!
- 36. Do I need a cluster?
● But I have one!
○ Install Hadoop!
○ Nobody ever got fired for using Hadoop on
a cluster (Rowstron 2012)
- 37. Petuum
● Xing et al.: “Petuum: A New Platform for Distributed Machine Learning on
Big Data”, KDD 2015
- 39. Do I need a cluster?
● But I want fault tolerance!
- 40. Do I need a cluster?
● But I want fault tolerance!
○ Will have to use existing data processing
system
- 42. Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
- 43. Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
● Quite different in terms of maturity
- 44. Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
● Quite different in terms of maturity
● Biased opinion: Invest early in Flink
- 45. Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)
● Quite different in terms of maturity
● Biased opinion: Invest early in Flink
- 47. What are the problems with ML systems?
● “Hidden technical debt in Machine Learning Systems”, Sculley et al (NIPS
2015)
- 48. What are the problems with ML systems?
● Boundary erosion
- 49. What are the problems with ML systems?
● Boundary erosion
○ Entanglement
- 50. What are the problems with ML systems?
● Boundary erosion
○ Entanglement
■ CACE: Changing Anything Changes Everything
- 51. What are the problems with ML systems?
● Boundary erosion
○ Entanglement
■ CACE: Changing Anything Changes Everything
○ Undeclared consumers
- 52. What are the problems with ML systems?
● Boundary erosion
○ Entanglement
■ CACE: Changing Anything Changes Everything
○ Undeclared consumers
● Data dependencies
- 54. What are the problems with ML systems?
Data ML Code Model
- 55. What are the problems with ML systems?
Data ML Code Model
- 57. A few useful questions
● How easily can an entirely new algorithmic approach be tested at full
scale?
● What is the transitive closure of all data dependencies?
● How precisely can the impact of a new change to the system be
measured?
● Does improving one model or signal degrade others?
● How quickly can new members of the team be brought up to speed?
- 59. Where are we taking large-scale machine
learning?
● Communication efficient algorithms and systems
- 60. Where are we taking large-scale machine
learning?
● Communication efficient algorithms and systems
● “Computation efficient” algorithms and systems
- 65. ● Unsupervised learning
● Higher-order representations
● Reasoning about uncertainty
Where are we taking large-scale machine
learning?
- 67. References
Parts of the structure and content of the presentation come from the KDD tutorial “A new look at the
system, algorithm and theory foundations of Distributed ML” and used with permission from Eric Xing.
● Leon Botou: Learning with Large Datasets
● Silver (2016): Mastering the game of Go with deep neural networks and tree search
● Nikulin (2016): Exploring the Neural Algorithm of Artistic Style
● Champandard (2016): Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork
● BID Data Toolkit: BID Data Project Website
● CMU Petuum: Petuum Project
● Apache Spark: spark.apache.org
● Apache Flink: flink.apache.org
Other references found in the text.
And finally: