A few questions about large scale machine learning

A few questions about
large-scale Machine Learning
March 9th, 2016
Theodore Vasiloudis, SICS/KTH

What can we do with machine
learning these days?

What can we do with machine learning these
days?
● We can paint pictures!

NeuralDoodle: Turning Two-Bit Doodles into Fine Artwork
Source: Champandard (2016)

days?
● We can beat top-ranked players at Go!

AlphaGo beats Lee Se-dol in first of five matches

days?
● We can beat professionals at Go!

days?
○ Optimization problems
○ i.e. we can approximate unknown functions

days?
○ Optimization problems
○ i.e. we can approximate unknown functions
● We can beat professionals at Go!
○ Probabilistic problems
○ i.e. we can also approximate unknown distributions*
*definition abuse warning, see Silver et al.
(2016) for details

What can we not do with machine
learning these days?

What can we not do with machine learning these days?
r/MachineLearning

What is large-scale machine
learning?

What do we mean?
● Small-scale learning ● Large-scale learning
Source: Léon Bottou

What do we mean?
● Small-scale learning
○ We have a small-scale learning problem
when the active budget constraint is the
number of examples.
● Large-scale learning

What do we mean?
● Small-scale learning
○ We have a small-scale learning problem
number of examples.
● Large-scale learning
○ We have a large-scale learning problem
computing time.

Do we need a cluster to do large-
scale machine learning?

Do I need a cluster?
Probably not.

● But I want one!

● But I want one!
○ Nobody ever got fired for buying a cluster
(Appuswamy 2013)

BID Data Toolkit
● Canny et al.: “Big Data Analytics with Small Footprint: Squaring the Cloud”,
KDD 2013
● Canny et al.: “BIDMach: Large-scale Learning with Zero Memory Allocation”,
NIPS 2013 BigLearn workshop

Matrix factorization on the complete Netflix dataset
System Nodes/cores Dim Error Time (s) Cost Energy (KJ)
Graphlab 18/576 100 376 $3.50 10,000
Spark 32/128 100 0.82 146 $0.40 1000
BIDMach 1 100 0.83 90 $0.015 20
Spark 32/128 200 0.82 544 $1.45 3500
BIDMach 1 200 0.83 129 $0.02 30
BIDMach 1 500 0.83 600 $0.10 150

Kmeans on MNIST-8M
System nodes
/cores
nclust Error Time (s) Cost Energy(KJ)
Spark 32/128 256 1.5e13 180 $0.45 1150
BIDMach 1 256 1.44e13 320 $0.06 90
Sk-Learn 1/8 256 3200x4 * $1.0 10
Spark 96/384 4096 1.05e13 1100 $9.00 22000
BIDMach 1 4096 0.995e13 735 $0.12 140

Roofline design
BID Data Toolkit

● But I have one!

● But I have one!
○ Install Hadoop!

● But I have one!
○ Install Hadoop!
○ Nobody ever got fired for using Hadoop on
a cluster (Rowstron 2012)

Petuum
● Xing et al.: “Petuum: A New Platform for Distributed Machine Learning on
Big Data”, KDD 2015

Source: Xing (2015)
Left: Petuum vs. YahooLDA. Right: Petuum vs. Graphlab and Spark

● But I want fault tolerance!

● But I want fault tolerance!
○ Will have to use existing data processing
system

Distributed ML Systems
● General purpose data processing systems
○ Apache Spark
○ Apache Flink

○ Apache Spark
○ Apache Flink
● Both provide large-scale ML libraries (MLlib & FlinkML)

○ Apache Spark
○ Apache Flink
● Quite different in terms of maturity

○ Apache Spark
○ Apache Flink
● Quite different in terms of maturity
● Biased opinion: Invest early in Flink

What are the problems with ML
systems?

What are the problems with ML systems?
● “Hidden technical debt in Machine Learning Systems”, Sculley et al (NIPS
2015)

● Boundary erosion

○ Entanglement

○ Entanglement
■ CACE: Changing Anything Changes Everything

○ Entanglement
○ Undeclared consumers

○ Entanglement
○ Undeclared consumers
● Data dependencies

Data ML Code Model

A few useful questions
● How easily can an entirely new algorithmic approach be tested at full
scale?
● What is the transitive closure of all data dependencies?
● How precisely can the impact of a new change to the system be
measured?
● Does improving one model or signal degrade others?
● How quickly can new members of the team be brought up to speed?

Where are we taking large-scale
machine learning?

Where are we taking large-scale machine
learning?
● Communication efficient algorithms and systems

learning?
● Communication efficient algorithms and systems
● “Computation efficient” algorithms and systems

Active research area
How to
get here?

learning?

● Unsupervised learning
learning?

● Higher-order representations
learning?

● Higher-order representations
● Reasoning about uncertainty
learning?

Thank You.
@thvasilo
tvas@sics.se

References
Parts of the structure and content of the presentation come from the KDD tutorial “A new look at the
system, algorithm and theory foundations of Distributed ML” and used with permission from Eric Xing.
● Leon Botou: Learning with Large Datasets
● Silver (2016): Mastering the game of Go with deep neural networks and tree search
● Nikulin (2016): Exploring the Neural Algorithm of Artistic Style
● Champandard (2016): Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork
● BID Data Toolkit: BID Data Project Website
● CMU Petuum: Petuum Project
● Apache Spark: spark.apache.org
● Apache Flink: flink.apache.org
Other references found in the text.
And finally:

A few questions about large scale machine learning

Related slideshows

More Related Content

A few questions about large scale machine learning