SlideShare a Scribd company logo
TensorFrames:
Google Tensorflow on
Apache Spark
Tim Hunter
Meetup 08/2016 - Salesforce
How familiar are you with Spark?
1. What is Apache Spark?
2. I have used Spark
3. I am using Spark in production or I
contribute to its development
2
How familiar are you with TensorFlow?
1. What is TensorFlow?
2. I have heard about it
3. I am training my own neural networks
3
Founded by the team who
created Apache Spark
Offers a hosted service:
- Apache Spark in the
cloud
- Notebooks
- Cluster management
- Production environment
About Databricks
4
Software engineer at Databricks
Apache Spark contributor
Ph.D. UC Berkeley in Machine
Learning
(and Spark user since Spark 0.5)
About me
5
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
6
Numerical computing for Data
Science
• Queries are data-heavy
• However algorithms are computation-heavy
• They operate on simple data types: integers,
floats, doubles, vectors, matrices
7
The case for speed
• Numerical bottlenecks are good targets for
optimization
• Let data scientists get faster results
• Faster turnaround for experimentations
• How can we run these numerical algorithms
faster?
8
Evolution of computing power
9
Failure is not an option:
it is a fact
When you can afford your dedicated chip
GPGPU
Scale out
Scaleup
Evolution of computing power
10
NLTK
Theano
Today’s talk:
Spark + TensorFlow
Evolution of computing power
• Processor speed cannot keep up with memory
and network improvements
• Access to the processor is the new bottleneck
• Project Tungsten in Spark: leverage the
processor’s heuristics for executing code and
fetching memory
• Does not account for the fact that the problem is
numerical
11
Asynchronous vs. synchronous
• Asynchronous algorithms perform updates concurrently
• Spark is synchronous model, deep learning frameworks
usually asynchronous
• A large number of ML computations are synchronous
• Even deep learning may benefit from synchronous
updates
12
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
13
GPGPUs
14
• Graphics Processing Units for General Purpose
computations
6000
Theoretical peak
throughput
GPU CPU
Theoretical peak
bandwidth
GPU CPU
• Library for writing “machine intelligence”
algorithms
• Very popular for deep learning and neural
networks
• Can also be used for general purpose
numerical computations
• Interface in C++ and Python
15
Google TensorFlow
Numerical dataflow with Tensorflow
16
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
session = tf.Session()
output_value = session.run(output,
{x: 3, y: 5})
x:
int32
y:
int32
mul 3
z
Numerical dataflow with Spark
df = sqlContext.createDataFrame(…)
x = tf.placeholder(tf.int32, name=“x”)
y = tf.placeholder(tf.int32, name=“y”)
output = tf.add(x, 3 * y, name=“z”)
output_df = tfs.map_rows(output, df)
output_df.collect()
df: DataFrame[x: int, y: int]
output_df:
DataFrame[x: int, y: int, z: int]
x:
int32
y:
int32
mul 3
z
Demo
18
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
19
20
It is a communication problem
Spark worker process Worker python process
C++
buffer
Python
pickle
Tungsten
binary
format
Python
pickle
Java
object
21
TensorFrames: native embedding of
TensorFlow
Spark worker process
C++
buffer
Tungsten
binary
format
Java
object
• Estimation of
distribution from
samples
• Non-parametric
• Unknown bandwidth
parameter
• Can be evaluated with
goodness of fit
An example: kernel density scoring
22
• In practice, compute:
with:
• In a nutshell: a complex numerical function
An example: kernel density scoring
23
24
Speedup
0
60
120
180
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU
Runtime(sec)
def score(x: Double): Double = {
val dis = points.map { z_k =>
- (x - z_k) * (x - z_k) / ( 2 * b * b)
}
val minDis = dis.min
val exps = dis.map(d => math.exp(d - minDis))
minDis - math.log(b * N) + math.log(exps.sum)
}
val scoreUDF = sqlContext.udf.register("scoreUDF", score _)
sql("select sum(scoreUDF(sample)) from samples").collect()
25
Speedup
0
60
120
180
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU
Runtime(sec)
def score(x: Double): Double = {
val dis = new Array[Double](N)
var idx = 0
while(idx < N) {
val z_k = points(idx)
dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b)
idx += 1
}
val minDis = dis.min
var expSum = 0.0
idx = 0
while(idx < N) {
expSum += math.exp(dis(idx) - minDis)
idx += 1
}
minDis - math.log(b * N) + math.log(expSum)
}
val scoreUDF = sqlContext.udf.register("scoreUDF", score _)
sql("select sum(scoreUDF(sample)) from samples").collect()
26
Speedup
0
60
120
180
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU
Runtime(sec)
def cost_fun(block, bandwidth):
distances = - square(constant(X) - sample) / (2 * b * b)
m = reduce_max(distances, 0)
x = log(reduce_sum(exp(distances - m), 0))
return identity(x + m - log(b * N), name="score”)
sample = tfs.block(df, "sample")
score = cost_fun(sample, bandwidth=0.5)
df.agg(sum(tfs.map_blocks(score, df))).collect()
27
Speedup
0
60
120
180
Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU
Runtime(sec)
def cost_fun(block, bandwidth):
distances = - square(constant(X) - sample) / (2 * b * b)
m = reduce_max(distances, 0)
x = log(reduce_sum(exp(distances - m), 0))
return identity(x + m - log(b * N), name="score”)
with device("/gpu"):
sample = tfs.block(df, "sample")
score = cost_fun(sample, bandwidth=0.5)
df.agg(sum(tfs.map_blocks(score, df))).collect()
Demo: Deep dreams
28
Demo: Deep dreams
29
Outline
• Numerical computing with Apache Spark
• Using GPUs with Spark and TensorFlow
• Performance details
• The future
30
31
Improving communication
Spark worker process
C++
buffer
Tungsten
binary
format
Java
object
Direct memory copy
Columnar
storage
The future
• Integration with Tungsten:
• Direct memory copy
• Columnar storage
• Better integration with MLlib data types
• GPU instances in Databricks:
Official support coming this fall
32
Recap
• Spark: an efficient framework for running
computations on thousands of computers
• TensorFlow: high-performance numerical
framework
• Get the best of both with TensorFrames:
• Simple API for distributed numerical computing
• Can leverage the hardware of the cluster
33
Try these demos yourself
• TensorFrames source code and documentation:
github.com/databricks/tensorframes
spark-packages.org/package/databricks/tensorframes
• Demo notebooks available on Databricks
• The official TensorFlow website:
www.tensorflow.org
34
Spark Summit EU 2016
15% Discount Code: DatabricksEU16
35
Thank you.

More Related Content

TensorFrames: Google Tensorflow on Apache Spark

  • 1. TensorFrames: Google Tensorflow on Apache Spark Tim Hunter Meetup 08/2016 - Salesforce
  • 2. How familiar are you with Spark? 1. What is Apache Spark? 2. I have used Spark 3. I am using Spark in production or I contribute to its development 2
  • 3. How familiar are you with TensorFlow? 1. What is TensorFlow? 2. I have heard about it 3. I am training my own neural networks 3
  • 4. Founded by the team who created Apache Spark Offers a hosted service: - Apache Spark in the cloud - Notebooks - Cluster management - Production environment About Databricks 4
  • 5. Software engineer at Databricks Apache Spark contributor Ph.D. UC Berkeley in Machine Learning (and Spark user since Spark 0.5) About me 5
  • 6. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 6
  • 7. Numerical computing for Data Science • Queries are data-heavy • However algorithms are computation-heavy • They operate on simple data types: integers, floats, doubles, vectors, matrices 7
  • 8. The case for speed • Numerical bottlenecks are good targets for optimization • Let data scientists get faster results • Faster turnaround for experimentations • How can we run these numerical algorithms faster? 8
  • 9. Evolution of computing power 9 Failure is not an option: it is a fact When you can afford your dedicated chip GPGPU Scale out Scaleup
  • 10. Evolution of computing power 10 NLTK Theano Today’s talk: Spark + TensorFlow
  • 11. Evolution of computing power • Processor speed cannot keep up with memory and network improvements • Access to the processor is the new bottleneck • Project Tungsten in Spark: leverage the processor’s heuristics for executing code and fetching memory • Does not account for the fact that the problem is numerical 11
  • 12. Asynchronous vs. synchronous • Asynchronous algorithms perform updates concurrently • Spark is synchronous model, deep learning frameworks usually asynchronous • A large number of ML computations are synchronous • Even deep learning may benefit from synchronous updates 12
  • 13. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 13
  • 14. GPGPUs 14 • Graphics Processing Units for General Purpose computations 6000 Theoretical peak throughput GPU CPU Theoretical peak bandwidth GPU CPU
  • 15. • Library for writing “machine intelligence” algorithms • Very popular for deep learning and neural networks • Can also be used for general purpose numerical computations • Interface in C++ and Python 15 Google TensorFlow
  • 16. Numerical dataflow with Tensorflow 16 x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) session = tf.Session() output_value = session.run(output, {x: 3, y: 5}) x: int32 y: int32 mul 3 z
  • 17. Numerical dataflow with Spark df = sqlContext.createDataFrame(…) x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) output_df = tfs.map_rows(output, df) output_df.collect() df: DataFrame[x: int, y: int] output_df: DataFrame[x: int, y: int, z: int] x: int32 y: int32 mul 3 z
  • 19. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 19
  • 20. 20 It is a communication problem Spark worker process Worker python process C++ buffer Python pickle Tungsten binary format Python pickle Java object
  • 21. 21 TensorFrames: native embedding of TensorFlow Spark worker process C++ buffer Tungsten binary format Java object
  • 22. • Estimation of distribution from samples • Non-parametric • Unknown bandwidth parameter • Can be evaluated with goodness of fit An example: kernel density scoring 22
  • 23. • In practice, compute: with: • In a nutshell: a complex numerical function An example: kernel density scoring 23
  • 24. 24 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def score(x: Double): Double = { val dis = points.map { z_k => - (x - z_k) * (x - z_k) / ( 2 * b * b) } val minDis = dis.min val exps = dis.map(d => math.exp(d - minDis)) minDis - math.log(b * N) + math.log(exps.sum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  • 25. 25 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def score(x: Double): Double = { val dis = new Array[Double](N) var idx = 0 while(idx < N) { val z_k = points(idx) dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b) idx += 1 } val minDis = dis.min var expSum = 0.0 idx = 0 while(idx < N) { expSum += math.exp(dis(idx) - minDis) idx += 1 } minDis - math.log(b * N) + math.log(expSum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
  • 26. 26 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  • 27. 27 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) with device("/gpu"): sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
  • 30. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 30
  • 31. 31 Improving communication Spark worker process C++ buffer Tungsten binary format Java object Direct memory copy Columnar storage
  • 32. The future • Integration with Tungsten: • Direct memory copy • Columnar storage • Better integration with MLlib data types • GPU instances in Databricks: Official support coming this fall 32
  • 33. Recap • Spark: an efficient framework for running computations on thousands of computers • TensorFlow: high-performance numerical framework • Get the best of both with TensorFrames: • Simple API for distributed numerical computing • Can leverage the hardware of the cluster 33
  • 34. Try these demos yourself • TensorFrames source code and documentation: github.com/databricks/tensorframes spark-packages.org/package/databricks/tensorframes • Demo notebooks available on Databricks • The official TensorFlow website: www.tensorflow.org 34
  • 35. Spark Summit EU 2016 15% Discount Code: DatabricksEU16 35

Editor's Notes

  1. Explain that TensorFlow is a library for deep learning
  2. list a few algorithms: deep learning, clustering, classification, etc. business logic and analysis more concerned usually with complex structures: text, lists, associations like dictionaries The bread and butter of data science can be told in 3 words: integers, floats and doubles. Slicing and dicing data: matrices, vectors, reals
  3. not everybody is a fortran or C++ programmer. There is considerable friction in writing optimized algorithms. How can we lower the barrier?
  4. scale up or scale The Holy Grail:a large number of specialized processors you have 2 options: better computers or more computers
  5. For all these configurations of hardware, there are even more frameworks and libraries to access them, and each of them has strengths and weaknesses the classics for single machine use the distributed frameworks: Spark, Mahout, MapReduce the libraries to access specialized hardware: CUDA and OpenCL for parallel programming in the middle, MPI: it is hard to program and it is not very resilient to hardware failures Then frameworks built on top of these in the recent years for deep learning and computer vision The trend is to have multiple graphic cards communicate
  6. MLlib has KDE, but how about making it work for other data types like floats, or other kernels?
  7. my phd adviser used to tell me that you always have to include one equation to show that you mean serious business
  8. do not talk about UDF, simply say you can wrap scala function inside the SQL engine UDF: it is a scala function and you can run it inside a SQL query
  9. start from login,homepage disable debug menu go more slowly for demo