SlideShare a Scribd company logo
Domino Data Lab November 10, 2015
Faster data science — without a cluster
Parallel programming in Python
Manojit Nandi
mnandi92@gmail.com
@mnandi92
Who am I?
Domino Data Lab November 10, 2015
• Data Scientist at STEALTHBits Technologies





• Data Science Evangelist at Domino Data Lab





• BS in Decision Science
Domino Data Lab November 10, 2015
Agenda and Goals
• Motivation
• Conceptual intro to parallelism, general principles and pitfalls
• Machine learning applications
• Demos
Goal: Leave you with principles, and practical concrete tools, that will
help you run your code much faster
Motivation
Domino Data Lab November 10, 2015
• Lots of “medium data” problems
• Can fit in memory on one machine
• Lots of naturally parallel problems

• Easy to access large machines
• Clusters are hard
• Not everything fits map-reduce
CPUs with multiple cores have become the standard in the recent
development of modern computer architectures and we can not only find
them in supercomputer facilities but also in our desktop machines at
home, and our laptops; even Apple's iPhone 5S got a 1.3 GHz Dual-core
processor in 2013.
- Sebastian Rascka
Parallel programing 101
Domino Data Lab November 10, 2015
• Think about independent tasks (hint: “for” loops are a good place to start!)
• Should be CPU-bound tasks
• Warning and pitfalls
• Not a substitute for good code
• Overhead
• Shared resource contention
• Thrashing
Source: Blaise Barney, Lawrence Livermore National Laboratory
Can parallelize at different “levels”
Domino Data Lab November 10, 2015
Will focus on algorithms, with some brief comments on
Experiments
Run against underlying libraries that parallelize
low-level operations, e.g., openBLAS, ATLAS
Write your code (or use a package) to
parallelize functions or steps within your
analysis
Run different analyses at once
Math ops
Algorithms
Experiments
Parallelize tasks to match your resources
Domino Data Lab November 10, 2015
Computing something (CPU)

Reading from disk/database

Writing to disk/database

Network IO (e.g., web scraping)
Saturating a resource will create a bottleneck
Don't oversaturate your resources
Domino Data Lab November 10, 2015
itemIDs = [1, 2, … , n]
parallel-for-each(i = itemIDs){
item = fetchData(i)
result = computeSomething(item)
saveResult(result)
}
Parallelize tasks to match your resources
Domino Data Lab November 10, 2015
items = fetchData([1, 2, … , n])
results = parallel-for-each(i = items){
computeSomething(item)
}
saveResult(results)
Avoid modifying global state
Domino Data Lab November 10, 2015
itemIDs = [0, 0, 0, 0]
parallel-for-each(i = 1:4) {
itemIDs[i] = i
}
A = [0,0,0,0]Array initialized in process 1
[0,0,0,0] [0,0,0,0][0,0,0,0][0,0,0,0]Array copied to each sub-process
[0,0,0,0] [0,0,0,3][0,0,2,0][0,1,0,0]The copy is modified
[0,0,0,0]
When all parallel tasks finish, array in original process remained unchanged
Demo
Domino Data Lab November 10, 2015
Many ML tasks are parallelized
Domino Data Lab November 10, 2015
• Cross-Validation
• Grid Search Selection
• Random Forest
• Kernel Density Estimation
• K-Means Clustering
• Probabilistic Graphical Models
• Online Learning
• Neural Networks (Backpropagation)
Harder to parallelize
Intuitive to parallelize
Cross validation
Domino Data Lab November 10, 2015
Grid search
Domino Data Lab November 10, 2015
1 10 100 1000
Linear
RBF
C
Kernel
Random forest
Domino Data Lab November 10, 2015
Parallel programing in Python
Domino Data Lab November 10, 2015
• Joblib

pythonhosted.org/joblib/parallel.html
• scikit-learn (n_jobs) scikit-learn.org
• GridSearchCV
• RandomForest
• KMeans
• cross_val_score
• IPython Notebook clusters

www.astro.washington.edu/users/vanderplas/Astr599/notebooks/
21_IPythonParallel
Demo
Domino Data Lab November 10, 2015
Parallel Programming using the GPU
Domino Data Lab November 10, 2015
• GPUs are essential to deep learning
because they can yield 10x speed-up
when training the neural networks.
• Use PyCUDA library to write Python
code that executes using the GPU.
Demo
Domino Data Lab November 10, 2015
Can compose layers of parallelism
Domino Data Lab November 10, 2015
c1 c2 cn… c1 c2 cn…c1 c2 cn…
Machines

(experiments)
Cores
RF NN GridSearched 

SVC
Demo
Domino Data Lab November 10, 2015
FYI: Parallel programing in R
Domino Data Lab November 10, 2015
• General purpose
• parallel
• foreach

cran.r-project.org/web/packages/foreach
• More specialized
• randomForest

cran.r-project.org/web/packages/randomForest
• caret

topepo.github.io/caret
• plyr

cran.r-project.org/web/packages/plyr
Domino Data Lab November 10, 2015
dominodatalab.com
blog.dominodatalab.com
@dominodatalab
Check us out!

More Related Content

Parallel Programming in Python: Speeding up your analysis

  • 1. Domino Data Lab November 10, 2015 Faster data science — without a cluster Parallel programming in Python Manojit Nandi mnandi92@gmail.com @mnandi92
  • 2. Who am I? Domino Data Lab November 10, 2015 • Data Scientist at STEALTHBits Technologies
 
 
 • Data Science Evangelist at Domino Data Lab
 
 
 • BS in Decision Science
  • 3. Domino Data Lab November 10, 2015 Agenda and Goals • Motivation • Conceptual intro to parallelism, general principles and pitfalls • Machine learning applications • Demos Goal: Leave you with principles, and practical concrete tools, that will help you run your code much faster
  • 4. Motivation Domino Data Lab November 10, 2015 • Lots of “medium data” problems • Can fit in memory on one machine • Lots of naturally parallel problems
 • Easy to access large machines • Clusters are hard • Not everything fits map-reduce CPUs with multiple cores have become the standard in the recent development of modern computer architectures and we can not only find them in supercomputer facilities but also in our desktop machines at home, and our laptops; even Apple's iPhone 5S got a 1.3 GHz Dual-core processor in 2013. - Sebastian Rascka
  • 5. Parallel programing 101 Domino Data Lab November 10, 2015 • Think about independent tasks (hint: “for” loops are a good place to start!) • Should be CPU-bound tasks • Warning and pitfalls • Not a substitute for good code • Overhead • Shared resource contention • Thrashing Source: Blaise Barney, Lawrence Livermore National Laboratory
  • 6. Can parallelize at different “levels” Domino Data Lab November 10, 2015 Will focus on algorithms, with some brief comments on Experiments Run against underlying libraries that parallelize low-level operations, e.g., openBLAS, ATLAS Write your code (or use a package) to parallelize functions or steps within your analysis Run different analyses at once Math ops Algorithms Experiments
  • 7. Parallelize tasks to match your resources Domino Data Lab November 10, 2015 Computing something (CPU)
 Reading from disk/database
 Writing to disk/database
 Network IO (e.g., web scraping) Saturating a resource will create a bottleneck
  • 8. Don't oversaturate your resources Domino Data Lab November 10, 2015 itemIDs = [1, 2, … , n] parallel-for-each(i = itemIDs){ item = fetchData(i) result = computeSomething(item) saveResult(result) }
  • 9. Parallelize tasks to match your resources Domino Data Lab November 10, 2015 items = fetchData([1, 2, … , n]) results = parallel-for-each(i = items){ computeSomething(item) } saveResult(results)
  • 10. Avoid modifying global state Domino Data Lab November 10, 2015 itemIDs = [0, 0, 0, 0] parallel-for-each(i = 1:4) { itemIDs[i] = i } A = [0,0,0,0]Array initialized in process 1 [0,0,0,0] [0,0,0,0][0,0,0,0][0,0,0,0]Array copied to each sub-process [0,0,0,0] [0,0,0,3][0,0,2,0][0,1,0,0]The copy is modified [0,0,0,0] When all parallel tasks finish, array in original process remained unchanged
  • 11. Demo Domino Data Lab November 10, 2015
  • 12. Many ML tasks are parallelized Domino Data Lab November 10, 2015 • Cross-Validation • Grid Search Selection • Random Forest • Kernel Density Estimation • K-Means Clustering • Probabilistic Graphical Models • Online Learning • Neural Networks (Backpropagation) Harder to parallelize Intuitive to parallelize
  • 13. Cross validation Domino Data Lab November 10, 2015
  • 14. Grid search Domino Data Lab November 10, 2015 1 10 100 1000 Linear RBF C Kernel
  • 15. Random forest Domino Data Lab November 10, 2015
  • 16. Parallel programing in Python Domino Data Lab November 10, 2015 • Joblib
 pythonhosted.org/joblib/parallel.html • scikit-learn (n_jobs) scikit-learn.org • GridSearchCV • RandomForest • KMeans • cross_val_score • IPython Notebook clusters
 www.astro.washington.edu/users/vanderplas/Astr599/notebooks/ 21_IPythonParallel
  • 17. Demo Domino Data Lab November 10, 2015
  • 18. Parallel Programming using the GPU Domino Data Lab November 10, 2015 • GPUs are essential to deep learning because they can yield 10x speed-up when training the neural networks. • Use PyCUDA library to write Python code that executes using the GPU.
  • 19. Demo Domino Data Lab November 10, 2015
  • 20. Can compose layers of parallelism Domino Data Lab November 10, 2015 c1 c2 cn… c1 c2 cn…c1 c2 cn… Machines
 (experiments) Cores RF NN GridSearched 
 SVC
  • 21. Demo Domino Data Lab November 10, 2015
  • 22. FYI: Parallel programing in R Domino Data Lab November 10, 2015 • General purpose • parallel • foreach
 cran.r-project.org/web/packages/foreach • More specialized • randomForest
 cran.r-project.org/web/packages/randomForest • caret
 topepo.github.io/caret • plyr
 cran.r-project.org/web/packages/plyr
  • 23. Domino Data Lab November 10, 2015 dominodatalab.com blog.dominodatalab.com @dominodatalab Check us out!