SlideShare a Scribd company logo
Working Hard to Keep it Simple Martin Odersky Typesafe
The Challenge The world of mainstream software is changing: Moore’s law now achieved  by increasing # of cores  not clock cycles Huge volume workloads  that require horizontal scaling “ PPP” Grand Challenge Data from Kunle Olukotun, Lance Hammond, Herb Sutter, Burton Smith, Chris Batten, and Krste Asanovic
Concurrency and Parallelism Parallel  programming   Execute programs faster on   parallel hardware.  Concurrent  programming   Manage concurrent execution     threads explicitly. Both are too hard!
The Root of The Problem Non-determinism  caused by  concurrent threads  accessing  shared mutable  state. It helps to encapsulate state in actors  or transactions, but the fundamental  problem stays the same. So,   non-determinism  =  parallel processing   +  mutable state To get deterministic processing, avoid the mutable state! Avoiding mutable state means programming  functionally . var  x = 0 async { x = x + 1 } async { x = x * 2 } // can give 0, 1, 2
Space vs Time Time (imperative/concurrent) Space (functional/parallel)
Scala is a Unifier Agile, with lightweight syntax  Object-Oriented  Scala  Functional Safe and performant, with strong static tpying
Scala is a Unifier Agile, with lightweight syntax  Parallel Object-Oriented  Scala  Functional Sequential Safe and performant, with strong static tpying
Oscon keynote: Working hard to keep it simple
Some adoption vectors: Web platforms Trading platforms Financial modeling Simulation Fast to first product, scalable afterwards
Scala’s Toolbox
Different Tools for Different Purposes Parallelism : Parallel Collections Collections Distributed Collections Parallel DSLs Concurrency : Actors  Software transactional memory   Akka Futures
Let’s see an example:
A class ... public   class  Person { public final  String  name ; public final   int   age ; Person(String name,  int  age) { this . name  = name; this . age  = age; } } class  Person( val  name: String,    val  age:  Int ) ... in Java: ... in Scala:
... and its usage import  java.util.ArrayList; ... Person[]  people ; Person[]  minors ; Person[]  adults ; {  ArrayList<Person> minorsList =  new  ArrayList<Person>(); ArrayList<Person> adultsList =  new  ArrayList<Person>(); for  ( int  i = 0; i <  people . length ; i++) ( people [i]. age  < 18 ? minorsList : adultsList)   .add( people [i]); minors  = minorsList.toArray( people ); adults  = adultsList.toArray( people ); } ... in Java: ... in Scala: val  people:  Array [Person] val   (minors, adults) = people partition (_.age < 18) A simple pattern match An infix method call A function value
Going Parallel ? ... in Java: ... in Scala: val  people:  Array [Person] val   (minors, adults) = people .par  partition (_.age < 18)
Actors for Concurrent Programming Simple message-oriented programming model for multi-threading Serializes access to shared resources using queues and function passing. Easier for programmers to create reliable concurrent processing Many sources of contention, races, locking and dead-locks removed
Going further: Parallel DSLs But how do we keep a bunch of Fermi’s happy? How to find and deal with 10000+ threads in an application? Parallel collections and actors are necessary but not sufficient for this. Our bet for the mid term future: parallel embedded DSLs. Find parallelism in domains: physics simulation, machine learning, statistics, ... Joint work with Kunle Olukuton, Pat Hanrahan @ Stanford. EPFL side funded by ERC.
EPFL / Stanford Research Applications Domain Specific Languages Heterogeneous Hardware DSL Infrastructure OOO Cores SIMD Cores Threaded Cores Specialized Cores Programmable Hierarchies Scalable  Coherence Isolation & Atomicity On-chip Networks Pervasive Monitoring Domain Embedding Language ( Scala ) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics ( Liszt ) Scripting Probabilistic (RandomT) Machine Learning ( OptiML ) Rendering Parallel Runtime ( Delite, Sequoia, GRAMPS ) Dynamic Domain Spec. Opt. Locality Aware Scheduling Staging Polymorphic Embedding Task & Data Parallelism Hardware Architecture Static Domain Specific Opt.
Example: Liszt - A DSL for Physics Simulation Mesh-based Numeric Simulation Huge domains  millions of cells Example: Unstructured Reynolds-averaged Navier Stokes (RANS) solver Fuel injection Transition Thermal Turbulence Turbulence Combustion
Liszt as Virtualized Scala val // calculating scalar convection (Liszt) val Flux = new Field[Cell,Float] val Phi = new Field[Cell,Float] val cell_volume = new Field[Cell,Float] val deltat = .001 ... untilconverged { for(f <- interior_faces) { val flux = calc_flux(f) Flux(inside(f)) -= flux Flux(outside(f)) += flux } for(f <- inlet_faces) { Flux(outside(f)) += calc_boundary_flux(f) } for(c <- cells(mesh)) { Phi(c) += deltat * Flux(c) /cell_volume(c) } for(f <- faces(mesh)) Flux(f) = 0.f } AST Hardware DSL Library Optimisers Generators … … Schedulers GPU, Multi-Core, etc
Follow us on twitter: @typesafe scala-lang.org typesafe.com

More Related Content

Oscon keynote: Working hard to keep it simple

  • 1. Working Hard to Keep it Simple Martin Odersky Typesafe
  • 2. The Challenge The world of mainstream software is changing: Moore’s law now achieved by increasing # of cores not clock cycles Huge volume workloads that require horizontal scaling “ PPP” Grand Challenge Data from Kunle Olukotun, Lance Hammond, Herb Sutter, Burton Smith, Chris Batten, and Krste Asanovic
  • 3. Concurrency and Parallelism Parallel programming Execute programs faster on parallel hardware. Concurrent programming Manage concurrent execution threads explicitly. Both are too hard!
  • 4. The Root of The Problem Non-determinism caused by concurrent threads accessing shared mutable state. It helps to encapsulate state in actors or transactions, but the fundamental problem stays the same. So, non-determinism = parallel processing + mutable state To get deterministic processing, avoid the mutable state! Avoiding mutable state means programming functionally . var x = 0 async { x = x + 1 } async { x = x * 2 } // can give 0, 1, 2
  • 5. Space vs Time Time (imperative/concurrent) Space (functional/parallel)
  • 6. Scala is a Unifier Agile, with lightweight syntax Object-Oriented Scala Functional Safe and performant, with strong static tpying
  • 7. Scala is a Unifier Agile, with lightweight syntax Parallel Object-Oriented Scala Functional Sequential Safe and performant, with strong static tpying
  • 9. Some adoption vectors: Web platforms Trading platforms Financial modeling Simulation Fast to first product, scalable afterwards
  • 11. Different Tools for Different Purposes Parallelism : Parallel Collections Collections Distributed Collections Parallel DSLs Concurrency : Actors Software transactional memory Akka Futures
  • 12. Let’s see an example:
  • 13. A class ... public class Person { public final String name ; public final int age ; Person(String name, int age) { this . name = name; this . age = age; } } class Person( val name: String, val age: Int ) ... in Java: ... in Scala:
  • 14. ... and its usage import java.util.ArrayList; ... Person[] people ; Person[] minors ; Person[] adults ; { ArrayList<Person> minorsList = new ArrayList<Person>(); ArrayList<Person> adultsList = new ArrayList<Person>(); for ( int i = 0; i < people . length ; i++) ( people [i]. age < 18 ? minorsList : adultsList) .add( people [i]); minors = minorsList.toArray( people ); adults = adultsList.toArray( people ); } ... in Java: ... in Scala: val people: Array [Person] val (minors, adults) = people partition (_.age < 18) A simple pattern match An infix method call A function value
  • 15. Going Parallel ? ... in Java: ... in Scala: val people: Array [Person] val (minors, adults) = people .par partition (_.age < 18)
  • 16. Actors for Concurrent Programming Simple message-oriented programming model for multi-threading Serializes access to shared resources using queues and function passing. Easier for programmers to create reliable concurrent processing Many sources of contention, races, locking and dead-locks removed
  • 17. Going further: Parallel DSLs But how do we keep a bunch of Fermi’s happy? How to find and deal with 10000+ threads in an application? Parallel collections and actors are necessary but not sufficient for this. Our bet for the mid term future: parallel embedded DSLs. Find parallelism in domains: physics simulation, machine learning, statistics, ... Joint work with Kunle Olukuton, Pat Hanrahan @ Stanford. EPFL side funded by ERC.
  • 18. EPFL / Stanford Research Applications Domain Specific Languages Heterogeneous Hardware DSL Infrastructure OOO Cores SIMD Cores Threaded Cores Specialized Cores Programmable Hierarchies Scalable Coherence Isolation & Atomicity On-chip Networks Pervasive Monitoring Domain Embedding Language ( Scala ) Virtual Worlds Personal Robotics Data informatics Scientific Engineering Physics ( Liszt ) Scripting Probabilistic (RandomT) Machine Learning ( OptiML ) Rendering Parallel Runtime ( Delite, Sequoia, GRAMPS ) Dynamic Domain Spec. Opt. Locality Aware Scheduling Staging Polymorphic Embedding Task & Data Parallelism Hardware Architecture Static Domain Specific Opt.
  • 19. Example: Liszt - A DSL for Physics Simulation Mesh-based Numeric Simulation Huge domains millions of cells Example: Unstructured Reynolds-averaged Navier Stokes (RANS) solver Fuel injection Transition Thermal Turbulence Turbulence Combustion
  • 20. Liszt as Virtualized Scala val // calculating scalar convection (Liszt) val Flux = new Field[Cell,Float] val Phi = new Field[Cell,Float] val cell_volume = new Field[Cell,Float] val deltat = .001 ... untilconverged { for(f <- interior_faces) { val flux = calc_flux(f) Flux(inside(f)) -= flux Flux(outside(f)) += flux } for(f <- inlet_faces) { Flux(outside(f)) += calc_boundary_flux(f) } for(c <- cells(mesh)) { Phi(c) += deltat * Flux(c) /cell_volume(c) } for(f <- faces(mesh)) Flux(f) = 0.f } AST Hardware DSL Library Optimisers Generators … … Schedulers GPU, Multi-Core, etc
  • 21. Follow us on twitter: @typesafe scala-lang.org typesafe.com

Editor's Notes

  1. This leads to our vision, applications driven by a set of interoperable DSLs. We are developing DSLs to provide evidence as to their effectiveness in extracting parallel performance. But we are also very interested in empowering other to easily build such DSLs, so we are investing heavily in developing frameworks and runtimes to make parallel DSL development easier. And the goal is to run single source programs on a variety of very different hardware targets.
  2. Liszt is another language we have implemented. It is designed to support the creation of solvers for mesh-based partial differential equations. Problems in this domain typically simulate complex physical systems such as fluid flow or mechanics by breaking up space into discrete cells. A typical mesh may contain hundreds of millions of these cells (here we are visualizing a scram-jet designed to work at hypersonic speeds). Liszt is an ideal candidate for a DSL because while the problems are large and highly parallel, the mesh introduces many data-dependencies that are difficult to reason about, making writing solvers tedious.