SlideShare a Scribd company logo
exascaleproject.org
ECP Application Development
Andrew Siegel, ANL
HPC User Forum
Argonne National Laboratory
Sept. 10, 2019
2
Exascale Computing Project: Application Development
Code Porting Algorithmic
Restructuring
Alternate choice of
Physical Models
New
Numerical
Approaches
Hardware has significant impact on all aspects of simulation strategy
Goal: Ensure that exascale hardware impacts DOE science/engineering mission
Approach: Significant investment in scientific applications well in advance of
exascale machines
3
Portfolio of ECP Applications
Application
Categories
Number of
Projects
Chemistry and
Materials
6
Energy (generation) 5
Earth and Space
Sciences
5
Data Analytics and
Optimization
4
National Security 4
24 Domain Science/Engineering Simulation Projects
50+ separate codes
Well defined, evolving dependencies on ECP
software technology projects
2/3 C ; 1/3 Fortran
Most pure MPI, or MPI+OpenMP at outset
4
What defines an application project?
New physics capabilities – Not just faster/bigger version of existing codes
1. Scientific or Engineering exascale challenge problem.
2. Detailed completion criteria for (1) on exascale platform
3. A Figure of Merit (FOM) > 50 for project success
5
Quick Flyover of all 21 non-NNSA AD Application Projects
6
Energy Applications
Harden wind plant design
and layout against energy
loss susceptibility; higher
penetration of wind energy
Lead: NREL
DOE EERE
ExaWind: Turbine Wind Plant
Efficiency
Commercial-scale demo of
transformational energy
technologies - curbing CO2
emissions at fossil fuel
power plants by 2030
Lead: NETL
DOE EERE
MFIX-Exa: Scale-up of Clean
Fossil Fuel Combustion
Virtual test reactor for
advanced designs via
experimental-quality
simulations of reactor
behavior
Lead: ORNL
DOE NE
ExaSMR: Design and
Commercialization of Small
Modular Reactors
Combustion-PELE: High-
Efficiency, Low-Emission
Combustion Engine Design
Reduction or elimination
of current cut-and-try
approaches for
combustion system
design
Lead: SNL
DOE BES, EERE
WarpX: Plasma Wakefield
Accelerator Design
Virtual design of 100-stage
1 TeV collider; dramatically
cut accelerator size and
design cost
Lead: LBNL
DOE HEP
Prepare for ITER
experiments and increase
ROI of validation data and
understanding; prepare for
beyond-ITER devices
Lead: PPPL
DOE FES
WDMApp: High-Fidelity Whole
Device Modeling of Magnetically
Confined Fusion Plasmas
7
Chemistry and Materials Applications
ExaAM: Additive Manufacturing of
Qualifiable Metal Parts
Accelerate the widespread
adoption of AM by enabling
routine fabrication of
qualifiable metal parts
Lead: ORNL
DOE NNSA / EERE
GAMESS: Biofuel Catalyst Design
Design more robust and
selective catalysts orders
of magnitude more
efficient at temperatures
hundreds of degrees lower
Lead: Ames
DOE BES
EXAALT: Materials for Extreme
Environments
Simultaneously address
time, length, and accuracy
requirements for predictive
microstructural evolution of
materials
Lead: LANL
DOE BES, FES, NE
QMCPACK: Find, Predict, Control
Materials & Properties at Quantum
Level
Design and optimize next-
generation materials from
first principles with
predictive accuracy
Lead: ORNL
DOE BES
NWChemEx: Catalytic Conversion
of Biomass-Derived Alcohols
Develop new optimal catalysts
while changing the current
design processes that remain
costly, time consuming, and
dominated by trial-and-error
Lead: PNNL
DOE BER, BES
LatticeQCD: Validate Fundamental
Laws of Nature
Correct light quark masses;
properties of light nuclei
from first principles; <1%
uncertainty in simple
quantities
Lead: FNAL
DOE NP, HEP
8
Earth and Space Science Applications
Subsurface: Carbon Capture,
Fossil Fuel Extraction, Waste
Disposal
Reliably guide safe
long-term consequential
decisions about storage,
sequestration, and
exploration
Lead: LBNL
DOE BES, EERE, FE, NE
EQSIM: Earthquake Hazard Risk
Assessment
Replace conservative and
costly earthquake retrofits
with safe purpose-fit
retrofits and designs
Lead: LBNL
DOE NNSA / NE, EERE
Forecast water resources
and severe weather with
increased confidence;
address food supply
changes
Lead: SNL
DOE BER
E3SM-MMF: Accurate Regional
Impact Assessment in Earth
Systems
Unravel key unknowns in
the dynamics of the
Universe: dark energy,
dark matter, and inflation
Lead: ANL
DOE HEP
ExaSky: Cosmological Probe of
the Standard Model of Particle
Physics
ExaStar: Demystify Origin of
Chemical Elements
What is the origin of the
elements? Behavior of
matter at extreme
densities? Sources of
gravity waves?
Lead: LBNL
DOE NP
9
Data Analytics and Optimization Applications
ExaBiome: Metagenomics for
Analysis of Biogeochemical Cycles
Discover knowledge useful
for environmental
remediation and the
manufacture of novel
chemicals and medicines
Lead: LBNL
DOE BER
ExaFEL: Light Source-Enabled
Analysis of Protein and Molecular
Structure and Design
Process data without beam time loss;
determine nanoparticle size
& shape changes; engineer
functional properties in
biology and material science
Lead: SLAC
DOE BES
Optimize power grid
planning, operation,
control and improve
reliability and efficiency
Lead: PNNL
DOE EDER, CESER, EERE
ExaSGD: Reliable and Efficient
Planning of the Power Grid
Develop predictive pre-clinical
models & accelerate diagnostic
and targeted therapy thru
predicting mechanisms of
RAS/RAF driven cancers
Lead: ANL
NIH
CANDLE: Accelerate and Translate
Cancer Research
10
Application Development Milestones
AD: Mapping of
applications to
target exascale
architecture with
machine-specific
performance
analysis including
challenges and
projections.
CD-2/3 Approval
AD: Early results
on pre-exascale
architectures
with analysis of
performance
challenges and
projections.
Q2 Q1Q1 Q1Q4Q2 Q4 Q1Q2 Q2 Q4 Q1FY18 FY19 FY20 FY21 FY22 Q4 Q4FY23
AD, ST, HI:
Demonstration of
Application
Performance on
Exascale Challenge
Problems
AD: Assess
application
status relative
to challenge
problem
Q4
AD: Results
on early
exascale
hardware
CD-4 Approve
Project Completion
Q2
11
Sequoia (10)
Cori (12)
Theta (24)
Mira (21)
Titan (9)
Trinity (6)
Baseline
Platforms
Trinity (6)
O (10PF)
Summit (1)
NERSC-9
Perlmutter
Current
ECP Focus
O (100PF)
Aurora
ECP Target
Exascale
Platforms
O (1EF)
12
Current Figure of Merit Improvements on Summit/Sierra
13
Applications Face Common Challenges
1) Flat performance profiles
2) Strong Scaling
3) Understanding/analyzing accelerator performance
4) Choice of programming model
5) Selecting mathematical models that fit architecture
6) Software dependencies
14
Strong scaling on modern high throughput cores
• High throughput processors do not perform near their peak in starvation
limit
– High value of n1/2
– Require abundance of fine-grained tasks that are efficiently scheduled on available
resources
– Otten et al. e.g. demonstrate that, on Titan, certain problems can run faster by
exploiting the additional granularity afforded through the all-CPU model rather than
using the highly-tuned GPU code (albeit with a 2.5 increase in power)
• Important because work is linear in time, so 50x has major performance
impact
14
15
Fast reductions is another key component of strong scaling
• Conjugate Gradient
– If vector reductions performed in software
• η=.5 n/P ≥ 8500–12000 for P = 106–109
– If vector reductions performed in hardware
• η=.5 n/P ≥ 1200 for P = 106–109 !
• Multigrid
– η=.5 n/P ~ 10000-20000 on machine like BG/Q
– 2-4 times faster if hardware support for addition prefix ops
– Bottom line: enables same simulation to run faster
15
16
• Instead of solving equation, simulate
individual neutrons directly
• Use known probability distributions
for events (distance to collision,
reaction, etc.)
• Count (or “tally”) the number of
events that occur
• Simulating many (think millions+)
particles gives average behavior
Exemplar: Nuclear Engineering (ExaSMR)
Steve Hamilton (ORNL)
Approach: Monte Carlo Method
Why is this hard on accelerator architectures?
17
History-based Algorithm
Entire life of a particle
history
for each particle do
while particle is alive
calculate next interaction
endwhile
endfor
Thread divergence: not a natural fit for GPUs
One particle at a time
18
Get vector of particles
while any particle alive do
for each event type do
for particle ∈ event queue do
Process event
end for
end for
end while
Event-based Algorithm
Data-level parallelism?
• Do one step at a time
• Sort by event type
• Process as SIMD
19
Algorithmic mapping to hardware – neutron particle transport
• Reduce thread divergence – change from history- to event-based algorithm
• Flatten algorithms to reduce kernel size; smaller kernels = higher occupancy
• Partition events based on fuel and non-fuel regions
• Take advantage of other architectural improvements
20
• Machine-learned MD potential that seeks for quantum-chemistry accuracy
• Neighbors of each atom are mapped onto unit sphere in 4D
• Density around each atom is expanded in a basis of 4D hyperspherical harmonics
• Bispectrum components of the 4D hyperspherical harmonic expansion are used as the geometric
descriptors of the local environment
• Preserves universal physical symmetries
• Invariant to rotation, translation, permutation
• Size-consistent
• SNAP uses linear regression to fit coefficients to DFT data
q0,q,f( ) = q0
max
r rcut , cos-1
(z r), tan-1
(y x)( )
20
r
rcut
Exemplar: SNAP Potential (Danny Perez, LANL)
21
SNAP GPU Performance Over Time
OLCF GPU
Hackathon
22
SNAP Performance Improvements
• Aidan Thompson (Sandia) took the SNAP CPU code out of LAMMPS →
TestSNAP stand-alone (realistic) force kernel, includes correctness check
• Idea from Nick Lubbers (LANL) →Aidan made algorithmic improvements that
reduced FLOP count and eliminated some intermediate storage → ~2x
speedup on CPUs
• Aidan reduced memory use by collapsing multidimensional arrays into
compact lists
• Rahul Gayatri (NERSC):
1. broke up the one monster kernel into many smaller kernels, reduces register pressure and allows tailoring
launch parameters for each kernel, but blows up the memory
2. inverted loops and changed data layouts to improve memory access
• Also had help from Sarah Anderson (Cray) and Evan Weinberg (NVIDIA)
• These improvements were ported to Kokkos SNAP in LAMMPS by Stan
Moore
23
EXAALT FOM/KPP Projection for Summit
• Mira (IBM BG/Q) FOM baseline: 0.182 Katoms-steps/s/node * 49152 Mira nodes
• 2018 LAMMPS performance on Summit: 33.7 Katom-steps/s/node * 4608 Summit nodes:
projected 17.4x faster than Mira baseline
• New LAMMPS performance on Summit: 175.1 Katom-steps/s/node * 4608 Summit nodes:
projected 90.2x faster than Mira baseline
• Recently ported energy minimization in LAMMPS to Kokkos, which is needed by ParSplice
• Danny Perez (LANL) planning to validate these projections with large-scale Summit run soon
24
Overall …
• ECP is a very difficult project with many moving parts: specialized node
architectures, system software, programming models, application level libraries,
etc. enabling ambitious science and performance goals.
• Early adoption of intermediate (100PF) systems, test hardware, and hardware
simulators critical to lowering risk by enabling progress tracking and early
identification of issues.
• Surprisingly good progress to date, need to continue to push early adoption of
exascale-type hardware, ensure proper balance of domain expertise and
performance engineering. Facilities engagement programs are critical to
achieving this.

More Related Content

ECP Application Development

  • 1. exascaleproject.org ECP Application Development Andrew Siegel, ANL HPC User Forum Argonne National Laboratory Sept. 10, 2019
  • 2. 2 Exascale Computing Project: Application Development Code Porting Algorithmic Restructuring Alternate choice of Physical Models New Numerical Approaches Hardware has significant impact on all aspects of simulation strategy Goal: Ensure that exascale hardware impacts DOE science/engineering mission Approach: Significant investment in scientific applications well in advance of exascale machines
  • 3. 3 Portfolio of ECP Applications Application Categories Number of Projects Chemistry and Materials 6 Energy (generation) 5 Earth and Space Sciences 5 Data Analytics and Optimization 4 National Security 4 24 Domain Science/Engineering Simulation Projects 50+ separate codes Well defined, evolving dependencies on ECP software technology projects 2/3 C ; 1/3 Fortran Most pure MPI, or MPI+OpenMP at outset
  • 4. 4 What defines an application project? New physics capabilities – Not just faster/bigger version of existing codes 1. Scientific or Engineering exascale challenge problem. 2. Detailed completion criteria for (1) on exascale platform 3. A Figure of Merit (FOM) > 50 for project success
  • 5. 5 Quick Flyover of all 21 non-NNSA AD Application Projects
  • 6. 6 Energy Applications Harden wind plant design and layout against energy loss susceptibility; higher penetration of wind energy Lead: NREL DOE EERE ExaWind: Turbine Wind Plant Efficiency Commercial-scale demo of transformational energy technologies - curbing CO2 emissions at fossil fuel power plants by 2030 Lead: NETL DOE EERE MFIX-Exa: Scale-up of Clean Fossil Fuel Combustion Virtual test reactor for advanced designs via experimental-quality simulations of reactor behavior Lead: ORNL DOE NE ExaSMR: Design and Commercialization of Small Modular Reactors Combustion-PELE: High- Efficiency, Low-Emission Combustion Engine Design Reduction or elimination of current cut-and-try approaches for combustion system design Lead: SNL DOE BES, EERE WarpX: Plasma Wakefield Accelerator Design Virtual design of 100-stage 1 TeV collider; dramatically cut accelerator size and design cost Lead: LBNL DOE HEP Prepare for ITER experiments and increase ROI of validation data and understanding; prepare for beyond-ITER devices Lead: PPPL DOE FES WDMApp: High-Fidelity Whole Device Modeling of Magnetically Confined Fusion Plasmas
  • 7. 7 Chemistry and Materials Applications ExaAM: Additive Manufacturing of Qualifiable Metal Parts Accelerate the widespread adoption of AM by enabling routine fabrication of qualifiable metal parts Lead: ORNL DOE NNSA / EERE GAMESS: Biofuel Catalyst Design Design more robust and selective catalysts orders of magnitude more efficient at temperatures hundreds of degrees lower Lead: Ames DOE BES EXAALT: Materials for Extreme Environments Simultaneously address time, length, and accuracy requirements for predictive microstructural evolution of materials Lead: LANL DOE BES, FES, NE QMCPACK: Find, Predict, Control Materials & Properties at Quantum Level Design and optimize next- generation materials from first principles with predictive accuracy Lead: ORNL DOE BES NWChemEx: Catalytic Conversion of Biomass-Derived Alcohols Develop new optimal catalysts while changing the current design processes that remain costly, time consuming, and dominated by trial-and-error Lead: PNNL DOE BER, BES LatticeQCD: Validate Fundamental Laws of Nature Correct light quark masses; properties of light nuclei from first principles; <1% uncertainty in simple quantities Lead: FNAL DOE NP, HEP
  • 8. 8 Earth and Space Science Applications Subsurface: Carbon Capture, Fossil Fuel Extraction, Waste Disposal Reliably guide safe long-term consequential decisions about storage, sequestration, and exploration Lead: LBNL DOE BES, EERE, FE, NE EQSIM: Earthquake Hazard Risk Assessment Replace conservative and costly earthquake retrofits with safe purpose-fit retrofits and designs Lead: LBNL DOE NNSA / NE, EERE Forecast water resources and severe weather with increased confidence; address food supply changes Lead: SNL DOE BER E3SM-MMF: Accurate Regional Impact Assessment in Earth Systems Unravel key unknowns in the dynamics of the Universe: dark energy, dark matter, and inflation Lead: ANL DOE HEP ExaSky: Cosmological Probe of the Standard Model of Particle Physics ExaStar: Demystify Origin of Chemical Elements What is the origin of the elements? Behavior of matter at extreme densities? Sources of gravity waves? Lead: LBNL DOE NP
  • 9. 9 Data Analytics and Optimization Applications ExaBiome: Metagenomics for Analysis of Biogeochemical Cycles Discover knowledge useful for environmental remediation and the manufacture of novel chemicals and medicines Lead: LBNL DOE BER ExaFEL: Light Source-Enabled Analysis of Protein and Molecular Structure and Design Process data without beam time loss; determine nanoparticle size & shape changes; engineer functional properties in biology and material science Lead: SLAC DOE BES Optimize power grid planning, operation, control and improve reliability and efficiency Lead: PNNL DOE EDER, CESER, EERE ExaSGD: Reliable and Efficient Planning of the Power Grid Develop predictive pre-clinical models & accelerate diagnostic and targeted therapy thru predicting mechanisms of RAS/RAF driven cancers Lead: ANL NIH CANDLE: Accelerate and Translate Cancer Research
  • 10. 10 Application Development Milestones AD: Mapping of applications to target exascale architecture with machine-specific performance analysis including challenges and projections. CD-2/3 Approval AD: Early results on pre-exascale architectures with analysis of performance challenges and projections. Q2 Q1Q1 Q1Q4Q2 Q4 Q1Q2 Q2 Q4 Q1FY18 FY19 FY20 FY21 FY22 Q4 Q4FY23 AD, ST, HI: Demonstration of Application Performance on Exascale Challenge Problems AD: Assess application status relative to challenge problem Q4 AD: Results on early exascale hardware CD-4 Approve Project Completion Q2
  • 11. 11 Sequoia (10) Cori (12) Theta (24) Mira (21) Titan (9) Trinity (6) Baseline Platforms Trinity (6) O (10PF) Summit (1) NERSC-9 Perlmutter Current ECP Focus O (100PF) Aurora ECP Target Exascale Platforms O (1EF)
  • 12. 12 Current Figure of Merit Improvements on Summit/Sierra
  • 13. 13 Applications Face Common Challenges 1) Flat performance profiles 2) Strong Scaling 3) Understanding/analyzing accelerator performance 4) Choice of programming model 5) Selecting mathematical models that fit architecture 6) Software dependencies
  • 14. 14 Strong scaling on modern high throughput cores • High throughput processors do not perform near their peak in starvation limit – High value of n1/2 – Require abundance of fine-grained tasks that are efficiently scheduled on available resources – Otten et al. e.g. demonstrate that, on Titan, certain problems can run faster by exploiting the additional granularity afforded through the all-CPU model rather than using the highly-tuned GPU code (albeit with a 2.5 increase in power) • Important because work is linear in time, so 50x has major performance impact 14
  • 15. 15 Fast reductions is another key component of strong scaling • Conjugate Gradient – If vector reductions performed in software • η=.5 n/P ≥ 8500–12000 for P = 106–109 – If vector reductions performed in hardware • η=.5 n/P ≥ 1200 for P = 106–109 ! • Multigrid – η=.5 n/P ~ 10000-20000 on machine like BG/Q – 2-4 times faster if hardware support for addition prefix ops – Bottom line: enables same simulation to run faster 15
  • 16. 16 • Instead of solving equation, simulate individual neutrons directly • Use known probability distributions for events (distance to collision, reaction, etc.) • Count (or “tally”) the number of events that occur • Simulating many (think millions+) particles gives average behavior Exemplar: Nuclear Engineering (ExaSMR) Steve Hamilton (ORNL) Approach: Monte Carlo Method Why is this hard on accelerator architectures?
  • 17. 17 History-based Algorithm Entire life of a particle history for each particle do while particle is alive calculate next interaction endwhile endfor Thread divergence: not a natural fit for GPUs One particle at a time
  • 18. 18 Get vector of particles while any particle alive do for each event type do for particle ∈ event queue do Process event end for end for end while Event-based Algorithm Data-level parallelism? • Do one step at a time • Sort by event type • Process as SIMD
  • 19. 19 Algorithmic mapping to hardware – neutron particle transport • Reduce thread divergence – change from history- to event-based algorithm • Flatten algorithms to reduce kernel size; smaller kernels = higher occupancy • Partition events based on fuel and non-fuel regions • Take advantage of other architectural improvements
  • 20. 20 • Machine-learned MD potential that seeks for quantum-chemistry accuracy • Neighbors of each atom are mapped onto unit sphere in 4D • Density around each atom is expanded in a basis of 4D hyperspherical harmonics • Bispectrum components of the 4D hyperspherical harmonic expansion are used as the geometric descriptors of the local environment • Preserves universal physical symmetries • Invariant to rotation, translation, permutation • Size-consistent • SNAP uses linear regression to fit coefficients to DFT data q0,q,f( ) = q0 max r rcut , cos-1 (z r), tan-1 (y x)( ) 20 r rcut Exemplar: SNAP Potential (Danny Perez, LANL)
  • 21. 21 SNAP GPU Performance Over Time OLCF GPU Hackathon
  • 22. 22 SNAP Performance Improvements • Aidan Thompson (Sandia) took the SNAP CPU code out of LAMMPS → TestSNAP stand-alone (realistic) force kernel, includes correctness check • Idea from Nick Lubbers (LANL) →Aidan made algorithmic improvements that reduced FLOP count and eliminated some intermediate storage → ~2x speedup on CPUs • Aidan reduced memory use by collapsing multidimensional arrays into compact lists • Rahul Gayatri (NERSC): 1. broke up the one monster kernel into many smaller kernels, reduces register pressure and allows tailoring launch parameters for each kernel, but blows up the memory 2. inverted loops and changed data layouts to improve memory access • Also had help from Sarah Anderson (Cray) and Evan Weinberg (NVIDIA) • These improvements were ported to Kokkos SNAP in LAMMPS by Stan Moore
  • 23. 23 EXAALT FOM/KPP Projection for Summit • Mira (IBM BG/Q) FOM baseline: 0.182 Katoms-steps/s/node * 49152 Mira nodes • 2018 LAMMPS performance on Summit: 33.7 Katom-steps/s/node * 4608 Summit nodes: projected 17.4x faster than Mira baseline • New LAMMPS performance on Summit: 175.1 Katom-steps/s/node * 4608 Summit nodes: projected 90.2x faster than Mira baseline • Recently ported energy minimization in LAMMPS to Kokkos, which is needed by ParSplice • Danny Perez (LANL) planning to validate these projections with large-scale Summit run soon
  • 24. 24 Overall … • ECP is a very difficult project with many moving parts: specialized node architectures, system software, programming models, application level libraries, etc. enabling ambitious science and performance goals. • Early adoption of intermediate (100PF) systems, test hardware, and hardware simulators critical to lowering risk by enabling progress tracking and early identification of issues. • Surprisingly good progress to date, need to continue to push early adoption of exascale-type hardware, ensure proper balance of domain expertise and performance engineering. Facilities engagement programs are critical to achieving this.