SlideShare a Scribd company logo
What is Bayesian
Optimization?
Bayesian optimization is a sequential design strategy for
global optimization.
Many workflows require you to find a powerful set of
parameters solve a problem. The challenge is finding those
parameters robustly in as little time as possible.
© 2019 IBM Corporation
Applied to Computational Chemistry
Applied to Engineering Design
Applied to Drug Discovery
BOA accelerated workflow uses
1/3 of the calculations to
achieve 4 orders of magnitude
resolution increase
BOA performed in 19 hours and ~30 simulations what an
expert designer would do in 3 weeks
Brute force methods of
screening require 20,000
experiments. BOA accelerated
method required ~200
IBM BOA gets HPC clients designs here
faster than any other method
Blackbox function optimization
§ No derivatives f’(x).
§ No analytic form.
§ Possibly multiple minima.
§ Possibly noisy data.
§ Expensive to calculate.
§Grid search - exhaustive
§Random search - luck
§Simulated annealing – large number of
evaluations
§Gradient descent - requires derivatives
§Genetic algorithm – hard to tune
Bayesian Optimization
Air flow simulation around a car,
FPGA synthesizer, reservoir
simulation, etc.
GPU accelerated Power server (IBM AC922)
High performance computing infrastructure
(x86/Power Systems/Cloud etc.)
Parameter values
Result (Chip power
consumption, for example)
Bayesian optimization
1. Pick a surrogate that
represents your prior belief
on black box behavior.
2. Define an acquisition
function over the surrogate.
3. Repeat:
• Select the parameter values
by optimizing the acquisition
function.
• Evaluate the black box for
those parameter values.
• Update the surrogate
through posterior inferencing
© 2019 International Business Machines Corporation CONFIDENTIAL – INTERNAL USE ONLY
Interface Functions
Input Data
Output Data
SOLVE
Traditional HPC
BOA
Interface
Function
(out)
Interface
Function
(in)
NEW: Interface Functions
Scheduler
Objective
Function
User Defined
Unique to each Client
IBOA
IBM BOA Differentiation
• Dimensionality mitigation based on compressive
sensing
• Smart initialization
• Novel acquisition functions
• Explainability
• Ease of use through software abstractions
BOA Settings
"model":{"gaussian_process": {
"kernel_func": "Matern52",
"scale_y": True,
"scale_x": False,
"noise_kernel": True,
"use_scikit": False,
"optimizer": "LBFGS"
}}
Explore-Exploit Trade-Off
• Exploration – Prefers to acquire new knowledge
• Exploitation – Prefers to lean heavily on what is already known to drive
optimization
• Need to strike a balance – too much exploration is inefficient, too much
exploitation can result in poor performance.
Exploitation PI, 𝜀 = 0
Exploration PI, 𝜀 = 0.2
Probability of Improvement
Probability of Improvement does not consider the amount by which an improvement occurs, just
that an improvement occurs.
Can we tweak this to tell the algorithm what constitutes a significant improvement
𝛾 =
𝑓 𝑥!"#$ − 𝜇 𝑥 + 𝜀
𝜎(𝑥)
Denote the improvement as
𝛾 =
𝑓 𝑥!"#$ − 𝜇(𝑥)
𝜎(𝑥)
Probability of Improvement (PI) (Kushner):
𝛼%& = Φ(𝛾 𝑥 )
"sampling_function": {
"type": "adaptive_expected_improvement",
"epsilon": 0.03,
"optimize_acq": False,
"outlier": False,
"bounds": None,
"scale": False }

More Related Content

IBM BOA for POWER

  • 1. What is Bayesian Optimization? Bayesian optimization is a sequential design strategy for global optimization. Many workflows require you to find a powerful set of parameters solve a problem. The challenge is finding those parameters robustly in as little time as possible. © 2019 IBM Corporation Applied to Computational Chemistry Applied to Engineering Design Applied to Drug Discovery BOA accelerated workflow uses 1/3 of the calculations to achieve 4 orders of magnitude resolution increase BOA performed in 19 hours and ~30 simulations what an expert designer would do in 3 weeks Brute force methods of screening require 20,000 experiments. BOA accelerated method required ~200 IBM BOA gets HPC clients designs here faster than any other method
  • 2. Blackbox function optimization § No derivatives f’(x). § No analytic form. § Possibly multiple minima. § Possibly noisy data. § Expensive to calculate. §Grid search - exhaustive §Random search - luck §Simulated annealing – large number of evaluations §Gradient descent - requires derivatives §Genetic algorithm – hard to tune Bayesian Optimization
  • 3. Air flow simulation around a car, FPGA synthesizer, reservoir simulation, etc. GPU accelerated Power server (IBM AC922) High performance computing infrastructure (x86/Power Systems/Cloud etc.) Parameter values Result (Chip power consumption, for example) Bayesian optimization 1. Pick a surrogate that represents your prior belief on black box behavior. 2. Define an acquisition function over the surrogate. 3. Repeat: • Select the parameter values by optimizing the acquisition function. • Evaluate the black box for those parameter values. • Update the surrogate through posterior inferencing
  • 4. © 2019 International Business Machines Corporation CONFIDENTIAL – INTERNAL USE ONLY Interface Functions Input Data Output Data SOLVE Traditional HPC BOA Interface Function (out) Interface Function (in) NEW: Interface Functions Scheduler Objective Function User Defined Unique to each Client
  • 5. IBOA IBM BOA Differentiation • Dimensionality mitigation based on compressive sensing • Smart initialization • Novel acquisition functions • Explainability • Ease of use through software abstractions
  • 6. BOA Settings "model":{"gaussian_process": { "kernel_func": "Matern52", "scale_y": True, "scale_x": False, "noise_kernel": True, "use_scikit": False, "optimizer": "LBFGS" }}
  • 7. Explore-Exploit Trade-Off • Exploration – Prefers to acquire new knowledge • Exploitation – Prefers to lean heavily on what is already known to drive optimization • Need to strike a balance – too much exploration is inefficient, too much exploitation can result in poor performance. Exploitation PI, 𝜀 = 0 Exploration PI, 𝜀 = 0.2 Probability of Improvement Probability of Improvement does not consider the amount by which an improvement occurs, just that an improvement occurs. Can we tweak this to tell the algorithm what constitutes a significant improvement 𝛾 = 𝑓 𝑥!"#$ − 𝜇 𝑥 + 𝜀 𝜎(𝑥) Denote the improvement as 𝛾 = 𝑓 𝑥!"#$ − 𝜇(𝑥) 𝜎(𝑥) Probability of Improvement (PI) (Kushner): 𝛼%& = Φ(𝛾 𝑥 )
  • 8. "sampling_function": { "type": "adaptive_expected_improvement", "epsilon": 0.03, "optimize_acq": False, "outlier": False, "bounds": None, "scale": False }