11institutetext: David J. Gardner and Carol S. Woodward 22institutetext: Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, Livermore, CA, USA, 22email: gardner48, woodward6@llnl.gov 33institutetext: Lynda L. LoDestro 44institutetext: Lawrence Livermore National Laboratory, Fusion Energy Sciences Program, Livermore, CA, USA, 44email: lodestro1@llnl.gov

Towards the Use of Anderson Acceleration in Coupled Transport-Gyrokinetic Turbulence Simulations

David J. Gardner\orcidID0000-0002-7993-8282    Lynda L. LoDestro\orcidID0000-0001-6503-4189    Carol S. Woodward\orcidID0000-0002-6502-8659
Abstract

Predicting the behavior of a magnetically confined fusion plasma over long time periods requires methods that can bridge the difference between transport and turbulent time scales. The nonlinear transport solver, Tango, enables simulations of very long times, in particular to steady state, by advancing each process independently with different time step sizes and couples them through a relaxed iteration scheme. We examine the use of Anderson Acceleration (AA) to reduce the total number of coupling iterations required by interfacing Tango with the AA implementation, including several extensions to AA, provided by the KINSOL nonlinear solver package in SUNDIALS. The ability to easily enable and adjust algorithmic options through KINSOL allows for rapid experimentation to evaluate different approaches with minimal effort. Additionally, we leverage the GPTune library to automate the optimization of algorithmic parameters within KINSOL. We show that AA can enable faster convergence in stiff and very stiff tests cases without noise present and in all cases, including with noisy fluxes, increases robustness and reduces sensitivity to the choice of relaxation strength.

1 Introduction

Accurately simulating turbulent transport is a critical aspect of modeling magnetically confined fusion plasmas. Turbulent fluxes have a strong nonlinear dependence on the gradients of averaged fields and are the primary mechanism for transporting energy from the plasma interior. A major challenge in simulations of very long times, in particular steady state, is the significant difference in characteristic time scales for transport (seconds) and turbulence (microseconds) in magnetic-fusion-energy burning plasmas post2004report . This challenge is further complicated by the presence of noise in the averaged transport fluxes arising from turbulent simulations.

Evolving both processes at the same time step is prohibitively expensive; however, as the dynamics on each time scale are well separated, each process can be advanced independently at its own rate and coupled through an iteration scheme to bridge the scale gap. Turbulence model evaluations are the dominant computational expense, and the speed at which the coupling iteration converges determines the number of turbulence model evaluations. In this work, we aim to reduce the total number of coupling iterations necessary and thus lower the overall simulation cost by accelerating the convergence of the iterative method. To this end, we examine the use of Anderson Acceleration (AA) and extensions to the method through the KINSOL nonlinear solver package from the SUNDIALS library.

The remainder of this paper is organized as follows. In Section 2, we overview turbulent transport simulations for magnetically confined fusion plasmas and the coupling method applied. We introduce the AA algorithm and detail the various extensions we consider to improve performance in Section 3. In Section 4 we overview the Tango code for coupled simulations and its interfacing with the AA implementation from KINSOL as well as the use of GPTune to optimize parameter selection. In Section 5 we discuss results from numerical simulations with Tango in both stiff and very stiff settings along with the impact of noisy fluxes on performance. Finally, in Section 6, we provide concluding remarks and directions for future investigation.

2 Coupling Transport & Turbulence

Simulating turbulence-driven transport in toroidal magnetic-fusion-energy plasmas with near-first-principles accuracy requires solving a set of gyrokinetic equations describing the distribution of different species in a 5D phase space. 5D turbulence due to fast time-scale instabilities drives transport of averaged fields (i.e., density, momentum, and energy, which are constant on magnetic flux surfaces and thus spatially 1D in the magnetic flux coordinate) across the magnetic surfaces confining the plasma. Because the transport time scales are much slower than the turbulence time scales, a formal separation-of-scales approach is possible, permitting a multiscale computational solution which proceeds by coupling separate equation sets for the fast and slow scales shestakov2003self . For this work, we utilize a simplified model for a single species to evaluate algorithms for turbulent transport without the expense of 5D simulation or the complexities of the toroidal geometry. Consider the transport equation,

pt+qx=S,𝑝𝑡𝑞𝑥𝑆\frac{\partial p}{\partial t}+\frac{\partial q}{\partial x}=S,divide start_ARG ∂ italic_p end_ARG start_ARG ∂ italic_t end_ARG + divide start_ARG ∂ italic_q end_ARG start_ARG ∂ italic_x end_ARG = italic_S , (1)

where p(t,x)𝑝𝑡𝑥p(t,x)italic_p ( italic_t , italic_x ) is the plasma profile, q(p(t,x))𝑞𝑝𝑡𝑥q(p(t,x))italic_q ( italic_p ( italic_t , italic_x ) ) is the average turbulent flux, and S(t,x)𝑆𝑡𝑥S(t,x)italic_S ( italic_t , italic_x ) includes all local sources. Because the turbulence-driven transport is stiff, an implicit time-advance is required. Discretizing (1) in space with finite differences and in time with backward Euler, we obtain the implicit system,

p(n+1)+H𝒟xq(n+1)=p(n)+HS(n+1),superscript𝑝𝑛1𝐻subscript𝒟𝑥superscript𝑞𝑛1superscript𝑝𝑛𝐻superscript𝑆𝑛1p^{(n+1)}+H\mathcal{D}_{x}q^{(n+1)}=p^{(n)}+HS^{(n+1)},italic_p start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT + italic_H caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT = italic_p start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + italic_H italic_S start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT , (2)

for a time step from t(n)superscript𝑡𝑛t^{(n)}italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT to t(n+1)superscript𝑡𝑛1t^{(n+1)}italic_t start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT with step size H𝐻Hitalic_H where superscripts denote the time at which a quantity is approximated, and 𝒟xsubscript𝒟𝑥\mathcal{D}_{x}caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is the difference operator for the finite difference method. Using the LoDestro method shestakov2003self we solve (2) with the iteration,

pk+1H𝒟x(Dk𝒟xpk+1ckpk+1)=p(n)+HS(n+1),subscript𝑝𝑘1𝐻subscript𝒟𝑥subscript𝐷𝑘subscript𝒟𝑥subscript𝑝𝑘1subscript𝑐𝑘subscript𝑝𝑘1superscript𝑝𝑛𝐻superscript𝑆𝑛1p_{k+1}-H\mathcal{D}_{x}(D_{k}\mathcal{D}_{x}p_{k+1}-c_{k}p_{k+1})=p^{(n)}+HS^% {(n+1)},italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_H caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) = italic_p start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + italic_H italic_S start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT , (3)

where pk+1subscript𝑝𝑘1p_{k+1}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is the new iterate for p(n+1)superscript𝑝𝑛1p^{(n+1)}italic_p start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT and q(n+1)superscript𝑞𝑛1q^{(n+1)}italic_q start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT is split into diffusive and convective terms, with coefficients Dk=θkq(pk)/𝒟xpksubscript𝐷𝑘subscript𝜃𝑘𝑞subscript𝑝𝑘subscript𝒟𝑥subscript𝑝𝑘D_{k}=-\theta_{k}q(p_{k})/\mathcal{D}_{x}p_{k}italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_q ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ck=(1θk)q(pk)/pksubscript𝑐𝑘1subscript𝜃𝑘𝑞subscript𝑝𝑘subscript𝑝𝑘c_{k}=(1-\theta_{k})q(p_{k})/p_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( 1 - italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_q ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT respectively. The parameter θksubscript𝜃𝑘\theta_{k}italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT controls the diffusive-convective split at each mesh point. The purpose of the split is to keep D𝐷Ditalic_D positive (assuring a parabolic linear equation (2)) and within bounds. Without some such scheme, profile maxima in the course of iterating would cause violent spatial swings in D𝐷Ditalic_D, and converged solutions with the flux, q𝑞qitalic_q, running uphill (anti-diffusive) would be unobtainable. The value of θ𝜃\thetaitalic_θ may be chosen in different ways shestakov2003self ; parker2018bringing . Here we use the default strategy in Tango,

θk={0whereD^k<Dmin12(1+DmaxD^kDmaxDmin)whereDminD^kDmaxand|𝒟xpk|<1012whereD^k>Dmaxand|𝒟xpk|<10,subscript𝜃𝑘cases0wheresubscript^𝐷𝑘subscript𝐷min121subscript𝐷maxsubscript^𝐷𝑘subscript𝐷maxsubscript𝐷minwheresubscript𝐷minsubscript^𝐷𝑘subscript𝐷maxandsubscript𝒟𝑥subscript𝑝𝑘1012wheresubscript^𝐷𝑘subscript𝐷maxandsubscript𝒟𝑥subscript𝑝𝑘10\theta_{k}=\begin{cases}0&\text{where}\quad\hat{D}_{k}<D_{\text{min}}\\ \frac{1}{2}\left(1+\frac{D_{\text{max}}-\hat{D}_{k}}{D_{\text{max}}-D_{\text{% min}}}\right)&\text{where}\quad D_{\text{min}}\leq\hat{D}_{k}\leq D_{\text{max% }}\;\text{and}\;|\mathcal{D}_{x}p_{k}|<10\\ \frac{1}{2}&\text{where}\quad\hat{D}_{k}>D_{\text{max}}\;\text{and}\;|\mathcal% {D}_{x}p_{k}|<10\end{cases},italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { start_ROW start_CELL 0 end_CELL start_CELL where over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_D start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( 1 + divide start_ARG italic_D start_POSTSUBSCRIPT max end_POSTSUBSCRIPT - over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_D start_POSTSUBSCRIPT max end_POSTSUBSCRIPT - italic_D start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_ARG ) end_CELL start_CELL where italic_D start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ≤ over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_D start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and | caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | < 10 end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_CELL start_CELL where over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_D start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and | caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | < 10 end_CELL end_ROW , (4)

with D^k=q(pk)/𝒟xpksubscript^𝐷𝑘𝑞subscript𝑝𝑘subscript𝒟𝑥subscript𝑝𝑘\hat{D}_{k}=-q(p_{k})/\mathcal{D}_{x}p_{k}over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - italic_q ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) / caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, Dmin=105subscript𝐷minsuperscript105D_{\text{min}}=10^{-5}italic_D start_POSTSUBSCRIPT min end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, and Dmax=1013subscript𝐷maxsuperscript1013D_{\text{max}}=10^{13}italic_D start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT 13 end_POSTSUPERSCRIPT. Observe that if the iteration converges, the fluxes exactly sum to q𝑞qitalic_q. Obtaining the new profile in (3) requires solving the linear system,

Mkpk+1=b,subscript𝑀𝑘subscript𝑝𝑘1𝑏M_{k}p_{k+1}=b,italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_b , (5)

within each iteration, where Mk=IH𝒟x(Dk𝒟xck)subscript𝑀𝑘𝐼𝐻subscript𝒟𝑥subscript𝐷𝑘subscript𝒟𝑥subscript𝑐𝑘M_{k}=I-H\mathcal{D}_{x}(D_{k}\mathcal{D}_{x}-c_{k})italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_I - italic_H caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and b=p(n)+HS(n+1)𝑏superscript𝑝𝑛𝐻superscript𝑆𝑛1b=p^{(n)}+HS^{(n+1)}italic_b = italic_p start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + italic_H italic_S start_POSTSUPERSCRIPT ( italic_n + 1 ) end_POSTSUPERSCRIPT. In practice, relaxation of the iteration is required to ensure convergence; i.e., pk+1=βpk+1+(1β)pk1subscript𝑝𝑘1𝛽subscript𝑝𝑘11𝛽subscript𝑝𝑘1p_{k+1}=\beta p_{k+1}+(1-\beta)p_{k-1}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_β italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + ( 1 - italic_β ) italic_p start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT. The overall method is outlined in Algorithm 1.

Algorithm 1 LoDestro Method

Input: p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, β𝛽\betaitalic_β, and tol       Output: psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

1:for k=0,1,,kmax𝑘01subscript𝑘maxk=0,1,\ldots,k_{\rm{max}}italic_k = 0 , 1 , … , italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT do
2:     Compute the flux qk(pk)subscript𝑞𝑘subscript𝑝𝑘q_{k}(p_{k})italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and split it into Dksubscript𝐷𝑘D_{k}italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
3:     Construct Mksubscript𝑀𝑘M_{k}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and solve Mkpk+1=bsubscript𝑀𝑘subscript𝑝𝑘1𝑏M_{k}p_{k+1}=bitalic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_b
4:     Relax the profile pk+1=βpk+1+(1β)pksubscript𝑝𝑘1𝛽subscript𝑝𝑘11𝛽subscript𝑝𝑘p_{k+1}=\beta p_{k+1}+(1-\beta)p_{k}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_β italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + ( 1 - italic_β ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
5:     if pk+1pk<tolnormsubscript𝑝𝑘1subscript𝑝𝑘tol\|p_{k+1}-p_{k}\|<\text{tol}∥ italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ < tol return pk+1subscript𝑝𝑘1p_{k+1}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT as psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
6:end for
Refer to caption
Figure 1: A diagram illustrating the coupling between the transport and turbulence models in Algorithm 1. The transport code (1D) uses a large implicit time step that requires iterating to convergence. During this iteration, the turbulence model (5D for physics runs; 1D in this paper) advances with a small explicit time step. The two models periodically exchange fluxes and profiles until the iteration has converged.

As illustrated in Fig. 1, the transport iteration and the turbulent simulation are run concurrently and periodically exchange information with the transport code providing profiles to the turbulent code and receiving fluxes. Concurrent running minimizes the number of time steps taken by the 5D code per iteration. The turbulent fluxes are obtained from a 5D global gyrokinetic code running with an explicit step size hHmuch-less-than𝐻h\ll Hitalic_h ≪ italic_H. Evolving the turbulence model requires far more computational resources and expense than solving the linear system (5) in the transport iteration. As such, minimizing the number of iterations for convergence is critical to overall performance.

3 Anderson Acceleration

The LoDestro method can be viewed as a damped fixed-point iteration given by

pk+1=βG(pk)+(1β)pk,subscript𝑝𝑘1𝛽𝐺subscript𝑝𝑘1𝛽subscript𝑝𝑘p_{k+1}=\beta G(p_{k})+(1-\beta)p_{k},italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_β italic_G ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (6)

where G(p)𝐺𝑝G(p)italic_G ( italic_p ) comprises steps 2 and 3 in Algorithm 1. This iteration may converge quite slowly (β1much-less-than𝛽1\beta\ll 1italic_β ≪ 1 may be required for stability of the iterations), and, thus, we are interested in methods for accelerating convergence. One such method is Anderson Acceleration (AA) anderson1965iterative ; kelley2018numerical which computes an iteration update that approximately minimizes the linearized fixed-point residual over a subspace constructed from prior iterations. In a linear setting and under certain conditions, AA is essentially equivalent to the generalized minimal residual (GMRES) method walker2011anderson , and, with a sufficiently good initial iterate, AA performs no worse than a fixed-point iteration toth2015convergence .

The AA algorithm with fixed damping (β𝛽\betaitalic_β) and subspace size, or depth (m𝑚mitalic_m), is given in Algorithm 2. Following walker2011anderson , the optimization problem in step 6 for the weights, γkmksubscript𝛾𝑘superscriptsubscript𝑚𝑘\gamma_{k}\in\mathbb{R}^{m_{k}}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, to compute the new iterate is presented as an unconstrained linear least-squares problem. Utilizing this formulation, rather than the constrained form of the problem, allows for an efficient solution via a QR factorization of ksubscript𝑘\mathcal{F}_{k}caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT i.e., Rkγk=QkTfksubscript𝑅𝑘subscript𝛾𝑘subscriptsuperscript𝑄𝑇𝑘subscript𝑓𝑘R_{k}\gamma_{k}=Q^{T}_{k}f_{k}italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where QkN×mksubscript𝑄𝑘superscript𝑁subscript𝑚𝑘Q_{k}\in\mathbb{R}^{N\times m_{k}}italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is an orthogonal matrix, Rkmk×Nsubscript𝑅𝑘superscriptsubscript𝑚𝑘𝑁R_{k}\in\mathbb{R}^{m_{k}\times N}italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_N end_POSTSUPERSCRIPT is an upper triangular matrix, and fkNsubscript𝑓𝑘superscript𝑁f_{k}\in\mathbb{R}^{N}italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is the nonlinear residual, G(pk)pk𝐺subscript𝑝𝑘subscript𝑝𝑘G(p_{k})-p_{k}italic_G ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Updating the factorization and solving the linear system as the residual history changes is typically a minimal cost compared to the function evaluation loffeld2016considerations ; lockhart2022performance and thus not a major contributor to the overall iteration run time.

Algorithm 2 Anderson Acceleration

Input: p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, mmaxsubscript𝑚maxm_{\rm{max}}italic_m start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, kmaxsubscript𝑘maxk_{\rm{max}}italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, tol, and β𝛽\betaitalic_β       Output: psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

1:Set p1=βG(p0)+(1β)p0subscript𝑝1𝛽𝐺subscript𝑝01𝛽subscript𝑝0p_{1}=\beta G(p_{0})+(1-\beta)p_{0}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_β italic_G ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
2:for k=1,,kmax𝑘1subscript𝑘maxk=1,\ldots,k_{\rm{max}}italic_k = 1 , … , italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT do
3:     Set mk=min{k,mmax}subscript𝑚𝑘𝑘subscript𝑚maxm_{k}=\min\{k,m_{\rm{max}}\}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min { italic_k , italic_m start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT }
4:     Set 𝒢k=[ΔGkmk,,ΔGk1]subscript𝒢𝑘Δsubscript𝐺𝑘subscript𝑚𝑘Δsubscript𝐺𝑘1\mathcal{G}_{k}=[\Delta G_{k-m_{k}},\ldots,\Delta G_{k-1}]caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ roman_Δ italic_G start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_Δ italic_G start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] where ΔGi=G(pi+1)G(pi)Δsubscript𝐺𝑖𝐺subscript𝑝𝑖1𝐺subscript𝑝𝑖\Delta G_{i}=G(p_{i+1})-G(p_{i})roman_Δ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_G ( italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) - italic_G ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
5:     Set k=[Δfkmk,,Δfk1]subscript𝑘Δsubscript𝑓𝑘subscript𝑚𝑘Δsubscript𝑓𝑘1\mathcal{F}_{k}=[\Delta f_{k-m_{k}},\ldots,\Delta f_{k-1}]caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ roman_Δ italic_f start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_Δ italic_f start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] where Δfi=fi+1fiΔsubscript𝑓𝑖subscript𝑓𝑖1subscript𝑓𝑖\Delta f_{i}=f_{i+1}-f_{i}roman_Δ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and fi=G(pi)pisubscript𝑓𝑖𝐺subscript𝑝𝑖subscript𝑝𝑖f_{i}=G(p_{i})-p_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_G ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
6:     Determine γksubscript𝛾𝑘\gamma_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that minimizes fkkγk2subscriptnormsubscript𝑓𝑘subscript𝑘subscript𝛾𝑘2\|f_{k}-\mathcal{F}_{k}\gamma_{k}\|_{2}∥ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
7:     Set pk+1=G(pk)𝒢kγk(1β)(fkQkRkγk)subscript𝑝𝑘1𝐺subscript𝑝𝑘subscript𝒢𝑘subscript𝛾𝑘1𝛽subscript𝑓𝑘subscript𝑄𝑘subscript𝑅𝑘subscript𝛾𝑘p_{k+1}=G(p_{k})-\mathcal{G}_{k}\gamma_{k}-(1-\beta)(f_{k}-Q_{k}R_{k}\gamma_{k})italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_G ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ( 1 - italic_β ) ( italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
8:     if pk+1pi<tolnormsubscript𝑝𝑘1subscript𝑝𝑖tol\|p_{k+1}-p_{i}\|<\text{tol}∥ italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ < tol return pk+1subscript𝑝𝑘1p_{k+1}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT as psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
9:end for

In this work, we consider several extensions to AA, including delaying the start of acceleration as well as non-stationary variants with variable damping and/or depth (subspace size), leading to Algorithm 3. When the initial iterate is far from the fixed-point, delaying acceleration can be advantageous in order to obtain a better initial iterate before enabling AA. Different approaches to adaptive damping in evans2020proof and chen2022non demonstrated significant improvements in convergence when compared to fixed damping values. The method in chen2022non computes an “optimal” damping value by solving a minimization problem at the cost of two additional function evaluations. To avoid extra evaluations, we use the heuristic approach from evans2020proof with the damping value

βk=0.9ωβΓkwithΓk=1(QTfk/fk)2,formulae-sequencesubscript𝛽𝑘0.9subscript𝜔𝛽subscriptΓ𝑘withsubscriptΓ𝑘1superscriptnormsuperscript𝑄𝑇subscript𝑓𝑘normsubscript𝑓𝑘2\beta_{k}=0.9-\omega_{\beta}\Gamma_{k}\quad\mathrm{with}\quad\Gamma_{k}=\sqrt{% 1-(\|Q^{T}f_{k}\|/\|f_{k}\|)^{2}},italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0.9 - italic_ω start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_with roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = square-root start_ARG 1 - ( ∥ italic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ / ∥ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (7)

where ΓksubscriptΓ𝑘\Gamma_{k}roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the AA convergence gain in the k𝑘kitalic_k-th iteration, Q𝑄Qitalic_Q is readily available from the QR solve in the AA optimization step, and ωβsubscript𝜔𝛽\omega_{\beta}italic_ω start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT is a tuning parameter. In chen2022composite the adaptive damping algorithm from chen2022non is combined with a composite version of AA as a means of adapting the depth and further improves the convergence rate. A different adaptivity approach is taken in pollock2023filtering where a filtering method is applied to remove columns from the history matrix, ksubscript𝑘\mathcal{F}_{k}caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, in order to control the condition number of the least-squares problem. This method also leads to improved convergence. In this work, we utilize a strategy similar to that employed in pollock2021anderson where the depth is determined by the magnitude of the current residual with

mk=min{m~k,mk1+1,mmax},subscript𝑚𝑘subscript~𝑚𝑘subscript𝑚𝑘11subscript𝑚𝑚𝑎𝑥m_{k}=\min\{\tilde{m}_{k},m_{k-1}+1,m_{max}\},italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_min { over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + 1 , italic_m start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT } , (8)

where m0=0subscript𝑚00m_{0}=0italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, m~k=max{0,log10(ωmfk)}subscript~𝑚𝑘0subscript10subscript𝜔𝑚normsubscript𝑓𝑘\tilde{m}_{k}=\max\{0,\lfloor-\log_{10}(\omega_{m}\|f_{k}\|)\rfloor\}over~ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_max { 0 , ⌊ - roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ) ⌋ }, and ωmsubscript𝜔𝑚\omega_{m}italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is a turning parameter.

Algorithm 3 Anderson Acceleration with delay and variable damping and depth

Input: p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, mmaxsubscript𝑚maxm_{\rm{max}}italic_m start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, kmaxsubscript𝑘maxk_{\rm{max}}italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, tol, d𝑑ditalic_d, and β𝛽\betaitalic_β       Output: psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

1:Set p1=βG(p0)+(1β)p0subscript𝑝1𝛽𝐺subscript𝑝01𝛽subscript𝑝0p_{1}=\beta G(p_{0})+(1-\beta)p_{0}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_β italic_G ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
2:for k=1,,kmax𝑘1subscript𝑘maxk=1,\ldots,k_{\rm{max}}italic_k = 1 , … , italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT do
3:     if kd𝑘𝑑k\leq ditalic_k ≤ italic_d then
4:         Set pk+1=βG(pk)+(1β)pksubscript𝑝𝑘1𝛽𝐺subscript𝑝𝑘1𝛽subscript𝑝𝑘p_{k+1}=\beta G(p_{k})+(1-\beta)p_{k}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_β italic_G ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ( 1 - italic_β ) italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
5:     else
6:         Select mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT using (8)
7:         Set 𝒢k=[ΔGkmk,,ΔGk1]subscript𝒢𝑘Δsubscript𝐺𝑘subscript𝑚𝑘Δsubscript𝐺𝑘1\mathcal{G}_{k}=[\Delta G_{k-m_{k}},\ldots,\Delta G_{k-1}]caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ roman_Δ italic_G start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_Δ italic_G start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] where ΔGi=G(pi+1)G(pi)Δsubscript𝐺𝑖𝐺subscript𝑝𝑖1𝐺subscript𝑝𝑖\Delta G_{i}=G(p_{i+1})-G(p_{i})roman_Δ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_G ( italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) - italic_G ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
8:         Set k=[Δfkmk,,Δfk1]subscript𝑘Δsubscript𝑓𝑘subscript𝑚𝑘Δsubscript𝑓𝑘1\mathcal{F}_{k}=[\Delta f_{k-m_{k}},\ldots,\Delta f_{k-1}]caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ roman_Δ italic_f start_POSTSUBSCRIPT italic_k - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , roman_Δ italic_f start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ] where Δfi=fi+1fiΔsubscript𝑓𝑖subscript𝑓𝑖1subscript𝑓𝑖\Delta f_{i}=f_{i+1}-f_{i}roman_Δ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and fi=G(pi)pisubscript𝑓𝑖𝐺subscript𝑝𝑖subscript𝑝𝑖f_{i}=G(p_{i})-p_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_G ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
9:         Determine γksubscript𝛾𝑘\gamma_{k}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that minimizes fkkγk2subscriptnormsubscript𝑓𝑘subscript𝑘subscript𝛾𝑘2\|f_{k}-\mathcal{F}_{k}\gamma_{k}\|_{2}∥ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - caligraphic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
10:         Select βksubscript𝛽𝑘\beta_{k}italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT using (7)
11:         Set pk+1=G(pk)𝒢kγk(1βk)(fkQkRkγk)subscript𝑝𝑘1𝐺subscript𝑝𝑘subscript𝒢𝑘subscript𝛾𝑘1subscript𝛽𝑘subscript𝑓𝑘subscript𝑄𝑘subscript𝑅𝑘subscript𝛾𝑘p_{k+1}=G(p_{k})-\mathcal{G}_{k}\gamma_{k}-(1-\beta_{k})(f_{k}-Q_{k}R_{k}% \gamma_{k})italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_G ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - ( 1 - italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
12:     end if
13:     if pk+1pi<tolnormsubscript𝑝𝑘1subscript𝑝𝑖tol\|p_{k+1}-p_{i}\|<\text{tol}∥ italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ < tol return pk+1subscript𝑝𝑘1p_{k+1}italic_p start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT as psuperscript𝑝p^{*}italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
14:end for

4 Implementation

To evaluate whether the application of AA improves the convergence rate of the LoDestro method, we utilize the Tango 1D transport solver parker2018bringing interfaced with the KINSOL nonlinear system solver from the SUNDIALS library hindmarsh2005sundials ; gardner2022enabling . Tango is an open-source Python code that both solves the 1D transport equation (3) by way of the linear system (5), and implements the coupling Algorithm 1. The turbulent fluxes can be computed either by coupling with the global gyrokinetic code, GENE goerler2011global ; genecode , to simulate unstable drift-wave (such as the ion-temperature-gradient mode)-driven turbulence or by using analytic models for rapid algorithmic development. For this work we utilize the latter option.

KINSOL provides a number of algorithms for solving nonlinear algebraic systems including an Anderson-accelerated fixed-point iteration. AA in KINSOL leverages several implementation optimizations for greater performance such as efficient QR factorization updates and solves for computing the acceleration weights fang2009two ; kresse1996efficiency ; walker2011anderson or the use of low synchronization orthogonalization methods within the QR solve to minimize the number of global reductions lockhart2022performance . For this work, we have extended KINSOL’s existing support for delay and constant damping with a fixed depth to allow for adapting the depth and/or damping. These new capabilities will be included in an upcoming SUNDIALS release. Through the interface between Tango and KINSOL, we can readily access an efficient implementation of AA and easily experiment with different variants of AA with minimal modification to existing Tango code.

The interfacing between Tango and KINSOL primarily consists of creating a class within Tango with a method to evaluate G(p)𝐺𝑝G(p)italic_G ( italic_p ) using Tango’s native functions for computing and splitting fluxes and setting up and solving the linear system. Additional class methods call KINSOL functions to configure the sovler based on command line options and solve the nonlinear system. As SUNDIALS packages are written in C, we leverage SWIG beazley1996swig to generate Python interfaces that align with the C API while allowing Tango to access data stored in SUNDIALS vectors as NumPy arrays without having to copy data between KINSOL and Tango data structures.

Algorithm 3 includes a number of parameters that can have a significant impact on performance. The optimal combination of settings is unclear a priori and depends on the application and problem considered. To automate parameter selection we utilize the GPTune package liu2021gptune , a Python-based auto-tuning framework using Bayesian optimization methodologies with support for multi-task learning, multi-objective tuning, sensitivity analysis, and many other features. For this work we use the single-task learning autotuning (SLA) capability in GPTune for a single-objective function. That is, we consider one configuration of the problem (1) when optimizing the algorithm parameters to minimize the number of iterations to reach a given solution tolerance. The problem class created for interfacing with KINSOL also streamlines using GPUTune, as the objective function only needs to construct the Tango problem class given a dictionary of input options, call the solve method, and return the number of solver iterations. Additionally, the driver program defines the input, output, and parameter spaces, as well as any constraints, to create the tuning problem. Once the problem is defined, the driver creates the GPTune object and runs the SLA method which calls Tango as needed through the objective function.

5 Numerical Results

In this section we present results applying AA to the LoDestro method for a nonlinear diffusion problem from shestakov2003self that mimics the challenges observed in turbulence-driven transport simulations. The problem is given by (1) on the interval x[0,1]𝑥01x\in[0,1]italic_x ∈ [ 0 , 1 ] with

q=Dpx+ϵwhereD=(1ppx)r,formulae-sequence𝑞𝐷𝑝𝑥italic-ϵwhere𝐷superscript1𝑝𝑝𝑥𝑟q=-D\frac{\partial p}{\partial x}+\epsilon\quad\text{where}\quad D=\left(\frac% {1}{p}\frac{\partial p}{\partial x}\right)^{r},italic_q = - italic_D divide start_ARG ∂ italic_p end_ARG start_ARG ∂ italic_x end_ARG + italic_ϵ where italic_D = ( divide start_ARG 1 end_ARG start_ARG italic_p end_ARG divide start_ARG ∂ italic_p end_ARG start_ARG ∂ italic_x end_ARG ) start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , (9)

and ϵitalic-ϵ\epsilonitalic_ϵ is spatially correlated, temporally white Gaussian noise that mimics the noise present in GENE simulations parker2018investigation . Below, we first consider cases without noise. The system is stiff with r=2𝑟2r=2italic_r = 2, and the stiffness increases with the value of r𝑟ritalic_r. The use of an analytic flux model allows for rapid testing and corresponds to running the turbulence code to saturation each coupling exchange. The source term in (1) is defined as

S(x)={1for0x0.10for0.1<x1,𝑆𝑥cases1for0𝑥0.10for0.1𝑥1S(x)=\begin{cases}1&\text{for}\quad 0\leq x\leq 0.1\\ 0&\text{for}\quad 0.1<x\leq 1\end{cases},italic_S ( italic_x ) = { start_ROW start_CELL 1 end_CELL start_CELL for 0 ≤ italic_x ≤ 0.1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL for 0.1 < italic_x ≤ 1 end_CELL end_ROW , (10)

and the boundary conditions are p(0)=0superscript𝑝00p^{\prime}(0)=0italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) = 0 and p(1)=0.01𝑝10.01p(1)=0.01italic_p ( 1 ) = 0.01. For the LoDestro method, the relaxation strength depends on the degree of stiffness, and the default value is β=0.6/r𝛽0.6𝑟\beta=0.6/ritalic_β = 0.6 / italic_r. AA setups with zero depth (m=0𝑚0m=0italic_m = 0) and constant damping correspond to applying the unaccelerated LoDestro method as implemented in Tango and serves as the baseline comparison point.

5.1 A Stiff Test Case

We first consider the stiff case (r=2𝑟2r=2italic_r = 2) and the impact of different AA configurations on the convergence of the iteration.

Fixed Damping, Depth, and Delay

To begin, we consider AA with static damping, β𝛽\betaitalic_β, and depth, m𝑚mitalic_m, sweeping over the parameter space to better understand how each impacts performance. Fig. 2 shows a heat map for the number of iterations to reach a residual of 1011superscript101110^{-11}10 start_POSTSUPERSCRIPT - 11 end_POSTSUPERSCRIPT for different β𝛽\betaitalic_β and m𝑚mitalic_m values with and without a delay, d𝑑ditalic_d, before starting AA. From the first row (m=0𝑚0m=0italic_m = 0, no acceleration), we see the default, β=0.3𝛽0.3\beta=0.3italic_β = 0.3, is near optimal in this case, but β=0.4𝛽0.4\beta=0.4italic_β = 0.4 can reduce the number of iterations by 16%percent1616\%16 %. However, increasing β𝛽\betaitalic_β too much leads to unphysical values or divergence (both denoted by empty cells).

Refer to caption
(a) Without Delay
Refer to caption
(b) Delay (d𝑑ditalic_d) AA by 1 iteration
Figure 2: Heat map of the number of iterations to reach a residual of 1011superscript101110^{-11}10 start_POSTSUPERSCRIPT - 11 end_POSTSUPERSCRIPT. Empty cells correspond to failed solves. Without a delay, m2𝑚2m\geq 2italic_m ≥ 2 is needed to see a benefit in iteration counts from AA. With a one iteration delay, AA consistently reduces the number of iterations and decreases sensitivity to the value of β𝛽\betaitalic_β.

Without delay, adding acceleration, m=1𝑚1m=1italic_m = 1, gives some robustness in the choice of β𝛽\betaitalic_β, enabling convergence for larger values than possible without AA. However, the convergence rate is improved only for β𝛽\betaitalic_β far from the optimal value. Increasing the depth, m>1𝑚1m>1italic_m > 1, yields more consistent gains with AA. Comparing the best results, AA takes approximately 14%percent1414\%14 % fewer iterations. For a non-optimal β𝛽\betaitalic_β, larger reductions in iteration count are realized due to the greater robustness and reduced sensitivity to the choice of β𝛽\betaitalic_β. Improved robustness in β𝛽\betaitalic_β with AA has also been observed in other multiphysics coupling applications hamilton2016assessment .

As the first iterate diverges from final solution before steadily approaching the correct value at each subsequent iteration, introducing a small delay before starting AA significantly strengthens the performance in all aspects. Generally, the results continue to improve with increased depth but with diminishing returns, especially near the optimal β𝛽\betaitalic_β. Delaying AA further can improve the results slightly but also reduces robustness with larger β𝛽\betaitalic_β and may increase iteration counts with smaller β𝛽\betaitalic_β. The size of the delay necessary will depend on the quality of the initial iterate, and the best depth value is often problem dependent.

Optimized Fixed Damping, Depth, and Delay

In the prior section we manually explored the parameter space to understand the impact of different options on algorithmic performance. In this section we investigate optimizing these selections with GPTune and examine the convergence history. We consider three sets of parameters to optimize: β𝛽\betaitalic_β with m=0𝑚0m=0italic_m = 0; β𝛽\betaitalic_β and m𝑚mitalic_m; and β𝛽\betaitalic_β, m𝑚mitalic_m, and d𝑑ditalic_d. Each set is optimized for three residual tolerances (106superscript10610^{-6}10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, 108superscript10810^{-8}10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT, and 1011superscript101110^{-11}10 start_POSTSUPERSCRIPT - 11 end_POSTSUPERSCRIPT). We set the initial number of GPTune parameter-space samples to 10 and the total number of samples to 50. The convergence histories for the best parameter values from each set are shown in Fig. 3.

[width=7.5cm]optimize_fixed_params_p2.pdf

Figure 3: Residual history for autotuned configurations. The default (faded circles) and one setup from Fig. 2(b) (faded stars) are included for reference. The autotuned results align well with the parameter sweep in Fig. 2.

In the first few iterations there is little difference between the various options with smaller β𝛽\betaitalic_β values performing slightly better. At more moderate residuals, optimizing β𝛽\betaitalic_β alone is one or two iterations better than the AA results. However, for values below 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT the AA setups give the best results with the optimized parameter set edging out the best result from Fig. 2(b) by one or two iterations. The benefit of AA increases for values less than 109superscript10910^{-9}10 start_POSTSUPERSCRIPT - 9 end_POSTSUPERSCRIPT where the convergence of the unaccelerated version slows. This result indicates that AA delivers accelerated convergence to a tight tolerance. Overall the optimized parameter sets from GPTune closely align with the best setups found with the manual sweep.

Adaptive Damping and Depth

In the previous section we saw that the best parameter configurations depend on the desired residual norm. Performance also depends significantly on whether the initial iterate is in the asymptotic convergence regime of the iterative scheme. In this section we evaluate AA with adaptive damping and/or depth to find setups that perform well across residual regimes. We again consider three sets of parameters to optimize: β𝛽\betaitalic_β with adaptive m𝑚mitalic_m; m𝑚mitalic_m and d𝑑ditalic_d with adaptive β𝛽\betaitalic_β, and adapting β𝛽\betaitalic_β and m𝑚mitalic_m together. We only optimize d𝑑ditalic_d in the fixed-m𝑚mitalic_m case as the strategy for adapting m𝑚mitalic_m automatically finds an optimum d𝑑ditalic_d by initially selecting m=0𝑚0m=0italic_m = 0. When adapting β𝛽\betaitalic_β and/or m𝑚mitalic_m, GPTune optimizes the factors ωβsubscript𝜔𝛽\omega_{\beta}italic_ω start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT and ωmsubscript𝜔𝑚\omega_{m}italic_ω start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in (7) and (8), respectively. As before, each set is optimized for the same three residual tolerances and we use the same numbers of initial and total parameter-space samples. The convergence history for the best parameter values from each set are shown in Fig. 4, and history of β𝛽\betaitalic_β or m𝑚mitalic_m from the adaptive cases are given in Fig. 5.

[width=7.5cm]optimize_adaptive_params_p2.pdf

Figure 4: Residual history for autotuned configurations with adaptive β𝛽\betaitalic_β and/or m𝑚mitalic_m. The adaptive results closely follow the autotuned fixed parameter results in Fig. 3 (faded triangles and squares included for reference).
Refer to caption
(a) β=0.37𝛽0.37\beta=0.37italic_β = 0.37 and adaptive m𝑚mitalic_m
Refer to caption
(b) Adaptive β𝛽\betaitalic_β, m=10𝑚10m=10italic_m = 10, and d=4𝑑4d=4italic_d = 4
Refer to caption
(c) Adaptive β𝛽\betaitalic_β and m𝑚mitalic_m
Refer to caption
(d) Adaptive β𝛽\betaitalic_β and m𝑚mitalic_m
Figure 5: Residual (left axis) and parameter history (right axis) for adaptive β𝛽\betaitalic_β and/or m𝑚mitalic_m configurations. Adapting m𝑚mitalic_m automatically selects the AA delay, and adapting β𝛽\betaitalic_β alone leads to values near the max allowed. Adapting β𝛽\betaitalic_β and m𝑚mitalic_m together leads to large, frequent changes in β𝛽\betaitalic_β slightly degrading performance compared to other setups.

Optimizing a fixed β𝛽\betaitalic_β and an adaptive m𝑚mitalic_m gives similar results to optimizing fixed β𝛽\betaitalic_β, m𝑚mitalic_m, and d𝑑ditalic_d. Fig. 5(a) shows that adaptive m𝑚mitalic_m delays AA six iterations and has a max m𝑚mitalic_m of three. Given the low sensitivity observed in Fig. 2(b), it is not surprising that, although the m𝑚mitalic_m and d𝑑ditalic_d are slightly different, the convergence history only differs by one iteration. Optimizing an adaptive β𝛽\betaitalic_β with fixed m𝑚mitalic_m and d𝑑ditalic_d has improved convergence at larger residuals, and, after five iterations, the setup is within two iterations of the best results. Fig. 5(b) shows that the performance gain at larger residuals is due to using a small β𝛽\betaitalic_β value initially before immediately growing to almost the maximum β𝛽\betaitalic_β once AA starts. This behavior occurs because an optimum fixed β𝛽\betaitalic_β is computed by GPTune in the delay region before AA starts, as ΓksubscriptΓ𝑘\Gamma_{k}roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is not available to estimate β𝛽\betaitalic_β. Once AA starts, the iteration switches to adaptive values and keeps β𝛽\betaitalic_β close to the max allowed.

Optimizing both β𝛽\betaitalic_β and m𝑚mitalic_m, is the least performant of the three configurations but, due to the low sensitivity to β𝛽\betaitalic_β and m𝑚mitalic_m, the results are still relatively close (within five iterations) of the other cases. Again adapting m𝑚mitalic_m automatically delays AA (five iterations) before settling on a max depth. Unlike when adapting β𝛽\betaitalic_β alone, there are large oscillations in the β𝛽\betaitalic_β after the initial delay. This variation in β𝛽\betaitalic_β appears to be the cause of lower overall performance and suggests that considering the gain history to reduce large, frequent changes in β𝛽\betaitalic_β may improve results.

Overall, in this stiff test case there are modest gains in numbers of iterations with AA compared to optimized damping values at large to moderate residual values. For residuals below 108superscript10810^{-8}10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT AA has a larger impact as convergence of the iteration without AA slows. However, AA greatly increases robustness and decreases sensitivity to the choice of β𝛽\betaitalic_β with a small delay in AA increasing these effects.

5.2 A Very Stiff Test Case

We now consider a very stiff case (r=10𝑟10r=10italic_r = 10) relevant to magnetic-fusion-energy problems and examine the impact of AA on convergence with optimized fixed and adaptive parameter configurations as in the prior two sections. Fig. 6 shows the convergence history for the default iteration (β=0.06𝛽0.06\beta=0.06italic_β = 0.06), an optimized damping value (β=0.14𝛽0.14\beta=0.14italic_β = 0.14), and three of the best performing AA configurations.

[width=7.5cm]optimize_p10.pdf

Figure 6: Residual history for autotuned setups with the default iteration (β=0.06𝛽0.06\beta=0.06italic_β = 0.06), optimized damping (β=0.14𝛽0.14\beta=0.14italic_β = 0.14), and various AA configurations. Markers are placed every 5 iterations. For a wide range of residuals, AA leads to reduced numbers of iterations, especially for smaller residual values.

As in the stiff case, optimizing β𝛽\betaitalic_β gives better results than the default β𝛽\betaitalic_β and, in this case, is generally the best option for larger residuals. Enabling AA with β=0.01𝛽0.01\beta=0.01italic_β = 0.01 and m=3𝑚3m=3italic_m = 3 consistently outperforms optimizing β𝛽\betaitalic_β alone for residuals below 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Again, delaying AA generally leads to better results overall and, given the larger initial residual and stronger damping applied in this case, the best AA configurations tend to have a longer delay. For residuals below 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, configurations with fixed β𝛽\betaitalic_β, m𝑚mitalic_m, and d𝑑ditalic_d or fixed β𝛽\betaitalic_β and adaptive m𝑚mitalic_m are the most efficient with results that differ by less than six iterations. With an adaptive m𝑚mitalic_m, AA is automatically delayed by twelve iterations, the same as in the fixed-delay case, but allows more depth going up to m=7𝑚7m=7italic_m = 7. Unlike in the stiff test, the depth does not immediately ramp up after the delay but instead keeps m𝑚mitalic_m constant for 2 to 7 iterations before increasing m𝑚mitalic_m. The stair-step behavior of m𝑚mitalic_m reflects the overall slower convergence in this case, which requires additional iterations before a residual reduction triggers an increase in m𝑚mitalic_m using the heuristic (8). Adapting β𝛽\betaitalic_β can also improve results but, like in the stiff case, does not generally perform as well as other setups due to large oscillations in β𝛽\betaitalic_β. Overall the results with this very stiff test are improved when comparing against the stiff case, with AA producing more significant gains over a larger range of residuals.

5.3 Adding Noise in a Stiff Test Case

Finally, we revisit the stiff test (r=2𝑟2r=2italic_r = 2) when accounting for fluctuations in q𝑞qitalic_q. The 5D (3 physical- plus 2 velocity-space dimensions) turbulence code indicated in Fig. 1 reaches a saturated state with short spatial- and temporal-scale fluctuations. (It is the electric fields involved in these fluctuations that kick particles across the magnetic surfaces, i.e., cause the transport.) Even when a heat or particle flux is integrated over four dimensions, the 1D flux, q𝑞qitalic_q, still has short-scale structure. We include the effects of these fluctuations on our coupling problem with the “noise” term, ϵitalic-ϵ\epsilonitalic_ϵ, in Eq. (9). We use the same spatially correlated, temporally white Gaussian noise model studied in parker2018investigation (see Figs. 4a and 5b in parker2018investigation for a comparison of the model with q𝑞qitalic_q from GENE).

(a) *
(b) *
Figure 7: Residual history and heat map for the number of iterations to reach a residual of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT both with a delay of two iterations. AA increases robustness and reduces sensitivity to β𝛽\betaitalic_β but only lowers iteration counts away from the optimal β𝛽\betaitalic_β.

[.49]Refer to caption [.49] Refer to caption

Fig. 7(a) shows the residual history for the default setup (β=0.3𝛽0.3\beta=0.3italic_β = 0.3) and a more optimal configuration (β=0.4𝛽0.4\beta=0.4italic_β = 0.4), with and without AA, along with a heat map for other combinations of β𝛽\betaitalic_β and m𝑚mitalic_m. Unlike in the cases without noise, the performance with AA is similar to that without AA when using the best β𝛽\betaitalic_β value. This lack of improvement is due to the higher, noise-dependent residual floor at which the iteration stagnates as noted in parker2018investigation for the LoDestro method and similarly observed in toth2017local when applying AA to problems with inaccurate function evaluations. While the residual floor limits the benefits in lowering iteration counts, AA still improves overall robustness and reduces sensitivity to β𝛽\betaitalic_β, which leads to faster convergence for non-optimal β𝛽\betaitalic_β values. This reduced sensitivity is significant as it shows that with AA, less manual tuning of β𝛽\betaitalic_β is necessary for an efficient solution approach.

6 Conclusions

In this work we used the KINSOL nonlinear solver library in SUNDIALS to evaluate the potential of several variations of Anderson acceleration to improve the convergence of the LoDestro method for coupled turbulence and transport simulations of a magnetically confined fusion plasma. Combining the flexibility of KINSOL with the GPTune autotuning library enabled rapid evaluation of the parameter space to compare the impact of different options on algorithmic performance.

For all test cases, AA allows for convergence with larger relaxation parameter values and decreases sensitivity to the choice of the relaxation parameter. Comparing to the optimal damping value, AA can provide a small decrease in the number of iterations for moderate residual values with the benefit increasing for smaller target residuals leading to a reduction of 20% or more in the number of iterations. Moreover, these benefits improve as the problem difficulty increases, going from stiff to very stiff tests; and for non-optimal damping values, the improvements are larger. Utilizing adaptive depth with AA provides a means for automatically selecting the delay and maximum allowed depth with results generally matching the best setups with optimized fixed parameter values. We note that use of GPTune removes the need for time-consuming hand optimizations of the parameters. When noisy fluxes are considered, AA does not provide an improvement in iterations with optimal damping due to the higher residual floor parker2018investigation . However, AA still increases robustness and reduces sensitivity to the damping parameter, which lowers iteration counts for non-optimal damping.

In future efforts we plan to assess the performance of AA with a modified analytic flux model that reflects the time dependence of the turbulence calculation as it is coupled to the transport code (see Fig. 1). We will also apply recently developed approaches for analyzing AA convergence de2022linear to better understand how the presence of noise and the addition of time dependence in the coupling impact convergence of the algorithm. Additionally, we will conduct studies to quantify parameter sensitivity in AA using GPUTune. Finally, the interfaces to KINSOL will allow for testing new variants of AA as they are integrated into the library, such as alternating and composite AA chen2022composite , without modifying Tango.

Acknowledgements.
The authors wish to acknowledge the initial explorations applying Anderson acceleration in Tango by Jeff Parker, Jack G. Paulson, and H. Hunter Schwartz and thank them for their help in using Tango and interfacing with KINSOL. We would also like to thank Cody J. Balos for his work in creating Python interfaces to KINSOL. Support for this work was provided in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) Program through the Frameworks, Algorithms, and Scalable Technologies for Mathematics (FASTMath) Institute, under Lawrence Livermore National Laboratory subcontract B626484 and DOE award DE-SC0021354. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of Fusion Energy Sciences, Scientific Discovery through Advanced Computing (SciDAC) program. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-JRNL-864392.
\ethics

Competing Interests The authors have no conflicts of interest to declare that are relevant to the content of this chapter.

References

  • (1) Anderson, D.G.: Iterative procedures for nonlinear integral equations. Journal of the ACM (JACM) 12(4), 547–560 (1965). DOI 10.1145/321296.321305
  • (2) Beazley, D.M., et al.: SWIG: An easy to use tool for integrating scripting languages with C and C++. In: Tcl/Tk Workshop, vol. 43, p. 74 (1996). DOI 10.5555/1267498.1267513
  • (3) Chen, K., Vuik, C.: Composite Anderson acceleration method with two window sizes and optimized damping. International Journal for Numerical Methods in Engineering 123(23), 5964–5985 (2022)
  • (4) Chen, K., Vuik, C.: Non-stationary Anderson acceleration with optimized damping. arXiv preprint arXiv:2202.05295 (2022)
  • (5) De Sterck, H., He, Y.: Linear asymptotic convergence of Anderson acceleration: fixed-point analysis. SIAM Journal on Matrix Analysis and Applications 43(4), 1755–1783 (2022). DOI 10.1137/21M1449579
  • (6) Evans, C., Pollock, S., Rebholz, L.G., Xiao, M.: A proof that Anderson acceleration improves the convergence rate in linearly converging fixed-point methods (but not in those converging quadratically). SIAM Journal on Numerical Analysis 58(1), 788–810 (2020). DOI 10.1137/19M1245384
  • (7) Fang, H.r., Saad, Y.: Two classes of multisecant methods for nonlinear acceleration. Numerical linear algebra with applications 16(3), 197–221 (2009). DOI 10.1002/nla.617
  • (8) Gardner, D.J., Reynolds, D.R., Woodward, C.S., Balos, C.J.: Enabling new flexibility in the SUNDIALS suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software (TOMS) 48(3), 1–24 (2022). DOI 10.1145/3539801
  • (9) GENE Developers: Gyrokinetic Electromagnetic Numerical Experiment (GENE). URL http://genecode.org/. Accessed 2024-05-06
  • (10) Goerler, T., Lapillonne, X., Brunner, S., Dannert, T., Jenko, F., Merz, F., Told, D.: The global version of the gyrokinetic turbulence code GENE. Journal of Computational Physics 230(18), 7053–7071 (2011). DOI 10.1016/j.jcp.2011.05.034
  • (11) Hamilton, S., Berrill, M., Clarno, K., Pawlowski, R., Toth, A., Kelley, C., Evans, T., Philip, B.: An assessment of coupling algorithms for nuclear reactor core physics simulations. Journal of Computational Physics 311, 241–257 (2016). DOI 10.1016/j.jcp.2016.02.012
  • (12) Hindmarsh, A.C., Brown, P.N., Grant, K.E., Lee, S.L., Serban, R., Shumaker, D.E., Woodward, C.S.: SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software (TOMS) 31(3), 363–396 (2005). DOI 10.1145/1089014.1089020
  • (13) Kelley, C.T.: Numerical methods for nonlinear equations. Acta Numerica 27, 207–287 (2018)
  • (14) Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Computational materials science 6(1), 15–50 (1996). DOI 10.1016/0927-0256(96)00008-0
  • (15) Liu, Y., Sid-Lakhdar, W.M., Marques, O., Zhu, X., Meng, C., Demmel, J.W., Li, X.S.: GPTune: Multitask learning for autotuning exascale applications. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 234–246 (2021). DOI 10.1145/3437801.3441621
  • (16) Lockhart, S., Gardner, D.J., Woodward, C.S., Thomas, S., Olson, L.N.: Performance of low synchronization orthogonalization methods in Anderson accelerated fixed point solvers. In: Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing, pp. 49–59. SIAM (2022). DOI 10.1137/1.9781611977141.5
  • (17) Loffeld, J., Woodward, C.S.: Considerations on the implementation and use of anderson acceleration on distributed memory and gpu-based parallel computers. In: Advances in the Mathematical Sciences: Research from the 2015 Association for Women in Mathematics Symposium, pp. 417–436. Springer (2016). DOI 10.1007/978-3-319-34139-2˙21
  • (18) Parker, J.B., LoDestro, L.L., Campos, A.: Investigation of a multiple-timescale turbulence-transport coupling method in the presence of random fluctuations. Plasma 1(1), 126–143 (2018). DOI 10.3390/plasma1010012
  • (19) Parker, J.B., LoDestro, L.L., Told, D., Merlo, G., Ricketson, L.F., Campos, A., Jenko, F., Hittinger, J.A.: Bringing global gyrokinetic turbulence simulations to the transport timescale using a multiscale approach. Nuclear Fusion 58(5), 054004 (2018). DOI 10.1088/1741-4326/aab5c8
  • (20) Pollock, S., Rebholz, L.G.: Anderson acceleration for contractive and noncontractive operators. IMA Journal of Numerical Analysis 41(4), 2841–2872 (2021). DOI 10.1093/imanum/draa095
  • (21) Pollock, S., Rebholz, L.G.: Filtering for Anderson acceleration. SIAM Journal on Scientific Computing 45(4), A1571–A1590 (2023). DOI 10.1137/22M1536741
  • (22) Post, D.E., Batchelor, D.B., Bramley, R.B., Cary, J.R., Cohen, R.H., Colella, P., Jardin, S.C.: Report of the fusion simulation project steering committee. Journal of fusion energy 23, 1–26 (2004). DOI 10.1007/s10894-004-1868-0
  • (23) Shestakov, A., Cohen, R., Crotinger, J., LoDestro, L., Tarditi, A., Xu, X.: Self-consistent modeling of turbulence and transport. Journal of Computational Physics 185(2), 399–426 (2003). DOI 10.1016/S0021-9991(02)00063-3
  • (24) Toth, A., Ellis, J.A., Evans, T., Hamilton, S., Kelley, C.T., Pawlowski, R., Slattery, S.: Local improvement results for Anderson acceleration with inaccurate function evaluations. SIAM Journal on Scientific Computing 39(5), S47–S65 (2017). DOI 10.1137/16M1080677
  • (25) Toth, A., Kelley, C.T.: Convergence analysis for Anderson acceleration. SIAM Journal on Numerical Analysis 53(2), 805–819 (2015). DOI 10.1137/130919398
  • (26) Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM Journal on Numerical Analysis 49(4), 1715–1735 (2011). DOI 10.1137/10078356X