[type=editor, auid=000,bioid=1, orcid=0000-0001-0000-0000] \cormark[1] \fnmark[1]

\credit

Conceptualization of this study, Writing - Original draft preparation

1]organization=Max-Planck-Institute for Meteorology, addressline=Bundessstrasse 53, city=Hamburg, postcode=20146, country=Germany

\cortext

[cor1]Corresponding author

\nonumnote

This article presents scenarios for future numerical weather prediction operational computing through federated computing, data handling and machine learning.

What if? Numerical weather prediction at the crossroads

Peter Bauer peter.bauer@mpimet.de [
Abstract

This paper provides an outlook on the future of operational weather prediction given the recent evolution in science, computing and machine learning. In many parts, this evolution strongly deviates from the strategy operational centres have formulated only several years ago. New opportunities in digital technology have greatly accelerated progress, and the full integration of computational science in numerical weather prediction centres is common knowledge now. Within the last few years, a vast machine learning research community has emerged for creating new and tailor-made products, accelerating processing and - most of all - creating emulators for the entire production of global forecasts that outperform traditional systems. In this context, the role of both numerical models and observations is changing from being equation to data driven. Analyses and reanalyses are becoming the new currency for training machine learning, and operational centres are in a powerful position as they generate these datasets based on decades worth of experience. This environment creates incredible opportunities to progress much faster than in the past but also uncertainties about what the strategic implications on defining cost-effective and sustainable research and operations are, and how to achieve sufficient high-performance computing and data handling capacities. It will take individual national public services a while to understand what to focus on and how to coordinate their substantial investments in staff and infrastructure at institutional, national and international level. This paper addresses this new situation operational weather prediction finds itself in through formulating the most likely "what if?" scenarios for the near future and provides an outline for how weather centres could adapt.

keywords:
Numerical weather prediction \sepMachine learning \sepHigh-performance computing
{highlights}

The time-critical operational production of numerical weather forecast by individual centres is reaching affordable computational limits.

Machine learning promises alternative options but requires complementary investments in physics based simulations and reanalyses.

Future cost-effective analysis and forecast production needs a federated approach to extreme-scale computing and data handling.

1 Introduction

Numerical weather prediction (NWP) is the foundation for civil protection services to society, it serves many industries with dependencies on the environment, and it is in every citizen’s mind when planning and managing their lives and well-being. With the growing impacts of extreme weather on lives and infrastructures in the context of climate change, the expectations on greater reliability of forecasts are increasing faster than ever before.

The steady progress of forecasting skill is well documented and known as the quiet revolution of NWP (Bauer et al., 2015). The meteorological community is likely one of the best organised science and service communities in terms of globally concerted investments in billion-dollar observing systems, the free and open exchange of data, the coordination of research and development, and the standardisation and quality control of outputs. These responsibilities are shared between national and international weather prediction centres under the umbrella of the World Meteorological Organization (WMO) (Brunet et al., 2023). The more recent thrust of private companies into this domain has further accelerated progress and led to a very diverse weather service ecosystem serving global and local needs at the same time.

Digital technologies have always been one of the main enablers of NWP, mostly for performing the forecast model calculations fast enough to generate timely output. High-performance computing (HPC) has provided the backbone for forecast production since the 1950s. The key specifications of simulation models, e.g., spatial resolution, time steps, Earth-system complexity, and observational data volumes used in data assimilation have grown commensurate with the available (and affordable) computing power (Michalakes, 2020) by a million times in the past 20 years.

The almost natural progression of NWP skill following technology evolution is coming to an end, because we are reaching the upper limits of (i) processor capacity for roughly the same cost according to Moore’s law and Dennard scaling (Shalf, 2020), and of (ii) the achievable sustained computing performance of complex NWP codes on this technology. This means that further enhancements ultimately need bigger and more expensive machines so that acquisition and operation cost are likely to become unaffordable for operational weather centres.

Machine learning (ML) promises alternative options if not a way out of this dilemma. The underlying methods have re–emerged after decades of dormancy because of a processor and software technology revolution that was stimulated by a vast range of commercial applications outside environmental science. The huge data amounts and computing capacities available today have turned 20th century’s ML into 21st century’s deep learning111For simplicity, we will use the term ’machine learning’ (ML) in the remainder of the paper..

This situation creates an environment of great optimism but also fear of the public losing ownership over this important domain. It is fuelled by substantial public funding for ML and a huge push from private companies discovering the profits to be made in weather and climate services. ML offers accelerated progress to the public players but requires new levels of science and technology coordination, including the need to respond faster to what technology has to offer and how research can be translated into operational benefit while still maintaining the community’s quality standards (Frolov et al., 2024; Bauer et al., 2023).

While it is easy to see that weather centres will need to adjust quickly and, in parts, radically to the rapid developments, it is more difficult to see how operational predictions will actually be done in practice in the future. For this purpose, this paper describes what-if scenarios and possible pathways towards them:

  1. 1.

    What if the quiet revolution has reached its limits and large km-scale ensembles – the scientific reference output – cannot be run operationally on the next two generations of supercomputers at the weather-centres?

  2. 2.

    What if both data assimilation used for creating initial conditions and forecasts based on ML will be unbeatable by physics-based models for operational predictions?

  3. 3.

    What if the present engagement of ’big tech’ companies with NWP will continue and result in alternative operational products?

  4. 4.

    What if ML models continue to grow in size (e.g., via the use of foundation models) and the training of ML applications will become the largest HPC application in Earth science?

For the rest of the paper, we assume that all of the what-if scenarios become reality to address our headline concern, namely that computing and data handling cost will become unaffordable following traditional ways. If adjusted, the computing cost of the actual operational forecasts will be reduced significantly when using ML, and the main drivers for compute power for NWP will be km-scale model simulations, ML training dataset generation, and ML training. These tasks will be too big for the local supercomputing facilities at individual weather centres, so that the following adaptations by the NWP community in Europe and world-wide will become necessary:

  1. (A)

    The time-critical production suite for both initial conditions and operational forecasts will be based on ML inference.

  2. (B)

    Continuous reference dataset production cycles will be performed to create the next-generation training datasets.

  3. (C)

    The costly generation of training datasets is shared between operational centres and third-party programmes to optimise the use of national and international computing resources and to democratise the outcomes.

  4. (D)

    The provision of both intellectual and digital infrastructure resources will be governed through a sustainable public-private partnership framework.

  5. (E)

    Software and data management capability and challenges will change massively and data handling will need to be addressed as a federated data problem.

The paper will provide arguments why the what-if scenarios are realistic in Section 2 and explain in more detail what the anticipated changes will mean for the NWP community in Section LABEL:section:what.

2 Why the what-if scenarios are likely

2.1 Background on forecast quality

Today, operational centres tend to produce longer-term, visionary strategies with a 5-10-year focus, and then more detailed, technical implementation plans including targets for operational system version upgrades for 1-5 years. The former align with big investments such as new HPC infrastructures while the latter align with regular core budget cycles and externally funded research efforts and service programmes.

These strategies are guided by service commitments that advance the value for society returned from the investments made in public centres, the translation of these services into science ambitions and requirements for the necessary technology. The national centre strategies also account for the role of the international service ecosystem and commercial providers, and include the research and development funding opportunities offered from third-party programmes. Here, WMO plays an important role for international strategic research collaboration and for the coordination of long-term commitments to Earth observation.

Refer to caption
Figure 1: Time series of deterministic forecast skill of ECMWF high-resolution model expressed as the anomaly correlation (= correlation between forecasts and verifying analyses, normalised by climatological signal) for the 500 hPa height in the northern hemisphere at 6-day lead time for the ML-model AIFS (blue), today’s numerical model IFS (red) and its 2016 version also used to create the ERA5 reanalysis (black). The figure is reproduced from Lang et al. (2024).

Today’s shorter-term implementation planning is based on the assumption that science and technology developments need to advance at a steady pace, that the core of the operational production is based on numerical systems to be run over time-critical schedules on big machines preferably owned by NWP centres, and that weather and climate derivative services inherit this infrastructure to also help produce their next version of service suites. In addition, commercial company business models mostly create added value on top of public data output, but there are exceptions where large companies aim to run independent prediction systems (like The Weather Company) and act as primary Earth observation data providers (like SPIRE). This approach has paid off in the past and established a framework of predictable progress and a well-funded and acknowledged public service space complemented by sufficient opportunities for business.

Weather forecast skill is measured by comparing forecasts with observations and analyses. The latter are produced by combining short-range forecasts with observations through data assimilation. Observations are sparse in space and time and it takes a physics based, dynamical model to describe the physically consistent evolution of all weather parameters at all grid points. These analyses are also used to initialise the next forecasts.

The continual cycle of observing, analysing and forecasting feeds forecast verification with solid statistical data from days to seasons. Figure 1 shows a well-known example of NWP forecast skill evolution from the European Centre for Medium-Range Weather Forecasts (ECMWF). The graphs show the synoptic-scale 500 hPa geopotential height in the northern hemisphere, which represents a metric to denote the skill to predict weather-patterns such as fronts. All operational centres produce such skill monitoring and contribute to well established global performance assessments supervised by WMO.

The figure illustrates the steady growth of medium-range skill, here at day 6, overlaid by an annual cycle in the northern hemisphere showing more variability due to smaller-scale weather systems that are harder to predict in summer. The operational system (red) overtakes the skill of the ERA-5 reanalysis (black) in 2017 since ERA-5 is based on the 2016 operational model version. Before this, ERA-5 performs better as it is a more recent model version applies to past periods. The ML-based AIFS (blue) has outperformed the other systems for this skill score straight away since its publication in 2022.

The cornerstones of skill improvement are well documented: stable and accurate numerical schemes to solve the equations of motion and thermodynamics, so-called physical parametrisations that describe the impact of subgrid-scale processes like radiation and clouds on the resolved scales, data assimilation methods combining simulated and observed data to produce best estimates of the state of the system at a given time, and ensemble methods to characterise uncertainties and how they propagate from initial states through forecasts. Increasingly, weather models have included land, sea-surface wave and ocean sub-models as these are relevant for weather, and more seamlessly represent processes ranging from days to seasons. The steady improvements in forecast skill for the IFS over time for several decades represents the quiet revolution of steady progress in NWP (Bauer et al., 2015).

Steady improvements in the past meant that centres, supported by the wider research community, aim to improve these components within the existing framework and without radically changing the underlying numerical system framework. Improvements in the observations as well as the numerical methods and algorithms that are used for both the forecast model and data assimilation have led to significant skill over time, but a leading driver for improvements is the increase of spatial resolution as it reduces the range of unresolved and therefore approximated processes with immediate benefits when describing flow in mountainous areas, deep convection and cloud formation, but also atmosphere-land interaction and waves (Wedi, 2014). The limited-area prediction systems at such very high resolution that all national centres maintain to produce more reliable forecasts locally are also testimonies to this fact.

At present, the ultimate goal for global prediction is to resolve storms and even cloud systems as these are responsible for the vertical energy transfer and interact with the large-scale circulation influencing weather globally. This goal translates to spatial resolutions of 1 km or better (Stevens et al., 2019; Wedi et al., 2020; Hohenegger et al., 2023), and is also relevant for climate simulations predicting the future of global weather (Palmer and Stevens, 2019; Hewitt et al., 2022; Rackow et al., 2022).

Km-scale forecasts also require km-scale data assimilation to create the initial conditions, which is presently not only impossible due to HPC limitations but also because today’s data assimilation methods are not equipped for dealing with non-linear processes acting across this range of scales (Carrassi et al., 2018). The same data assimilation methods are also used to create reanalyses of the weather evolution over past decades, for which the same considerations apply. Lastly, also the km-scale vision includes the need for quantifying uncertainty through ensembles (Palmer, 2020), which always multiplies the computational load.

While global km-scale models are expected to provide a significant push in realism and forecast quality, and are exploited in programmes such as Destination Earth in Europe (Commission, 2024), storm and even cloud resolving simulations result in nearly insurmountable computational hurdles for operational global weather and climate prediction alike (Schulthess et al., 2018). This burden is becoming unmanageable for operational weather centres. ECMWF is, for example, currently running 50-member ensemble forecasts at 9 km resolution operationally twice per day. If their compute budget would remain at a similar level, the expected improvements in computing power from the next generations of supercomputers would not suffice to push the resolution of operational ensembles to 1 km. This is because a km-scale ensemble would approximately be 9x9x9 of the current cost.

This is why operational reanalyses and high-resolution simulations in ensemble mode are only performed by a few centres given the substantial research, testing and operational constraints, many of which are linked to computing. Many of the output products are shared with the wider community at no cost, and increasingly open-data policies are applied. Examples in Europe are reanalyses and other products that are funded by the European Commission (Copernicus, 2024b), a sub-set of the operational ECMWF data (ECMWF, 2024b) and NOAA’s Global Forecasting System output (NOAA, 2024).

2.2 Background on high-performance computing

The traditional approach of most national centres in Europe, USA, Japan, China, India, but also ECMWF for the acquisition and operation of HPC resources is to buy or lease a system over which the centre has full control during the contracted lifetime. There are deviations from this principle where several, smaller centres share a system (e.g., Nordic countries in Europe) or manage an allocation on a larger system that is owned by another public entity (e.g., Bureau of Meteorology in Australia, MeteoSwiss in Europe). These HPC systems usually comprise both computing and data handling resources and are contracted for four years or longer with HPC providers.

These HPC procurements have a lead time of at least two years used for exploring the target requirements and potential vendors, and about one year is spent on system installation and acceptance testing before the active operational period starts. The most important disadvantage of this approach is that technology moves fast and that the HPC system contains already outdated components by the time it becomes operational. While selected components can be upgraded in phases during contracts, the basic architecture remains unchanged, and vast deviations from a chosen computing/data handling configuration with implications on networks and power/cooling supply are impossible. The growing complexity of both NWP codes and HPC systems makes the cost effective transfer of new science into operations therefore increasingly difficult.

Table 1: Key data of selected HPC systems from operational NWP and other centres (ECMWF: European Centre for Medium-Range Weather Forecasts, NOAA: National Oceanic and Atmospheric Administration, JMA: Japan Meteorological Agency, KMA: Korean Meteorological Agency, CSC: Finnish Information Technology Center for Science (LUMI: Large Unified Modern Infrastructure), CINECA: Consorzio Interuniversitario del Nord-Est per il Calcolo Automatico, BSC: Barcelona Supercomputing Center, CSCS: Swiss National Supercomputing Centre, DOE: Department of Energy). 1Note that performance figures may denote first phase of a system to be upgraded in future phases of a contract.2Performance achieved with High-Performance Linpack (HPL) benchmark in double precision (Dongarra et al., 2003).
Centre, country Sustained performance1,2 Power consumption Main vendor/chip provider:
(Pflop/s) (MW) (Nodes)
ECMWF, Europe 4x7 4x1.2 Atos/AMD: 4 x 1,890 CPU
Met Office, UK 50 N/A HPE/AMD: N/A
NOAA, USA 2x10 N/A HPE/AMD: 2 x 2,560 CPU
JMA, Japan 2x13.4 2x0.9 Fujitsu/Fujitsu: 9,216 CPU
KMA, Korea 2x18 2x3.3 Lenovo/Intel: 2 x 4,023 CPU
CSC, Finland/Europe (LUMI) 380 7.1 HPE/AMD: 2,048 CPU, 2,978 GPU)
CINECA, Italy/Europe (Leonardo) 241 7.5 Atos/NVIDIA: 1,536 CPU, 3,456 GPU
BSC, Spain/Europe (MareNostrum5) 175 4.2 Atos/Intel&NVIDIA: 6,408 CPU, 1,120 GPU
CSCS, Switzerland (Alps) 270 5.2 HPE/AMD&NVIDIA: 1,024 CPU, 2,688 GPU
Riken, Japan (Fugaku) 442 29.9 Fujitsu/Fujitsu: 158,976 CPU
DOE, USA (Frontier) 1,206 22.8 HPE/AMD: 9,408 GPU

More recently, NWP centres have started to outsource some of their post-processing and data handling to commercial providers using cloud-based services. Some of these are based on a software service platform owned by selected NWP centres serving the wider community (e.g. European Weather Cloud). The Copernicus services in Europe also piggy-back on the operational centre systems by paying for computing and data handling allocations, and by contracting their own cloud-based services in their vicinity. This makes the adaptation of codes based on NWP software easier and avoids costly data transfers to separate infrastructures.

The first operational centres to break away from the prime HPC-ownership model have been NOAA and the UK Met Office. NOAA sub-contracted General Dynamics Information Technology (GDIT) (HPCWire, 2023) in 2020 and the Met Office sub-contracted Microsoft in 2021 (HPCWire, 2021). Both are ten-year commitments with total envelopes over $500 million and £1.2 billion, respectively. The expectation is that HPC-as-a-service will reduce the centre-internal spin-up and enhance technological agility, eventually delivering better value for the investment. In how far this choice is economical and independent from vendor specific solution lock-in remains to be seen, but HPC-as-a-service appears to be scalable beyond what is presently available through the public ownership model.

Table 1 summarises the key computing figures of the largest NWP centres and adds, for comparison, the leading European HPC systems co-funded by the European Commission through the EuroHPC programme and selected Department of Energy (DOE) and Japan’s RIKEN machines. They provide between 150-300 Pflop/s (1015 (Peta) floating point operations per second at 64 bit) sustained performance, a performance which is mostly derived from benchmarking idealised numerical problems on large GPU allocations that maximise computational intensity.

Refer to caption
Figure 2: Illustration of typical daily HPC node-hour allocation at NWP centres such as ECMWF: twice-daily, time-critical production of initial conditions and forecasts, other operational workloads such as reforecasts and limited-area models, continuous reanalysis generation and research experiments. Capability is the maximum available node allocation, capacity is the full node allocation integrated over time.

Most of the present operational NWP suite and workflow set-ups already stretch the existing system capacities. First, the prediction suites produce the initial conditions (called analyses) for forecasts based on the latest incoming observations and short-range forecasts from previous cycles. This is done multiple times per day. The computational load for analyses and medium-range forecasts is very similar as the former are based on solving a complex global four-dimensional optimisation problem ingesting 100s of millions of new observations, and the latter solves the complex numerical Earth-system equation framework in thousands of time steps for the next days to weeks. Both analyses and forecasts also include separate, ensemble based suites that produce uncertainty estimates, and these dominate the computing cost as they multiply the cost by as many ensemble members are used.

For an individual HPC system this means that the analyses and forecast suites determine the minimum required number of compute nodes that need to be allocated to an analysis or forecast cycle at a given time. As an example, ECMWF’s 51-member data assimilation ensemble occupies presently about 1,600 HPC nodes and the ensemble forecast about 2,500 nodes of the system in Table 1. Each takes about one hour to complete so that forecasts can be issued as quickly as possible. This means that already today, a substantial part of ECMWF’s present allocation for operations of their ca. 8,000 node system is used for calculating either analyses or forecasts in daily, time-critical production. This is illustrated in Figure 2 where initial condition and forecast generation reach the upper limits of a substantial part of the HPC system (capability) and overall capacity (= allocation over time) is fully used by the sum of all operational and research workloads.

Projecting this capacity into the future is revealing the challenge. The procurement of the next-generation HPC systems is usually based on a 5-year extrapolation of the analysis and forecast suites in terms of spatial resolution and ensemble size. At ECMWF, this has led to a doubling of horizontal resolution of the single-shot forecast in 24 months in the past (Schulthess et al., 2018). However, if this rate were to be maintained, and the number of ensemble members would stay the same, the next generation of HPC systems at national centres would approach the size of the present EuroHPC infrastructures (see Table 1), requiring acquisition budgets in the range €150-250 million. It is not obvious that publicly funded centre budgets will be able to afford such amounts given the economic conditions in times where other geo-political pressures dominate. As a baseline, this means that individual NWP centres may struggle to afford HPC systems that can sustain more than a 100 Pflop/s (1015 floating-point operations per second), which may also consume more than 10 MW.

On a general principle, HPC limitations to progress in NWP are not new. Both weather and climate communities have invested in augmenting the computing performance of their simulations for more than a decade, and the concerns about the scalability of complex Earth-system models has produced several joint projects at both national and international level (Lawrence et al., 2018; Schulthess et al., 2018; Müller et al., 2019; Bauer et al., 2021; Govett et al., 2024).

Realistically though, present-day models produce only a few percent sustained floating-point performance because the ratio between computation and data movement is fairly poor (Carman et al., 2017). Investments in parallelisation and memory access patterns improve on this ratio, and a factor of 10 and more acceleration gains can be obtained for individual model components from this while also benefiting from the core counts in GPUs (Müller et al., 2019; Dahm et al., 2023). These efforts are only partly supported by generic programming model options though and involve deep-dives into algorithms and code design. Porting to other, more specialised processors like field programmable gate arrays (FPGA) are possible and may produce good performance and energy efficiency gains. However, they only work for specific tasks, are tedious to carry out and almost impossible to generalise (Targett et al., 2021). While the community has spent significant efforts and made very good progress to make global NWP models portable and more efficient, a step-change that meets the above stated requirements cannot be expected only from code adaptation to heterogeneous processor and memory technologies in the coming years.

An important aspect is electrical power consumption that ranges from about 5 MW for the operational centre systems to 30 MW for the leading flagship computers shown in Table 1. This translates to about $5-30 million energy cost per year, ideally to be drawn from renewable resources. This can be more easily achieved in northern countries where free cooling is more effective and wind and hydro-power sources are abundant, but this is not the case for the existing systems. As the energy–per–flop ratio is not decreasing for modern HPC hardware and more flop/s are needed for running higher-resolution and more complex models, the overall power consumption at most compute sites will approach hard limits. Apart from affordability, this makes it likely that weather centres may need to move their compute facility into regions with more sustainable energy availability in the future.

The present European, UK, US, Japanese and Korean weather-centre HPC systems are between 10% and 25% of the 150-300 Pflop/s dimension. They are still heavily CPU processor based with capacities between 1,000-8,000 nodes. They also include smaller allocations (order of 10s) of GPU nodes, mostly dedicated to ML and research. At ECMWF, a realistic assumption is that about half of the centre’s capacity is used for operational production, the other half for research and other projects. Data handling capacity follows suit. Operational data output rates are presently about 100 TByte/day and will reach 1 PByte/day in the next few years.

2.3 Background on machine learning

In recent years, computing technology supporting commercial artificial intelligence (AI) applications has boomed, easy-to-use and efficient software environments became available, and large public funding programs have appeared. This environment has stimulated a large wave of research projects. It is also fuelled by substantial commercial investments that spill into NWP. Consequently, the data-based training for solving larger and more complex tasks has grown at an unprecedented rate.

During the initial phase of this development, weather and climate centres focused on hybrid ML applications that couple ML tools with the conventional prediction workflow, while big technology companies, including NVIDIA, Google Deep Mind, Huawei and Microsoft, were the first to realise that ML models that emulate the entire forecast production can actually compete and, in fact, outperform physics-based NWP systems in both deterministic and ensemble predictions (Pathak et al., 2022; Bi et al., 2022; Lam et al., 2022; Price et al., 2023). The capability to build such models was clearly underestimated by the weather community when initial feasibility experiments were performed (Dueben and Bauer, 2018).

Consequently, ML has changed the NWP landscape significantly during the last two years, and has had impact on the basic understanding of how forecasts are produced. Weather and climate modelling centres are now catching up quickly – for example, ECMWF has pushed their first global ML model (called AIFS) into semi-operational use (Lang et al., 2024). The quality of predictions in terms of forecast scores of the AIFS represents a step-change when compared to physics based models (see Figure 1) and pure ML models are able to predict many kinds of extreme events and show much more physical consistency than originally expected by domain scientists (Bouallègue et al., 2024). It is unlikely that the use of physics based NWP models will stop entirely anytime soon, as they provide the physical-process based insight that is necessary to understand performance. They are also less vulnerable to represent singular events like aerosol injections into the atmosphere from volcanic outbreaks that ML models are more difficult to train for. However, the recent developments of ML models seem to mark the end of the quiet revolution of physics based models, at least in terms of daily, time-critical production. Today’s business-as-usual approach will be replaced by a new generation of ML models that produce most of the predictions for end-users in the near future.

One of today’s most important questions is whether ML can also be used for data assimilation as successfully as for forecast modelling. In terms of information content, it seems hard to imagine that the presently available observations, despite comprising 100s of millions of satellite and conventional data points per day, can describe the four-dimensional state-space of a global Earth-system at, say, 5 km resolution and 100 vertical levels. However, ML appears to be suitable to replace all individual steps that are performed in data assimilation from observation operators, to observation quality control, to error estimation for observations and the model, to interpolation in space and time, to the blending of information from observations and the model. Further, conventional methods only exploit 5-10% of the truly available observational data volume (e.g. ECMWF (2024c)) and it appears unlikely that the full state-vector for a day that has 1014-15 degrees of freedom represents independent variables that are all necessary to provide good predictions for the most influential model fields such as precipitation, 2-metre temperature of 10-metre surface winds. It is therefore not a big surprise that first tests where ML performs the full data assimilation process provide very promising results (Huang et al., 2024; Vaughan et al., 2024).

The next challenges for ML in the weather prediction domain are the push of ML models to represent the full Earth system including land, ocean, sea-ice, and waves. First ML-ocean models are already available (Wang et al., 2024a). This also leads to a closer investigation of the usability of ML models for climate predictions noting that first, so-called Atmospheric Model Intercomparison Project (Gates et al., 1999) type simulations have already been successful. Another question is whether hybrid approaches that couple a ML model with a conventional model (Kochkov et al., 2023) will eventually be outperformed by pure ML methods in terms of forecast scores and forecast consistency. ML methods enhancing conventional-model output also exist, for example, to perform online bias corrections within the model simulations (Bonavita and Laloyaux, 2020; Laloyaux et al., 2022), or for post-processing and spatial down-scaling (Harris et al., 2022; Bouallègue et al., 2024).

Currently, the cost for the training of pure ML models is still significantly lower than executing high-resolution physics based models, even with tens of thousands of node-hours spent for the training of the larger models. However, as the ML models are pushed to higher resolution and are trained for ensemble forecasts using, for example, diffusion and ensemble score-based loss functions (Price et al., 2023; Pacchiardi et al., 2024), the cost for training is increasing rapidly.

Furthermore, the trend in ML learning is moving towards so-called foundation models. These are ML models that can be used for several application areas and are trained using several different input and output data types in a representation learning approach. Their training is gap-filling the dataset and therefore learning to probabilistically transfer different input streams in space and time to retrieve the chosen outputs. The first results with such foundation models in Earth system science are also very promising (Lessig et al., 2023; Nguyen et al., 2023; Bodnar et al., 2024).

As the training datasets and networks for foundation models are much larger when compared to task-based models, they have the potential to become the largest HPC application in the domain of weather and climate predictions in the near future (Wang et al., 2024b) and will likely have a significant impact on how ML is used for weather forecasting. However, they still need to show that they can eventually produce better results for important task-based applications when compared to pure ML-emulator NWP models.

An important aspect of ML research is that it has already established a wide public-private collaboration environment where private actors aim to create and exploit new business opportunities that were not accessible to them in the classical NWP world. Public actors, of course, understand the huge potential for accelerating progress and benefiting from substantial funding programmes that national governments release in order to remain competitive. The work and power distribution between public and private entities has therefore changed (Bauer et al., 2023), in particular, as some of the big technology companies seem determined to turn their research results into operational weather products for their users. But if commercial companies stop sharing their data-driven models and start patenting their products (Cheon and Mun, 2023; ClimateAI, 2024), this public-private collaboration environment may change yet again.

3 How would NWP centres adapt?

Given today’s situation, the main question therefore is: how can operational NWP centres create an affordable environment to maintain and even accelerate progress realising the urgent need to deliver better services for a society that is increasingly exposed to the impacts of extreme weather?

Help could come from an alternative scenario to the present operational practice towards a more efficient and agile technical prediction system set-up. It would also boost scientific research, the pressure for managing operational workloads at NWP centres would reduce, and it would come with a more cost-effective use of digital technology and HPC resources:

  1. (A)

    The time-critical production suite for both analyses and forecasts including uncertainty quantification would be based on ML inference:

    • the inference suites would cover global, regional and local scales, and would include suites targeted at, for example, hydrology, air quality, agriculture or coastal management;

    • the refresh cycles would be shortened and include on-demand suites that are only activated when necessary, for example in situations leading towards a flooding event;

    • the inference suites would be open source and easily transferable between centres, HPC systems, and countries.

  2. (B)

    There would be a continuous, shortened reference dataset production cycle for generating the next-generation training datasets:

    • several reference data generation suites including reanalyses, model simulations and observational data sets would be produced in parallel depending on the training target - global vs regional or weather vs climate focused;

    • a specific effort would be created to identify, promote and advance numerical models that offer the best physical realism and perform well when combined with observations;

    • beyond the consolidation of the already available and openly accessible observational framework, a specific effort would be made for ingesting experimental, internet-of-things (IoT), and commercial datasets as fast as possible.

  3. (C)

    The costly generation of training datasets would be shared between operational centres and third-party providers to optimise the use of computing resources and democratise the outcomes of their use:

    • the allocation of resources at individual centres would remain reasonable while the collective allocation would go beyond the full size of any single HPC system;

    • sustainable funding programmes external to NWP centres would be included, for example national/international research and development funds, space agencies, and digital-technology programmes.

  4. (D)

    The provision of both intellectual and digital infrastructure resources would be governed through a sustainable public-private partnership framework:

    • the global public space would be shared by a few high-performance generic data producers and many specialised data users that also feed user specific, tailormade suites for a wide range of local and topical applications into the community.

    • extreme-scale HPC resources would be drawn from selected national and international centres that also serve other scientific communities, but where weather (and climate) data production obtains sufficiently large and sustainable allocations;

    • private companies would support the public parties with computing and software services but also provide anonymised observational and application specific data; in turn, private companies would have access to quality controlled generic data generated by public entities.

  5. (E)

    Data handling and access would be turned into a federated data programme:

    • the output data management would be handled through a decentralised infrastructure approach with large active data spaces near the production site for fast data analytics access, sufficient long-term storage elsewhere, and dedicated high-speed network links between selected high-performance data producers to facilitate access to large holdings;

    • output from ML inference would be produced locally from initial conditions that are shared;

    • an end-to-end workflow based on ML that includes data assimilation would help to reduce the need to share large dataset as observational data is sparse;

    • federated compute power would be available next to the federated datasets and post-processing would include cloud based services;

    • reference datasets (reanalyses, model simulations, observations) could be transformed into each other via foundation models; all data would employ the same API and uncertainty representation;

    • weather and climate domain specific text and knowledge repositories would be created, maintained and enhanced that feed large-language models helping to interpret prediction system output and to turn data into information.

Such a scenario clearly deviates from the present philosophy of only relying on a single, entirely physics-numerical methods based prediction system that generates analyses and forecasts at all scales and serves all downstream applications on time-critical schedules. The scenario does not abolish such a system but rather introduces a clear distinction between what needs to run in near real-time, how ML training data is generated and how the remaining numerical methods based systems are managed. It is unlikely that physics based predictions will be removed entirely from operations, but the fraction of compute power used for time-critical numerical vs ML models will shift significantly.

Our scenario also introduces the notion of a much wider reaching collaboration framework that is necessary to share workloads and make the best use of the collectively available HPC resources rather than the single centre–single production–single HPC infrastructure thinking. The need for sharing the multi-lateral computing resources also implies a more democratised approach to data and software for users everywhere. Eventually, this will also allow a much easier transfer of knowledge and value to less developed countries.

Table 2 shows the proposal for a timeline of the evolution from present-day operations towards our scenario. The node allocations are ballpark figures based on the existing performance and what could be expected from the acceleration by ML. The main message of this table is that operational centres could achieve smaller node-hour allocations for their time-critical tasks while still investing in both numerical systems and the upgrades to ML-based suites. The heavy workloads requiring larger machines than affordable for individual NWP centres will increasingly be moved to external HPC centres. These are storm-resolving ensemble simulations at 1-5km (called x𝑥xitalic_xkm) and reanalyses. Daily production for observations mostly refers to space and meteorological agencies while regular production outside NWP centres refers to Tier 1 & 0 HPC infrastructures maintained by large national/international programmes.

Table 2: Scenario for three stages of future evolution of NWP-centre operational production. Research experimentation and research-to-operations testing is not included. Node allocations are rough estimates based on experience with today’s technology. Numbers in brackets denote duration of calculations. GPU nodes include CPU processors. x𝑥xitalic_xkm refers to resolutions of several km.
Operational NWP centres HPC centres
Time-critical daily cycles (duration) Regular production (duration) Daily cycles (duration) Regular production (duration)
Today Observation processing, numerical analyses & forecasts: 1,000-5,000 CPU nodes (hours) Experimental ML forecast inference: 1 GPU node (minutes) Numerical reanalyses: 500 CPU nodes (years) Experimental ML forecast training: 10s GPU nodes (weeks) Observational data gathering and pre-processing & parameter retrievals: 100 CPU nodes (hours) Observational data reprocessing: 100 CPU nodes (months) Numerical km-scale simulations: 1,000 GPU nodes (months)
Shorter-term
(1-3 years) Observation processing, ML accelerated numerical analyses, ML forecasts: 1,000 CPU-GPU nodes (minutes-hours) Experimental ML analyses inference: 10 GPU nodes (minutes) Experimental ML x𝑥xitalic_xkm-scale forecast inference: 10 GPU nodes (minutes) Numerical reanalyses: 500-1,000 CPU nodes (years) Experimental ML analysis training: 100s GPU nodes (weeks) Experimental ML km-scale forecast training: 100-1000 GPU nodes (weeks) Observational data gathering and pre-processing, ML parameter retrievals: 100 CPU nodes, 10 GPU nodes (minutes-hours) Observational data reprocessing: 100 CPU nodes (months) Experimental ML reanalysis training and inference: 100-1000 GPU nodes (months) ML accelerated numerical km-scale simulations: 1,000-10,000 GPU nodes (months) Experimental ML foundation models: 100 GPU nodes (weeks-months)
Longer-term
(3-5 years) Observation processing, ML analyses & x𝑥xitalic_xkm-scale forecasts: 10-100 GPU nodes ML accelerated numerical reanalyses: 500-1,000 CPU nodes (months) ML reanalyses: 100-500 GPU nodes (months) ML analysis and reanalysis training: 100 GPU nodes (weeks-months) Observational data gathering and pre-processing, ML parameter retrievals: 100 CPU nodes, 10-100 GPU nodes (minutes-hours) Observational data reprocessing: 100 CPU nodes ML km-scale simulations: 1,000 GPU nodes (months) ML foundation models: 1,000 GPU nodes (months)

3.1 Implications on high-performance computing

The adoption of this approach translates to the following set-up of the operational analysis and forecast suites at weather centres. Today, the operational, ML-inference based production suites can be run on a few GPUs in seconds-minutes. Due to more sophisticated ML architectures, the focus on ensemble predictions, and an increase in resolution and complexity (e.g., adding ocean and land components) will cause a significant increase in the computing cost. However, ML inference will still be much cheaper when compared to today’s physics based models. Re-training of the largest models and foundation models from scratch would only be necessary when scientifically justified, but it could be afforded multiple times per year. For smaller-sized domains/products, re-training is much cheaper and could be updated more frequently as required by specific products and applications. Or it can focus on the frequent training of so-called tail networks that adjust the large and generic foundation models to specific application needs.

However, the generation of training datasets will also generate very large cost. As an example, we take the plans for the future ECMWF reanalysis ERA-6 that is funded by the European Commission’s Copernicus programme. ERA reanalyses rely on numerical model and data assimilation advances provided by ECMWF’s operational system and add previously inaccessible observational datasets and features helping to create a seamless multi-decadal, physically consistent time series of weather (Hersbach et al., 2020). Such global reanalyses provide boundary conditions for limited-area systems that focus on regions of special interest at higher resolution but usually covering shorter time periods (e.g., in the Arctic: (Bromwich et al., 2016)).

Upgrades of these reanalyses are presently lagging behind operational suites in terms of the computing cost drivers (model version, spatial resolution, ensemble size) by about 8-10 years, and new versions are produced every 8 years or so. Since forecast skill presently increases by a rate of about 1 day of lead time per decade (Bauer et al., 2015), creating new reanalysis versions at least every 2 years is well justified if it wasn’t for the substantial computing effort and most users relying on copying vast amounts of reanalysis data to feed their needs.

According to first estimates (H. Hersbach, pers. communication), ERA-6 is expected to run its main analysis and medium-range forecast suites as an 11-member ensemble, of which the control analysis occupies at least 12 nodes and the 10 perturbed members probably slightly less. If we assume 100 nodes for the ensemble to be run in multiple parallel streams for the period 1950-2025, it would take 4 streams and a continuous allocation of 400 nodes to complete 40 analysis years in 1.5 years, or 80 years in 3 years. A more ambitious set-up with more parallel streams, higher resolution and with bigger ensembles would clearly challenge present NWP centre HPC capacities as this workload would be added on top of the daily operational tasks.

Faster progress in NWP also means a capacity for running highly realistic, i.e., high-resolution, more complex Earth-system models in both forecast and analysis mode, and in both deterministic and ensemble mode to drive innovation. Apart from running research experiments towards this goal, also longer-period trial suites in a near-operational setting need to be executed in a timely manner. This is so that statistically representative datasets can be created for performance and stability assessments, and that innovation can be transferred into operations. Also here, the present 5-10-year innovation cycle is too long. If km-scale coupled atmosphere-ocean-land simulations with a throughput of 1 simulated year per day are the target, the requirement for node allocations would be of the order of an entire machine like Alps at CSCS (see Table 1). This is not affordable now or in the near future by individual centres. This also applies to the resulting data handling challenge (Hoefler et al., 2023).

Advancing the speed of the creation of these reference datasets will require both acceleration by ML where possible and deployment on larger infrastructures than individual NWP centres can afford. These datasets will become one of the most important assets of the NWP community, and their accelerated development requires investments in new ways of collaboration.

3.2 Implications on collaborations

Refer to caption
Figure 3: Conceptual model of future work distribution framework between operational NWP centres, data providers and national/international HPC centres, which also interface with digital technology companies and the wider public-private service ecosystem and other data spaces. Note that both data providers and HPC centres have further links to other data spaces and public-private services that are not shown here.

Figure 3 provides a concept for the interplay between selected operational, HPC and NWP centres as well as data providers. This concept deviates from the existing single-entity focus and introduces much more collaboration on computing and data handling to collectively stem the imminent challenges.

The conceptual model would only be effective for those NWP centres, which are in charge of producing the most demanding analysis and forecast datasets. Such centres would co-produce these datasets at extreme scale, namely very high-resolution coupled reanalyses and very high-resolution simulations providing the most realistic simulations of weather. These so-called master datasets would serve the wider community, spawn research and be increasingly delegated to HPC centres having the extreme-scale HPC capacities that NWP centres will ultimately lack. The master datasets will increasingly include ML components and, depending on the ongoing research, could be complemented by foundation models. This should limit the HPC needs of NWP centres to about where they are at present and, as more and more cost-effective ML workloads provide the operational output, release sufficient resources for ML training and physics based research. Based on these master datasets, many other refinements for specific applications can be placed with additional data and models.

The option of running generic NWP suites within flexible software and cluster configuration environments at multiple HPC centres is already being prototyped, for example at the Swiss CSCS (Alam et al., 2023). Their so-called vcluster (versatile software-defined clusters) allows to allocate specific portions of the same HPC system to workloads that require different (hardware dependent) software environments, software stacks and therefore allow separate change management. This infrastructure-as-code approach introduces much more flexibility for the portability of complex workloads between machines. Since Switzerland is part of the LUMI consortium, such an approach could scale across northern European countries that can also more easily fulfil the sustainable power and cooling requirements for very large systems. Also ECMWF is already installing a Tbit/second datalink to CSCS and CINECA’s Leonardo, which could become a part of such a distributed technical set-up, and setting up parts of their modelling workflows on several EuroHPC machines as part of Destination Earth.

An essential input for this evolution (shown in Table 2) and co-production will be the collection and standardisation of the observational data records from operational and commercial providers as well as data from individual field campaigns. Ideally, this data pool is continually updated, documented and replenished. The mechanisms existing at space agencies and governed by WMO may be sufficient to manage this data space on a general principle, but they do not include sufficient public-private data governance concepts.

Analyses and reanalyses only using observations without additional forecasts may be the ultimate outcome. Today’s operational observational database at ECMWF amounts to approximately 150 GByte/day including 800 million observations. This data has already been quality controlled and reduced to avoid redundancy. Once produced at the NWP centre, they can be easily shared with other data analytics users, even if this volume would grow by, say, an order of magnitude in the next few years. As a host and to democratise usage, the observational database in Figure 3 would become a central part of inter-agency collaboration and between data providers and users to make ML methods more effective.

The increase in workloads outsourced from NWP to HPC centres produces technical and governance implications. Some of the technical ones are already being tested by Destination Earth through the so-called data bridges in Figure 3, next to the presently largest European HPC infrastructures called LUMI, Leonardo and MareNostrum5 (see also Table 1). These data bridges come with several PBytes worth of storage both inside and outside the firewall-protected HPC systems that users can access. Data is continuously streamed from the HPC production suites onto these bridges, and users can make use of a vast range of data management tools as part of Destination Earth’s so-called digital-twin engine (ECMWF, 2024a). Similar open-access infrastructures should be placed near other centres following the same template and deploy similar software and data analytics services.

All these systems also provide access to federated and generic cloud services, to which post-processing and other production tasks can be delegated, avoiding the need to move datasets in the PByte size range. The Copernicus Climate Change Service (Copernicus, 2024a) can serve as an example how open data is efficiently made available to a very large user base. There are also ideas for scaling this up to a wider federated data handling concept (Hoefler et al., 2023).

The implication on governance of resources owned by national extreme-scale HPC centres and/or administered by internationally funded allocations (e.g., EuroHPC) needs to be addressed with urgency. The workloads in this concept would go beyond classical one-off research experiments managed by competitive calls as they need sustainable allocations of thousands of GPU nodes for weeks to months. And, computing resource management has to go hand in hand with data management. In our scenario, the HPC centres need an approach for securing sustainable allocations and an operational workflow and dataflow management system that interfaces with the NWP-centres for these allocations. The cost for this resource could be shared between countries.

Since the leading HPC centres collaborate with digital technology companies through their contracts and a well established network, these companies can also support the growing NWP workloads with access to new technology, even running selected large workloads and through co-developing software infrastructures. In turn, the commercial sector can benefit from master datasets for training ML methods feeding their business models and contribute to setting up quality standards for NWP data driven models but also large-language models that will help interpret data and user interaction (Bauer et al., 2023). This should be the foundation of a new public-private partnership model that serves both, pending mutually beneficial agreements on code and data sharing.

4 Concluding remarks

This paper has been motivated by the concern that the operational production set-up of NWP centres, mostly based on owning and fully operating HPC and software infrastructures, will soon reach its limits. It is likely that the drivers of computing and data handling needs in NWP will soon be the generation of ML training datasets based on km-scale model simulations and comprehensive observational dataset records, but also foundation models, and that these will ultimately require HPC resources that individual NWP centres can not afford. They will even stretch the limits of the largest national HPC centres.

The take-over of ML suites in time-critical operational production would enhance efficiency and more flexibility with regards to the ingestion of new science and new products. It does not mean, however, that traditional, first-principles based numerical modeling loses importance as it merely simplifies production. Also the most advanced physics based simulations reach computing and data limits, but can be managed more easily outside time-critical production and through efforts concerted between NWP centres, a wide range of top-tier HPC centres and digital technology providers.

This adds a new level of coordination and dependencies between centres, and we argue that the outsourcing of the largest computing and data handling demands onto the available HPC infrastructures needs to be co-funded by several countries and happen in co-development with digital technology companies to benefit from the latest technology.

The link between public entities and commercial companies providing both digital technologies and methodological ML solutions relies on mutual benefits. If future extreme-scale HPC architectures will entirely rely on low-precision (e.g., 16 bit or less) processing to support artificial-intelligence applications, the generation of reference simulations and reanalyses will not be supported on these machines. And, if companies start patenting their algorithms the presently rather open exchange on fundamental ML science may struggle to survive. Weather extremes and climate change are important enough to invest in partnerships that provide for sustainable science development and commercial interests.

Our concept would also break the single centre – single vendor approach that will ultimately limit cost effectiveness. If the shared NWP-HPC centre concept is applied in various countries at the same time, it will benefit from a wider range of technologies and solutions and can react with more agility to the fast paced technology evolution. There are already developments in Europe, funded by national programmes and the European Commission, that can be used to prepare such a federated framework. In principle, our collaboration model adds a new stage to why ECMWF has been founded 50 years ago: pool resources to move beyond what individual centres could do, except that pooling in a single place would now become a federation.

Also the climate modeling community is looking into concepts that produce more cost-effective decadal and centennial simulations on the largest HPC infrastructures in the world and add a more NWP-type approach for developing the next generation of models and transferring them into production, but also making the data available more quickly and with more data analytics support (Stevens et al., 2023). There are many common elements between weather and climate prediction in terms of their HPC requirements and the role for acceleration and model surrogation through ML. Some of the solutions in our concept may expand on the needs of both communities, which would be another important efficiency gain.

Author contribution

pb conceptualised and wrote this paper.

Competing interests

The author declares that there are no competing interests.

Acknowledgements

The author would like to thank Peter Dueben for his substantial contributions to this work, Torsten Hoefler for providing valuable comments and proposals for improvement. He would also like to thank Hans Hersbach for sharing first estimates of the ERA-6 configuration, acknowledging that these may still change.

References

  • Alam et al. [2023] Sadaf R Alam, Miguel Gila, Mark Klein, Maxime Martinasso, and Thomas C Schulthess. Versatile software-defined hpc and cloud clusters on alps supercomputer for diverse workflows. The International Journal of High Performance Computing Applications, 37(3-4):288–305, 2023.
  • Bauer et al. [2015] Peter Bauer, Alan Thorpe, and Gilbert Brunet. The quiet revolution of numerical weather prediction. Nature, 525(7567):47–55, 2015.
  • Bauer et al. [2021] Peter Bauer, Peter D Dueben, Torsten Hoefler, Tiago Quintino, Thomas C Schulthess, and Nils P Wedi. The digital revolution of earth-system science. Nature Computational Science, 1(2):104–113, 2021.
  • Bauer et al. [2023] Peter Bauer, Peter Dueben, Matthew Chantry, Francisco Doblas-Reyes, Torsten Hoefler, Amy McGovern, and Bjorn Stevens. Deep learning and a changing economy in weather and climate prediction. Nature Reviews Earth & Environment, 4(8):507–509, 2023.
  • Bi et al. [2022] Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast. arXiv preprint arXiv:2211.02556, 2022.
  • Bodnar et al. [2024] Cristian Bodnar, Wessel P Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, et al. Aurora: A foundation model of the atmosphere. arXiv preprint arXiv:2405.13063, 2024.
  • Bonavita and Laloyaux [2020] Massimo Bonavita and Patrick Laloyaux. Machine learning for model error inference and correction. Journal of Advances in Modeling Earth Systems, 12(12):e2020MS002232, 2020. https://doi.org/10.1029/2020MS002232. URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2020MS002232. e2020MS002232 10.1029/2020MS002232.
  • Bouallègue et al. [2024] Zied Ben Bouallègue, Jonathan A Weyn, Mariana CA Clare, Jesper Dramsch, Peter Dueben, and Matthew Chantry. Improving medium-range ensemble weather forecasts with hierarchical ensemble transformers. Artificial Intelligence for the Earth Systems, 3(1):e230027, 2024.
  • Bouallègue et al. [2024] Zied Ben Bouallègue, Mariana C A Clare, Linus Magnusson, Estibaliz Gascón, Michael Maier-Gerber, Martin Janoušek, Mark Rodwell, Florian Pinault, Jesper S Dramsch, Simon T K Lang, Baudouin Raoult, Florence Rabier, Matthieu Chevallier, Irina Sandu, Peter Dueben, Matthew Chantry, and Florian Pappenberger. The rise of data-driven weather forecasting: A first statistical assessment of machine learning-based weather forecasts in an operational-like context. Bulletin of the American Meteorological Society, 2024. 10.1175/BAMS-D-23-0162.1. URL https://journals.ametsoc.org/view/journals/bams/aop/BAMS-D-23-0162.1/BAMS-D-23-0162.1.xml.
  • Bromwich et al. [2016] David H Bromwich, Aaron B Wilson, Le-Sheng Bai, George WK Moore, and Peter Bauer. A comparison of the regional arctic system reanalysis and the global era-interim reanalysis for the arctic. Quarterly Journal of the Royal Meteorological Society, 142(695):644–658, 2016.
  • Brunet et al. [2023] Gilbert Brunet, David B Parsons, Dimitar Ivanov, Boram Lee, Peter Bauer, Natacha B Bernier, Veronique Bouchet, Andy Brown, Antonio Busalacchi, Georgina Campbell Flatter, et al. Advancing weather and climate forecasting for our changing world. Bulletin of the American Meteorological Society, 104(4):E909–E927, 2023.
  • Carman et al. [2017] Jessie C Carman, Thomas Clune, Francis Giraldo, Mark Govett, Anke Kamrath, Tsengdar Lee, David McCarren, John Michalakes, Scott Sandgathe, and Tim Whitcomb. Position paper on high performance computing needs in earth system prediction. 2017.
  • Carrassi et al. [2018] Alberto Carrassi, Marc Bocquet, Laurent Bertino, and Geir Evensen. Data assimilation in the geosciences: An overview of methods, issues, and perspectives. Wiley Interdisciplinary Reviews: Climate Change, 9(5):e535, 2018.
  • Cheon and Mun [2023] Minjong Cheon and Changbae Mun. The climate of innovation: Ai’s growing influence in weather prediction patents and its future prospects. Sustainability, 15(24):16681, 2023.
  • ClimateAI [2024] ClimateAI. Climateai has u.s. patent granted for genai-based approach applied to weather forecasting, 2024. https://climate.ai/blog/climateai-patent-genai-applied-to-weather-forecasting/#:~:text=This%20newly%20patented%20ClimateAi%20system,biases%20in%20current%20weather%20models.
  • Commission [2024] European Commission. Building a highly accurate digital twin of the earth, 2024. https://destination-earth.eu.
  • Copernicus [2024a] Copernicus. The copernicus climate data store, 2024a. https://cds.climate.copernicus.eu/#!/home.
  • Copernicus [2024b] Copernicus. List of noaa open data dissemination program datasets, 2024b. https://www.copernicus.eu/en/access-data.
  • Dahm et al. [2023] Johann Dahm, Eddie Davis, Florian Deconinck, Oliver Elbert, Rhea George, Jeremy McGibbon, Tobias Wicky, Elynn Wu, Christopher Kung, Tal Ben-Nun, et al. Pace v0. 2: a python-based performance-portable atmospheric model. Geoscientific Model Development, 16(9):2719–2736, 2023.
  • Dongarra et al. [2003] Jack J Dongarra, Piotr Luszczek, and Antoine Petitet. The linpack benchmark: past, present and future. Concurrency and Computation: practice and experience, 15(9):803–820, 2003.
  • Dueben and Bauer [2018] Peter D Dueben and Peter Bauer. Challenges and design choices for global weather and climate models based on machine learning. Geoscientific Model Development, 11(10):3999–4009, 2018.
  • ECMWF [2024a] ECMWF. The digital twin engine, 2024a. https://stories.ecmwf.int/the-digital-twin-engine/.
  • ECMWF [2024b] ECMWF. Ecmwf releases a much larger open dataset, 2024b. https://www.ecmwf.int/en/about/media-centre/news/2024/ecmwf-releases-much-larger-open-dataset.
  • ECMWF [2024c] Reading/bonn/Bologna ECMWF. Ecmwf observational data monitoring, 2024c. https://www.ecmwf.int/en/forecasts/quality-our-forecasts/monitoring-observing-system#Satellite.
  • Frolov et al. [2024] Sergey Frolov, Kevin Garrett, Isidora Jankov, Daryl Kleist, Jebb Q Stewart, and John Ten Hoeve. Integration of emerging data-driven models into the noaa research to operation pipeline for numerical weather prediction. Bulletin of the American Meteorological Society, 2024.
  • Gates et al. [1999] W Lawrence Gates, James S Boyle, Curt Covey, Clyde G Dease, Charles M Doutriaux, Robert S Drach, Michael Fiorino, Peter J Gleckler, Justin J Hnilo, Susan M Marlais, et al. An overview of the results of the atmospheric model intercomparison project (amip i). Bulletin of the American Meteorological Society, 80(1):29–56, 1999.
  • Govett et al. [2024] Mark Govett, Bubacar Bah, Peter Bauer, Dominique Berod, Veronique Bouchet, Susanna Corti, Chris Davis, Yihong Duan, Tim Graham, Yuki Honda, et al. Exascale computing and data handling: Challenges and opportunities for weather and climate prediction. Bulletin of the American Meteorological Society, 2024.
  • Harris et al. [2022] Lucy Harris, Andrew TT McRae, Matthew Chantry, Peter D Dueben, and Tim N Palmer. A generative deep learning approach to stochastic downscaling of precipitation forecasts. Journal of Advances in Modeling Earth Systems, 14(10):e2022MS003120, 2022.
  • Hersbach et al. [2020] Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, et al. The era5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049, 2020.
  • Hewitt et al. [2022] Helene Hewitt, Baylor Fox-Kemper, Brodie Pearson, Malcolm Roberts, and Daniel Klocke. The small scales of the ocean may hold the key to surprises. Nature Climate Change, 12(6):496–499, 2022.
  • Hoefler et al. [2023] Torsten Hoefler, Bjorn Stevens, Andreas F Prein, Johanna Baehr, Thomas Schulthess, Thomas F Stocker, John Taylor, Daniel Klocke, Pekka Manninen, Piers M Forster, et al. Earth virtualization engines: a technical perspective. Computing in Science & Engineering, 25(3):50–59, 2023.
  • Hohenegger et al. [2023] Cathy Hohenegger, Peter Korn, Leonidas Linardakis, René Redler, Reiner Schnur, Panagiotis Adamidis, Jiawei Bao, Swantje Bastin, Milad Behravesh, Martin Bergemann, et al. Icon-sapphire: simulating the components of the earth system and their interactions at kilometer and subkilometer scales. Geoscientific Model Development, 16(2):779–811, 2023.
  • HPCWire [2021] HPCWire. Behind the met office’s procurement of a billion-dollar microsoft system, 2021. https://www.hpcwire.com/2021/05/13/behind-the-met-offices-procurement-of-a-billion-dollar-microsoft-system/.
  • HPCWire [2023] HPCWire. Gdit expands noaa supercomputing capacity for advanced national weather forecasting, 2023. https://www.hpcwire.com/off-the-wire/gdit-expands-noaa-supercomputing-capacity-for-advanced-national-weather-forecasting/.
  • Huang et al. [2024] Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler. Diffda: a diffusion model for weather-scale data assimilation. arXiv preprint arXiv:2401.05932, 2024.
  • Kochkov et al. [2023] Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, James Lottes, Stephan Rasp, Peter Düben, Milan Klöwer, et al. Neural general circulation models. arXiv preprint arXiv:2311.07222, 2023.
  • Laloyaux et al. [2022] Patrick Laloyaux, Thorsten Kurth, Peter Dominik Dueben, and David Hall. Deep learning to estimate model biases in an operational nwp assimilation system. Journal of Advances in Modeling Earth Systems, 14(6):e2022MS003016, 2022. https://doi.org/10.1029/2022MS003016. URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2022MS003016. e2022MS003016 2022MS003016.
  • Lam et al. [2022] Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, et al. Graphcast: Learning skillful medium-range global weather forecasting. arXiv preprint arXiv:2212.12794, 2022.
  • Lang et al. [2024] Simon Lang, Mihai Alexe, Matthew Chantry, Jesper Dramsch, Florian Pinault, Baudouin Raoult, Mariana CA Clare, Christian Lessig, Michael Maier-Gerber, Linus Magnusson, et al. Aifs-ecmwf’s data-driven forecasting system. arXiv preprint arXiv:2406.01465, 2024.
  • Lawrence et al. [2018] Bryan N Lawrence, Michael Rezny, Reinhard Budich, Peter Bauer, Jörg Behrens, Mick Carter, Willem Deconinck, Rupert Ford, Christopher Maynard, Steven Mullerworth, et al. Crossing the chasm: how to develop weather and climate models for next generation computers? Geoscientific Model Development, 11(5):1799–1821, 2018.
  • Lessig et al. [2023] Christian Lessig, Ilaria Luise, Bing Gong, Michael Langguth, Scarlet Stadler, and Martin Schultz. Atmorep: A stochastic model of atmosphere dynamics using large scale representation learning. arXiv preprint arXiv:2308.13280, 2023.
  • Michalakes [2020] John Michalakes. Hpc for weather forecasting. Parallel Algorithms in Computational Science and Engineering, pages 297–323, 2020.
  • Müller et al. [2019] Andreas Müller, Willem Deconinck, Christian Kühnlein, Gianmarco Mengaldo, Michael Lange, Nils Wedi, Peter Bauer, Piotr K Smolarkiewicz, Michail Diamantakis, Sarah-Jane Lock, et al. The escape project: energy-efficient scalable algorithms for weather prediction at exascale. Geoscientific Model Development, 12(10):4425–4441, 2019.
  • Nguyen et al. [2023] Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K Gupta, and Aditya Grover. Climax: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343, 2023.
  • NOAA [2024] NOAA. List of noaa open data dissemination program datasets, 2024. https://www.noaa.gov/nodd/datasets.
  • Pacchiardi et al. [2024] Lorenzo Pacchiardi, Rilwan A Adewoyin, Peter Dueben, and Ritabrata Dutta. Probabilistic forecasting with generative networks via scoring rule minimization. Journal of Machine Learning Research, 25(45):1–64, 2024.
  • Palmer [2020] Tim Palmer. A vision for numerical weather prediction in 2030. arXiv preprint arXiv:2007.04830, 2020.
  • Palmer and Stevens [2019] Tim Palmer and Bjorn Stevens. The scientific challenge of understanding and estimating climate change. Proceedings of the National Academy of Sciences, 116(49):24390–24395, 2019.
  • Pathak et al. [2022] Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, et al. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214, 2022.
  • Price et al. [2023] Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Timo Ewalds, Andrew El-Kadi, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. Gencast: Diffusion-based ensemble forecasting for medium-range weather. arXiv preprint arXiv:2312.15796, 2023.
  • Rackow et al. [2022] Thomas Rackow, Sergey Danilov, Helge F Goessling, Hartmut H Hellmer, Dmitry V Sein, Tido Semmler, Dmitry Sidorenko, and Thomas Jung. Delayed antarctic sea-ice decline in high-resolution climate change simulations. Nature communications, 13(1):637, 2022.
  • Schulthess et al. [2018] Thomas C Schulthess, Peter Bauer, Nils Wedi, Oliver Fuhrer, Torsten Hoefler, and Christoph Schär. Reflecting on the goal and baseline for exascale computing: a roadmap based on weather and climate simulations. Computing in Science & Engineering, 21(1):30–41, 2018.
  • Shalf [2020] John Shalf. The future of computing beyond moore’s law. Philosophical Transactions of the Royal Society A, 378(2166):20190061, 2020.
  • Stevens et al. [2019] Bjorn Stevens, Masaki Satoh, Ludovic Auger, Joachim Biercamp, Christopher S Bretherton, Xi Chen, Peter Düben, Falko Judt, Marat Khairoutdinov, Daniel Klocke, et al. Dyamond: the dynamics of the atmospheric general circulation modeled on non-hydrostatic domains. Progress in Earth and Planetary Science, 6(1):1–17, 2019.
  • Stevens et al. [2023] Bjorn Stevens, Stefan Adami, Tariq Ali, Hartwig Anzt, Zafer Aslan, Sabine Attinger, Jaana Bäck, Johanna Baehr, Peter Bauer, Natacha Bernier, et al. Earth virtualization engines (eve). Earth System Science Data Discussions, 2023:1–14, 2023.
  • Targett et al. [2021] James Stanley Targett, Wayne Luk, Michael Lange, and Olivier Marsden. Systematically migrating an operational microphysics parameterisation to fpga technology. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 69–77. IEEE, 2021.
  • Vaughan et al. [2024] Anna Vaughan, Stratis Markou, Will Tebbutt, James Requeima, Wessel P Bruinsma, Tom R Andersson, Michael Herzog, Nicholas D Lane, J Scott Hosking, and Richard E Turner. Aardvark weather: end-to-end data-driven weather forecasting. arXiv preprint arXiv:2404.00411, 2024.
  • Wang et al. [2024a] Xiang Wang, Renzhi Wang, Ningzi Hu, Pinqiang Wang, Peng Huo, Guihua Wang, Huizan Wang, Sengzhang Wang, Junxing Zhu, Jianbo Xu, et al. Xihe: A data-driven model for global ocean eddy-resolving forecasting. arXiv preprint arXiv:2402.02995, 2024a.
  • Wang et al. [2024b] Xiao Wang, Aristeidis Tsaris, Siyan Liu, Jong-Youl Choi, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, and Prasanna Balaprakash. Orbit: Oak ridge base foundation model for earth system predictability. arXiv preprint arXiv:2404.14712, 2024b.
  • Wedi [2014] Nils P Wedi. Increasing horizontal resolution in numerical weather prediction and climate simulations: illusion or panacea? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372(2018):20130289, 2014.
  • Wedi et al. [2020] Nils P Wedi, Inna Polichtchouk, Peter Dueben, Valentine G Anantharaj, Peter Bauer, Souhail Boussetta, Philip Browne, Willem Deconinck, Wayne Gaudin, Ioan Hadade, et al. A baseline for global weather and climate simulations at 1 km resolution. Journal of Advances in Modeling Earth Systems, 12(11):e2020MS002192, 2020.