Extreme Data Challenges Towards Exascale Weather Forecasting Systems

11 Mar 2021
16:30 - 16:50

Extreme Data Challenges Towards Exascale Weather Forecasting Systems

Numerical Weather Prediction (NWP) is a highly data intensive HPC application. ECMWF operational weather forecasts generate massive amounts of I/O in short bursts, accumulating to tens of TiB in hourly forecast cycle windows. From this output, millions of user-defined daily products are generated and disseminated to member states and commercial customers all over the world. These products are processed from the raw output of the IFS model, within the time critical path and under strict delivery schedule.

With upcoming programs such as EuroHPC and Destination Earth, in addition to ECMWF’s own 2025 NWP strategy, an upcoming rise in resolution and growing popularity will increase both the size and number of these weather forecast products.

The adoption of software-defined, semantics data storage including object stores for the time-critical operations has opened the door for more comprehensive improvements to the NWP post-processing chain and enabled new access paths to very high-resolution time critical datasets. These improvements will bring product generation and data analytics closer to the NWP model and the model output data, to build true data-centric processing and analytics workflows, including data-intensive novel Machine Learning models.

These are part of ECMWF plans to achieve Exascale NWP by 2025 and to empower our users and member states with novel and increased usage of our weather forecast data. As Exascale NWP datasets are expected to feature between 250 TiB to 1 PiB per forecast cycle, the data-centric approach is critical to enable their efficient usage, by minimising data transport and bringing post-processing and insight discovery closer to the data source.

We present the Exascale IO challenges ECMWF is facing, and the latest developments in model I/O, product generation and data access and storage. We show how ECMWF is reworking our operational workflows to adapt to forthcoming new architectures and memory-storage hierarchies, as we build bridges from HPC data producer to Cloud based data analytics workflows.