Amplify Renewables - Training Weather Models On Intel Gaudi AI accelerator

Eugenie_Wirz · ‎03-26-2025

Authors: Rachit Singh, Rahul Unnikrishnan Nair

Predicting the weather is a complex challenge, and for wind and solar power, precision is critical. Amplify Renewables, a participant in the Intel® Liftoff for AI Startups Catalyst Track, has enhanced its energy forecasting capabilities to ensure more accurate and efficient renewable power dispatch to the grid.

So, what advancements did they make to improve their forecasting?

By teaming up with the Intel® Tiber™ AI Cloud, Intel’s public cloud offering for AI startups and enterprises and Intel® Liftoff team to train large-scale weather models on Intel® Gaudi® 2 HPUs making grid forecasting faster and more precise.

Training Big Models with Big Data

Machine learning is changing the way we predict the weather. Amplify Renewables wanted to take things a step further by training their own global weather model, but that meant processing massive amounts of data - terabytes of weather patterns from both public and private sources.

Between December 2023 and January 2024, they worked with the Intel® Tiber™ AI Cloud and Intel® Liftoff team to scale up their model training. The workload involved high-volume data processing, utilizing 90GB+ VRAM per Intel® Gaudi® 2 HPU and distributed data-parallel training across eight Gaudi 2 cards on a bare-metal system. This system was equipped with 1TB of RAM and multiple high-speed NVMe drives, efficiently managing large datasets through a filesystem-based storage service. That’s some serious computing power!

How They Made It Work

To build their models, the team utilized the Habana PyTorch framework and Huggingface’s ecosystem of libraries on the Intel® Tiber™ AI Cloud, experimenting with three execution modes: Lazy, Eager, and Eager with `torch.compile`. Each mode offered distinct advantages:

Lazy Mode: Operations are accumulated into a graph and executed in a deferred manner, allowing the graph compiler to optimize device execution through operator fusion, data layout management, and other enhancements. Read more.
Eager Mode: Executes operations immediately, one at a time, facilitating straightforward debugging and rapid model iteration. Read more.
Eager Mode with `torch.compile`: Introduced in PyTorch 2.0, this mode allows parts of the model to be wrapped into a graph for improved performance, combining the immediate execution benefits of Eager Mode with the optimization capabilities of graph execution. Read more.

In upcoming Intel® Gaudi® software releases, Eager Mode extended with `torch.compile` is expected to replace Lazy Mode, offering comparable performance without the need for graph rebuilding in each iteration. Intel® Gaudi® accelerators integrate seamlessly with popular AI frameworks, including PyTorch and TensorFlow, through the Habana® SynapseAI® SDK. Startups leveraging Hugging Face models can take advantage of Optimum Habana to optimize model performance on Gaudi hardware with minimal code changes.

What the Data Showed

The results spoke for themselves. Amplify Renewables trained multiple models and found key advantages:

Easy setup: PyTorch models required minimal changes to run on Habana HPUs.
Flexible execution: All three modes worked for inference, with lazy and eager modes also running smoothly for training.
Scalability: Linear performance for inference and close-to-linear scaling for distributed training.
High memory efficiency: The large VRAM on Intel® Gaudi® 2 HPUs made a huge difference in handling complex computations.

See It in Action

Advancements in AI-Based Weather Prediction

The graph below presents a timeline of state-of-the-art AI models for weather prediction, showing their performance in Z300 RMSE skill compared to ECMWF HRES (a high-resolution weather prediction benchmark).

The orange line represents the ECMWF HRES baseline, which is as mentioned above a high-resolution numerical weather prediction model widely used for global forecasting. The Y-axis (Z300 RMSE skill vs. ECMWF HRES) measures the relative performance of AI-based models compared to ECMWF HRES, with higher values indicating improved forecasting accuracy

By benchmarking against ECMWF HRES, this graph highlights the progression of AI models in surpassing traditional weather prediction techniques.

Key Takeaways from the Graph:

Progression Over Time:

AI-based weather prediction models have steadily improved since 2019, with a significant leap around 2022.
Earlier models (such as Dueben & Bauer, WeatherBench, and Weyn et al.) exhibited moderate performance, whereas more recent models (FourCastNet, Keisler, and Pangu) demonstrate substantial improvements in forecast accuracy.

Recent Breakthroughs (2023-2024):

Pangu (2023) set a new baseline for AI weather prediction accuracy, surpassing previous models.
Since then, GraphCast, FengWu, SFNO, and FuXi have pushed accuracy further, refining AI-based forecasting capabilities.
In 2024, newer models like NeuralGCM, HEAL-ViT, WindBorne, GenCast Aurora, and ArchesWeather continue to refine accuracy and prediction reliability.

Understanding the Training and Inference Timing Graphs

The graphs below illustrate training and inference performance across different execution modes (Lazy, Eager, and Eager with torch.compile) on Intel® Gaudi® 2 HPUs. These experiments were conducted using both single-HPU and multi-HPU (8 HPUs) configurations.

Key Observations

Inference Performance:

The first set of graphs compares training times on 1 HPU vs. 8 HPUs. Inference with 1 HPU reveals a noticeable compilation delay, but Lazy mode consistently outperforms Eager mode, even considering the initial variance caused by compilation overhead. While this extra compilation delay might seem like a downside, the final faster speed often makes it worth the tradeoff. When scaling to 8 HPUs, Lazy mode demonstrates a more pronounced performance advantage over Eager mode, with scaling being more efficient and linear, though there is a slight sublinearity. Training with 1 HPU shows significant initial variance, likely due to graph compilation and initialization costs, though the difference between Lazy and Eager modes remains minimal. In contrast, training with 8 HPUs exhibits roughly linear scaling, with Lazy mode continuing to show superior performance.

Why is Lazy Mode Faster in Training?

Lazy execution compiles operations into a computational graph and optimizes execution before running the batch, leading to lower execution times. In contrast, Eager mode executes operations immediately, which introduces additional overhead for each operation.

Why Does Scaling Improve Performance?

Using multiple HPUs (8 HPUs) allows parallelization across devices, reducing per-batch execution time compared to a single HPU. This effect is especially noticeable in inference workloads where latency is critical.

Why This Matters

With this breakthrough, Amplify Renewables can now put their models to the test against existing public and private forecasts. By refining their predictions for solar and wind power output, they’re helping make grid forecasting more reliable and critical for the future of renewable energy.

Thoughts from the Team

We were very happy with our experience porting our Pytorch models over to use on Habana HPUs, especially the new eager mode. The large VRAM of the models made it easy for us to get started, and we didn't run into any major issues getting our models running on the Habana PyTorch framework. — Rachit Singh, CTO, Amplify Renewables.

What’s Next?

Now that their training framework is up and running, Amplify Renewables is looking to expand. Next steps include testing a wider range of models and experimenting with new pre-training strategies. More models, better predictions, and a stronger renewable energy grid… sounds like a bright future ahead!

About Intel® Liftoff

Intel® Liftoff is a virtual program that connects early-stage AI and ML startups with the tools and support they need to push their innovations forward.

Through Intel® Liftoff, startups gain access to cutting-edge hardware on Intel’s public cloud offering, Intel® Tiber™ AI Cloud , expert mentorship, and a thriving developer community. Whether you're optimizing deep learning workloads or training massive models, the program helps you do it faster and more efficiently.

Ready to scale smarter? Join the program here.

Related resources

For those looking to replicate these results or dive deeper into the Intel® Gaudi® 2 ecosystem and Intel’s cloud offering, here are some helpful links:

Intel® Tiber™ AI Cloud - Cloud platform for AI development and deployment. Find pricing information for GPUs/CPUs here.

Apply to Intel Liftoff and gain access to a startup package including credits on Intel® Tiber™ AI Cloud for select AI startups.

Intel® Gaudi® 2 AI accelerator - High-performance AI training processor designed for deep learning workloads

Intel® Gaudi® 3 AI accelerato r - Built on the high-efficiency Intel® Gaudi® platform with proven MLPerf benchmark performance, Intel Gaudi 3 AI accelerators are built to handle demanding training and inference.

Optimum Habana (Hugging Face) - An optimized library for running Hugging Face models on Gaudi accelerators with minimal modifications.