From FLOPs to Watts: Energy Measurement Skills for Sustainable AI in Data Centers

Dawn_Nafus · ‎01-15-2025

Dawn Nafus is a principal engineer and research manager of the Sociotechnical Systems team at Intel Labs.

Highlights

Energy transparency is increasingly a priority for policymakers in the responsible deployment and use of artificial intelligence (AI), but most developers are not trained in how to effectively measure energy consumption.
Intel and the National Renewable Energy Laboratory's (NREL) Joint Institute for Strategic Energy Analysis recently published a guide on factors to consider when measuring energy in data centers.
Collaborators on the team share the real-world challenges they encountered in developing effective measurement skills and strategies.

How can we protect the environment effectively given that AI can consume significant amounts of energy? Calls for energy transparency are growing by the day, but the measurement methods needed to provide that transparency are not standard. There are many sources of error that may easily go undetected and energy measurement is not typically a core part of AI developer training.

To close that gap, Intel and the National Renewable Energy Laboratory’s Joint Institute for Strategic Energy Analysis teamed up to create A Beginner's Guide to Power and Energy Measurement and Estimation for Computing and Machine Learning. This in-depth guide equips AI developers and other software professionals with the skills to make intelligent measurement decisions — from deciding at-the-wall versus on-device measurements, sampling strategies, where to look for errors, and when proxy measures are sufficient. These are vital first steps in pinpointing which optimizations and model choices have the greatest impact on sustainability.

For the collaborators on our team, these were no abstract, high-minded topics. In fact, this guide began because of our own struggles in getting a reliable measurement, which I share here to show the depth and extent of the challenge. As energy transparency becomes a growing expectation, now is the right time to get specific about how to actually do it.

The Real-World Challenge of Energy Measurement

It started with a call from a colleague who would become my co-author on the guide. “It’s not going to work,” she said. That’s not the kind of call you want to get when kicking off a new project.

She was trying to compare the energy consumption of different natural language processing (NLP) models. This was a first step on a much more ambitious path to asking: What tools would help AI developers orient themselves to the world of energy and sustainability? It is one thing to know what a watt is, and that it can lead to greenhouse gas emissions, but quite another to know and work with it in the same way that engineers know and work with FLOPs or epochs.

Orientations can make or break progress on climate change. We knew that a majority of NLP professionals are deeply concerned about the climate effects of their work, but there is a difference between saying that sustainability should be a first order metric, and actually acting accordingly. Factors such as orientations, skills, and even what’s sometimes called professional vision contribute to that difference. They dictate what concepts and tools people notice and use, and what is considered irrelevant information. Without an energy orientation, a watt-hour is irrelevant. We get stuck.

Energy is outside of most people’s sensibilities, potentially even yours. Most of us could barely hazard a guess at how many watt-hours it would take to charge our phones, let alone run an AI inference. This is a major problem in a policy and standards environment that recognizes the environmental problems of AI and is increasingly calling for energy and emissions transparency.

That means seeing it and measuring it in the first place. Data centers are complex places where seeing energy is no small feat. Creating a credible and useful estimate requires some understanding of how to usefully scope the measurements, and what can interfere with measurement reliability.

But what about model optimization? It is true that machine learning (ML) developers are no strangers to efficiency, but not all ML optimizations translate into energy savings. To translate into energy savings, you must look at energy and not make assumptions. ML developers are often good at anticipating training runtime, but what it does take to get the same feel for energy? That is an interesting social and technical question that our team set out to tackle.

The conversation with my co-author continued, “Look, I’ll show you what I did.”

She wanted to find out how much energy each epoch consumed and which required frequently sampling the energy consumed during training. The more she learned, the more she came to distrust her own measurements — an uneasy feeling anyone in research will know well.

The issue that caught my eye was the sampling resource overhead problem she found: The more times you ping the processor for energy readings, the more energy you consume just by doing the measuring. This is minor noise for coarse measurements, but for granular ones, it muddies the signal. I knew about the observer paradox from physics, but I hardly expected to find it here.

Other issues started to flow in from other team members, from what do I do when using hyperthreading, to how do I account for processes running in the background. While there are off-the-shelf tools that help, the problems were coming fast and hard because ultimately, when you are trained to build AI-enabled interactions for ALS patients like Stephen Hawking, how energy flows through a data center is not a front and center concern.

Building the Guide

Luckily, we had a deep bench of experts to call upon for help, including fellow Intel researchers who had been designing orchestration technology to flexibly meet energy key performance indicators (KPIs). We also called colleagues at NREL’s Joint Institute for Strategic Energy Analysis whose Green Computing Catalyzer spearheads research in energy efficient computing. NREL researchers regularly work with an extensively instrumented high performance computing center (an impressive 1 million metrics per minute for those counting).

We knew that the questions we put to them were unlikely to be ours alone, and so they joined us in documenting what we wished we knew at the beginning. Even with all the tooling now available (which we catalogued in the paper), “Is this energy number good enough for what I want to do with it?” is almost always the right question.

Figure 1 NREL Intel AI measurement flow chart.jpg

If you find yourself asking it, think of A Beginner's Guide to Power and Energy Measurement and Estimation for Computing and Machine Learning as a kind of a marauder’s map. At the top of the map lies simplicity itself, with clear and obvious statements that can always be made, like a gigantic large language model (LLM) consumes more energy in training than a tiny one. The link between software design and electrons is as plain as day, even if you know absolutely nothing about the specific setup, or anything at all about hardware. The problem is that these statements are not wrong, but almost never useful. So, you move along the path, digging deeper.

At the bottom of the map lies a seductive impossibility: tracing each and every electron through each and every subsystem and instruction set to arrive at a conclusive, and fine-grained sum of energy used. It beckons, but observer effects and a host of other things mean that ultimately, that’s not where clarity lies.

The beginner’s guide is there to help you steer a path that embraces the reliability of simplicity while not oversimplifying. It shows just enough of what lurks below your feet to know when it is time to dip down further, and in which directions thar be dragons.

Looking Forward

In the long term, techniques of observability do need to improve, and we need more consensus on the kinds of energy transparency developers can expect from one another. Responsibility for energy use is not a buck to be passed between developers, and between developers and data centers, in the value chain.

In the short term, good enough measurements are not just feasible, but useful and important once you know how to look. It can even be as simple as going to one of the many offline estimators and putting in information from your last fine-tuning run. You might be surprised at what you see.

Pallavi12 · ‎01-15-2025

This guide will help AI developers understand the physical link between their software and electricity source to impact energy consumption and emissions in Datacenters.