Hi,
I've observed that the latest version of the Pro firmware causes an idle power usage regression with the Arc B50 card in Linux. The setup is a Proxmox host with SR-IOV, and a Linux VM running light enc/dec and Vulkan workloads.
- \w 32.0.101.6979 -- 3-8W reported power -- constant high fan noise, despite low reported RPM (I believe this is a different, known bug)
- \w 32.0.101.8306 -- 22+W reported power -- saner fan behavior, but higher temps
I have verified several times that the card firmware is the only difference by updating it using Windows.
The power usage is not a reporting problem: whole system power usage monitoring confirms the difference.
Details:
- Proxmox host -- i7-12700K -- Asus W680 -- 6.17.4-2-pve kernel
- Latest BIOS / ME
- SR-IOV
- ASPM enabled
- Setting pcie_aspm to powersave doesn't make a difference
- Linux guest -- 6.17.13+deb13 kernel (VF passed through)
連結已複製
Hello mag1024,
Thank you for posting on Intel Community Forum.
I understand that you’re seeing a power usage regression, and the details you’ve shared so far are very helpful. I’ll review everything and get back to you as soon as I have an update.
Best regards
Jed G.
Intel Customer Support Technician
Hello mag1024,
I'm reaching out to get some information that will help with our investigation. Please see below.
1. Does the power regression occur even when the Linux VM is shut down or only when the VM is running?
2. Have you tried the latest stable kernel 6.18 and firmware?
3. Is the issue also happening on support Linux OS Ubuntu* 22.04?
I look forward to your response.
Best regards
Jed G.
Intel Customer Support Technician
Hello mag1024,
I wanted to check if you had the chance to review the questions I posted. Please let me know at your earliest convenience so that we can determine the best course of action to resolve this matter.
Best regards,
Jed G.
Intel Customer Support Technician
Thanks for your response!
The power usage only goes up when the VM that is running an intermittent/light inference workload is running. I was not able to test newer kernels, but I did try the linux-firmware from git, in combination with my existing 6.17.4 kernel.
Before: xe/bmg_guc_70.bin version 70.49.4, xe/bmg_huc.bin version 8.2.10
After: xe/bmg_guc_70.bin version 70.55.3, xe/bmg_huc.bin version 8.2.10
This helped a lot.
The power is no longer pegged at 22+W, and fluctuates more similarly to how it did with the .6979 firmware.
It still seems higher on average, but I don't have a conclusive data for this.
I guess I'll wait for the newer kernels to come to Proxmox.
Hello mag1024,
Thank you for keeping me posted. I’m glad to hear that trying a different firmware resolved the issue.
I’d also like to share that Intel Arc graphics currently support Ubuntu only and it may contribute to the issue you are experiencing. Other Linux distributions are not supported at this time.
You may also find this article helpful: Where Can I Find Linux Drivers for Intel® Arc™ Graphics?
I look forward to your response.
Best regards
Jed G.
Intel Customer Support Technician
Hello mag1024,
I have not heard back from you so I will close this inquiry now. If you need further assistance, please submit a new question as this thread will no longer be monitored.
Best regards,
Jed G.
Intel Customer Support Technician
Hello guys,
I'm actually experiencing exactly the same situation with my Intel Arc Pro B50 as reported by user @mag1024 . There seems to be inconsistent power idle behavior (most probably caused by firmware).
Software:
- Ubuntu linux 26.04
- Kernel: 7.0.0-15-generic (newest one, but also tested other recent kernels)
- Driver: xe
- linux-firmware: latest from git
- bmg_guc_70.bin (70.60.0)
- bmg_huc.bin (8.2.10)
- bmg_dmc.bin (v2.6)
Case 1 – No monitoring tools (baseline - card is completely idle)
- GPU idle power usage: ~20–22W stable, no spikes up or down
- Fan: normal / quiet ~1100rpm
- GPU temperrature ~ 45-49°C
Case 2 – With nvtop (or similar GPU polling)
- GPU power: drops to ~5-6W - stable, no spikes (confirmed by my power wattmeter)
- GPU temperature drops to 34°C
- BUT:
- GPU fan ramps up to very high speed (audibly ~100%) - it literally sounds like starting jet engine
- but nvtop reports 0 RPM
- This state persists as long as monitoring is active
- similar situation with sensors (lmsensors)
3. Fan behavior indicates failsafe mode
- When low power state is reached (via nvtop/polling):
- fan ramps up aggressively
- RPM reporting becomes inconsistent (nvtop shows 0 RPM)
- This looks like a failsafe cooling mode triggered by missing/invalid thermal control
dmesg also shows this:
xe 0000:05:00.0: [drm] *ERROR* PCODE Mailbox failed: -6 Illegal Command
xe 0000:05:00.0: [drm] Thermal mailbox not supported by card firmware
Here are also the values from my monitoring script on idle system (polling interval 2s):
root@microserver:~# ./intel_top.sh
Xe GPU monitor (W / temps / fan / runtime)
-------------------------------------------
Runtime: suspended | State: D3hot | GPU: 20.81 W | PKG: 11.68 W | Temp: 50.0°C / 50.0°C | Fan: 2104 RPM
Runtime: suspended | State: D3hot | GPU: 20.59 W | PKG: 11.64 W | Temp: 50.0°C / 50.0°C | Fan: 1330 RPM
Runtime: suspended | State: D3hot | GPU: 20.60 W | PKG: 11.65 W | Temp: 50.0°C / 50.0°C | Fan: 1187 RPM
Runtime: suspended | State: D3hot | GPU: 20.61 W | PKG: 11.68 W | Temp: 50.0°C / 50.0°C | Fan: 1200 RPM
Runtime: suspended | State: D3hot | GPU: 20.61 W | PKG: 11.67 W | Temp: 50.0°C / 50.0°C | Fan: 1187 RPM
Runtime: suspended | State: D3hot | GPU: 20.54 W | PKG: 11.62 W | Temp: 50.0°C / 50.0°C | Fan: 1195 RPM
Runtime: suspended | State: D3hot | GPU: 20.57 W | PKG: 11.61 W | Temp: 50.0°C / 50.0°C | Fan: 1195 RPM
Runtime: suspended | State: D3hot | GPU: 20.58 W | PKG: 11.62 W | Temp: 50.0°C / 50.0°C | Fan: 1194 RPM
Runtime: suspended | State: D3hot | GPU: 20.60 W | PKG: 11.62 W | Temp: 50.0°C / 50.0°C | Fan: 1194 RPMwhen using "nvtop" on second terminal, then values are rather different, but dont't be fooled by those low fan rpm readings, the actual fan rpm was at 100% and fan sounded like perfectly started jet engine!
Runtime: active | State: D0 | GPU: 1.25 W | PKG: 0.71 W | Temp: 51.0°C / 54.0°C | Fan: 62 RPM
Runtime: active | State: D0 | GPU: 5.79 W | PKG: 0.79 W | Temp: 51.0°C / 50.0°C | Fan: 0 RPM
Runtime: active | State: D0 | GPU: 8.59 W | PKG: 0.80 W | Temp: 51.0°C / 50.0°C | Fan: 1000 RPM
Runtime: active | State: D0 | GPU: 5.75 W | PKG: 0.77 W | Temp: 51.0°C / 48.0°C | Fan: 589 RPM
Runtime: active | State: D0 | GPU: 5.75 W | PKG: 1.52 W | Temp: 51.0°C / 48.0°C | Fan: 811 RPM
Runtime: active | State: D0 | GPU: 12.39 W | PKG: 5.85 W | Temp: 51.0°C / 48.0°C | Fan: 307 RPM
Runtime: active | State: D0 | GPU: 5.70 W | PKG: 1.40 W | Temp: 51.0°C / 48.0°C | Fan: 0 RPM
Runtime: active | State: D0 | GPU: 5.75 W | PKG: 1.39 W | Temp: 51.0°C / 48.0°C | Fan: 207 RPM
Runtime: active | State: D0 | GPU: 5.78 W | PKG: 1.60 W | Temp: 51.0°C / 48.0°C | Fan: 354 RPM
Runtime: active | State: D0 | GPU: 11.06 W | PKG: 4.69 W | Temp: 51.0°C / 48.0°C | Fan: 156 RPM
Runtime: active | State: D0 | GPU: 5.82 W | PKG: 1.36 W | Temp: 51.0°C / 48.0°C | Fan: 138 RPM
Runtime: active | State: D0 | GPU: 5.83 W | PKG: 1.35 W | Temp: 51.0°C / 46.0°C | Fan: 124 RPM
Runtime: active | State: D0 | GPU: 3.09 W | PKG: 1.33 W | Temp: 51.0°C / 46.0°C | Fan: 113 RPM
Runtime: active | State: D0 | GPU: 6.32 W | PKG: 1.34 W | Temp: 51.0°C / 46.0°C | Fan: 104 RPM
Runtime: active | State: D0 | GPU: 6.41 W | PKG: 1.31 W | Temp: 51.0°C / 46.0°C | Fan: 192 RPM
Runtime: active | State: D0 | GPU: 9.57 W | PKG: 3.34 W | Temp: 51.0°C / 46.0°C | Fan: 90 RPM
Runtime: active | State: D0 | GPU: 6.43 W | PKG: 1.72 W | Temp: 51.0°C / 44.0°C | Fan: 334 RPM
Runtime: active | State: D0 | GPU: 9.14 W | PKG: 2.50 W | Temp: 51.0°C / 44.0°C | Fan: 79 RPM
Runtime: active | State: D0 | GPU: 10.00 W | PKG: 1.29 W | Temp: 51.0°C / 44.0°C | Fan: 75 RPM
Runtime: active | State: D0 | GPU: 3.57 W | PKG: 1.74 W | Temp: 51.0°C / 44.0°C | Fan: 281 RPM
Runtime: active | State: D0 | GPU: 8.60 W | PKG: 1.72 W | Temp: 51.0°C / 44.0°C | Fan: 134 RPM
Runtime: active | State: D0 | GPU: 7.28 W | PKG: 1.79 W | Temp: 51.0°C / 44.0°C | Fan: 256 RPM
Runtime: active | State: D0 | GPU: 8.35 W | PKG: 1.25 W | Temp: 51.0°C / 44.0°C | Fan: 123 RPM
Runtime: active | State: D0 | GPU: 12.24 W | PKG: 1.48 W | Temp: 51.0°C / 44.0°C | Fan: 0 RPM
Runtime: active | State: D0 | GPU: 12.55 W | PKG: 1.90 W | Temp: 51.0°C / 44.0°C | Fan: 2143 RPM
Runtime: active | State: D0 | GPU: 8.85 W | PKG: 1.27 W | Temp: 51.0°C / 44.0°C | Fan: 1213 RPM
Runtime: active | State: D0 | GPU: 11.31 W | PKG: 5.74 W | Temp: 51.0°C / 44.0°C | Fan: 429 RPMHere's the code of my "intel_top.sh" script:
#!/bin/bash
GPU_PATH=$(ls -d /sys/bus/pci/devices/0000:05:00.0/hwmon/hwmon*)
RT_PATH=$(ls -d /sys/bus/pci/devices/0000:05:00.0)
E1="$GPU_PATH/energy1_input"
E2="$GPU_PATH/energy2_input"
T2="$GPU_PATH/temp2_input"
T3="$GPU_PATH/temp3_input"
F1="$GPU_PATH/fan1_input"
RT="$RT_PATH/power/runtime_status"
ST="$RT_PATH/power_state"
prev_time=$(date +%s.%N)
prev1=$(cat "$E1")
prev2=$(cat "$E2")
echo "Xe GPU monitor (W / temps / fan / runtime)"
echo "-------------------------------------------"
while true; do
# sleep 2
# runtime=$(cat "$RT" 2>/dev/null)
# state=$(cat "$ST" 2>/dev/null)
cur_time=$(date +%s.%N)
dt=$(awk "BEGIN {print $cur_time - $prev_time}")
cur1=$(cat "$E1")
cur2=$(cat "$E2")
d1=$((cur1 - prev1))
d2=$((cur2 - prev2))
# µJ -> W
w1=$(awk "BEGIN {printf \"%.2f\", $d1 / 1000000 / $dt}")
w2=$(awk "BEGIN {printf \"%.2f\", $d2 / 1000000 / $dt}")
temp2=$(awk "BEGIN {printf \"%.1f\", $(cat $T2)/1000}")
temp3=$(awk "BEGIN {printf \"%.1f\", $(cat $T3)/1000}")
fan=$(cat "$F1" 2>/dev/null)
sleep 2
runtime=$(cat "$RT" 2>/dev/null)
state=$(cat "$ST" 2>/dev/null)
printf "Runtime: %-9s | State: %-5s | GPU: %6s W | PKG: %6s W | Temp: %s°C / %s°C | Fan: %4s RPM\n" \
"$runtime" "$state" "$w1" "$w2" "$temp2" "$temp3" "$fan"
prev1=$cur1
prev2=$cur2
prev_time=$cur_time
doneSo the expected behavior is GPU should be able to enter low idle power (~5 W) and maintain correct fan curve / thermal control
Monitoring tools should NOT trigger fan runaway, but the actual behaviour: there are two mutually exclusive states:
~20 W idle, correct fan behavior, higher temps
~5-8 W idle, broken fan control (max fan = jet engine), lower temps.
Do you guys experience the same issue?
many thanks,
Antonin