Playing to the Strengths of Intel® Core™ Ultra Processors: How GOALS Delivers Sustained, Competitive Esports Performance on Handheld PCs — Part 2: Handheld Profiles, XeSS Integration, and CPU Variance Reduction
By Torbjörn Söderman, Technical Director, GOALS
Console-Style Defaults for Known Intel Arc Handhelds
The power management system described in Part 1 addresses the high-end desktop problem. For Intel® Core™ Ultra Processors for Handheld PC Gaming, there is an additional layer: certification. Shipping GOALS on Xbox Game Pass through WinGDK requires that default settings "just work" on supported devices without player intervention. On a traditional PC, you can ask the player to run a benchmark. On a handheld, the device needs to open, start playing, and feel good immediately.
The Intel Handheld Constraint Model
To understand why unconstrained defaults fail on handhelds, we must look at the three physical constraints that directly shape every handheld engineering decision:
- A unified power envelope. The CPU and GPU share a total package power limit (PL1/PL2). A transient CPU spike from physics, animation evaluation, or pathfinding will cause the driver to redistribute power away from the GPU almost immediately. On a discrete GPU machine, a CPU spike is annoying. On an Intel Arc iGPU handheld, it can cause a visible GPU frequency drop mid-frame.
- A shared memory bus. System RAM services both the CPU cores and the Xe GPU compute units. On the MSI Claw and similar devices, LPDDR5X bandwidth is the primary shared resource. Overdraw, high material complexity, and excessive draw call pressure all compete for the same bus. Bandwidth pressure that is invisible on a discrete setup becomes a genuine bottleneck on integrated hardware.
- Transient boost frequencies. Intel Core Ultra processors boost aggressively in short bursts via Intel® Turbo Boost, but sustained workloads must be designed for the equilibrium clock rates the device settles into after thermal saturation. For a competitive title where players routinely run multiple matches back-to-back in a single session, sustained behaviour is the only behaviour that matters.
The engineering principle that flows from these constraints: design for steady-state power delivery, not peak benchmark performance.
Bespoke Handheld Profiles
Our answer to these constraints was to treat known Intel Arc-powered handhelds the way you treat a console: with a bespoke, handcrafted configuration for each device. These profiles define resolution scale and upscaling configuration, anti-aliasing method, Intel XeSS quality preset, simulation budget defaults, and animation system constraints. They are authored by an engineer who has played the game on the device, profiled it under sustained load, and tuned the settings for its specific thermal and power envelope. These device profiles are shared between WinGDK and Steam builds, authored once and applied consistently across both distribution paths.
Intel XeSS Integration: Upscaling and Frame Generation as a Handheld Strategy
GOALS integrates Intel XeSS 3 (Xe Super Sampling, version 3.0.1.2) via the Unreal Engine plugin, supporting both XeSS Upscaling and Intel XeFG (Xe Frame Generation). On handheld devices, these are not simply visual quality options — they are core tools for managing the relationship between render workload, thermal budget, and perceived frame rate.
One technical detail matters here: XeSS has two operating modes depending on the silicon. On hardware with XMX matrix acceleration units — present in Arc discrete GPUs and in a reduced form in some Core Ultra integrated configurations — XeSS uses its dedicated neural inference path. On hardware without XMX, it falls back to a DP4a-based path running on standard shader cores. Current MSI Claw hardware with Core Ultra 7 and Core Ultra 9 silicon uses the DP4a path. The device profiles and CVar values described below reflect that reality.
The baseline for all MSI Claw profiles is XeSS Balanced upscaling (goals.PresetRenderingAAModeIndex=0), set in the BaseClaw profile that every Claw device inherits from. This gives a reasonable internal resolution on the Claw's 1080p display while keeping GPU workload within the sustained thermal envelope. The base Claw profile also sets the upscaler explicitly to XeSS:
ini
[BaseClaw DeviceProfile]
; MSI Claw uses Intel chips — set upscaler to XeSS Balanced
+CVars=goals.PresetRenderingAAMethod=5
+CVars=goals.PresetRenderingAAModeIndex=0
Per-Silicon XeSS Profiles
Rather than applying a single XeSS setting across all Claw hardware, the device profile system maps upscaling quality directly to the silicon's compute capability.
Core Ultra 7 devices (original Claw A1M) inherit the Balanced baseline. The Xe LPG GPU on this SKU operates at 15–20W TDP in handheld mode, and Balanced gives the right trade between image quality and sustained thermal headroom. The profile also sets a cheaper temporal history format to reduce bandwidth consumption on the shared LPDDR5X memory bus:
ini
+CVars=r.XeSS.HistoryFormat=1
This halves XeSS's temporal history memory bandwidth, a meaningful saving on hardware where system RAM is shared between the CPU cores and the Xe GPU compute units.
Core Ultra 9 devices (revised Claw with Core Ultra 9 185H) step up to XeSS Quality mode (goals.PresetRenderingAAModeIndex=1), reflecting the additional shader resources and higher sustained TDP available on this SKU. The profile also raises overall rendering quality and increases the animation budget worker thread ceiling to 55% of available thread time. On this silicon, the higher internal resolution and rendering complexity can be sustained without exceeding the thermal envelope.
The Claw A8 Exception
The MSI Claw A8 is a good example of why device profiles require genuine per-device engineering rather than blanket silicon mapping. The Claw A8 ships with an AMD Ryzen Z2 Extreme processor rather than an Intel chip — the one device in the Claw family that breaks the pattern. Its profile therefore inherits GenericRyzenZ2Extreme directly rather than the Claw Intel base profile:
ini
[ClawA8 DeviceProfile]
; Claw A8 ships with AMD Ryzen Z2 Extreme — use AMD profile, not Intel Claw base
BaseProfileName=GenericRyzenZ2Extreme
The upscaler selection follows the silicon, not the brand. A Claw with an AMD chip gets FSR; a Claw with an Intel chip gets XeSS.
Frame Generation as a Thermal Tool
Intel XeFG (Xe Frame Generation) is available in the Visual Settings menu. On handheld hardware the thermal logic behind it is worth making explicit.
By reducing the actual render frame rate, the GPU does less work per second and draws less power. XeFG then fills the gap, generating interpolated frames to hit the display's refresh rate without the full render cost of native frames. The result is a display frame rate that feels competitive while the GPU runs cooler and draws less from the shared power budget.
XeSS 3 introduces multiplied frame generation modes beyond a simple on/off toggle: 2X, 3X, and 4X frame generation are all supported, each multiplying the number of interpolated frames inserted between rendered frames. On a handheld sustaining 45 rendered FPS, 2X frame generation targets a 90 FPS display rate; 3X targets 135 FPS. The higher multipliers come with greater latency sensitivity, which is exactly why XeLL's pacing mechanism is mandatory when XeFG is active regardless of which multiplier is in use.
On Intel hardware, XeFG works alongside XeSS upscaling — the combination of a lower internal render resolution and frame interpolation means the Xe GPU can maintain a competitive perceived frame rate at a power envelope the handheld can sustain for a full play session, rather than only during a benchmark run.
One configuration detail worth noting for handheld deployments: the integration sets r.XeFG.UICompositionState=1, which ensures the game's HUD and menus are composited correctly over generated frames rather than being interpolated along with scene geometry. Without this, UI elements can appear to stutter or lag behind the scene during frame generation, a particularly visible issue on the smaller displays and fixed controller layouts typical of handheld play.
XeLL: Low-Latency Input Pipeline
Intel XeLL (Xe Low Latency) reduces input-to-display latency by inserting pipeline markers at key points in the engine's frame loop and using a sleep-based pacing mechanism to align CPU work more precisely with GPU consumption.
XeLL is integrated as a native Unreal Engine module via FXeLLLatencyMarkers and FXeLLMaxTickRateHandler. The latency marker system instruments the full frame pipeline, covering input sample, simulation start and end, render submit start and end, and present start and end, by calling xellAddMarkerData() at each boundary. This gives the XeLL runtime a precise picture of where time is being spent across the CPU, allowing it to schedule xellSleep() calls through the max tick rate handler to pace simulation work without holding the CPU idle in a hot spin.
One important integration detail: XeLL is hard-enforced while XeFG is running, not merely recommended. If a caller attempts to set r.XeLL.Enabled=0 while XeFG is active, the request is blocked and a warning is logged: "Can't close XeLL as XeFG is on." The OnPreSetXeFGEnabled() hook sets XELL_ENABLED_BY_XEFG=1 before XeFG initialises, making the coupling explicit in the state machine. This is not a limitation but the correct behaviour. Frame generation increases the gap between rendered frames and display frames, which would otherwise inflate perceived input latency. XeLL's pacing closes that gap, keeping the game feeling responsive even when the rendered frame rate is intentionally held below the display refresh rate.
For a competitive title where players are acutely sensitive to how their inputs feel, that latency reduction matters regardless of whether frame generation is active.
Validated Through Certification
This handheld-first approach was not theoretical. GOALS recently passed Xbox XPA Certification covering Xbox Series X|S, WinGDK PC, and Handhelds. Passing certification required demonstrating that our XeSS configurations delivered visual stability and performance consistency across the full spectrum of Windows-based handheld devices, and the device profile hierarchy described above was central to meeting that bar.
Layered Fallback for Unknown and Future Intel Devices
We cannot handcraft profiles for every future handheld. The Intel Core Ultra handheld market is expanding rapidly, with new devices using Intel Core Ultra Series processors already available or coming soon to market. We therefore implement a deliberate fallback hierarchy.
Known Intel handheld with a bespoke profile receives its handcrafted configuration, plus the iGPU-specific performance safeguards described below.
Detected Intel handhelds without a bespoke profile receive conservative iGPU-oriented defaults and the full suite of runtime performance safeguards. Detection uses the Microsoft API for handheld identification, supplemented by our own additional checks for devices that are not yet in the MS registry, such as the Lenovo Legion Go.
Standard PC games rely on a short, synthetic hardware benchmark run at first launch to determine graphics quality. On shared-TDP handhelds, these short workloads execute during the CPU's transient "boost" phase (the PL2 power limit), severely overestimating the performance the device can actually sustain. This leads to excessive heat, rapid battery drain, and eventual aggressive thermal throttling once the hardware saturates. By defaulting undetected handhelds to a pre-optimized, conservative iGPU baseline (XeSS Balanced with optimized, bandwidth-saving graphics settings) at boot, we ensure they run strictly within their steady-state thermal and power envelope (PL1) from the very first frame. This protects the hardware from thermal oscillation and guarantees a smooth, stutter-free competitive experience without relying on unstable benchmark results.
Undetected device falls back to standard PC Experience build behaviour: Unreal's hardware benchmark determines an appropriate quality tier and sets defaults accordingly. Importantly, the benchmark now populates both AC and battery quality tiers simultaneously, so even an undetected device gets appropriate power-state-aware defaults rather than a single fixed configuration.
This hierarchy means no future Intel Arc handheld receives a completely uncurated experience. They get a handcrafted profile, a conservative iGPU-aware fallback, or standard PC benchmark behaviour, in that priority order.
CPU Spike Mitigation on Intel Arc iGPU Handhelds
For detected Intel handhelds, we enable techniques originally developed for the Performance build, specifically those focused on controlling CPU cost variance.
The key insight: on Intel Arc iGPU hardware, reducing CPU cost variance often improves sustained GPU frequency more than reducing average GPU load. The driver responds to spikes, not averages. Smooth, predictable CPU behaviour directly translates to stable GPU clocks — and on a device with Intel's shared power budget, a CPU spike does not just cost CPU cycles, it steals headroom from the Xe GPU within the same PL1 envelope.
A Custom Animation Budget System
Rather than using Epic's AnimationBudgetAllocator plugin directly, we built our own animation budget subsystem, UGoalsAnimationBudgetSubsystem, with football-specific priority logic and tighter integration with our frame time telemetry.
The core mechanism will be familiar to anyone who has used Epic's system: each registered UGoalsBudgetedSkeletalMeshComponent tracks its own game thread evaluation time, game thread wait time, and worker thread time at cycle-counter precision. The subsystem sorts components by priority each frame and updates them in descending priority order until the frame's thread budgets are exhausted. Lower-priority components are skipped, with their animation interpolating from the last evaluated pose.
What makes it GOALS-specific is the priority function. Rather than prioritising by screen-space size or camera distance as a generic engine system would, CalculatePriority() uses distance to the ball as its primary signal:
cpp
float UGoalsAnimationBudgetSubsystem::CalculatePriority(const ComponentFrameData& FrameData) const
{
if (FrameData.Component->GetOwner() == ControlledActor)
return 1.f; // Local player: always full fidelity
if (!FrameData.Component->IsVisible())
return 0.f;
const float DistanceToBall = FVector2D::Distance(
FVector2D(FrameData.Component->GetComponentLocation()), BallPosition);
const float DistanceFactor = FMath::Max(1.f - DistanceToBall / BallEffectRange, 0);
const float TimeFactor = FMath::Min(
static_cast<float>(FrameData.FramesSinceUpdate) / CVarAnimationBudgetMaxUpdateInterval, 1.f);
return FMath::Lerp(DistanceFactor, 1.f, TimeFactor);
}
The local player's controlled actor is pinned at priority 1.0, always evaluated at full fidelity. Every other character's priority is determined by proximity to the ball, not proximity to the camera. In football, what matters most perceptually is what is happening near the ball, not what is near the edge of the screen.
The TimeFactor term is an anti-starvation mechanism: a character who hasn't been updated for several frames gradually climbs back toward priority 1.0 regardless of ball proximity. This prevents characters from being permanently frozen at the edge of the pitch simply because play has moved away from them.
The budget itself is dynamically derived from actual frame telemetry. UpdateFrameTimesMs() receives real game thread, render thread, RHI thread, and GPU frame times each frame. When the game thread is the bottleneck, the animation budget tightens. When the game thread has headroom, the budget relaxes gradually. Animation quality automatically scales with how much CPU the rest of the frame is consuming, rather than operating against a fixed cap.
On hardware with two or fewer logical cores, the subsystem detects this at initialisation and disables parallel animation evaluation entirely by setting a.ParallelAnimEvaluation, a.ParallelAnimUpdate, and a.ParallelAnimInterpolation to zero. Spawning worker threads that immediately contend for the same two cores would cost more than the parallelism saves.
Pre-Baked Animation for Background Characters
For crowd and background characters, those rarely near the ball and never the controlled actor, we have a second system: UGoalsSimpleSkeletalMeshComponent.
Rather than evaluating skeletal animation at runtime, this component records the full animation to a compressed transform array at initialisation time, sampling at a configurable FPS and storing per-frame bone transforms as FCompressedTransform. Once recorded, playback is a direct array lookup with no animation graph evaluation, no blend tree, and no skeletal solve:
cpp
const int32 FrameIndex = FMath::Min(
FMath::RoundToInt(AnimationPosition * SampledFPS), AnimationFrameCount - 1);
TArray<FTransform>& CurrentTransforms = GetEditableComponentSpaceTransforms();
CurrentTransforms.Reset();
for (int32 BoneIndex = FrameIndex * BoneCount; BoneIndex < (FrameIndex + 1) * BoneCount; ++BoneIndex)
{
CurrentTransforms.Add(AnimationTransforms[BoneIndex].Decompress());
}
The result is that background characters cost essentially nothing per frame on the CPU. On an Intel Arc iGPU handheld where every saved millisecond of game thread work translates directly into more stable GPU clocks, this is a meaningful contribution to sustained performance.
Battery vs. AC: Runtime Power State Awareness
Gaming laptops face a meaningful performance difference between AC and battery: the device firmware adjusts the processor's available TDP based on power source, and a configuration tuned for AC performance may run uncomfortably hot or drain the battery too quickly without it.
One important design decision shapes how this works in GOALS: IsGoalsRunningOnBattery() currently returns false on handheld devices, treating them as always-AC. The code comment is explicit: "For now treat Handhelds on PC as AC only." The reasoning is that handhelds have bespoke device profiles that already encode the right settings for their specific power envelope, and their AC and battery performance characteristics are close enough that a separate battery profile adds complexity without meaningful benefit. The battery adaptation system is therefore a laptop feature for now, not a handheld one.
For laptops, GOALS now handles AC and battery states dynamically and automatically. The system is built around a BatteryOverrides struct in UGoalsGameUserSettings that stores a separate value for every graphics setting — upscaling method, upscaling quality mode, render quality, frame generation mode, and more — alongside the standard AC value. When a player changes a setting, it is written to either the AC slot or the battery slot depending on which power state they are currently in. Both are persisted to the user config independently.
A HandlePowerTicker polls the power state every 5 seconds via Unreal's FTSTicker. When the power state changes, EnableAntiAliasingAndFrameGenDeferred() applies the appropriate slot immediately without requiring a restart or player action:
cpp
const bool bIsOnBattery = IsGoalsRunningOnBattery();
int32 ActiveAAMethod = bIsOnBattery ? BatteryOverrides.AntiAliasingMethod : AntiAliasingMethod;
// ... resolve XeSS mode index, frame generation mode, render quality from the correct slot
For XeSS specifically, a laptop player can run XeSS Quality on AC and drop to XeSS Balanced on battery, with the switch happening automatically the moment they unplug. Frame generation follows the same path: BatteryOverrides.FrameGenerationIndex can differ from FrameGenerationIndex, so enabling XeFG on AC does not force it on when the player switches to battery.
The hardware benchmark, run when a device is first seen, now populates both slots simultaneously. If it runs on AC, the battery quality tier is set one step lower. If it runs on battery, the AC tier is estimated one step higher. The result is that even a completely unknown laptop gets a power-state-aware starting configuration from first launch rather than a single fixed point.
Testing confirmed why this distinction makes sense. On gaming laptops, unplugging can halve performance or worse — thermal and power limits on the Razer RTX 3060 laptop we tested kicked in almost immediately on battery, with clock speeds dropping significantly compared to AC. Handhelds behaved very differently: battery life was better preserved and performance remained far more consistent between plugged and unplugged states. The handheld power delivery architecture is simply better suited to sustained gaming loads on battery, making the cliff that justifies a separate battery profile much less pronounced. The bespoke device profiles handle the handheld case well enough without needing a second settings tier.
The Architecture in Summary
Looking across both parts of this article, a coherent design pattern emerges.
High-end desktops get GPUThrottle: IGCL-powered thermal and acoustic regulation that finds and holds the highest sustainable frame rate within a comfortable operating envelope. Players get competitive frame rates without the noise and heat of an unconstrained GPU.
Known Intel Arc handhelds get console-style bespoke profiles: handcrafted, empirically validated, and shared across both WinGDK and Steam distribution paths.
Unknown Intel handhelds get conservative adaptive behaviour: iGPU-aware safeguards and CPU spike mitigation even without a device-specific profile.
All iGPU hardware benefits from CPU variance reduction: the custom animation budget system and football-aware priority scoring that stabilise GPU clocks by smoothing the CPU load presented to the shared power budget.
Battery and AC states are handled as first-class configurations for laptops: separate settings slots, automatic switching on power state change, and benchmark-derived defaults for both states from first launch. Handhelds are treated as always-AC, as their bespoke profiles already encode the right settings for their power envelope.
IGCL closes the loop across all of it: real-time telemetry that lets runtime decisions respond to what the hardware is actually doing, rather than what a static profile predicted it would do.
Closing Thoughts
The through-line connecting these systems, from high-end desktop thermal management to handheld device profiles to automatic battery-aware adaptation, is a single principle: treat frame rate and GPU load as variables to be managed, not targets to be maximised.
On discrete GPUs, maximising GPU load is often fine because the thermal and acoustic consequences are confined to the GPU itself. On Intel Core Ultra processors, where CPU and GPU share a power budget and a memory bus, the consequences of an unmanaged GPU load propagate immediately and visibly into the player experience.
Intel's hardware, the Xe GPU power model, the Core architecture, the IGCL telemetry API, gives developers the tools to manage this precisely. The question is whether you use them.
The player in a hot climate with their fan at maximum RPM, or the player on an MSI Claw watching the battery drain twice as fast as it should: these are solvable problems. The hardware APIs exist. The data is available at runtime. Using them is an engineering choice.
GOALS launched in June 2026 on PlayStation, Xbox (including Xbox Game Pass), Steam, and Epic Games Store, with more platforms set to follow later this year. The GOALS team is a remote-first team spread across Europe, with HQ based in Stockholm, Sweden
Related resources:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.