When running the LLM model detected ARC770 dGPU many stalled stage

WeiSeng · ‎09-04-2024

Hi Team,

Recently we using the vtune to analysis the LLM model inference on ARC770 DGPU.

From the Vtune trace detected many stalled stages instead of activate stage.

I attached the picture that capture from vtune.

Any suggestion?

NormanS_Intel · ‎09-10-2024

Hello WeiSeng,

Thank you for posting in the community!

To further investigate this issue, could you please confirm if you are using Intel® VTune™ Profiler? If not, could you provide the exact name of the software you are using?

Best regards,

Norman S.

Intel Customer Support Engineer

WeiSeng · ‎09-10-2024

Hello,

Yes, is using tyhe Intel VTUNE Profiler to capture the ARC770 GPU metrics.

Thanks!

JedG_Intel · ‎09-11-2024

Hello WeiSeng,

Thank you for sharing this information.

To ensure you receive the most specialized assistance, we have a dedicated forum that addresses these specific concerns. Therefore, I will be moving this discussion to our Developer Software Forum. This will allow our knowledgeable community and experts to provide you with timely and accurate solutions.

Have a good one!

Best regards,

Jed G.

Intel Customer Support Technician

yuzhang3_intel · ‎09-13-2024

In general, Stalled issues are related to memory footprint. Model optimization, like optimizing model compilation time, graph fusing, etc., can also reduce memory usage and inference time. You can also use SLM to improve memory access latency for some kernels. Using oneDNN is also helpful for LLM optimization. There are some documents you can refer to:

https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html

https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-2/kernels.html

clevels · ‎09-30-2024

@WeiSeng Please see @yuzhang3_intel recommendations above.