Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5142 Discussions

When running the LLM model detected ARC770 dGPU many stalled stage

WeiSeng
Employee
1,016 Views

Hi Team,

 

Recently we using the vtune to analysis the LLM model inference on ARC770 DGPU.

 

From the Vtune trace detected many stalled stages instead of activate stage.

 

I attached the picture that capture from vtune.

 

Any suggestion?

 

0 Kudos
5 Replies
NormanS_Intel
Moderator
904 Views

Hello WeiSeng,


Thank you for posting in the community!


To further investigate this issue, could you please confirm if you are using Intel® VTune™ Profiler? If not, could you provide the exact name of the software you are using?


Best regards,

Norman S.

Intel Customer Support Engineer


0 Kudos
WeiSeng
Employee
876 Views

Hello,

 

Yes, is using tyhe Intel VTUNE Profiler to capture the ARC770 GPU metrics.

 

Thanks!

0 Kudos
JedG_Intel
Moderator
846 Views

Hello WeiSeng,

 

Thank you for sharing this information.

 

To ensure you receive the most specialized assistance, we have a dedicated forum that addresses these specific concerns. Therefore, I will be moving this discussion to our Developer Software Forum. This will allow our knowledgeable community and experts to provide you with timely and accurate solutions.

 

Have a good one!

 

 

Best regards,

Jed G.

Intel Customer Support Technician


0 Kudos
yuzhang3_intel
Moderator
808 Views

In general, Stalled issues are related to memory footprint. Model optimization, like optimizing model compilation time, graph fusing, etc., can also reduce memory usage and inference time. You can also use SLM to improve memory access latency for some kernels. Using oneDNN is also helpful for LLM optimization. There are some documents you can refer to:

https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html

https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-2/kernels.html

 

 

 

 

0 Kudos
clevels
Moderator
598 Views

@WeiSeng Please see @yuzhang3_intel recommendations above.

0 Kudos
Reply