Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5248 Diskussionen

When running the LLM model detected ARC770 dGPU many stalled stage

WeiSeng
Mitarbeiter
1.523Aufrufe

Hi Team,

 

Recently we using the vtune to analysis the LLM model inference on ARC770 DGPU.

 

From the Vtune trace detected many stalled stages instead of activate stage.

 

I attached the picture that capture from vtune.

 

Any suggestion?

 

0 Kudos
5 Antworten
NormanS_Intel
Moderator
1.411Aufrufe

Hello WeiSeng,


Thank you for posting in the community!


To further investigate this issue, could you please confirm if you are using Intel® VTune™ Profiler? If not, could you provide the exact name of the software you are using?


Best regards,

Norman S.

Intel Customer Support Engineer


WeiSeng
Mitarbeiter
1.383Aufrufe

Hello,

 

Yes, is using tyhe Intel VTUNE Profiler to capture the ARC770 GPU metrics.

 

Thanks!

JedG_Intel
Moderator
1.353Aufrufe

Hello WeiSeng,

 

Thank you for sharing this information.

 

To ensure you receive the most specialized assistance, we have a dedicated forum that addresses these specific concerns. Therefore, I will be moving this discussion to our Developer Software Forum. This will allow our knowledgeable community and experts to provide you with timely and accurate solutions.

 

Have a good one!

 

 

Best regards,

Jed G.

Intel Customer Support Technician


yuzhang3_intel
Moderator
1.315Aufrufe

In general, Stalled issues are related to memory footprint. Model optimization, like optimizing model compilation time, graph fusing, etc., can also reduce memory usage and inference time. You can also use SLM to improve memory access latency for some kernels. Using oneDNN is also helpful for LLM optimization. There are some documents you can refer to:

https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html

https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-2/kernels.html

 

 

 

 

clevels
Mitarbeiter
1.105Aufrufe

@WeiSeng Please see @yuzhang3_intel recommendations above.

Antworten