链接已复制
Hello WeiSeng,
Thank you for posting in the community!
To further investigate this issue, could you please confirm if you are using Intel® VTune™ Profiler? If not, could you provide the exact name of the software you are using?
Best regards,
Norman S.
Intel Customer Support Engineer
Hello,
Yes, is using tyhe Intel VTUNE Profiler to capture the ARC770 GPU metrics.
Thanks!
Hello WeiSeng,
Thank you for sharing this information.
To ensure you receive the most specialized assistance, we have a dedicated forum that addresses these specific concerns. Therefore, I will be moving this discussion to our Developer Software Forum. This will allow our knowledgeable community and experts to provide you with timely and accurate solutions.
Have a good one!
Best regards,
Jed G.
Intel Customer Support Technician
In general, Stalled issues are related to memory footprint. Model optimization, like optimizing model compilation time, graph fusing, etc., can also reduce memory usage and inference time. You can also use SLM to improve memory access latency for some kernels. Using oneDNN is also helpful for LLM optimization. There are some documents you can refer to:
https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html
https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-2/kernels.html
