Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4975 Discussions

Advisor Roofline for Ideal Hardware

JNorw
Beginner
950 Views

The Advisor Roofline could provide a useful tool for evaluating ai hardware efficiency vs some ideal hardware.  By "ideal", I mean analyze the neural net model number of operations and evaluate the minimum latency through the model if all operations could be executed asynchronously and with zero wait times for memory accesses.  Use that as a roofline rather than the limits of the existing hardware.

Is there some configuration of the tool that could accomplish this?

0 Kudos
1 Solution
Zakhar_M_Intel1
Employee
898 Views

Hello,

What you are asking for can be seen as "ideal throughput model", under assumption there is neither compute latency nor caches/memory latency effects et all, right?

So for example, for "memory subsystem" this would correspond to memory bandwidth peak assuming all data fits into the registers, while for compute it will correspond to e.g. FMA (or VNNI etc) benchmark with no data flow dependencies. Is this correct understanding?

How would you use this kind of roofline in your practice? Understanding your usage model may help us in better prioritization and feature definition.

I should mention that de facto, the current implementation is not so far from what you are asking:

  • FMA Compute benchmarks are highly optimized and implemented so that there is mostly no latency in the system
  • Registers-only benchmarks are not provided, but L1 benchmark is here and it is actually already much closer to the ideal, rather than practical peak, because rare workload fits into L1 even partially (that's why L1 benchmark is so good for CARM Roofline, which is focused on algorithmic and fundamental limits, compared to "MLR" or Classic Roofline in Advisor, which is oriented more towards current bottlenecks highlighting)
  • Some elements of "Offload Advisor" , released in oneAPI Gold, may generally also fit into what you are asking for in some future, but also depending on your usage model as I asked above.

 

View solution in original post

0 Kudos
4 Replies
GouthamK_Intel
Moderator
932 Views

Hi,

As your issue is related to Intel Advisor tool, we are having a dedicated forum for Analyzers. So we are moving this thread to Analyzers forum for faster response.

Have a Good day!

 

Thanks & Regards

Goutham

 

0 Kudos
Zakhar_M_Intel1
Employee
899 Views

Hello,

What you are asking for can be seen as "ideal throughput model", under assumption there is neither compute latency nor caches/memory latency effects et all, right?

So for example, for "memory subsystem" this would correspond to memory bandwidth peak assuming all data fits into the registers, while for compute it will correspond to e.g. FMA (or VNNI etc) benchmark with no data flow dependencies. Is this correct understanding?

How would you use this kind of roofline in your practice? Understanding your usage model may help us in better prioritization and feature definition.

I should mention that de facto, the current implementation is not so far from what you are asking:

  • FMA Compute benchmarks are highly optimized and implemented so that there is mostly no latency in the system
  • Registers-only benchmarks are not provided, but L1 benchmark is here and it is actually already much closer to the ideal, rather than practical peak, because rare workload fits into L1 even partially (that's why L1 benchmark is so good for CARM Roofline, which is focused on algorithmic and fundamental limits, compared to "MLR" or Classic Roofline in Advisor, which is oriented more towards current bottlenecks highlighting)
  • Some elements of "Offload Advisor" , released in oneAPI Gold, may generally also fit into what you are asking for in some future, but also depending on your usage model as I asked above.

 

0 Kudos
Gopika_Intel
Moderator
877 Views

Hi,

Has your query clarified. Shall we discontinue monitoring this thread.

Regards

Gopika


0 Kudos
Gopika_Intel
Moderator
849 Views

Hi,


We haven’t heard back from you, we won’t be monitoring this thread as the solution provided by Zakhar is accepted as the solution. For further assistance, please post a new thread.


Regards

Gopika


0 Kudos
Reply