Community
cancel
Showing results for 
Search instead for 
Did you mean: 
guo__jian
Beginner
195 Views

Can VTune 2019 profile Python code with Numba decorators?

I am using VTune to profile some Python codes for getting "GFLOPS". 

The baseline code is written with Numpy and the optimized code is in Numba (with @njit and @vectorize decorator). The Numba code is about 8 times faster than the Numpy baseline, however, vTune shows that  Numpy and Numba achieve the same "GFLOPS".

I just want to make sure that can the latest vTune report  "GFLOPS" correctly for Numba Python code or not?

Is there any benchmark or example code about profiling Python​ Numba​ with vTune?

 

Thanks and regards

0 Kudos
3 Replies
guo__jian
Beginner
195 Views

Reply for testing

Anton_M_Intel
Employee
195 Views

The coming update 1 release of Parallel Studio will contain better support for Numba profiling but it is rather related to how Numba code is displayed and referred to in VTune. I'll leave for others to comment how GFLOPS metrics works in VTune but I can explain the difference in performance between Numba and Numpy. Numba fuses all the vectorized operations into a single loop over a data, so it does not need to store intermediate results to memory and get them back for another operations which Numpy usually does. So, they work at the same rate with the memory, but Numba is much more efficient with respect to the number of memory operations.

guo__jian
Beginner
195 Views

 

 

 

Anton Malakhov (Intel) wrote:

The coming update 1 release of Parallel Studio will contain better support for Numba profiling but it is rather related to how Numba code is displayed and referred to in VTune. I'll leave for others to comment how GFLOPS metrics works in VTune but I can explain the difference in performance between Numba and Numpy. Numba fuses all the vectorized operations into a single loop over a data, so it does not need to store intermediate results to memory and get them back for another operations which Numpy usually does. So, they work at the same rate with the memory, but Numba is much more efficient with respect to the number of memory operations.

Thanks very much for your reply.

I have another question about profile Python (NumPy-based) with vTune. When VTune calculate and report GFLOPS, which counts kernel computing time and API calls time or just kernel computing time? Can VTune report the total number of how many flop in a code? (I think the number of flop is able, but I am sorry that I have no idea how to check it)

Thanks again.

 

Reply