Hi I have a question regarding the performance of the slasrt2 function. I'm using vtune and am seeing the following results, which shows that my code is not getting the FPU peak or anywhere near that. I was wondering about the percentages of slasrt2 routines, is there any switch or environmental variable i need to use to override default setting of MKL to get higher FPU utilization?
The function ?lasrt2 is number sorting function actually used quick sorting/ insertion sorting(n<20). Within this calculation, there will be large amount of moving operation between memory to register. Only comparison operation for float number will use FPU. The time complexity of insertion sort is O(n^2), you could not reduce times of moving operation, and the complexity of quick sort is O(nlog n)~O(n^2). Different from multiplication operation, you could not ensure the FPU utilization for ?lasrt2 always provide a high performance.
My advise is that you could watch assembly code in Vtune, if there indeed remains many mov operation, it is very normal you could not get peak of FPU utilization.