I am using the lapack subroutine 'dgelsd' in order to calculate the linear least square solution of (||Ax-b||) system. For that I have used Intel MKL Parallel library. When I run my code I can see that only 57% of the total CPU is used. Also setting the number of threads for MKL also has no effect. For that I used
call mkl_set_num_threads( 32 )
I am working on the workstation, whose specs are given below:
Intel(R) Xeon(R) CPU E5-2620 v4@ 2.10 GHz, Cores = 16, Logical processors = 32, Windows 10 Pro, 64-bit Operating system, x64-based processor.
Please suggest me how i can make use of available processing capacity. Presently my code is taking so much time to give results and its main computational part is calling DGELSD (where it is spending most of its time to give least square solution).
MKL parallel lib is not parallel from the beginning to end, some parts is still sequential. Please use VTune to analyze your program and the source code to find out the hotspots.
mkl_set_num_thread() should be set number of cores, please try mkl_set_num_thead(16) to see if there are improvements.
BTW, please also verify whether hyperthreading is set tobe on or not.