Call to ippsSortDescend_32f_I api hangs/doesnt return

Ganesh_K_ · ‎11-09-2016

Hello Folks,

My application uses IPP library. I am currently using, IPP library version 6.1.6. Due to some customer requirements I cannot upgrade my IPP library to later versions.

I am seeing a weird behaviour where in a call to ippsSortDescend_32f_I api doesn't return/hangs. This occurs intermittently(like 2 out of 10 runs). I see this non-responsive behaviour on Linux & Mac(Haven't tried on Windows).

I initially suspected some threading issue within the IPP library & posted my query here: http://stackoverflow.com/questions/40475792/how-to-set-number-of-threads-while-using-static-intel-ipp-library

On further investigation, I found out that IPP library doesn't spawn any new threads of its own (No internal multithreading)

From this doc: http://www.csbi.mit.edu/technology/intel_ipp/doc/ThreadedFunctionsList.txt

Looks like the function ippsSortDescend_32f_I is not even threaded.

If I use c style quick sort instead of this api, my application works as expected. I am not sure why a call to this synchronous api would not return.

Are there any conditions where in, this function runs into an infinite loop ? (Just my guess)

Any help would be highly appreciated.

Thanks,

Ganesh

Jing_Xu · ‎11-10-2016

Hi Ganesh,

ippsSortDescend_32f_I won't run into infinite loop.

Could you provide us a test case for investigation, please?

Ganesh_K_ · ‎11-11-2016

Hello Jing,

Thanks for the response.

As I have mentioned earlier, this issue doesn't happen every time, and as such it is a bit difficult to provide an exact test case. In my case though, I am passing a float vector containing 2028 elements. I verified that each element in this vector is indeed a float & that there is no data type mismatch of any sort.

I am attaching a file which contains the disassembly of the ippsSortDescend_32f_I api. I stepped through the api in assembly.
There appears to be some loop from Line # 374 -> 378 -> 381 -> 383 -> 387 -> 390.

The outer loop seems to be starting from Line 72: 0x10175ada4 <+72>: cmpl $0x9, %esi

This line seems to be causing back the jump to Line 72 multiple times: 0x10175afea <+654>: jne 0x10175ada4 ; <+72>

Kindly let me know, in case if you need more info on my end.

Thanks,

Ganesh

Jing_Xu · ‎11-13-2016

Hi Ganesh,

Could you provide us more detailed information about your usage, please?

1. OS and its version

2. Hardware information, such as CPU SKU, etc.

According to your attached file, "y8_ippsSortDescend_32f_I" was called. This is specifically optimized for processors with Intel SSE4.2, and y8 is the CPU Identification Codes Associated with Processor-Specific Libraries(https://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp). You can test whether the mx version of this functions, which is a Generic code optimized for processors with Intel® Streaming SIMD Extensions (Intel® SSE), works on your machine. You can call ippInitCpu function prior to other IPP functions to assign manually the specified processor type. Alternatively, if there are other platforms available, you can run the program on other platforms to test whether the same symptom appears. Platform here means different generation of CPU, like Haswell, Broadwell, Skylake, and so on.

3. Could you integrate the following code to gather IPP version into your program and copy the result to us, please?

const IppLibraryVersion* lib; 
    lib = ippsGetLibVersion(); 

printf("major = %d\n",lib->major);
printf("minor = %d\n",lib->minor);
printf("majorBuild = %d\n",lib->majorBuild);
printf("build = %d\n",lib->build);
printf("targetCpu = %c%c%c%c\n",lib->targetCpu[0],lib->targetCpu[1],lib->targetCpu[2],lib->targetCpu[3]);
printf("Name = %s\n", lib->Name);
printf("Version = %s\n", lib->Version);
printf("BuildDate = %s\n", lib->BuildDate);

4. Would you like to describe in more details about how did you call the IPP functions, please? Like whether you build the code statically or dynamically into an executable or into a static or dynamic library, or whether this function is called inside a thread, and so on.

Besides, I copied the issue to our engineer team. However, since the version is extremely old, and it is quite difficult to investigate with only the disassembly, we will try our best to search for a solution.

Ganesh_K_ · ‎11-14-2016

Hello Jing,

Thanks for the details.

Here are the details you requested.

1. OS: OS X El Capitan Version: 10.11.5

2. I am not sure how to get SKU info. I used the following two commands & their corresponding outputs looks like:

Comand-1: sysctl -n machdep.cpu.brand_string

Output: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz

Command-2: system_profiler SPHardwareDataType

Output:

Model Name: MacBook Pro
Model Identifier: MacBookPro11,2
Processor Name: Intel Core i7
Processor Speed: 2.5 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Boot ROM Version: MBP112.0138.B17
SMC Version (system): 2.18f15
Serial Number (system): C02NL23TG3QN
Hardware UUID: 3AE68F30-13DF-57B5-BB7C-CFAB7B3632CA

Unfortunately I am not very well versed with different generation of Intel CPU's. Are these Haswell, Broadwell & Skylake, like older versions of i7 processor ? I have an i7 based machine. Most of the machines in our QA department are also i7 based. Do I need to run tests on older processors or i3 or i5 based machines ?

3. Output from the corresponding printf statements are as follows:

major = 6
minor = 1
majorBuild = 137
build = 837
targetCpu = y8
Name = ippsy8.a+
Version = 6.1 build 137.46
BuildDate = Nov 27 2009

4. We do not build executables. We build both static/dynamic libraries & deliver the libraries to our customers. Our library doesn't create any threads and as such it doesn't call the function from inside a thread, but our customers who use our libraries in their applications are free to create & use threads in their application (External threading).

Kindly let me know if you need any more info from my end.

Thanks,

Ganesh

Jing_Xu · ‎11-14-2016

Hi Ganesh,

When the problem you mentioned occurred, how was your library linked, how was IPP library linked to your library, and was the api funtion running in single-thread or multi-thread?

Are there any deterministic behaviors or patterns when the problem occurs?

Ganesh_K_ · ‎11-14-2016

Hello Jing,

Thanks for the response.

IPP library is statically linked to our library. The application which I run to reproduce this issue is multi-threaded.

When the api hangs, there is only one thread which is accessing this api. There are no "Race Conditions" in my application.

Thanks,

Ganesh

Jing_Xu · ‎11-15-2016

Hi Ganesh,

There are 3 suggestions to this issue.

1. Would you like to replace ippInit() with ippInitCpu(ippCpuEM64T) in your source code to test whether the PX/MX optimization code works in your machine? As what I introduced in #4, PX/MX is a Generic code optimized for processors with Intel® Streaming SIMD Extensions (Intel® SSE). PX is for 32-bit, MX is for 64-bit. What you were using(Y8) is optimized for SSE4.2.

2. Will changing from GNU gcc/g++ to icc, or vice versa, work for you?

3. Our engineering team found a very good workaround:

IPPAPI(IppStatus, ippsSortRadixDescend_32f_I, (Ipp32f *pSrcDst, Ipp32f *pTmp, Ipp32s len))

It is significantly faster than ordinary Sort. The only one minus is that this function requires a temporary buffer of the same size as input vector.

Please try them and let us know if they do not work for you.