I am converting from 4.1 to 5.3.4 and have it finally compiling/linking. When I try to run the first (?) function that gets called is probably an ippmTranspose_ma_64f_P function that hangs. Looking at the callstack in windbg, i have the following stack:
2df7e69c 7c90d85c ntdll!KiFastSystemCallRet
2df7e6a0 7c8023ed ntdll!NtDelayExecution+0xc
2df7e6f8 7c802451 kernel32!SleepEx+0x61
2df7e708 0278a4c3 kernel32!Sleep+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
2df7e714 02775cea libguide40!_kmp_reap_monitor+0x227
2df7e748 02775975 libguide40!_kmp_wait_sleep+0x162
2df7e79c 02774dc7 libguide40!_kmp_fork_call+0x2085
2df7e7b4 02774e83 libguide40!_kmp_fork_call+0x14d7
2df7e7dc 0277beec libguide40!_kmp_fork_call+0x1593
2df7e814 1de808bf libguide40!_kmpc_fork_call+0x34
2df7e8b8 1fc3f9c9 ippmv8_5_3!ippmTranspose_ma_64f_P+0xc580b
2df7e960 1fc3d0f8 GcCoordXformInterpFilter!BiCubicInterpolator::Interpolate+0x149
I set OMP_NUM_THREAD=1. It still appears to be trying to do multi-threading. Did I miss an initialize step somewhere? I couldn't find it in the docs but I have been going back and forth from 4.1 to 5.3 so much it wouldn't surprise me if it's in there.
It's a Core 2 Duo processor running WinXP Pro.
Ok, I added an ippSetNumThreads(1) call in my mainline. That seems to allow it to come up. Is there an environment value that can be used for this in the IPP library? OMP_NUM_THREADS doesn't seem to work in this case. Also, will this impact using regsvr32 to register a dll that isn't the main one?
I'm sorry for not explaining myself more completely. Sometimes I forget everyone is not familiar with how I/we work here. Anyway...
No, we don't have a small sample that shows the issue. Our application is rather large. We have DLLs that include libs that have IPP calls in them. We do not use openmp for anything. Our application has 40 or so threads on its own that get created. We are running on Core 2 Duo processors, 2 cores with 4G ram. The system is a mix, most compiled with MSVCC and some libs/dlls compiled with the Intel compiler. These were historical, intense calculation libs that benefitted from the increased optimizations we got from ICL, 15% or more increase in most cases. Good job!!
We have reached a hold point in the project and are updating from ICL 9.1 and IPP 4.1 to ICL 10.1 and IPP 5.3. ICL 10.1 right now is causing us problems because it seems to have a problem doing regsvr32 calls on dlls that contain IPP libs linked in. I am working with Jennifer to resolve that. If I compile with 9.1, the regsvr32 calls work but running the applcation fails with a hang. The offending lib is as indicated in the first item above. The ippmTranspose call appears to hang trying to do libguide calls that try to initialize openmp, I found that I can avoid this hang by adding a call to ippSetNumThreads in our application near the start of it. My questions become the following.
1) Why does ipp assume it can create as many threads as there are processors? Shouldn't it default to the previous behavior and only generate more if allowed/requested?
2) It appears the Intel compiler uses an environment variable to control how many openmp threads are allowed. Why doesn't IPP look at the same one or one of its own?
3) Does going to IPP 6.0 (just released) with it's different openmp library change any of this?