Threading crashes with VML vsExp

jamiecook · ‎10-25-2010

I have symptoms very similar to this post : http://software.intel.com/en-us/forums/showthread.php?t=76294

I'm building a DLL which plugs in to another application and I'm trying to use MKL VML goodness to speed up some exponential calculations.

The DLL is built with the multi-threaded DLL flag (/MD) and i'm linking against mkl_sequential_dll.lib mkl_intel_c.lib mkl_core.lib

however whenever I call vsExp in code which is multithreaded using TBB i get a big old crash. I've tried all the combinations I can think of {mkl_sequential.lib, mkl_intel_thead.lib, mkl_intel_thead_dll.lib} and it's always a crash, sometimes it's a straight out kill the entire program crash and sometimes it is the same openMP problem that was reported in the above thread.

"OMP: Error #134: Cannot set thread affinity mask. OMP: System error #87: The parameter is incorrect."

I'm using the MKL that comes with Intel compiler 11.1.065 (but the actual compiler is version 11.1.071 - there is no separate MKL with this version)... and I'm using TBB 3.0 with a simple parallel_for - the memory for both the input and output are thread local.

What am I doing wrong?

VipinKumar_E_Intel · ‎10-25-2010

Can you try setting MKL_SERIAL=YES env variable and check?

--Vipin

jamiecook · ‎10-25-2010

I have set this env variable and then restarted the host application and have exactly the same problem. I am not using nmake, i am using visual studio 2008 and Intel compiler.

also I have added a call to the vsExp in the non-threaded section of the process and it works fine.

VipinKumar_E_Intel · ‎10-25-2010

Have you also tried using mkl_set_num_threads(1) in the code? Hope, you are also not using -openmp compiler flag.

jamiecook · ‎10-25-2010

I will try it first thing in the morning but what does mkl_set_num_threads actually do?

Does it set the mkl routines to use a single thread?

i'm not using the openmp flag... as I said I'm using tbb for my threading platform.

TimP · ‎10-25-2010

To make sure of TBB compatibility, as I understand it, you should be using the mkl_sequential library, and calling your MKL functions from TBB threads, if you want parallel execution. I suppose it's possible that even the MKL_SERIAL environment option is broken by the incompatibility between TBB and the OpenMP library used by MKL. In that case, I wouldn't be surprised if mkl_set_num_threads is broken (unless disabled by using mkl_sequential or openmp stubs libraries). There are rumors of MKL being switched eventually to the TBB compatible threading model, but that's many months away.

jamiecook · ‎10-26-2010

Tim, thanks for your reply - I am trying both the runtime method [mkl_set_num_threads(1)] and the linking method to ensure that mkl isn't trying to utilise openmp. But I am coming up with the same crash.

To be clear I have the following libraries linked: mkl_sequential.lib mkl_intel_c.lib mkl_core.lib

I have created a test project (which is a standalone executable) and I've shown that I can run the vml calls from within my tbb code - and achieve a factor of two increase in runtime - I'm showing results below

                         Call Count     Average Time     Total Time
Regular exponential               1              408            408
MKL     exponential               1               18             18
TBB     exponential               1              138            138
TBB+MKL exponential               1                8              8

This test project is compiled using exactly the same linking options as the plugin DLL but it works and the DLL crashes and I can't figure out why. The underlying code is the same for both scenarios

namespace OtUtils {
    template inline void vectorExp(vector& in, vector& out);
};

template<>
void OTUtils::vectorExp(vector& in, vector& out)
{
    vsExp(in.size(),&(in[0]),&(out[0]));
}

template<>
void OTUtils::vectorExp(vector& in, vector& out)
{
    vdExp(in.size(),&(in[0]),&(out[0]));
}

template
T OTUtils::calcUsage(vector& costs, vector& usage, T phi)
{
    T   totalUtility = 0.0; 
    T   minCost      = vectorMin(costs);
    int numOptions   = costs.size(); 
    vector utility(numOptions, 0.0); 
    vector normCost(numOptions); 
    usage.resize(numOptions, 0.0); 

    for (int i = 0; i < numOptions; ++i)
    {
        normCost.at(i) = (costs.at(i) == numeric_limits::infinity())
            ? -numeric_limits::infinity() 
            : phi * (costs.at(i) - minCost);
    }
    // This calls to VML and crashes everything
    vectorExp(normCost, utility); 
    // This call does the same function as the above line, but doesn't crash
    // for (int i=0; i < numOptions; i++) utility.at(i)=exp(normCost.at(i)); 
}

jamiecook · ‎10-26-2010

okay so i've narrowed it down to my use of a vector to store the memory that is passed to vsExp. As you can see in the following fragment if I use new or alloc to get some memory it works but if I pass in the memory that came out of a vector (with a standard allocator) then it cracks the ****s and crashes.

[cpp]    // If I allocate my memory myself, it works!
    // T* utility = (T*)_aligned_malloc(sizeof(T)*numOptions, 16);
    // T* utility = new T[numOptions]; 

    // If I let vector allocate my memory... it crashes
    vector utility(numOptions);

    // populate the utility vector with values
    for (int i = 0; i < numOptions; ++i)
    {
        utility = (costs == numeric_limits::infinity())
            ? -numeric_limits::infinity() 
            : phi * (costs - minCost);
    }

    // transform the utility in place (to it's exponential) 
    // NOTE: this is the same form for both vector and T[]
    vsExp(numOptions, &(utility[0]), &(utility[0]));[/cpp]

Once again, any help here is appreciated.

TimP · ‎10-26-2010

Do you have the same problem when you link dynamic MKL libraries? It seems possible to create link order problems when linking static libraries into a .dll.

Ilya_B_Intel · ‎10-26-2010

One more question, if you replace
vsExp(numOptions,&(utility[0]),&(utility[0]));
with
for(i=0;i{
*(&(utility[0])+i) =expf(*(&(utility[0])+i) );
}
Will it work?
Also, whereare your STL headers from?

jamiecook · ‎10-26-2010

I'm not sure how this would be an issue given the steps for reproducing that I've outlined. But anyway, I'm not entirely sure how to dynamically link the MKL... is that by linking against mkl_sequential_dll.lib? I didn't have much luck when I tried that either.

jamiecook · ‎10-26-2010

Yes, it does work when I replace it with

for (int i=0; i < numOptions; ++i) utility = exp(utility);

I'm pretty sure that what you have written is the same but with the memory arithmetic written out longhand.

My STL libraries are the standard visual studio 2008 ones.

TimP · ‎10-26-2010

That would be the _dll versions of all 3 MKL libraries.

jamiecook · ‎10-28-2010

Tim,

I've tried it with the DLL versions as well and I get exactly the same thing.

The strange thing is that as far as I can tell using a vector to do the allocation (of the array to be processed) should be (almost) exactly the same as using new/delete. But it's very replicatable - use a new/delete and call vsExp and it works perfectly - use a vector and call vsExp(vec.size(), &(vec[0]), &(vec[0])) and it crashes hard!

Another interesting thing is that even when it does work... it's not actually giving me a performance improvement over the version which just uses a for loop

[cpp](int i=0; i = exp(vec); // 2min 34sec through critical section
vsExp(vec_size, &(vec[0]), &(vec[0]))            // 2min 38sec through critical section[/cpp]

given that i've seen quite considerable performance improvement in my other tests this is a bit perplexing. maybe even the version which doesn't crash isn't working 100% correctly.

Regards, Jamie

Gennady_F_Intel · ‎11-01-2010

Hello Jamie,

Both of the problems ( run-time crach and performance improvement ) you reported are an unknown problems for the current version of VML. Could you give us the exact testcase for checking these problems on our side?

What is CPU type you are working on?

--Gennady

fbaralli · ‎11-11-2010

Jamie,

have you done some other benchmarks (i.e. sequential FFT-MKL + TBB)?

I'm currently evaluating the possibility of using TBB+MKL_sequential or OpenMP+MKL_parallel for my application and I'm mainly concerned about issuessimilarto what you are reporting.

Thanks

Francesco

Ilya_B_Intel · ‎12-16-2010

I am attaching example code mapin.cpp based on your inputs. It uses TBB and MKL. I compile it with line:

icl main.cpp mkl_intel_c.lib mkl_sequential.lib mkl_core.lib

This code works on my side and I see no crash of any kind. Does it work on your side? What should be changed here for it to crash as in your case?