FFT Failure Within SideFX Houdini

Harry_B_ · ‎02-12-2014

Hi guys,

This is a long shot but I'm hoping you can help, as I'm totally out of ideas and don't know where to go with this. I'm trying to run a plugin inside SideFX's Houdini that uses MKL's FFT. I had absolutely no problems with this with Houdini 12.5, but in Houdini 13 the calls are failing.
My test case is to run this code (have taken out status-checking for brevity):

    const int fft_size = 16384;

    std::complex<float>* input = new std::complex<float>[fft_size/2+1];
    for (int i(0); i < fft_size/2+1; ++i)
    {
        input = std::complex<float>(i, i);
    }

    float* output = new float[fft_size];

    MKL_LONG len = fft_size;
    DFTI_DESCRIPTOR_HANDLE fftHandle;
    DftiCreateDescriptor (&fftHandle, DFTI_SINGLE, DFTI_REAL, 1, static_cast<MKL_LONG>(len));
    DftiSetValue(fftHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    DftiCommitDescriptor (fftHandle);
    DftiComputeBackward (fftHandle, input, output);

When I run this inside Houdini 12.5 I have no problems, when I run it inside 13.0 the resulting "output" array is all NaNs, or just junk data. I can catch floating point exceptions and it usually fails inside an inverse radix norm inside MKL. I'm using MKL 11.0, but I've found the same problem occurs with 10.3. The interesting thing is that this only fails for power-of-two sizes; e.g. 16384 fails but 16383 is fine. Again, no problems inside Houdini 12.5.

The compilation is with g++ 446 and these flags:

# flags from SideFX, generated with "hcustom -c" and "hcustom -m"
HOUDINI_BUILD_FLAGS := -DVERSION=\"$(HOUDINI_VERSION)\" -D_GNU_SOURCE -DLINUX -DAMD64 -m64 -fPIC -DSIZEOF_VOID_P=8 -DFBX_ENABLED=1 -DOPENCL_ENABLED=1 -DOPENVDB_ENABLED=1 -DSESI_LITTLE_ENDIAN -DENABLE_THREADS -DUSE_PTHREADS -D_REENTRANT -D_FILE_OFFSET_BITS=64 -c -DGCC4 -DGCC3 -Wno-deprecated -I$(HOUDINIROOT)/toolkit/include -Wall -W -Wno-parentheses -Wno-sign-compare -Wno-reorder -Wno-uninitialized -Wunused -Wno-unused-parameter -O2 -fno-strict-aliasing

# flags from Intel, taken from Intels Link Line Advisor (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor)
INTEL_LINK_FLAGS := -L$(INTELROOT)/mkl/lib/intel64 -lmkl_rt -lpthread -lm
INTEL_BUILD_FLAGS := -m64 -I$(INTELROOT)/mkl/include

All on the following linux version (albeit heavily customised), although I've tried it on the latest Ubuntu too:

Linux version 2.6.32-279.14.1.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012

I've tried adding -ffast-math but no joy. Can anybody think of anything I might try to figure out this problem? I've been banging my head for two days straight and am totally stuck!

I've attached the exact code I am using packaged into a standalone example.

Thanks in advance,

Harry

Harry_B_ · ‎02-12-2014

This is a long shot but I'm hoping you can help, as I'm totally out of ideas and don't know where to go with this. I'm trying to run a plugin inside SideFX's Houdini that uses MKL's FFT. I had absolutely no problems with this with Houdini 12.5, but in Houdini 13 the calls are failing. My test case is to run this code: [cpp] const int fft_size = 16384; std::complex* input = new std::complex[fft_size/2+1]; for (int i(0); i < fft_size/2+1; ++i) { input = std::complex(i, i); } float* output = new float[fft_size]; MKL_LONG len = fft_size; DFTI_DESCRIPTOR_HANDLE fftHandle; DftiCreateDescriptor (&fftHandle, DFTI_SINGLE, DFTI_REAL, 1, static_cast(len)); DftiSetValue(fftHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE); DftiCommitDescriptor (fftHandle); DftiComputeBackward (fftHandle, input, output); [/cpp] When I run this inside Houdini 12.5 I have no problems, when I run it inside 13.0 the resulting "output" array is all NaNs, or just junk data. I can catch floating point exceptions and it usually fails inside an inverse radix norm inside MKL. I'm using MKL 11.0, but I've found the same problem occurs with 10.3. The interesting thing is that this only fails for power-of-two sizes; e.g. 16384 fails but 16383 is fine. Again, no problems inside Houdini 12.5. The compilation is with g++ 446 and these flags: [plain] HOUDINI_BUILD_FLAGS := -DVERSION=\"$(HOUDINI_VERSION)\" -D_GNU_SOURCE -DLINUX -DAMD64 -m64 -fPIC -DSIZEOF_VOID_P=8 -DFBX_ENABLED=1 -DOPENCL_ENABLED=1 -DOPENVDB_ENABLED=1 -DSESI_LITTLE_ENDIAN -DENABLE_THREADS -DUSE_PTHREADS -D_REENTRANT -D_FILE_OFFSET_BITS=64 -c -DGCC4 -DGCC3 -Wno-deprecated -I$(HOUDINIROOT)/toolkit/include -Wall -W -Wno-parentheses -Wno-sign-compare -Wno-reorder -Wno-uninitialized -Wunused -Wno-unused-parameter -O2 -fno-strict-aliasing INTEL_LINK_FLAGS := -L$(INTELROOT)/mkl/lib/intel64 -lmkl_rt -lpthread -lm INTEL_BUILD_FLAGS := -m64 -I$(INTELROOT)/mkl/include [/plain] All on the following linux version (albeit heavily customised), although I've tried it on the latest Ubuntu too: [plain] Linux version 2.6.32-279.14.1.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) [/plain] I've tried adding -ffast-math but no joy. Can anybody think of anything I might try to figure out this problem? I've been banging my head for two days straight and am totally stuck! I've attached the exact code I am using packaged into a standalone example. Thanks in advance, Harry

Zhang_Z_Intel · ‎02-12-2014

You may want to first check with SideFX Houdini technical support. If this problem is with the SideFX Houdini environment, then they may have already had a solution.

Can you reproduce the problem outside of SideFX Houdini? For example, if you can provide a standalone code that exhibits the same errors then it will be much easier for us to give it a look and get back to you fast.

Dmitry_B_Intel · ‎02-13-2014

DftiSetValue(fftHandle,DFTI_CONJUGATE_EVEN_STORAGE,DFTI_COMPLEX_COMPLEX) before committing descriptor should help.

Harry_B_ · ‎02-13-2014

@Dimitry: Thanks for the line. I've added it in, although it unfortunately doesn't solve the problem. Besides this my use of the library otherwise correct? @Zhang Z: I got in touch with SideFX yesterday and they sent me this reply:

I've been able to reproduce this problem but don't know why the results differ between versions. My simplified version of your program is attached. Running this through valgrind on linux shows some uninitialized memory use errors within the intel library, which may be related to the problem. It does seem that some uninitialized data is used, since the resulting values change on every run.

I've tried to recreate the problem process standalone; but it always completes fine. Only running inside the Houdini 13 environment does it. This surely means it's something in that runtime environment that's screwing things up for MKL; but I've no idea where to look, whether the problem is in MKL, in my end, or in SideFX's end. I've got this valgrind report (attached) from my standalone run, which I presume is the memory leaks they are talking about, and the simpler example they are referring to. Is this all expected stuff or does it point to a problem on your end? If you guys give me the all clear on this valgrind report and can confirm that I'm using, compiling against, and linking the library correctly I can bounce this back to SideFX again; or any advice about how to proceed / any other tests that can be run would be fantastic also. Thanks for all your help! Harry

Evgueni_P_Intel · ‎02-13-2014

Dear Harry,

I see "Pass" in the valgrind's log. Does it mean that the problem has gone?

The attatched code allocates 2 floats less than it is needed for input.

Backward FFT expects that the first and the last elements of input are real numbers.

Valgrind's messages about __intel_sse2_strlen and __intel_sse2_strdup are known to be false positive.

How many threads are available to your application? Do you see NaN's with OMP_NUM_THREADS set to 1?

Thank you.

Evgueni.

Harry_B_ · ‎02-13-2014

Hi Evgueni, Thanks for the advice, I've changed this line:

[cpp] float *input = new float[fft_size * 2]; [/cpp]

to this:

[cpp] float *input = new float[fft_size * 2 + 2]; [/cpp]

That valgrind output was from the standalone run; so that one went through, but I do still have the problem with Houdini. Thanks for the information about __intel_sse2_strlen and __intel_sse2_strdup.

The machine I am running on has 32 logical cores. If I set OMP_NUM_THREADS to 1 my test completes just fine inside Houdini! Calling mkl_domain_set_num_threads(1, MKL_FFT) before the MKL calls also works, although omp_set_num_threads(1) doesn't. This suggests that SideFX are doing something funky that is causing these threads to conflict?

Evgueni_P_Intel · ‎02-14-2014

We cannot blame SideFX yet :)

MKL chooses the FFT algorithm depending on the value of OMP_NUM_THREADS.

You mentioned that your Linux is "heavily customised" -- what are the customisations? E.g. MKL needs some 128K of stack in each thread...

Please run the application with KMP_AFFINITY=verbose, KMP_SETTINGS=1, the default OMP_NUM_THREADS, and share the output.

Overall, I join Zhang -- we need a standalone reproducer or (acces to) a machine where we can reproduce the issue.

Harry_B_ · ‎02-15-2014

Hi Evgueni & Zhang,

I've attached the output from OMP_NUM_THREADS set to 12 (the number of cores available on the particular machine I was using) and the other environment variables as you requested. The "Fail" output is from my own test code.

I'm setting up a clean test environment for you guys to get access to; fresh install of linux, gcc, mkl etc. I've also got some more from SideFX, they've tracked it down to their use of Pixar's OpenSubdiv library and they speculate it's something in the initialisation that that library does. Good news, as OpenSubdiv is an open source library! I'm trying to see if I can reproduce the problem with just OpenSubdiv and MKL.