Optimization reports in Intel

Adam_G_2 · ‎09-05-2016

When trying to build our open source project;
https://github.com/deeplearning4j/libnd4j

We are seeing a compilation error similar to:
https://gist.github.com/treo/d0b7610f9072f18449b600a0d585dad4

The full error report is here:

https://github.com/deeplearning4j/libnd4j/issues/280

This is with the new knight's landing beta.

Thanks!

Judith_W_Intel · ‎09-05-2016

I tried using the zip file at the github site and am seeing these (look like valid) compilation errors. Is there something wrong with the zip file or the include/op_boilerplate.h?

I see similar errors with g++ (after I change -qopenmp to -fopenmp).

cd /home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blasbuild/cpu/blas && /home/nsl/jward4/d/workspaces/cfe/dev/build_objs/efi2linux_debug/bin/icpc   -D__CPUBLAS__=true -Dnd4j_EXPORTS -I/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/include -I/usr/local/include -Wall -O3 -Wl,-rpath,RIGIN/ -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=2 -fopt-info-vec -fopt-info-vec-missed -qopenmp -Wall -O3 -std=c++11 -fassociative-math -funsafe-math-optimizations -fPIC   -o CMakeFiles/nd4j.dir/cpu/NativeOps.cpp.o -c /home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blas/cpu/NativeOps.cpp
icpc: command line warning #10006: ignoring unknown option '-ftree-vectorizer-verbose=2'
icpc: command line warning #10006: ignoring unknown option '-fopt-info-vec'
icpc: command line warning #10006: ignoring unknown option '-fopt-info-vec-missed'
icpc: command line warning #10006: ignoring unknown option '-fassociative-math'
icpc: command line warning #10006: ignoring unknown option '-funsafe-math-optimizations'
icpc: warning #10193: -vec is default; use -x and -ax to configure vectorization
In file included from /home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/include/broadcasting.h(17),
                 from /home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blas/cpu/../NativeOpExcutioner.h(8),
                 from /home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blas/cpu/NativeOps.cpp(6):
/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/include/op_boilerplate.h(471): error: the #endif for this directive is missing
#ifdef __clang__
   ^

/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/include/indexreduce.h(372): error: argument list for class template "simdOps::IndexMax" is missing
                      RETURNING_DISPATCH_BY_OPNUM(execScalar, PARAMS(x, xShapeInfo, extraParams), INDEX_REDUCE_OPS);
                                                                                                  ^
          detected during instantiation of "T NativeOpExcutioner<T>::execIndexReduceScalar(int, T *, int *, T *) [with T=double]" at line 34 of "/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blas/cpu/NativeOps.cpp"

/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/include/indexreduce.h(372): error: identifier "PARAMS" is undefined
                      RETURNING_DISPATCH_BY_OPNUM(execScalar, PARAMS(x, xShapeInfo, extraParams), INDEX_REDUCE_OPS);
                                                              ^
          detected during:
            instantiation of "T functions::indexreduce::IndexReduce<T>::execScalar(int, T *, int *, T *) [with T=double]" at line 37 of "/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blas/cpu/../NativeOpExcutioner.h"
            instantiation of "T NativeOpExcutioner<T>::execIndexReduceScalar(int, T *, int *, T *) [with T=double]" at line 34 of "/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-master/blas/cpu/NativeOps.cpp"'

...

Judy

Adam_G_2 · ‎09-05-2016

Did you make sure to check out the right branch?

Download the zip from here:

https://github.com/deeplearning4j/libnd4j/tree/icc_compilation

Judith_W_Intel · ‎09-05-2016

Ok thanks I can reproduce it now, I'm seeing an internal error in our optimizer code, i.e.:

...

/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-icc_compilation/include/summarystatsreduce.h(613): warning #1011: missing return statement at end of non-void function "functions::summarystats::SummaryStatsReduce<T>::execScalar(int, bool, T *, int *, T *) [with T=float]"
        }
        ^
          detected during:
            instantiation of "T functions::summarystats::SummaryStatsReduce<T>::execScalar(int, bool, T *, int *, T *) [with T=float]" at line 401 of "/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-icc_compilation/blas/cpu/../NativeOpExcutioner.h"
            instantiation of "T NativeOpExcutioner<T>::execSummaryStatsScalar(int, T *, int *, T *, bool) [with T=float]" at line 1346 of "/home/nsl/jward4/d/BUGS/FORUM6/libnd4j-icc_compilation/blas/cpu/NativeOps.cpp"

": internal error: #20000_3471: epair && enode (shared/hpo/hpo_vector_avr.c, line 3471)

I'll try to reduce it to a small example and see if I can come up with a workaround and also file a bug report. Stay tuned.

Judy

Judith_W_Intel · ‎09-05-2016

This is a small reproducer, the compiler crashes if you compile this with icpc -c -O3 -fopenmp bug.cpp:

struct IndexValue {
int value;
unsigned int index;
};

IndexValue update(IndexValue o, IndexValue old)
{
   if (o.value > old.value)
      return o;
   return old;
}

int foo() {
IndexValue curr;
#pragma omp simd
for (int i = 0; i < 3; i++) {
curr = update(curr, curr);
}
return curr.index;
}

I have submitted an internal bug defect (DPD200414099) for this problem.

As far as workarounds possibilities are:

(1) Disable high level optimization when compiling this file (i.e. compile with -O1 or lower)

(2) The bug seems to be triggered by the pragma omd simd's on line 446
and line 646 of indexreduce.h, i.e. in particular the assignment which
uses the variable startingIndex (and indexValue) both on the left hand and right hand side of the
last assignment statement:

#pragma omp simd
                                                        for (Nd4jIndex i = 0;
i < length; i++) {
                                                                IndexValue<T>
curr;
                                                                curr.value =
x;
                                                                curr.index =
i;
                                                                startingIndex
= OpType::update(startingIndex, curr,

So another workaround is to disable the two omp simds in this header file.

Thanks for reporting this and sorry for the inconvenience.

Judy

Serge_P_ · ‎09-11-2016

While compiler should never crash, the code causing the crash is also incorrect. The code contains cross-iteration dependency (reduction-like update of "startingIndex") which is not enlisted as reduction. So removing #pragma omp simd is not just workaround, it will make the code correct.

SergeyKostrov · ‎09-13-2016

>>icpc: command line warning #10006: ignoring unknown option '-ftree-vectorizer-verbose=2' >>icpc: command line warning #10006: ignoring unknown option '-fopt-info-vec' >>icpc: command line warning #10006: ignoring unknown option '-fopt-info-vec-missed' >>icpc: command line warning #10006: ignoring unknown option '-fassociative-math' >>icpc: command line warning #10006: ignoring unknown option '-funsafe-math-optimizations' Intel C++ compiler does Not support GCC compiler command line options.

Serge_P_ · ‎09-14-2016

Optimization reports in Intel Compiler are controlled using -qopt-report set of options: -qopt-report=<level> where highest level is 5, -qopt-report-phase=<phases_list>, for vectorization report use 'vec', reports are emitted to files <obj_name>.optrpt, you may use -qopt-report-file=stderr to redirect output to terminal window.

The latter 2 options are controlled via -fp-model switch in Intel Compiler and first one is enabled by default, while latter is too compiler-specific: Intel Compiler and gcc implement different set of optimizations. So it is hard to tell how -fp-model=fast and -fp-model=fast2 map to -funsafe-math-optimization.

TimP · ‎09-14-2016

associative-math optimizations are included in -fp-model fast[=1], as are many of the icc unsafe-math optimizations.

fast=2 adds complex-limited-range and the domain shortcuts for divide and sqrt, which I don't think have counterparts in gcc. Those options are available separately,

As Judy mentioned, specification of omp without declaring reductions (or firstprivate where needed) is likely to produce wrong results without warning, although it shouldn't crash the compiler.

Bug with ICC?