Solved: Why won't this loop vectorize?

kfsone · ‎03-29-2010

Doesn't vectorize even if I add #pragma ivdep.

_callbackArray is a simple C/C++ array, nothing fancy.

[cpp]void teulFreeAllCallbacks()
{
    for ( UINT32 x = 0 ; x < 256 ; ++x )  // Line 434
    {
        for ( UINT32 y = 0 ; y < 256 ; ++y ) // Line 436
        {
            REAL_TEUL_CALLBACK* realObject = _callbackArray ;
            if ( realObject != NULL)
            {
                _callbackArray = NULL ;
                memFree(realObject);
            }
        }
    }
}
[/cpp]

osmith@ubuntu:~/pn/build$ icc --version

icc (ICC) 11.1 20100203

/opt/intel/Compiler/11.1/069/bin/ia32/icpc -DHAVE_CONFIG_H=1 -DplatOS=OS_LINUX -DAPP_ID=12 -DAS_CLIENT -DKIT_SERVER -m32 -Wall -Wshadow -Wno-deprecat

ed -pthread -Werror -w1 -xSSE -restrict -std=c++0x -vec-guard-write -openmp -opt-class-analysis -fno-math-errno -fp-model strict -Wformat-security -Wex

tra-tokens -Wcheck -Wno-pragma-once -Wuninitialized -Wunused-function -Wwrite-strings -Woverloaded-virtual -scalar-rep -Wreturn-type -Wreorder -opt-cal

loc -freg-struct-return -global-hoist -ip -parallel -vec-report2 -par-report2 -diag-enable warn -O2 -unroll-aggressive -include hostproj/syscfg.h -g -I

/home/osmith/pn/host -I/home/osmith/pn/host/hostproj -I/home/osmith/pn/build -I/home/osmith/pn/host/include -I/home/osmith/pn/host/mem -I/home/osmith/p

n/host/str -I/home/osmith/pn/host/sys -I/usr/include/mysql -I/usr/local/include/libirc -I/usr/local/include/raknet -I/usr/include/lua5.1 -o CMakeFile

s/libteulserver.dir/teul/teulCallback.cpp.o -c /home/osmith/pn/host/teul/teulCallback.cpp

...

procedure: teulFreeAllCallbacks

/home/osmith/pn/host/teul/teulCallback.cpp(434): (col. 2) remark: loop was not parallelized: existence of parallel dependence.

/home/osmith/pn/host/teul/teulCallback.cpp(436): (col. 3) remark: loop was not parallelized: existence of parallel dependence.

vectorizor · ‎03-30-2010

There is a difference between vectorization and parallelization. The compiler says it cannot parallelize the loop, whereas you want to vectorize it.

In any case, there is a function call in the inner loop, you cant vectorize that. And its just a call to free resources, I dont think there is much performance gain to be had. Its not going to consume many CPU cycles.

A

View solution in original post

kfsone · ‎03-29-2010

Additionally: I tried adding #pragma omp parallel for schedule(static, 8) to the loop construct, and compiling with -par-report3, and apparently ICC still tries to auto-parallelize the loop.

[bash]void teulFreeAllCallbacks()
{
    #pragma omp parallel for schedule(static, 8)
    for ( UINT32 x = 0 ; x < 256 ; ++x ) // Line 435
    {
        for ( UINT32 y = 0 ; y < 256 ; ++y )
        {
            REAL_TEUL_CALLBACK*& objPtr = _callbackArray ;
            if ( objPtr != NULL)
            {
                free(objPtr);  // memFree is #defined as free when we are not debugging memory
                objPtr = NULL ;
            }
        }
    }
}
[/bash]

procedure: teulFreeAllCallbacks

procedure: teulFreeAllCallbacks

/home/osmith/pn/host/teul/teulCallback.cpp(435): (col. 2) remark: loop was not parallelized: existence of parallel dependence.

/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.

/home/osmith/pn/host/teul/teulCallback.cpp(440): (col. 4) remark: parallel dependence: assumed ANTI dependence between objPtr line 440 and objPtr line 443.

/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.

/home/osmith/pn/host/teul/teulCallback.cpp(440): (col. 4) remark: parallel dependence: assumed ANTI dependence between objPtr line 440 and objPtr line 443.

/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.

/home/osmith/pn/host/teul/teulCallback.cpp(440): (col. 4) remark: parallel dependence: assumed ANTI dependence between objPtr line 440 and objPtr line 443.

/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.

I tried adding #pragma ivdep and #pragma parallel to no avail...

vectorizor · ‎03-30-2010

There is a difference between vectorization and parallelization. The compiler says it cannot parallelize the loop, whereas you want to vectorize it.

In any case, there is a function call in the inner loop, you cant vectorize that. And its just a call to free resources, I dont think there is much performance gain to be had. Its not going to consume many CPU cycles.

A