Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Why won't this loop vectorize?

kfsone
New Contributor I
400 Views
Doesn't vectorize even if I add #pragma ivdep.
_callbackArray is a simple C/C++ array, nothing fancy.
[cpp]void teulFreeAllCallbacks()
{
    for ( UINT32 x = 0 ; x < 256 ; ++x )  // Line 434
    {
        for ( UINT32 y = 0 ; y < 256 ; ++y ) // Line 436
        {
            REAL_TEUL_CALLBACK* realObject = _callbackArray ;
            if ( realObject != NULL)
            {
                _callbackArray = NULL ;
                memFree(realObject);
            }
        }
    }
}
[/cpp]
osmith@ubuntu:~/pn/build$ icc --version
icc (ICC) 11.1 20100203
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
osmith@ubuntu:~/pn/build$ icc --versionicc (ICC) 11.1 20100203Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
/opt/intel/Compiler/11.1/069/bin/ia32/icpc -DHAVE_CONFIG_H=1 -DplatOS=OS_LINUX -DAPP_ID=12 -DAS_CLIENT -DKIT_SERVER -m32 -Wall -Wshadow -Wno-deprecat
ed -pthread -Werror -w1 -xSSE -restrict -std=c++0x -vec-guard-write -openmp -opt-class-analysis -fno-math-errno -fp-model strict -Wformat-security -Wex
tra-tokens -Wcheck -Wno-pragma-once -Wuninitialized -Wunused-function -Wwrite-strings -Woverloaded-virtual -scalar-rep -Wreturn-type -Wreorder -opt-cal
loc -freg-struct-return -global-hoist -ip -parallel -vec-report2 -par-report2 -diag-enable warn -O2 -unroll-aggressive -include hostproj/syscfg.h -g -I
/home/osmith/pn/host -I/home/osmith/pn/host/hostproj -I/home/osmith/pn/build -I/home/osmith/pn/host/include -I/home/osmith/pn/host/mem -I/home/osmith/p
n/host/str -I/home/osmith/pn/host/sys -I/usr/include/mysql -I/usr/local/include/libirc -I/usr/local/include/raknet -I/usr/include/lua5.1 -o CMakeFile
s/libteulserver.dir/teul/teulCallback.cpp.o -c /home/osmith/pn/host/teul/teulCallback.cpp
...
procedure: teulFreeAllCallbacks
procedure: teulFreeAllCallbacks
/home/osmith/pn/host/teul/teulCallback.cpp(434): (col. 2) remark: loop was not parallelized: existence of parallel dependence.
/home/osmith/pn/host/teul/teulCallback.cpp(436): (col. 3) remark: loop was not parallelized: existence of parallel dependence.
0 Kudos
1 Solution
vectorizor
New Contributor I
400 Views
There is a difference between vectorization and parallelization. The compiler says it cannot parallelize the loop, whereas you want to vectorize it.

In any case, there is a function call in the inner loop, you cant vectorize that. And its just a call to free resources, I dont think there is much performance gain to be had. Its not going to consume many CPU cycles.

A

View solution in original post

0 Kudos
2 Replies
kfsone
New Contributor I
400 Views
Additionally: I tried adding #pragma omp parallel for schedule(static, 8) to the loop construct, and compiling with -par-report3, and apparently ICC still tries to auto-parallelize the loop.
[bash]void teulFreeAllCallbacks()
{
    #pragma omp parallel for schedule(static, 8)
    for ( UINT32 x = 0 ; x < 256 ; ++x ) // Line 435
    {
        for ( UINT32 y = 0 ; y < 256 ; ++y )
        {
            REAL_TEUL_CALLBACK*& objPtr = _callbackArray ;
            if ( objPtr != NULL)
            {
                free(objPtr);  // memFree is #defined as free when we are not debugging memory
                objPtr = NULL ;
            }
        }
    }
}
[/bash]
procedure: teulFreeAllCallbacks
procedure: teulFreeAllCallbacks
/home/osmith/pn/host/teul/teulCallback.cpp(435): (col. 2) remark: loop was not parallelized: existence of parallel dependence.
/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.
/home/osmith/pn/host/teul/teulCallback.cpp(440): (col. 4) remark: parallel dependence: assumed ANTI dependence between objPtr line 440 and objPtr line 443.
/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.
/home/osmith/pn/host/teul/teulCallback.cpp(440): (col. 4) remark: parallel dependence: assumed ANTI dependence between objPtr line 440 and objPtr line 443.
/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.
/home/osmith/pn/host/teul/teulCallback.cpp(440): (col. 4) remark: parallel dependence: assumed ANTI dependence between objPtr line 440 and objPtr line 443.
/home/osmith/pn/host/teul/teulCallback.cpp(443): (col. 5) remark: parallel dependence: assumed FLOW dependence between objPtr line 443 and objPtr line 440.
I tried adding #pragma ivdep and #pragma parallel to no avail...
0 Kudos
vectorizor
New Contributor I
401 Views
There is a difference between vectorization and parallelization. The compiler says it cannot parallelize the loop, whereas you want to vectorize it.

In any case, there is a function call in the inner loop, you cant vectorize that. And its just a call to free resources, I dont think there is much performance gain to be had. Its not going to consume many CPU cycles.

A
0 Kudos
Reply