Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

How to deactivate stack reusing

Alexander_M_1
Beginner
556 Views

Hello,

I have code, which looks more or less like this:

template<typename B>
struct A
{
    B &b;
    A(B &b) : b(b) {}
};

struct B {…};

void f(B b)
{
    A(b) a;
    int i = 0; 
    /* more code using "a", but NOT "b". */
}

int main()
{
    B b{…};
    #pragma omp parallel
    {
        #pragma omp for
        for (int i = 0; i < 256; ++i)
            f(b);
    }
    return 0;
}

So I have a function "f", which is called multiple times in an OpenMP loop. This function is passed a class "B" by value. Inside this function, "b" is encapsulated in another class "A". After this encapsulation "b" is not used directly anymore, but over "a". However I noticed that although I make use of "a" a lot later in code, the (stack?) memory of the parameter "b" is reused. But as the data member "b" in "A" is a reference, this invalidates also my "a" variable.

Unfortunately this small code snippet does not reproduce the failing behaviour and my failing code is too big and too complex to show the behaviour. However as I noticed that the stack is reused I am curious whether I am able to disable this feature.

My code is running fine with -O1, but starts to fail for -O2. I tested the Intel C++ compiler version 17.0.0 and 17.0.2.

Thanks in advance and best regards,

Alexander Matthes

0 Kudos
9 Replies
Viet_H_Intel
Moderator
556 Views

 

Hi Alexander,

Does -fno-defer-pop help?

Regards,

Viet Hoang

0 Kudos
jimdempseyatthecove
Honored Contributor III
556 Views

At issue (IMHO) is the compiler optimization is inlining the function f(b) and in the process removing the copy of value b.

Try using

#pragma noinline

in front of f(b)

or

__attribute__ ((noinline))

on function declaration on Linux

or

__declspec(noinline)

on function declaration on Windows

Jim Dempsey

0 Kudos
Alexander_M_1
Beginner
556 Views

Hello Viet and Jim,

Viet Hoang (Intel) wrote:
Does -fno-defer-pop help?

jimdempseyatthecove wrote:
Try using

#pragma noinline

in front of f(b)

or

__attribute__ ((noinline))

on function declaration on Linux

 

I tried both, the -fno-defer-pop option and marking the function and the function call to not being inlined but it did not help. I even checked the ipo report to make sure that my function is not inlined. In my productive code the function is an operator() of a struct and furthermore encapsulated with std::bind, so it is not obvious when which stack is (re)used.

One working solution is to not define "b" as reference in "A". However this may result in a poor performance for some cases.

Regards,

Alexander Matthes

0 Kudos
jimdempseyatthecove
Honored Contributor III
556 Views

Because you want f(b) to instantiate a copy of b...

    B b{…};
    #pragma omp parallel
    {
        #pragma omp for
        for (int i = 0; i < 256; ++i)
            f(b);
    }

The above code would therefor want b to be private for each thread's call of f(b)

Modify your code

#pragma omp parallel firstprivate(b)

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
556 Views

Note, for optimal purposes you would want to have the copy of b instantiated once for each thread (private copy for each thread), then reference of the private copy for each call of f(b) as opposed to having a copy performed on each call of f(b). IOW reduce the number of copy operators.

Jim Dempsey

0 Kudos
Alexander_M_1
Beginner
556 Views

jimdempseyatthecove wrote:

Because you want f(b) to instantiate a copy of b...

    B b{…};
    #pragma omp parallel
    {
        #pragma omp for
        for (int i = 0; i < 256; ++i)
            f(b);
    }

The above code would therefor want b to be private for each thread's call of f(b)

Modify your code

#pragma omp parallel firstprivate(b)

Jim Dempsey

Okay, this creates an initialized copy of "b" for every OpenMP thread. If I have four OpenMP threads, than still 64 function calls of "f(b)" would get passed the same "b". Furthermore I am only reading "b". So it doesn't matter for me, whether the optimizer saves some stack space with directly using the global "b" instead of really doing a copy by value (except for cache line fighting, which I ignore for now).

0 Kudos
jimdempseyatthecove
Honored Contributor III
556 Views

The private copy was made under the presumption that b, or some portion of b, was used as temporary storage. If (when) b is read-only, then you would not want to make copies. IOW

void f(B& b)

should suffice.

IOW you have expressed a desire to use the same copy of b for all threads and all calls of f(b) thus why not reuse the same reference rather than create additional copies of the reference.

Though code elsewhere might require the copy (not stated in your problem statement).

Jim Dempsey

 

0 Kudos
Alexander_M_1
Beginner
556 Views

jimdempseyatthecove wrote:

The private copy was made under the presumption that b, or some portion of b, was used as temporary storage. If (when) b is read-only, then you would not want to make copies. IOW

void f(B& b)

should suffice.

Well, if different OpenMP threads are running on different NUMA nodes it may make sense to copy them to close memory. But anyway, "b" is only 16 bytes big and the function has a quite long run time, so it doesn't matter at all.

jimdempseyatthecove wrote:

IOW you have expressed a desire to use the same copy of b for all threads and all calls of f(b) thus why not reuse the same reference rather than create additional copies of the reference.

Though code elsewhere might require the copy (not stated in your problem statement).

Jim Dempsey

My desire is to turn off the reuse of variable spaces on the stack, which the compilers thinks are not used anymore. I have the strong assumption that something in the check whether a variable space is not used anymore in a function has a wrong positive resulting in reusing "b" although it is still used in "a". Marking the function parameter "b" as reference would indeed, probably help, but the surrounding framework does not allow this.

Regards,

Alexander Matthes

0 Kudos
jimdempseyatthecove
Honored Contributor III
556 Views

>>But anyway, "b" is only 16 bytes big and the function has a quite long run time, so it doesn't matter at all.

b will then likely be located within L1 or L2 (or LLC) of the core (or CPU) so NUMA placement is moot.

>>My desire is to turn off the reuse of variable spaces on the stack, which the compilers thinks are not used anymore.

int main()
{
    B b{…};
    #pragma omp parallel
    {
        #pragma omp for
        for (int i = 0; i < 256; ++i)
            f(b);
    }
    return 0;
}

"b", the b as first statement in main(), persists for the duration of main. Should your code be written such that f(b) passes a reference (pointer) to main's b, then there should be no issue of stack space reuse causing problems.

Also note, as originally described, should f(b) make a copy of main's b (as intended) this copy is stack-local to the thread issuing the f(b) and should have a lifetime for the duration of the function f (by that thread). On the other hand, should the reference be used (or inlineing) then the original copy in main be used by (copy of) reference.

Now then, this said, if your observations are that for some invocations that the b as used by the reference (copy) contained in A(b) a, references "junk" (IOW non-B in nature), then this is likely a compiler error in which the stack offset is likely incorrect.

Check for typographical error by using:

void f(B _b)
{
    A(_b) a;

(i.e.) too many reuse of same named token can lead to programming problems.

Jim Dempsey
 

 

 

0 Kudos
Reply