Software Archive
Read-only legacy content
17061 Diskussionen

Free memory after offload region - memleak?

Oliver_P_
Einsteiger
986Aufrufe

Hi,

I am trying to offload some of my code, which works without any errors. The problem I'm facing is, that while the code below runs without any errors, the memory usage of the program increases over time (when NUM2 is big enough it rises up to 32GB+). Since I think I am freeing it properly (runs fine without increased memory usage when I run it on the host without offloading), I can not further explain what is causing this memory leak.

Code:

for(int x = 0; x < 10000; ++x)
{     
    ....
    double*   a;
    double* pvP;
    double* psP;
    double* pfP;
    double* ovP;
    double* osP;
    double* ofP;

    int ok;
    ok = posix_memalign((void**)&a, 64, NUM2*sizeof(double));
    ok = posix_memalign((void**)&pvP, 64, NUM2*sizeof(double));
    ok = posix_memalign((void**)&psP, 64, NUM2*sizeof(double));
    ok = posix_memalign((void**)&pfP, 64, NUM2*sizeof(double));
    ok = posix_memalign((void**)&ovP, 64, NUM2*sizeof(double));
    ok = posix_memalign((void**)&osP, 64, NUM2*sizeof(double));
    ok = posix_memalign((void**)&ofP, 64, NUM2*sizeof(double));

    .....

    #pragma offload target(mic:0) in(mP, dtf) in(pfP, psP, pvP, a:length(NUM2)) inout(osP, ofP, ovP:length(NUM2))
    {
       ....
        #pragma omp parallel for default(shared) 
        for(int i = 0; i < 187500; ++i)
        {
            #pragma vector aligned
            #pragma ivdep
            for(int j = 0; j < 8; ++j)
            {
                osP[i*8+j] = psP[i*8+j] + ovP[i*8+j] * dt;
            }
        }
​        ....
    } 
    .....
    free(  a);
    free(pvP);
    free(psP);
    free(pfP);
    free(ovP);
    free(osP);
    free(ofP); 
    ....
}

I know this is far from optimum but I am just trying to get it running somehow and then will work from there. I guess my main question is, if it is sufficient to just call free on these pointers after the offload region or if I need to take any special measures to free all the memory previously allocated (e.g. the inout leaves the previously allocated memory region unreachable, thus the memory leak - if that's the case, how can I fix this?).

Thanks,

Oliver

0 Kudos
5 Antworten
Rajiv_D_Intel
Mitarbeiter
986Aufrufe

It's possible that there's a memory leak in the offload processing.  The effect is likely being amplified because you have a loop running for 10000 iterations with 7 variables alloced/freed inside the loop.

In any case, allocating and freeing buffers on the device for each iteration is inefficient. You would be better off allocating MIC buffers outside any loop. For example:

// Allocate a buffer for a

#pragma offload_transfer target(mic) nocopy(a:length(l) : alloc_if(1) free_if(0))

for (...)

{

    // Do the offloads

    #pragma offload target(mic) in(a:length(l) : alloc)if(0) free_if(0))

    {

    }

}

 

// Free the buffer for a

#pragma offload_transfer target(mic) nocopy(a:length(l) : alloc_if(0) free_if(1))

 

Oliver_P_
Einsteiger
986Aufrufe

Thanks for your answer!

I just added the outer loop to clarify that this part of the code gets called quite often, in reality it is in a method which gets called frequently.

As far as I understand your answer, you understood that I ran out of memory on the MIC? If so, this is not the case - I maybe should have written that better - I am running out of memory on the host, it seems like after the offload ran, the data copied back to the host leads to a non reachable memory section on the host (read: I can't free the memory allocated before the offload), which - as you've pointed out - accumulates over time due to the loop on the outside. 

Unfortunately I couldn't find an example where the memory allocated before an offload statement is actually freed before the program terminates..

However, I understand that allocating/freeing buffers on the MIC itself like you described is the way to go, I just don't think that would solve the problem I am experiencing right now since the memory on the host would still be unreachable (if I'm correct) and thus increase the memory usage.

Rajiv_D_Intel
Mitarbeiter
986Aufrufe

I understand that you observe a memory leak on the CPU. We will look into that.

My suggestion had a two-fold goal: 1) to improve offload performance, and 2) to reduce the offload-related activity on the CPU with the hope that some of the extra activity might have been involved in the CPU memory leak.

Oliver_P_
Einsteiger
986Aufrufe

I agree that this is probably the best chance to find the root cause of my problem. I am going to implement everything as you described and will report back with the outcome.

Thanks a lot!

Oliver_P_
Einsteiger
986Aufrufe

Just in case somebody stumbles upon this and wonders what the solution was: The solution really was allocating everything I need at the beginning, running the loop with alloc_if(0) free_if(0) and then freeing the memory before the program terminates. I still don't know why exactly this solved my problem, but it did.

Antworten