Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

Data alignment problem

Nick_L_1
Beginner
787 Views

Hi there

I was trying to offload some computation to MIC using "pragma", sending data addressed by a pointer p, then how to ensure the alignment of data on MIC after MIC recieved it? Does" __assume(p, 64)" work?I was trying to use instrinsics to load data to the vector RF, which requires the alignment of data.

Another problem, that I was trying to active lots of threads for the calculation using "#pragma omp parallel for", and some arrays inside the loop must be thread private while also 64-byte aligned.

I was using "_mm_malloc()" inside the loop to ensure these, but this leads to reduplicated and unnecessary allocation.

Thanks.

 

0 Kudos
4 Replies
Frances_R_Intel
Employee
787 Views

Could you possibly post a small sample code? Thanks.

0 Kudos
Nick_L_1
Beginner
787 Views

Frances Roth (Intel) wrote:

Could you possibly post a small sample code? Thanks.

In the main function:

.......         
double * p;
p = (double * )malloc(sizeof(double)*1024);
#pragma offload target(mic:0) in(p:length(128)
          foo(p);
.......
 

The data addressed by p is transfered into MIC And the function foo is defined like this:

__attribute__((target(mic)))void foo( double * p)
{
#ifdef __MIC__
......
long long iter;
#pragma omp parallel for private(iter)
        for(iter = 0 ; iter < N ; iter ++)
        {
                __m512d _A, _B;
                double * p1;
                p1 = (double * )_mm_malloc(sizeof(double)*1024, 512);      //p1 has to be thread-private
                ......
                _A = _mm512_load_pd((void*)p);                                                //p has to be aligned
                _B = _mm512_load_pd((void*)p1);                                             //p1 has to be aligned
                ......
                /* Calculations */
                ......
                _mm_free(p1);
        }
#endif
}
 

Thus p1 is allocated repeatedly inside the loop to make sure it's thread-private, while p1 has to be aligned.

 

0 Kudos
James_C_Intel2
Employee
787 Views

At the very least you should structure that more like this (which allocates once per thread, rather than once per iteration)

#pragma omp parallel
{
    long long iter;     // Though does it *really* need to be 64 bits!? How many iterations do you have?
                        // 64bit indexes are likely inefficient.
    double * p1 = (double *) _mm_malloc (sizeof(double)*1024, 512);

#pragma omp for
    for (iter=0; iter<N; iter++)
    {
        _mm_512d _A;
       ... etc ...
    }

    _mm_free (p1);
}

 

0 Kudos
Nick_L_1
Beginner
787 Views

James Cownie (Intel) wrote:

At the very least you should structure that more like this (which allocates once per thread, rather than once per iteration)

#pragma omp parallel
{
    long long iter;     // Though does it *really* need to be 64 bits!? How many iterations do you have?
                        // 64bit indexes are likely inefficient.
    double * p1 = (double *) _mm_malloc (sizeof(double)*1024, 512);

#pragma omp for
    for (iter=0; iter<N; iter++)
    {
        _mm_512d _A;
       ... etc ...
    }

    _mm_free (p1);
}

 

 

I really have that many iterations. Reconstructing the code helps ,thanks~

0 Kudos
Reply