Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

_m512 to float*

mti
Beginner
1,067 Views

Hi!,

Using the gcc 9.1 compiler on linux, I am trying to develop a gradient descent function using avx512, and I think I got the algorithm right. However, the issue I am facing is, I would like to return the optimal solution for both theta0 and theta1. From the intrinsics documentation _m512 handles a vector of 16. I tried defining theta[2] = {0,0}, but when this got loaded with _m512_loadu_ps and I used gdb to look at the loaded data only the first two entries had the actual data everything else was filled with garbage. Which in turn affects the final computation of the results. The following is the code for the gradient descent:

static inline float* avx512GradientDescent(float *_x, float *_y, float _alpha, size_t num_iter){
float* thetas = (float *)aligned_alloc(ALIGNE , col*sizeof(float));
trans(_x, xtrans);
 __m512 nsamples = _mm512_set1_ps(2*col);// broadcast to all 16 values
 __m512 samples = _mm512_set1_ps(col);// broadcast to all 16 values
 __m512 theta = _mm512_setzero_ps();
 //assert(col % 16 == 0);
 for(uint64_t i = 0; i < col; i += ALIGNE){
 __m512 hypothesis = _mm512_setzero_ps();
 __m512 loss = _mm512_setzero_ps();
 __m512 J = _mm512_setzero_ps();
 __m512 gradient = _mm512_setzero_ps();
 __m512 alpha = _mm512_set1_ps(_alpha);// broadcast to all 16 values
 __m512 xtemp = _mm512_loadu_ps(&(_x[i]));
 __m512 ytemp = _mm512_loadu_ps(&(_y[i]));
 __m512 xtranspose = _mm512_loadu_ps(&(xtrans[i]));
 for(uint64_t iter = 0; iter < num_iter; iter +=16)
 {
 hypothesis = _mm512_mul_ps(xtemp, theta);
 loss = _mm512_sub_ps(hypothesis, ytemp);
 J = _mm512_div_ps(_mm512_fmadd_ps(loss,loss, J), nsamples);
 gradient = _mm512_div_ps(_mm512_mul_ps(xtranspose, loss), samples);
 theta = _mm512_sub_ps(theta, _mm512_mul_ps(alpha, gradient));
 }
 _mm512_storeu_ps(thetas, theta);
}

 

0 Kudos
4 Replies
RahulV_intel
Moderator
1,033 Views

Hi,

 

Can you try with Intel compiler and see if the issue persists on icpc as well?

 

Command:

icpc -xCORE-AVX512 filename.cpp

 

Regards,

Rahul

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,024 Views

When you paste code, please use the paste button on the tool bar (it looks like </>), the pulldown for Markup, select the source code format (C++).

Note (gripe to Intel), when the pasted code, last line, does not contain a line terminator, clicking the OK returns to the main reply page with the newly pasted code selected. Thus when you attempt to continue with your reply, it deletes the selected text (code). To get around this is after inserting code, and OK, click on the right arrow, then Enter.

mti,

a problem I see with your code is it is not using the results generated. Compilers now tend to optimize this out.

>> I tried defining theta[2] = {0,0} // struct of two int's

__mm512 theta[2];
theta[0] = _mm512_setzero_ps();
theta[1] = _mm512_setzero_ps();

Jim Dempsey

0 Kudos
RahulV_intel
Moderator
1,000 Views

@mti ,

 

Could you let me know if the above suggestions worked for you?

 

Thanks,

Rahul

 

0 Kudos
RahulV_intel
Moderator
973 Views

Hi,


I have not heard back from you, so I will go ahead and close this thread from my end. However, please note that this thread will remain open for community discussion. Feel free to post a new question if you still face any issues.


--Rahul


0 Kudos
Reply