Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
199 Views

_m512 to float*

Hi!,

Using the gcc 9.1 compiler on linux, I am trying to develop a gradient descent function using avx512, and I think I got the algorithm right. However, the issue I am facing is, I would like to return the optimal solution for both theta0 and theta1. From the intrinsics documentation _m512 handles a vector of 16. I tried defining theta[2] = {0,0}, but when this got loaded with _m512_loadu_ps and I used gdb to look at the loaded data only the first two entries had the actual data everything else was filled with garbage. Which in turn affects the final computation of the results. The following is the code for the gradient descent:

static inline float* avx512GradientDescent(float *_x, float *_y, float _alpha, size_t num_iter){
float* thetas = (float *)aligned_alloc(ALIGNE , col*sizeof(float));
trans(_x, xtrans);
 __m512 nsamples = _mm512_set1_ps(2*col);// broadcast to all 16 values
 __m512 samples = _mm512_set1_ps(col);// broadcast to all 16 values
 __m512 theta = _mm512_setzero_ps();
 //assert(col % 16 == 0);
 for(uint64_t i = 0; i < col; i += ALIGNE){
 __m512 hypothesis = _mm512_setzero_ps();
 __m512 loss = _mm512_setzero_ps();
 __m512 J = _mm512_setzero_ps();
 __m512 gradient = _mm512_setzero_ps();
 __m512 alpha = _mm512_set1_ps(_alpha);// broadcast to all 16 values
 __m512 xtemp = _mm512_loadu_ps(&(_x[i]));
 __m512 ytemp = _mm512_loadu_ps(&(_y[i]));
 __m512 xtranspose = _mm512_loadu_ps(&(xtrans[i]));
 for(uint64_t iter = 0; iter < num_iter; iter +=16)
 {
 hypothesis = _mm512_mul_ps(xtemp, theta);
 loss = _mm512_sub_ps(hypothesis, ytemp);
 J = _mm512_div_ps(_mm512_fmadd_ps(loss,loss, J), nsamples);
 gradient = _mm512_div_ps(_mm512_mul_ps(xtranspose, loss), samples);
 theta = _mm512_sub_ps(theta, _mm512_mul_ps(alpha, gradient));
 }
 _mm512_storeu_ps(thetas, theta);
}

 

0 Kudos
4 Replies
Highlighted
Moderator
165 Views

Hi,

 

Can you try with Intel compiler and see if the issue persists on icpc as well?

 

Command:

icpc -xCORE-AVX512 filename.cpp

 

Regards,

Rahul

 

0 Kudos
Highlighted
156 Views

When you paste code, please use the paste button on the tool bar (it looks like </>), the pulldown for Markup, select the source code format (C++).

Note (gripe to Intel), when the pasted code, last line, does not contain a line terminator, clicking the OK returns to the main reply page with the newly pasted code selected. Thus when you attempt to continue with your reply, it deletes the selected text (code). To get around this is after inserting code, and OK, click on the right arrow, then Enter.

mti,

a problem I see with your code is it is not using the results generated. Compilers now tend to optimize this out.

>> I tried defining theta[2] = {0,0} // struct of two int's

__mm512 theta[2];
theta[0] = _mm512_setzero_ps();
theta[1] = _mm512_setzero_ps();

Jim Dempsey

0 Kudos
Highlighted
Moderator
132 Views

@mti ,

 

Could you let me know if the above suggestions worked for you?

 

Thanks,

Rahul

 

0 Kudos
Highlighted
Moderator
105 Views

Hi,


I have not heard back from you, so I will go ahead and close this thread from my end. However, please note that this thread will remain open for community discussion. Feel free to post a new question if you still face any issues.


--Rahul


0 Kudos