Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

mti

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-06-2020
10:06 AM

199 Views

_m512 to float*

Hi!,

Using the gcc 9.1 compiler on linux, I am trying to develop a gradient descent function using avx512, and I think I got the algorithm right. However, the issue I am facing is, I would like to return the optimal solution for both theta0 and theta1. From the intrinsics documentation _m512 handles a vector of 16. I tried defining theta[2] = {0,0}, but when this got loaded with _m512_loadu_ps and I used gdb to look at the loaded data only the first two entries had the actual data everything else was filled with garbage. Which in turn affects the final computation of the results. The following is the code for the gradient descent:

static inline float* avx512GradientDescent(float *_x, float *_y, float _alpha, size_t num_iter){

float* thetas = (float *)aligned_alloc(ALIGNE , col*sizeof(float));

trans(_x, xtrans);

__m512 nsamples = _mm512_set1_ps(2*col);// broadcast to all 16 values

__m512 samples = _mm512_set1_ps(col);// broadcast to all 16 values

__m512 theta = _mm512_setzero_ps();

//assert(col % 16 == 0);

for(uint64_t i = 0; i < col; i += ALIGNE){

__m512 hypothesis = _mm512_setzero_ps();

__m512 loss = _mm512_setzero_ps();

__m512 J = _mm512_setzero_ps();

__m512 gradient = _mm512_setzero_ps();

__m512 alpha = _mm512_set1_ps(_alpha);// broadcast to all 16 values

__m512 xtemp = _mm512_loadu_ps(&(_x[i]));

__m512 ytemp = _mm512_loadu_ps(&(_y[i]));

__m512 xtranspose = _mm512_loadu_ps(&(xtrans[i]));

for(uint64_t iter = 0; iter < num_iter; iter +=16)

{

hypothesis = _mm512_mul_ps(xtemp, theta);

loss = _mm512_sub_ps(hypothesis, ytemp);

J = _mm512_div_ps(_mm512_fmadd_ps(loss,loss, J), nsamples);

gradient = _mm512_div_ps(_mm512_mul_ps(xtranspose, loss), samples);

theta = _mm512_sub_ps(theta, _mm512_mul_ps(alpha, gradient));

}

_mm512_storeu_ps(thetas, theta);

}

4 Replies

Highlighted
##

RahulV_intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-16-2020
04:54 AM

165 Views

Hi,

Can you try with Intel compiler and see if the issue persists on icpc as well?

Command:

`icpc -xCORE-AVX512 filename.cpp`

Regards,

Rahul

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-16-2020
07:07 AM

156 Views

When you paste code, please use the paste button on the tool bar (it looks like </>), the pulldown for Markup, select the source code format (C++).

Note (gripe to Intel), when the pasted code, last line, does not contain a line terminator, clicking the OK returns to the main reply page with the newly pasted code selected. Thus when you attempt to continue with your reply, it deletes the selected text (code). To get around this is after inserting code, and OK, click on the right arrow, then Enter.

mti,

a problem I see with your code is it is not using the results generated. Compilers now tend to optimize this out.

>> I tried defining theta[2] = {0,0} // struct of two int's

```
__mm512 theta[2];
theta[0] = _mm512_setzero_ps();
theta[1] = _mm512_setzero_ps();
```

Jim Dempsey

Highlighted
##

RahulV_intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-21-2020
03:16 AM

132 Views

Highlighted
##

RahulV_intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-04-2020
10:36 PM

105 Views

Hi,

I have not heard back from you, so I will go ahead and close this thread from my end. However, please note that this thread will remain open for community discussion. Feel free to post a new question if you still face any issues.

--Rahul

For more complete information about compiler optimizations, see our Optimization Notice.