Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Performing a two element broadcast/load in AVX

Bharat_N_
Beginner
752 Views

I have the following problem:

Say, at location 'A', I have: c1 d1 c3 d3, which are all doubles (64-bit). I want to fill two registers, a00 and a01 with:

a00-> |d1|c1|d1|c1| ; a01-> |d3|c3|d3|c3|

i.e I want to broadcast the first two elements to register a00 and the next two elements to register a01.

Currently, I'm doing it as follows:

[cpp]

a_t = _mm256_load_pd(A); i.e a00[255:128] = |d3|c3| and a00[127:0] = |d1|c1|

a00 = _mm256_permute2f128_pd(a_t,a_t,0); i.e. a00[127:0] = a_t[127:0] and a00[255:128] = a_t[127:0]

a01 = _mm256_permute2f128_pd(a_t,a_t,51);i.e a01[127:0] = a_t[255:128] and a01[255:128] = a_t[255:128]

[/cpp]

However, this takes one load and two permute instructions. Is it possible to do this in 2 instructions?

Thanks,

Bharat.

0 Kudos
6 Replies
SergeyKostrov
Valued Contributor II
752 Views
I've looked at Intel manuals and I think this is not possible even with Broadcast instructions, like VBROADCASTF128/I128 - Broadcast 128-Bit Data ( page 5-177 in Instruction Set Reference ), because you're changing order (!) from: c1 d1 and c3 d3 to: d1 c1 d1 c1 and d3 c3 d3 c3
0 Kudos
Bharat_N_
Beginner
752 Views

Hi Sergey,

I am not changing the order.

To be clearer: A[0]=c1,A[1]=d1,A[2]=c3,A[3]=d3. When I say that register a00 should contain: |d1|c1|d1|c1|, 'c1' is at a00[63:0] which would have been the same if I had done, say :

[cpp]

a00 = _mm256_load_pd(A); a00->|d3|c3|d1|c1|

[/cpp]

0 Kudos
SergeyKostrov
Valued Contributor II
752 Views
Hi Bharat, I'll take a look at both cases with _mm256_broadcast_pd you've described tomorrow.
0 Kudos
SergeyKostrov
Valued Contributor II
752 Views
>>...I think the _mm256_broadcast_pd() should work since the order remains the same... Yes and take a look at a screenshot. >>...but don't I need to load it into a __m128 register first?.. No. Use cast instead. >>...Can I directly use the address and pass it as: _mm256_broadcast_pd((__m128 const* ) &addr);?? I tried this but >>the result doesn't come out correctly... Here is a screensjot. Take a look at values in b00 / b01 and c00 / c01 variables and do they look good for you? avxbroadcastvalues.jpg
0 Kudos
Bharat_N_
Beginner
752 Views

Hi Sergey,

Yes, this is what I was referring to. Thanks! I'll try this and get back to you.

0 Kudos
Bharat_N_
Beginner
752 Views

Yeah it works properly..thanks!

0 Kudos
Reply