Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Performing a two element broadcast/load in AVX

Bharat_N_
Beginner
2,005 Views

I have the following problem:

Say, at location 'A', I have: c1 d1 c3 d3, which are all doubles (64-bit). I want to fill two registers, a00 and a01 with:

a00-> |d1|c1|d1|c1| ; a01-> |d3|c3|d3|c3|

i.e I want to broadcast the first two elements to register a00 and the next two elements to register a01.

Currently, I'm doing it as follows:

[cpp]

a_t = _mm256_load_pd(A); i.e a00[255:128] = |d3|c3| and a00[127:0] = |d1|c1|

a00 = _mm256_permute2f128_pd(a_t,a_t,0); i.e. a00[127:0] = a_t[127:0] and a00[255:128] = a_t[127:0]

a01 = _mm256_permute2f128_pd(a_t,a_t,51);i.e a01[127:0] = a_t[255:128] and a01[255:128] = a_t[255:128]

[/cpp]

However, this takes one load and two permute instructions. Is it possible to do this in 2 instructions?

Thanks,

Bharat.

0 Kudos
6 Replies
SergeyKostrov
Valued Contributor II
2,005 Views
I've looked at Intel manuals and I think this is not possible even with Broadcast instructions, like VBROADCASTF128/I128 - Broadcast 128-Bit Data ( page 5-177 in Instruction Set Reference ), because you're changing order (!) from: c1 d1 and c3 d3 to: d1 c1 d1 c1 and d3 c3 d3 c3
0 Kudos
Bharat_N_
Beginner
2,005 Views

Hi Sergey,

I am not changing the order.

To be clearer: A[0]=c1,A[1]=d1,A[2]=c3,A[3]=d3. When I say that register a00 should contain: |d1|c1|d1|c1|, 'c1' is at a00[63:0] which would have been the same if I had done, say :

[cpp]

a00 = _mm256_load_pd(A); a00->|d3|c3|d1|c1|

[/cpp]

0 Kudos
SergeyKostrov
Valued Contributor II
2,005 Views
Hi Bharat, I'll take a look at both cases with _mm256_broadcast_pd you've described tomorrow.
0 Kudos
SergeyKostrov
Valued Contributor II
2,005 Views
>>...I think the _mm256_broadcast_pd() should work since the order remains the same... Yes and take a look at a screenshot. >>...but don't I need to load it into a __m128 register first?.. No. Use cast instead. >>...Can I directly use the address and pass it as: _mm256_broadcast_pd((__m128 const* ) &addr);?? I tried this but >>the result doesn't come out correctly... Here is a screensjot. Take a look at values in b00 / b01 and c00 / c01 variables and do they look good for you? avxbroadcastvalues.jpg
0 Kudos
Bharat_N_
Beginner
2,005 Views

Hi Sergey,

Yes, this is what I was referring to. Thanks! I'll try this and get back to you.

0 Kudos
Bharat_N_
Beginner
2,005 Views

Yeah it works properly..thanks!

0 Kudos
Reply