Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
공지
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
1135 토론

Performing a two element broadcast/load in AVX

Bharat_N_
초급자
2,000 조회수

I have the following problem:

Say, at location 'A', I have: c1 d1 c3 d3, which are all doubles (64-bit). I want to fill two registers, a00 and a01 with:

a00-> |d1|c1|d1|c1| ; a01-> |d3|c3|d3|c3|

i.e I want to broadcast the first two elements to register a00 and the next two elements to register a01.

Currently, I'm doing it as follows:

[cpp]

a_t = _mm256_load_pd(A); i.e a00[255:128] = |d3|c3| and a00[127:0] = |d1|c1|

a00 = _mm256_permute2f128_pd(a_t,a_t,0); i.e. a00[127:0] = a_t[127:0] and a00[255:128] = a_t[127:0]

a01 = _mm256_permute2f128_pd(a_t,a_t,51);i.e a01[127:0] = a_t[255:128] and a01[255:128] = a_t[255:128]

[/cpp]

However, this takes one load and two permute instructions. Is it possible to do this in 2 instructions?

Thanks,

Bharat.

0 포인트
6 응답
SergeyKostrov
소중한 기여자 II
2,000 조회수
I've looked at Intel manuals and I think this is not possible even with Broadcast instructions, like VBROADCASTF128/I128 - Broadcast 128-Bit Data ( page 5-177 in Instruction Set Reference ), because you're changing order (!) from: c1 d1 and c3 d3 to: d1 c1 d1 c1 and d3 c3 d3 c3
0 포인트
Bharat_N_
초급자
2,000 조회수

Hi Sergey,

I am not changing the order.

To be clearer: A[0]=c1,A[1]=d1,A[2]=c3,A[3]=d3. When I say that register a00 should contain: |d1|c1|d1|c1|, 'c1' is at a00[63:0] which would have been the same if I had done, say :

[cpp]

a00 = _mm256_load_pd(A); a00->|d3|c3|d1|c1|

[/cpp]

0 포인트
SergeyKostrov
소중한 기여자 II
2,000 조회수
Hi Bharat, I'll take a look at both cases with _mm256_broadcast_pd you've described tomorrow.
0 포인트
SergeyKostrov
소중한 기여자 II
2,000 조회수
>>...I think the _mm256_broadcast_pd() should work since the order remains the same... Yes and take a look at a screenshot. >>...but don't I need to load it into a __m128 register first?.. No. Use cast instead. >>...Can I directly use the address and pass it as: _mm256_broadcast_pd((__m128 const* ) &addr);?? I tried this but >>the result doesn't come out correctly... Here is a screensjot. Take a look at values in b00 / b01 and c00 / c01 variables and do they look good for you? avxbroadcastvalues.jpg
0 포인트
Bharat_N_
초급자
2,000 조회수

Hi Sergey,

Yes, this is what I was referring to. Thanks! I'll try this and get back to you.

0 포인트
Bharat_N_
초급자
2,000 조회수

Yeah it works properly..thanks!

0 포인트
응답