- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Do you need to call _m_load_pd, or is it kosher to cast an array to __m128d* and then call _mm_add_pd (etc) on elements of that array? I've tested this out, and it seems that casting approach works and is an order of magnitude faster than calling _mm_load_pd.
Some example code below:
double *r = (double*)_aligned_malloc(sizeof(double)*SIZE, SIMD_WORD_SIZE_BYTES);
double *d1 = (double*)_aligned_malloc(sizeof(double)*SIZE, SIMD_WORD_SIZE_BYTES);
double *d2 = (double*)_aligned_malloc(sizeof(double)*SIZE, SIMD_WORD_SIZE_BYTES);
for (int x=0;x r = 0;
d1 = x;
d2 = x/2;
}
__m128d *md1 = (__m128d*)d1;
__m128d *md2 = (__m128d*)d2;
for (int x=0;x __m128d mr = _mm_add_pd(*(md1+x), *(md2+x));
_mm_store_pd(r+(2*x),mr);
}
_aligned_free;
_aligned_free(d1);
_aligned_free(d2);
Some example code below:
double *r = (double*)_aligned_malloc(sizeof(double)*SIZE, SIMD_WORD_SIZE_BYTES);
double *d1 = (double*)_aligned_malloc(sizeof(double)*SIZE, SIMD_WORD_SIZE_BYTES);
double *d2 = (double*)_aligned_malloc(sizeof(double)*SIZE, SIMD_WORD_SIZE_BYTES);
for (int x=0;x
d1
d2
}
__m128d *md1 = (__m128d*)d1;
__m128d *md2 = (__m128d*)d2;
for (int x=0;x
_mm_store_pd(r+(2*x),mr);
}
_aligned_free;
_aligned_free(d1);
_aligned_free(d2);
コピーされたリンク
2 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
There should be no difference, the compile is smart enough to replace = with movaps/dqa, butonly works if the pointers arealigned, obviously. If you are unsure check the generated assembly code.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Intel C is sometimes more flexible about this than other compilers. If you have something which works with VC and gcc, you probably have it covered.
As to performance of the alternatives, it seemed to make less difference in my tests on Core i7 in 64-bit mode than on some older platforms.
As to performance of the alternatives, it seemed to make less difference in my tests on Core i7 in 64-bit mode than on some older platforms.
