Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
146 Views

_mm_lddqu_si128 and _mm_loadu_si128

Hi,
I would like to ask how much improvement I can get by replacing _mm_loadu_si128 by _mm_lddqu_si128 on a 64-bit machine. I wrote a simple program and tried to see the difference between these two load instructions but I could not see any improvement at all. According to my understanding, _mm_lddqu_si128 takes care of unaligned data loading better than _mm_loadu_si128. The following in my test code. Any comments or advice are appreciated!
----------------------------------
time1 = get_time();
srand(time(0));
for(i=0; i<999999; i++)
{
k = rand();
t1 = _mm_loadu_si128((__m128i*)(array+k)); // array is NOT 16-byte aligned
//t1 = _mm_lddqu_si128((__m128i*)(array+k));
}
time2 = get_time();
printf("Total Time = %8.4lfms\\n", (time2-time1)*1000);
-----------------------------------
Thanks,
Ivan
0 Kudos
1 Reply
Highlighted
New Contributor II
146 Views

there used to be a difference in pentium 4, in modern processors there is no difference
0 Kudos