- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I would like to ask how much improvement I can get by replacing _mm_loadu_si128 by _mm_lddqu_si128 on a 64-bit machine. I wrote a simple program and tried to see the difference between these two load instructions but I could not see any improvement at all. According to my understanding, _mm_lddqu_si128 takes care of unaligned data loading better than _mm_loadu_si128. The following in my test code. Any comments or advice are appreciated!
----------------------------------
time1 = get_time();
srand(time(0));
for(i=0; i<999999; i++)
{
k = rand();
t1 = _mm_loadu_si128((__m128i*)(array+k)); // array is NOT 16-byte aligned
//t1 = _mm_lddqu_si128((__m128i*)(array+k));
}
time2 = get_time();
printf("Total Time = %8.4lfms\\n", (time2-time1)*1000);
-----------------------------------Thanks,
Ivan
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
there used to be a difference in pentium 4, in modern processors there is no difference

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page