- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What are the performance benefits of using vload4 instead of loading data one by one if the buffers are not aligned on a float4 boundary? Onthe other hand, if the buffers are aligned on a float4 boundary, will there be a performance penalty in using vload4 instead of using *float4Ptr?
Thanks in advance
Thanks in advance
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to the spec the behavior is undefined if the data you are trying to load using vloadn is not correctly aligned (vloadn functions take two arguments - a start address and an offset, so start+offset*n should be aligned).
For the second part of your question,if your buffers are aligned (and for float4 the requirement is that it is aligned appropriately) there should be no difference is performance.
Thanks,
Raghu
For the second part of your question,if your buffers are aligned (and for float4 the requirement is that it is aligned appropriately) there should be no difference is performance.
Thanks,
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As per the spec, the start address of vloadn of float data type must be 4 byte aligned and not required to be 16 bytes aligned. Please correct me, if I am wrong. I would like to know the performance benefit of using vloadn in such a scenario when the buffer address is aligned on a float boundary and not float4 boundary.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry I misread your original post.
Yes vloadn requires the data (address+offset*n) to be aligned to sizeof(gentype). If the data is already aligned to 16bytes I don't think there is any performance difference in either approach. If the data is only aligned to float boundary you have to use vload4 since float4 data types require 16byte alignment.
Thanks,
Raghu

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page