- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Is the following code sample for adding vectors correct? Can I make it even faster using vectorized operations?
void vectorAdd(float*a, float*b, float* r,int size)
{
#pragma offload target(mic) in(a:length(size)) in(b:length(size)) inout(r:length(size))
#pragmaopenmp parallel for shared(a,b,r) private(i)
for(inti=0; i<size; ++i)
{
r = a+b;
}
}
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You probably need float * restrict a, float * restrict b, float *restrict r (or one of the ivdep pragmas) to get auto-vectorization. Alignment would help if you make all the OpenMP chunks a multiple of 32.
A single offloaded vector operation like this would spend a majority of the time on data transfer.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page