- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am testing some simple cases on vectorization, such as the below one (some non-essential rows are removed):
#define SIZE 134217728
#define ITER 128
void linear_func(const unsigned int arr[], unsigned int results[], const size_t arr_len) {
unsigned int a = rand() % 1024;
unsigned int b = rand() % 1024;
for (int i = 0; i < arr_len; ++i) {
results[i] = a * arr[i] + b;
}
}
int main() {
unsigned int* arr = malloc(SIZE * sizeof(unsigned int));
unsigned int* results = malloc(SIZE * sizeof(unsigned int));
double* elapsed_times = malloc(ITER * sizeof(double));
for (int i = 0; i < SIZE; ++i) {
arr[i] = rand() % SIZE;
}
for (int j = 0; j < ITER; ++j) {
unsigned long long start_time = get_timestamp_now();
linear_func(arr, results, SIZE);
elapsed_times[j] = get_timestamp_now() - start_time;
printf("%.0lfms(%u), ", elapsed_times[j], results[rand() % SIZE]);
// we pick and print one element from results, so that even the smartest compiler cant optimize my loop away.
}
printf("\n");
printf("Average: %lums\n", avg_et / ITER);
return 0;
}
If I compile it with
it works, it takes around 60ms to finish one iteration. As a comparison, if I add -no-vec flag to the command, each iteration takes around 100ms. So far so good.
However, if I separate the loop function to separate files, `func.c` and `func.h`and compile them with:
it still compiles but each iteration take 100ms to complete, which is basically the same as the non-vectorized version.
What's wrong here?
If complete code is needed, can find them here:
1st approach, function and main() in the same file: https://github.com/alex-lt-kong/the-nitty-gritty/tree/main/c/15_vectorization/1_hello-world
2nd approach, function and main() in different files: https://github.com/alex-lt-kong/the-nitty-gritty/tree/main/c/15_vectorization/2_vectorization-in-o-files
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
Could you please provide us with the below details to investigate more on your issue?
1. The OS version
2. The Base toolkit version
Thanks & Regards
Priyanshu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the working case, the compiler can see the defined macro SIZE (and see the call is passing size), and thus might choose to vectorize the loop.
I suggest inserting
#pragma vector always
or
#pragma simd
in front of the liner_func for loop.
Also, consider performing aligned allocation of your arrays (_aligned_malloc/_aligned_free).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jimdempseyatthecove ,
I actually tested what you said and a few other variants, none seems to be helpful.
After some investigation, the issue is not on vectorization--I checked the disassemble code, both versions are vectorized in an almost identical manner. I believe this post on SE (https://stackoverflow.com/questions/18159455/why-vectorizing-the-loop-does-not-have-performance-improvement) explains what I observed--in my code above, I load too much data into memory and use them only once, causing bottleneck at memory I/O. I tried to reduce the size of the array so that it can be fitted into CPU caches, then re-run, both linked version and standalone version show great improvement after turning on vectorization.
Thanks,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We assume that your issue is resolved. Could you please confirm whether we can close this issue from our end?
Thanks & Regards,
Priyanshu Kumar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes the issue has been resolved. Please close the ticket.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the confirmation.
As your issue is resolved we will no longer respond to this thread.
If you need any additional information, please post a new question.
Thanks,
Priyanshu

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page