Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7942 Discussions

vectorization not improving performance if linked from an object file

alexkong
Novice
1,083 Views

I am testing some simple cases on vectorization, such as the below one (some non-essential rows are removed):

 

 

 

#define SIZE 134217728
#define ITER 128

void linear_func(const unsigned int arr[], unsigned int results[], const size_t arr_len) {
  unsigned int a = rand() % 1024;
  unsigned int b = rand() % 1024;
  for (int i = 0; i < arr_len; ++i) {
    results[i] = a * arr[i] + b;
  }
}

int main() {
  unsigned int* arr = malloc(SIZE * sizeof(unsigned int));
  unsigned int* results = malloc(SIZE * sizeof(unsigned int));
  double* elapsed_times = malloc(ITER * sizeof(double));
  for (int i = 0; i < SIZE; ++i) {
    arr[i] = rand() % SIZE;
  }
  
  for (int j = 0; j < ITER; ++j) {
    unsigned long long start_time = get_timestamp_now();
    linear_func(arr, results, SIZE);
    elapsed_times[j] = get_timestamp_now() - start_time;
    printf("%.0lfms(%u), ", elapsed_times[j], results[rand() % SIZE]);
    // we pick and print one element from results, so that even the smartest compiler cant optimize my loop away.
  }
  printf("\n");
  printf("Average: %lums\n", avg_et / ITER);
  return 0;
}

 

If I compile it with

icc main.c -o main.out -g -O3 -lm

it works, it takes around 60ms to finish one iteration. As a comparison, if I add -no-vec flag to the command, each iteration takes around 100ms. So far so good.

 

However, if I separate the loop function to separate files, `func.c` and `func.h`and compile them with:

icc -c func.c -O3
 icc main.c func.o -o main.out -g -O3 -lm

it still compiles but each iteration take 100ms to complete, which is basically the same as the non-vectorized version.

 

What's wrong here?

If complete code is needed, can find them here:

1st approach, function and main() in the same file: https://github.com/alex-lt-kong/the-nitty-gritty/tree/main/c/15_vectorization/1_hello-world

2nd approach, function and main() in different files: https://github.com/alex-lt-kong/the-nitty-gritty/tree/main/c/15_vectorization/2_vectorization-in-o-files

 

Labels (1)
0 Kudos
6 Replies
PriyanshuK_Intel
Moderator
1,037 Views

Hi,

 

Thank you for posting in Intel Communities.

 

Could you please provide us with the below details to investigate more on your issue?

 1. The OS version

 2. The Base toolkit version


Thanks & Regards

Priyanshu


0 Kudos
jimdempseyatthecove
Honored Contributor III
1,030 Views

In the working case, the compiler can see the defined macro SIZE (and see the call is passing size), and thus might choose to vectorize the loop.

 

I suggest inserting

     #pragma vector always

or

    #pragma simd

in front of the liner_func for loop.

 

Also, consider performing aligned allocation of your arrays (_aligned_malloc/_aligned_free).

 

Jim Dempsey

0 Kudos
alexkong
Novice
1,014 Views

Hi @jimdempseyatthecove ,

 

I actually tested what you said and a few other variants, none seems to be helpful.

 

After some investigation, the issue is not on vectorization--I checked the disassemble code, both versions are vectorized in an almost identical manner. I believe this post on SE (https://stackoverflow.com/questions/18159455/why-vectorizing-the-loop-does-not-have-performance-improvement) explains what I observed--in my code above, I load too much data into memory and use them only once, causing bottleneck at memory I/O. I tried to reduce the size of the array so that it can be fitted into CPU caches, then re-run, both linked version and standalone version show great improvement after turning on vectorization.

 

Thanks,

Alex

0 Kudos
PriyanshuK_Intel
Moderator
985 Views

Hi, 


We assume that your issue is resolved. Could you please confirm whether we can close this issue from our end?


Thanks & Regards,

Priyanshu Kumar



0 Kudos
alexkong
Novice
976 Views

Hi,

Yes the issue has been resolved. Please close the ticket.

Thanks.

0 Kudos
PriyanshuK_Intel
Moderator
953 Views

Hi,


Thanks for the confirmation.

As your issue is resolved we will no longer respond to this thread.

If you need any additional information, please post a new question.


Thanks,

Priyanshu



0 Kudos
Reply