- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I posted this question on StackOverflow. Thought I'd repost it here.
I have a C++ function written using OpenMP. The function involves an outer loop which is parallelized statically with some private variables. But I believe the precise nature of the function might not be important. When I compile it into a dynamic library using this:
g++ -fopenmp -shared -fPIC -O3 -march=native testing.cpp -o test.so
everything works well downstream. When I compile it into a dynamic library using the equivalent command in icc:
icc -qopenmp -shared -fPIC -O3 -march=native testing.cpp -o test.so, the function executes in about the same time but gives the wrong result. Any ideas?
To reproduce: https://github.com/marsupialtail/icc-problem, run compile.sh, with either icc or gcc, can see that the results don't agree...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting this strange behavior, there are some details we wanted to know
- Is this behavior persists only with dynamic libraries or also with the normal executables?
- Please send us a minimal reproducer giving the above-mentioned problem to get more details, so that we can raise it as a bug.
- Also, give us details of the processor and compiler version you used to test the reproducer so that we can reproduce it on our end.
Please give us the above details.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While I haven't attempted to build and run your program, perhaps you can explain:
..., const float * __restrict__ BC, ...
...
#pragma omp parallel for schedule(static) private(ACC,RC,val,zero)
for(int C_block = 0; C_block < 28; C_block ++){
int C_offset = C_block * (12544 / 28);
...
for(int lane =0; lane < Tsz; lane += 4){
RC = _mm256_load_ps(&BC[0 + C_offset + lane]);
...
your incoming arrays are float.
you are using mm256 instructions, which have a SIMD vector width of 8 floats
while your lane process loop is advancing 4 lanes?
Is this a programming error (iow, you took SSE code and "converted" it to AVX code)?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
In the code, the increment of 4 is for Arm, as defined by preprocessor directives. For x86 I am using an increment of 8. The code is correct when NOT executed as a dynamic shared library.
The reproducible example is included in the github link.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>In the code, the increment of 4 is for Arm, as defined by preprocessor directives. For x86 I am using an increment of 8.
That is not what your code showed in the clip I excised from. You are explicitly using += 4 (a literal) combined with _mm256... intrinsic. It would appear that you may have a copy/paste error.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please give us an update on your issue and also give us a minimal reproducer.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please give us an update on your issue. It seems Jim's observation is correct. So please give us relevant information to dig into this issue.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please give us an update on your issue.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have not heard back from you, we won't be monitoring this thread. If you need further assistance, please post a new thread.
Thank you
Abhishek

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page