- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear experts,
I'm having some troubles while using the _mm512_extload_epi32 instrinsic. I want to load 16 signed char elements and convert them to int32 vector. The instruction is:
___m512i v = m512_extload_epi32(buffer, _MM_UPCONV_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE ); //buffer is aligned to 16-bytes
When I compiled it, icc said "catastrophic error: Invalid upconversion argument to intrinsic."
icc version 14.0.2 (gcc version 4.4.7 compatibility). MPSS version 3.1.4.
Can someone tell me where is the mistake?
Thanks,
Enzo
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are the macros used defined? (iow you are missing a header or mistyped a parameter, or document incorrectly stating parameter)
Insert before the offending line:
#if defined(_DEBUG) _MM_UPCONV_EPI32_ENUM conv_ = _MM_UPCONV_EPI32_SINT8; _MM_BROADCAST32_ENUM bc_ = _MM_BROADCAST32_NONE; int hint_ = _MM_HINT_NONE; #endif
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the macros are defined. I can print its values.
Here is Intel documentation for _mm512_extload_epi32 => https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5DA89227-25A6-49F5-975A-747822B2B6CE.htm
It seems for me that I'm not correctly using this function but I can't realize where is the mistake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cannot re-create the error so it would be helpful to see other source code/usage details in a complete reproducer. Can you please post a complete reproducer?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
Here is a complete reproducer:
#include <stdio.h> #include <stdlib.h> #include <malloc.h> #include <immintrin.h> #include <omp.h> #define SIZE 4096 #define NUM_THREADS 256 #define VECTOR_LENGTH 16 /************** MAIN *************/ int main(int argc, char *argv[]) { int i, iters, * a, *c; char * b; a = (int *) _mm_malloc(SIZE*sizeof(int), 64); b = (char *) _mm_malloc(SIZE*sizeof(char), 64); c = (int *) _mm_malloc(SIZE*sizeof(int), 64); for (i=0; i< SIZE ; i++) a = 0; for (i=0; i< SIZE ; i++) b = 'A'; iters = SIZE / NUM_THREADS; #pragma offload target(mic) in(a: length(SIZE)) in(b:length(SIZE) align(16)) out(c: length(SIZE)) #pragma omp parallel shared(iters) num_threads(NUM_THREADS) { __declspec(align(64)) __m512i aux1, aux2, aux3; // do C = A + B #pragma omp for schedule(dynamic) for (i=0; i < iters; i++) { aux1 = _mm512_extload_epi32(b+i*VECTOR_LENGTH, _MM_UPCONV_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE ); aux2 = _mm512_load_epi32(a+i*VECTOR_LENGTH); aux3 = _mm512_add_epi32(aux1,aux2); _mm512_store_epi32(c+i*VECTOR_LENGTH,aux3); } } _mm_free(a); _mm_free(b); _mm_free(c); return 0; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. The error results from the Xeon Phi™ intrinsic code being compiled for the host as part of the offload compilation for your offloaded code section. I believe you must do something like:
#ifdef __MIC__ <existing Xeon Phi™ intrinsic code here> #else <host equivalent code here> #endif
However, I notice the code compiles without -openmp so perhaps as a work around try the structure above. Meanwhile, I will discuss this with the Developers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I submitted this to Development (see internal tracking id below) for further investigation. I believe the compiler should issue the same error both with and without -openmp.
(Internal tracking id: DPD200256211)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
I've tried your proposal but the compiler keeps saying the same. I'll wait for the fix. Thank you very much,
Enzo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There was an underlying defect associated with the compilation error that occurs when compiling with -openmp.
That defect is now fixed in the latest Intel® Parallel Studio XE 2015 Update 1 release (2015.0.133 - Linux) and the code as written in your example now compiles with -openmp.
You do have to be careful about the code as written. The error that occurred is actually correct and is issued when the Xeon Phi™ intrinsics (not valid for the host) are seen by the host compilation. In the current form, when offload is mandatory (the default), the Xeon Phi™ intrinsics are not included within the host compilation; however, if the offload is made optional (say by adding an if clause to the offload pragma), those intrinsics participate in the host compilation and will once again trigger the earlier compilation error. With offload optional, you must ensure the intrinsics are only active for the target compilation. One method for doing so is the __MIC__ predefine.
For my earlier suggested use of __MIC__, here is the placement inside the OMP loop that I used that worked:
#pragma omp for schedule(dynamic) for (i=0; i < iters; i++) { #ifdef __MIC__ printf("MIC\n"); aux1 = _mm512_extload_epi32(b+i*VECTOR_LENGTH, _MM_UPCON V_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE ); aux2 = _mm512_load_epi32(a+i*VECTOR_LENGTH); aux3 = _mm512_add_epi32(aux1,aux2); _mm512_store_epi32(c+i*VECTOR_LENGTH,aux3); #else printf("Place the HOST equivalent code here\n"); // c = a + b; #endif }

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page