Software Archive
Read-only legacy content
17061 Discussions

Catastrophic error while using _mm512_extload_epi32

Enzo_R_
Beginner
1,296 Views

Dear experts,

I'm having some troubles while using the _mm512_extload_epi32 instrinsic. I want to load 16 signed char elements and convert them to int32 vector. The instruction is:

___m512i v = m512_extload_epi32(buffer, _MM_UPCONV_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE ); //buffer is aligned to 16-bytes

When I compiled it, icc said "catastrophic error: Invalid upconversion argument to intrinsic."

icc version 14.0.2 (gcc version 4.4.7 compatibility). MPSS version 3.1.4. 

Can someone tell me where is the mistake?

Thanks,

Enzo

 

 

 

0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
1,297 Views

Are the macros used defined? (iow you are missing a header or mistyped a parameter, or document incorrectly stating parameter)

Insert before the offending line:

#if defined(_DEBUG)
_MM_UPCONV_EPI32_ENUM conv_ = _MM_UPCONV_EPI32_SINT8;
_MM_BROADCAST32_ENUM bc_ = _MM_BROADCAST32_NONE;
int hint_ = _MM_HINT_NONE;
#endif
 

Jim Dempsey

0 Kudos
Enzo_R_
Beginner
1,297 Views

Yes, the macros are defined. I can print its values.

Here is Intel documentation for _mm512_extload_epi32 => https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5DA89227-25A6-49F5-975A-747822B2B6CE.htm

It seems for me that I'm not correctly using this function but I can't realize where is the mistake.

0 Kudos
Kevin_D_Intel
Employee
1,296 Views

I cannot re-create the error so it would be helpful to see other source code/usage details in a complete reproducer. Can you please post a complete reproducer?

0 Kudos
Enzo_R_
Beginner
1,296 Views

Hi Kevin,

Here is a complete reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <immintrin.h>
#include <omp.h>

#define SIZE 4096
#define NUM_THREADS 256
#define VECTOR_LENGTH 16

/************** MAIN *************/
int main(int argc, char *argv[]) {

	int i, iters, * a, *c;
	char * b;
		

	a = (int *) _mm_malloc(SIZE*sizeof(int), 64);
	b = (char *) _mm_malloc(SIZE*sizeof(char), 64);
	c = (int *) _mm_malloc(SIZE*sizeof(int), 64);

	for (i=0; i< SIZE ; i++) a = 0;
	for (i=0; i< SIZE ; i++) b = 'A';

	iters = SIZE / NUM_THREADS;
	
	#pragma offload target(mic) in(a: length(SIZE)) in(b:length(SIZE) align(16)) out(c: length(SIZE))
	#pragma omp parallel shared(iters) num_threads(NUM_THREADS) 
	{

		__declspec(align(64)) __m512i aux1, aux2, aux3;
// do C = A + B
		#pragma omp for schedule(dynamic)
		for (i=0; i < iters; i++) {
			aux1 = _mm512_extload_epi32(b+i*VECTOR_LENGTH, _MM_UPCONV_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE );
			aux2 = _mm512_load_epi32(a+i*VECTOR_LENGTH);
			aux3 = _mm512_add_epi32(aux1,aux2);
			_mm512_store_epi32(c+i*VECTOR_LENGTH,aux3);
		}
	}

	_mm_free(a); _mm_free(b); _mm_free(c);

	return 0;
}

 

0 Kudos
Kevin_D_Intel
Employee
1,298 Views

Thank you. The error results from the Xeon Phi™ intrinsic code being compiled for the host as part of the offload compilation for your offloaded code section. I believe you must do something like:

#ifdef __MIC__
<existing Xeon Phi™ intrinsic code here>
#else
<host equivalent code here>
#endif

However, I notice the code compiles without -openmp so perhaps as a work around try the structure above. Meanwhile, I will discuss this with the Developers.

0 Kudos
Kevin_D_Intel
Employee
1,297 Views

I submitted this to Development (see internal tracking id below) for further investigation. I believe the compiler should issue the same error both with and without -openmp.

(Internal tracking id: DPD200256211)

0 Kudos
Enzo_R_
Beginner
1,298 Views

Hi Kevin,

I've tried your proposal but the compiler keeps saying the same. I'll wait for the fix. Thank you very much,

Enzo

0 Kudos
Kevin_D_Intel
Employee
1,298 Views

There was an underlying defect associated with the compilation error that occurs when compiling with -openmp.

That defect is now fixed in the latest Intel® Parallel Studio XE 2015 Update 1 release (2015.0.133 - Linux) and the code as written in your example now compiles with -openmp.

You do have to be careful about the code as written. The error that occurred is actually correct and is issued when the Xeon Phi™ intrinsics (not valid for the host) are seen by the host compilation. In the current form, when offload is mandatory (the default), the Xeon  Phi™ intrinsics are not included within the host compilation; however, if the offload is made optional (say by adding an if clause to the offload pragma), those intrinsics participate in the host compilation and will once again trigger the earlier compilation error. With offload optional, you must ensure the intrinsics are only active for the target compilation. One method for doing so is the __MIC__ predefine.

For my earlier suggested use of __MIC__, here is the placement inside the OMP loop that I used that worked:

		#pragma omp for schedule(dynamic)
                for (i=0; i < iters; i++) {
#ifdef __MIC__
                printf("MIC\n");
                        aux1 = _mm512_extload_epi32(b+i*VECTOR_LENGTH, _MM_UPCON
V_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE );
                        aux2 = _mm512_load_epi32(a+i*VECTOR_LENGTH);
                        aux3 = _mm512_add_epi32(aux1,aux2);
                        _mm512_store_epi32(c+i*VECTOR_LENGTH,aux3);
#else
                printf("Place the HOST equivalent code here\n");
//                        c = a + b;
#endif
                }

 

0 Kudos
Reply