Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7876 Discussions

How to avoid _intel_fast_memcpy code generation?

kirill-prazdnikov
998 Views

I`m using Intel C++ Version 9.1 Build 20061103Z. I`m writing a dirver code and can not link any libs. The simple line of code generate call to __intel_fast_memcpy.

void copyFloat(float const *src, float *dst, int n) {
for (int i=0; i!=n; ++i)
dst = src;
}
Is there way to avoid generation of call to __intel_fast_memcpy ?

					
				
			
			
				
			
			
			
			
			
			
			
		
0 Kudos
8 Replies
Dale_S_Intel
Employee
998 Views

Well, there's a few possibilities. Unfortunately I couldn't a simple way to do exactly what you're asking, but there are a few possiblities. If your concern is the need for dynamic libraries, you can build it statically ('-static' on Linux/MacOS). That will pull what you need into the executable so you won't have any runtime lib dependences. If the resulting code is too large for your needs, then it gets more complicated. You could tryreplacing your loop with a call to __builtin_memcpy(). That seems to work in the trivial case, at least.

Another possibility is to enable the vectorizer with the appropriate -x[KWNPB] switch (see 'icc -help' for more details on -x options). Very likely the vectorizer will vectorize any similar loop, preventing them from being converted to memcpy's.

Do any of those work for you?

Dale

0 Kudos
ILevi1
Valued Contributor I
998 Views
  • Why are you doing memory copying (and even worse float values) in the driver?
  • Why do you use Intel Compiler at all if you don't want your code to work fast?
  • Why don't you call RtlCopyMemory() instead of writing such a function?
  • What exactly are you trying to do?!?

    Hopefully I wll never buy something that needs your driver...

0 Kudos
gpseek
Beginner
998 Views
I have to bring this old thread back.
I just re-discovered this _intel_fast_memcpy thing today.

void cpy_int(int* src, int* bb_dst, int count)
{
int i = 0;

do
{
dst = src;

} while(++i < count);
}

I call it like this:

int a[4];
cpy_int(&a[0], &a[2], 2);

And the above is translated into a _intel_fast_memcpy which I don't want to see.
As you see, we only copy 2 integers. A function call (and push & pop registers) is killing the performence.

So, how to disable _intel_fast_memcpy code generation?

Thanks!

0 Kudos
JenniferJ
Moderator
998 Views
Try "/Qfreestanding" if it's on Windows.
On Linux use -ffreestanding

Jennifer
0 Kudos
TimP
Black Belt
998 Views
On the wish list here would be for
#pragma loop count(2)
to notify the compiler you want optimization for the short loop. However, if it's always count of 2, it would be faster and shorter code simply to write it out.
0 Kudos
gpseek
Beginner
998 Views
"/Qfreestanding" works!

Thanks a lot, Jennifer.

On a side note, this switch is a global one, which kills _intel_fast_memcpy everywhere.
It would be ideal that Intel comes up with a pragma allowing to turn off individual memcpytranslations, because _intel_fast_memcpy is great in general cases anyways.

Thanks again.
0 Kudos
JenniferJ
Moderator
998 Views

Actually there is something already exist.

The "#pragma optimization_level K". intel_fast_memcpy will not be generated at /O1

Try following:

#pragma optimization_level 1

void cpy_int(int* src, int* dst, int count)

{

int i = 0;

do {

dst = src;

} while(++i < count);

}

void cpy_int2(int* src, int* dst, int count)

{

int i = 0;

do {

dst = src;

} while(++i < count);

}


compile cmd: icl /O2 /c t.cpp

0 Kudos
gpseek
Beginner
998 Views
Jennifer,

I just tried #pragma optimization_level 1 but it didnt work for me though I did the test in a fairly complicated project, not like the above t.cpp.
0 Kudos
Reply