Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Multiple Problems with ICL

mtlroom
Beginner
1,254 Views
Hello everybody,
I ported ffmpeg lib to be able to compile it with intel c++ compiler for windows. The only reason I did so was to be able to debug ffmpeg related code in ms visual studio.

While porting code I noticed some problems with the compiler and I had to do workarounds. Overall ffmpeg binaries compiled with icl work ok, but the biggest problem is that debug info is somewhat broken.

I'll list some of the problems with intel compiler that I encountered while compiling ffmpeg library.


So, I decided to post a question here on the forum and at the same time check what's new. Good news is that I see some of the problems that I encountered are fixed.

Basically, the biggest problem with debug info is that it's broken. Very often variables show wrong values in the debugger.
For example:

[cpp]struct some_context * h;
...
h->function_ptr(a,b,c);
h->var1 = 123;
decode(h, 1, 2, 3);
[/cpp]

and then if I get assertion inside decode(..) I see value of h as 0 which is impossible cosidering code that executed before calling decode(...);
Ffmpeg cannot be compiled without compiler optimization, so that kind of errors could be related to optimizations, but I highly suspect that it's not the case. Whenever I see a variable with wrong value I just switch to another function in the call stack and there the variable has correct value. In the posted example code, inside decode(...) function passed value of h could be 0, but if I go back in the stack then iside the function that calls decode I see correct value of h.
This is one of the problems, the other one is more serious: I get completely broken call stack.
Sometimes, I have asserts in code and I examine callstack and for sure that it can't be real - I see something like av_malloc calls some encoding function, which obviously cannot happen. So, I tried to put a breakpoint before that function... everything is fine, the moment that I step into the function (F10) call stack in the debugger's window become garbage (the functions below in the list become absolutely different)
Any info on that??

While debugging some problems, I found out that they were related to valiable length arrays. Replacing them with alloca calls fixed the problem. I see some of the posts that this problem has been fixed in latest build. While debugging that variable length array problem I was also getting similar weird call stack in debugger window, could it be related?


====
Next problem related to perfrmance/optimizations.
In ffmpeg there are a lot of constants that are used in different encoders/decoders. Many of these constants are 64bit sized consts. The issue with intel compiler that in most of the cases it emits such bad junk that it's difficult to make it anywhere slowere than what icl does.
I had an test app a while ago, but can't find it at the moment. Basically, I had a 64bit const and then I was assigning that 64 bit const to an mmx register. My intention was to generate code that moves 64bit data to an mmx register located static const variable. Instead, intel compiler emits code that pushes two 32bit ints onto the stack and then moves to mmx register using stack pointer. I'm not sure if it's clear what I'm saying... in short, here's code example:
[cpp] Instead of 
 	movq _some64bit_const, mmx_reg //_some64bit_const is {0x12345678, 0x56781234 } 
 It would emit junk like
 	push 0x12345678
 	push 0x56781234
 	movq esp, mmx_reg
[/cpp]
At first when I saw that bloat I thought that probably intel compiler knows what it does and produces faster optimized code instead of making processor load variable that located god know were outside of the function body, but on practice that way of pushes on stack and loading through stack pointer appear to be much slower and bigger in size, which is unacceptable for highly optimized encoder's code. To avoid this "optimization" I had to combine most of the64-bitstatic consts into arrays:
[cpp]static const int64_t var1 = 0x0101010101010101;
static const int64_t var2 = 0x0202020202020202;

became:

static const int64_t var1_var2[] = {0x0101010101010101, 0x0202020202020202};
[/cpp]

In this case icl doesn't try to "optimize" static consts anymore.


====
Another problem is related to the fact that icl reports some 3dnow inline asm instructions as invalid. This is known, and was reported I think on these forums. For that reason some projects chose to completely not support intel compiler.

====
Last problem I encountered just recently.
Somewhere on the web I read a post saying that performance of their math related library drops a few times if they use intel's math instead of math lib that comes with ms compiler. So I decided to make a test as I was using mathimf.h instead of math.h. The reason I sed mathimf instead of MS's math.h was because microsoft's math.h doesn't have some of c99 math functions that ffmpeg uses. These are rint,lrint,lring,lrintl,isnan,isinf etc. So, I tested these functions and results are astonishing (garbage code gnereated).
I did this test just while ago and I'll post a complete example program to test that.
///main.c

[cpp]//#include "mathimf.h"
int main(int argv, char**argc)
{
	double d;
	int x;

	d = 1.123;
	x = lrintf(d);

	return x;
}
[/cpp]
guess what's the value of x at exit? That's right, x is ...-2147483648 !!! icl compains about function "lrintf" declared implicitly, but still it links and runs, with garbage results. I tried to step through and it goes to libmmd anyways, but if you uncomment the first to incude mathimf.h then result of x at exit is correct: 1, BUT ... I tried to step trhough assembly generated and definetly icl generates big load of gunk with multiple tests of some variabels, running cpuid instruction, doing something I have no idea what exactly and finally doing what was needed to be done, fistpl instruction. Instead I opted to use code from gcc which work fine for me:

[cpp]static __inline long lrintf(float x)
{
	long retval;
	__asm__ __volatile__("fistpl %0"  : "=m" (retval) : "t" (x) : "st");
	return retval;
}[/cpp]

0 Kudos
23 Replies
mtlroom
Beginner
1,118 Views
icl version: w_cproc_p_11.0.072
Microsoft Visual Studio 2008Version 9.0.30729.1 SP

example of broken stack in attachment.
0 Kudos
mtlroom
Beginner
1,118 Views
I attached a scerenshot, have no idea why it doesn't show up...
0 Kudos
TimP
Honored Contributor III
1,118 Views
If you mean to use lrint() without a prototype, you certainly can't expect lrintf() to work the same. With a prototype, of course, you are inserting a cast lrintf((float)d) which is different from what you say you intended.
You hit on one of the many difficulties in using fistpl; since it is an x87 instruction available only on SSE3 CPUs, you would need to specify x87 code and use inline asm, for which there is no compatibility between gcc and Microsoft syntax. x87 is supported only as a compatibility option for older 32-bit only CPUs.
0 Kudos
mtlroom
Beginner
1,118 Views
Quoting - tim18
If you mean to use lrint() without a prototype, you certainly can't expect lrintf() to work the same. With a prototype, of course, you are inserting a cast lrintf((float)d) which is different from what you say you intended.

I don't get what you are trying to say, to me it's nonsence. You are saying that the code I posted has undefined behavior or supposed to return that garbage value?? What kind of prototype you are talking about? It's pure c code that doesn't even produce warnings except lrintf undefined. Will it make more obvious that somehting is wrong if in that example instead of double float is used? Replacing doable with float yelds exactly the same result.

Quoting - tim18
You hit on one of the many difficulties in using fistpl; since it is an x87 instruction available only on SSE3 CPUs, you would need to specify x87 code and use inline asm, for which there is no compatibility between gcc and Microsoft syntax. x87 is supported only as a compatibility option for older 32-bit only CPUs.

I dont' really understand... it's available only on sse3 CPUs ... and it's supported only as a compatibility option for older 32-bit only CPUs. What does it mean? What's the preferable way in asm to do lrintf() then?
As I understand, lrintf does conversion from float to int with curently set rounding mode. That exactly what fitpl does
0 Kudos
mtlroom
Beginner
1,118 Views
Just an update, I tested with 11.1.038 and lrintf problem is still there. I didn't check others, but I see that I have another problem now.
The other problem is related to visual studio integration.
As seen on my jamboo screenshot I have a few projects in a solution. To make sure that all of them have correct compilation settings I use property sheets instead of per project configurations.
The value is located at Configuration Properties->General->Inherited Project Property Sheets.
I used theis value:(ConfigurationName)$(TargetExt).vsprops
Which means that release lib would inherit Release.lib.vsprops, debug dll would inherit Debug.dll.vsprops etc. Everything obviously works fine with visual studion and workd fine with 11.0.72 compiler. With the newr build that I installed this doesn't work anymore, it's like there is no property sheets set at all. I had to manually change(ConfigurationName)$(TargetExt).vsprops to Release.lib.vsprops etc, and I don't like that at all. What's going on, how come stuff like that become broken?

Steps to reproduce:
create an static library project with visual studio (defaults are ok).
Create a Release.lib.vsprops file in the same folder as .vsproj file with the following contents:
[xhtml]

	
	

[/xhtml]
Then set inherited property sheet config value of a relase build to(ConfigurationName)$(TargetExt).vsprops. Save solution, exit visual studio and reopen solution. Now, in release settings check "Command Line" - it has all the extra defines and settings inherited from property sheet. Now if I convert project to intel compiler and check "Command Line" it doesn't have anymore settings from property sheet. Settings are simply ignored. In property Sheet manager (View->Property Manager) I see my property sheets like it should be, but the intel compiler doesn't see it.
Then if I modify project settings for inherited property sheet value from(ConfigurationName)$(TargetExt).vsprops to Release.lib.vsprops then intel compiler properly inherits the settings (solution has to be reloaded). I didn't need to manually expand environment variables wtih 11.0.72. I didn't check, but it's very possible that the new build has similar bug in other places where environment variables are used in configuration.
0 Kudos
JenniferJ
Moderator
1,118 Views
Wow, 5 issues.

I had them noted as following:

1. debug information and stack problem --- I'd need a test case for this.
2. optimization: use inline asm to assign a 64-bit const to mmx register. --- I'll try with a simple test.

static const int64_t var1 = 0x0101010101010101;
__asm movq _some64bit_const, mmx_reg

3. icl gives invaid asm for some 3dnow inline asm --- can you post some code snippets here?

4. lrintf() --- I'll try as well.
5. property sheet issue. --- I'll have to check with the latest compiler update. we've been supporting property sheet feature for a while. this shouldn't happen.

Thank you !
Jennifer


0 Kudos
mtlroom
Beginner
1,118 Views
Wow, 5 issues.

I had them noted as following:

1. debug information and stack problem --- I'd need a test case for this.



The debug problem is the most important problem in my case. That's the reason I ported entire ffmpeg to intel compiler to be able to debug ffmpeg code in visual studio. And it's the most difficult to reproduce. I have that problem on my side all the time if I try to step through ffmpeg code, but I don't have this problem in small programs. I compile ffmpeg in debug, omit frame pointer is disabled. But I know that ffmpeg internally uses frame pointer register. Could that be the reason for bad stack trace? I tried different debug settings and I'm always getting invalid stack trace.


2. optimization: use inline asm to assign a 64-bit const to mmx register. --- I'll try with a simple test.


This one should be simple to reproduce. I used gcc style asm for loading to the register. If I recall correctly same happens with MS style asm. I suspect that same problem might exist for all 64bit const variables.


3. icl gives invaid asm for some 3dnow inline asm --- can you post some code snippets here?


there are many (if not all of them), from the top of my head: pavgusb, femms,prefetch,prefetchw, cltd(which is cdq). Here's more I searched in the code:pf2id,pfmul,pfadd, pfsub,pswapd etc etc. Icl gives this error for all of these:error: unknownopcode "pswapd" -- __asm. This error doesn't exist in ICC as far as I know. I even did full binary search on the entire installation of intel compiler - nothing contains strings likepfmul etc, but if I search supported opcodes they are present in some dll or exe file. So, some of the opcodes are simply missing...
Problem with lrintf can easily be reproduced. I posted a full program for that. Same for property sheets. The problem with newer version is not that it doesn't support it completely, but it doesn't work if I use environment variables in the name of the property sheet so that required properties would be taken based on configuration selected.
0 Kudos
mtlroom
Beginner
1,118 Views
I'll add some more to the list.

I'm getting an assert in ffmpeg. Because of corrupted stack I wasn't able to see where exactly assertion was happening.
Finally I found out where it happens. It's inside one of deeply nested macro. Here's the piece of code that causes the assert:

[cpp]#define H264_MC_H(OPNAME, SIZE, MMX, ALIGN) 
....

static void OPNAME ## h264_qpel ## SIZE ## _mc12_ ## MMX(uint8_t *dst, uint8_t *src, int stride){
    DECLARE_ALIGNED(ALIGN, uint8_t, temp[SIZE*(SIZE<8?12:24)*2 + SIZE*SIZE]);
    uint8_t * const halfHV= temp;
    int16_t * const halfV= (int16_t*)(temp + SIZE*SIZE);
    assert(((int)temp & 7) == 0); 
    put_h264_qpel ## SIZE ## _hv_lowpass_ ## MMX(halfHV, halfV, src, SIZE, SIZE, stride);
    OPNAME ## pixels ## SIZE ## _l2_shift5_ ## MMX(dst, halfV+2, halfHV, stride, SIZE, SIZE);
}
[/cpp]

Once I attach with a debugger I wasn't able to inspect variables so I added some printf's in place of the assert:

[cpp] if(! (((int)temp & 7) == 0)){av_log(0, AV_LOG_ERROR, "ALIGN: %d, sizeof(temp): %d PPS TODO: %s ((int)temp & 7) => %d, temp: %dn", ALIGN, sizeof(temp), __FUNCTION__, ((int)temp & 7), (int)temp); /*__debugbreak();*/}
[/cpp]
here's what I got:
[cpp] ALIGN: 8, sizeof(temp): 112 PPS TODO: put_h264_qpel4_mc12_mmx2 ((int)temp & 7) => 4, temp: 1227276
[/cpp]

Here's DECLARE_ALIGNED:

[cpp]#elif defined(_MSC_VER)
    #define DECLARE_ALIGNED(n,t,v)      __declspec(align(n)) t v
    #define DECLARE_ASM_CONST(n,t,v)    __declspec(align(n)) static const t v
#else[/cpp]


so.... al in all, it comes down to this simple function:

[cpp]static void put_h264_qpel14_mc21_mmx2(uint8_t *dst, uint8_t *src, int stride){
	__declspec(align(8)) uint8_t temp[4*(4<8?12:24)*2 + 4*4];
    uint8_t * const halfHV= temp;
    int16_t * const halfV= (int16_t*)(temp + 4*4);
	assert(((int)temp & 7) == 0);
	....
}[/cpp]


So, it's obvious that declare aligned in this case didn't produce expected results.
temp was supposed to be 8-bytes aligned, but it's only 4 bytes aligned.

I'm not sure if some simple helloworld would produce the same wrong results. It could be prehaps related to the fact that my code uses variable length arrays and alloca at the same time.
Just to let you know - there is no major defects with my build or anything. If I define NDEBUG then my build passes all validation tests from ffmpeg. In this particular case unaligned data probably would result in performance penalty. In some other cases long time ago I had program terminated when 128bit data wasn't 16bit aligned for sse register, or soemthing like that but I fixed it with some code tweaking.

EDIT:
As a workaround I tried modified that
[cpp]    DECLARE_ALIGNED(ALIGN, uint8_t, temp[SIZE*(SIZE<8?12:24)*2 + SIZE*SIZE]);[/cpp]
and changed to ALIGN*2 and now it's all good.
0 Kudos
mtlroom
Beginner
1,118 Views
Some more to the list...
I'm sure I'm not the only one who had this problem and I have this problem very often (a few times a day if I build with intel compiler).
the problem:
1>C:FFMPEGDebugobjavcodec-52/vorbis_dec.obj : fatal error LNK1136: invalid or corrupt file

This is not related to the exact file, it happens with many different files. Most of the time that invalid or corrupt file is zero in size. (I didn't see them non zero size, yet). Very often I do a build and then terminate in the middle and then do a build again (not full rebuild). My guess is that compiler first creates inputfile.obj and if compilation aborted or anything like that it doesn't delete this file. Then dependency tracker considers the file as valid and up-to-date and doesn't make it recompile.
0 Kudos
mtlroom
Beginner
1,118 Views
In another thread I added some more info that could also be related to VS integration:
0 Kudos
levicki
Valued Contributor I
1,118 Views
As for your first issue (debug problem) -- one of the reasons such issues can happen is when you (inadvertently) mix calling conventions or ABI. In other words, yes, it is possible that a leaf function decode() sees h as zero if it doesn't interpret stack in the expected way. This is especially true for ported projects which have parts written in assembler for a different compiler.

As for your second issue, using Intel Parallel Studio ICL compiler version 11.1.063 with default options, and with #include I am getting the correct result of 1. Without #include compiler correctly aborts with an error "identifier "lrintf" is undefined".

As for the "big load of gunk" remark, compiler has to generate auto-dispatch code if you are compiling for multiple CPU targets. Even if you don't, compiler still has to check whether the instruction (FISTP) is supported in hardware before attempting to execute it or the application will crash with an #UD exception (undefined instruction). That checking code is executed only once at start of main() though so it has minimal (if any) impact on performance.

As for the instruction itself, it is available on CPUs which support SSE3 instruction set (Prescott and newer) -- it won't run on Pentium 4 (Northwood) CPUs, Pentium 3 or AMD CPUs prior to Athlon 64 Venice core if I remember correclty.

As for the static const "issue", Intel Compiler usually knows what it is doing. Depending on the optimization settings and the code itself it may choose different approach for constant loading into SIMD registers. If you haven't performed extensive performance benchmarking of those two different approaches you should not be so quick to dismiss the code generated by the compiler. Perhaps it is advantageous to have those values on stack to keep good data locality or the data (or the stack space) is being reused later, etc.

In any case, if I remember correctly there is no MMX intrinsic to load 64-bit value from memory but writing:

[cpp]#include "mmintrin.h"

__m64 m0;

void test(void)
{
	static const int x1 = 0x12345678;
	static const int x0 = 0x87654321;
	m0 = _mm_set_pi32(x1, x0);
//	_mm_empty();
}
[/cpp]

Would do what you wanted (make the compiler use the MOVQ instruction). Of course, you can always use it directly in inline asm code.

As for the 3DNow instructions, if I remember correctly Intel Compiler does not support those instructions. Supporting them would not make much sense because Intel CPUs do not support them in hardware and those instructions are terribly outdated anyway, so what you have seen can be classified as expected behavior.
0 Kudos
mtlroom
Beginner
1,118 Views
Hi Igor,
thanks for the reply.

For issue #1. I thought about such possibility, but in that case code would be crashing here and there,especiallybecause I have omit frame pointer optimization enabled everywhere. The thing is that code doesn't crash (except in one specific case because of unaligned memory and movaps instruction, unaligned memory problem was by the way caused because __declspec(align(x)) didn't make memory aligned as requested; see one of my other posts in this thread for more info). Moreover, all that ported code is complied completely by intel compiler, it doesn't use any precompiled libraries or dll's. I also tried to make such test - to simulate wrong calling convention, and compiler complained that function was previously declared bla-bla... so, it would be seen in the build log if it happened. There isn't much asm code used, but a lot of inline asm (in ffmpeg). On top of that, I was getting that parameter 0 when there was no asm or anything like that in the call statck chain... But what about completely broken stack? When function call order completely doesn't make sense and at the same time everything works just as expected?..

Issue #2: withmathimf.h I'm getting correct behavior, as I said, but without it (default ms project compiled with intel compiler) I'm getting garbage. Tested with 11.0�72 and11.1�38 and got the same output. Where can I d/l version that you are using? What I see is that 11.1.038 for windows and 11.1.046 for linux are available now.

Issue #3: Well, from the asm code I approximately understand what's going on there. Even though I don't think it's mentioned in the standard, I assume that lrintf function supposed to deliver highest performance, otherwise plain c-style cast would be used. In some projects they measured that lrintf is preferable over casting in performance sensitive applications. I didn't measure, but I think that casting will work better with icl. At least autodispatch code could have been written using function pointer that get's set when it's run first time so that static variable that was used for cpu check wouldn't need to be compared every time)

Issue #4: (const issue). Iactuallydid perform extensive benchmarking. I tried hard to write code that would run faster, but it always was slower; not 10% or 15% it was 2-3 times slower. And generated code was much bigger off course. I compared results produced by gcc, cl and icl. And my my conclusion was that icl produces "big load of gunk"! As a side note: I did all that not because I was bored and because I was afraid to lose those extra cpu cycles, but because of that generated junk ffmpeg did not compile anymore. Some of the functions use ALL available registers and pushing stuff on the stack and then loading from the stack also required extra registers and as a result compiler gave errors that can't find available registers or something like that. To make it clear: that code doesn't use intrinsics, it uses inline asm and 64-bit static const is passed as a memory operand.


0 Kudos
levicki
Valued Contributor I
1,118 Views
You are welcome, too bad that it did not help much.

Highly optimized code is very hard to debug since the code goes through many transformations. Thus, it is hard to match debug info and the actual code. Fault may be in PDB file generation, Microsoft debugger or in their interaction. If you are seeing correct values on stack (using memory window) and in registers, then at least the code is ok.

Compiler I am using is part of the Intel Parallel Studio (Update 1). Compiling function call without a prototype makes compiler assume the argument type which can lead to the error you described. Calling unprototyped functions in considered bad programming practice and I really do not understand why you are insisting on it and how it happened that the compiler does not abort with an error -- perhaps you suppressed errors or made some particular errors to be warnings?

As for the FISTP, if you are doing conversion to int using (int) cast, I guess you should be using FISTTP which always performs truncation. You can also use SSE2 for that -- CVTTSS2SI (but only for 32-bit floats).

As for dispatcher, I guess that the conditional jump is more likely to be predicted than a jump through a function pointer and thus have better performance, but I haven't bothered to verify that assumption.

64-bit const has to be passed as the memory operand. If you do not use intrinsics then you can use inline assember to perform MOVQ mm0, qword ptr [mem].

0 Kudos
mtlroom
Beginner
1,118 Views
I understand about debugginghighly optimizedcode and that stepping through won't always go in the same way as code written: it may jump up and down and go in reverse direction as a result of instruction reordering etc. I think I saw a few times that code that doesn't touch ffmpeg also has that problem with 0 parameters, and if compiled with ms compiler it doesn't have that problem.

What does "Compiling function call without a prototype" mean? I'm a c++ programmer and to me it sounds like something illegal and c-related. Is it like casting void pointer to a function and calling it (ignoring calling convention)?

Well, I didn't check either conditional jump vs function pointer, but I think that it will still need to go and see what's the value of that condition variable that might be located in some random location.

I just downloaded that latest version of icl and I'll post here results related to lrintf.

MOVQ mm0, qword ptr [mem]: that's approximately what I was doing... I don't have my test code (I did it in c:tmp) anymore, but from what I remember I had something like:
asm("movq (%0), xmm0", ::"m"(_64bit_const));
ans suppose that the 64bit constant was declared as: static const int64_t_64bit_const = 0x1111111122222222;

the generated asm would be something like this:

push 0x11111111
push 0x22222222
movq [esp], xmm0

I was surprised that compiler was smart enough to optimize this kind of code. But this kind of optimization only would be good with 32 bit consts but not 64 bit.


What about that problem with VS integration and property sheets? Any updates from intel people?
Thanks

0 Kudos
mtlroom
Beginner
1,118 Views
I just installed trial of the parallel studio and I'm getting exactly the same results.
If I do multithreaded dll debug build then for some reason icl still links my binary to libmmd.dll (as seen from the screenshot)
Then I tried to step into lrintf to see what's going on. It goes through that dispatching logic then it calls either lrintf.A or lrintf.J. libmmd.dll is loaded from parallel studio install folder (shows up in the list of modules). Output in both dll and static built is the same (as with the previous versions). If I replace math.h with mathinf.h then everything works as expected. When stepping through, on my pc it calls lrintf.A which does fld; fistp; then on top of that it calls LOTS of other instructions and at the end I have garbage.
So, whether I include or do not include mathimf, libmm from intel is still being used? But if don't include mathimf then something wrong happens. If I simply add forward declareation of lrintf:int lrintf(float x); then it works as expected and I get correct result. So, if lrintf isn't defined compiler treats it as: int lrintf(void) and that's why I get garbage in output. Therefore, question I have: why icl links to it's math lib (is it so by design?)

Attached a jamboo screeenshot :)



0 Kudos
levicki
Valued Contributor I
1,118 Views
As you have noticed yourself, there is a function declaration and a function definition and they are not C specific.

If the compiler does not see a declaration before the call to the function it doesn't know how to pass parameters and receive back the result.

As for the compiler replacing MOVQ from const variable with two push immediate instructions you will have to show some sort of proof. I have just tested it with this code:

[cpp]const __int64 x = 0x1234567887654321;

void test(void)
{
	__asm	{
		movq	mm0, qword ptr 
	}
}
[/cpp]
And I am getting the following assember code:

[cpp]; -- Machine type PW
; mark_description "Intel C++ Compiler for applications running on IA-32, Version 11.1    Build 20090624 %s";
; mark_description "-c -FAs";
	.686P
 	.387
	OPTION DOTNAME
	ASSUME	CS:FLAT,DS:FLAT,SS:FLAT
_TEXT	SEGMENT PARA PUBLIC FLAT 'CODE'
;	COMDAT ?test@@YAXXZ
TXTST0:
; -- Begin  ?test@@YAXXZ
; mark_begin;
IF @Version GE 800
  .MMX
ELSEIF @Version GE 612
  .MMX
  MMWORD TEXTEQU 
ENDIF
IF @Version GE 800
  .XMM
ELSEIF @Version GE 614
  .XMM
  XMMWORD TEXTEQU 
ENDIF
       ALIGN     16
	PUBLIC ?test@@YAXXZ
?test@@YAXXZ	PROC NEAR 
.B1.1:                          ; Preds .B1.0

;;; {

                                ; LOE ebx ebp esi edi
.B1.2:                          ; Preds .B1.1
; Begin ASM

;;; 	__asm	{
;;; 		movq	mm0, qword ptr 

        movq      mm0, QWORD PTR [?x@@4_JB]                     ;7.3
; End ASM
                                ; LOE ebx ebp esi edi
.B1.3:                          ; Preds .B1.2

;;; 	}
;;; }

        ret                                                     ;9.1
        ALIGN     16
                                ; LOE
; mark_end;
?test@@YAXXZ ENDP
;?test@@YAXXZ	ENDS
_TEXT	ENDS
_DATA	SEGMENT DWORD PUBLIC FLAT 'DATA'
_DATA	ENDS
; -- End  ?test@@YAXXZ
_RDATA	SEGMENT DWORD PUBLIC FLAT 'DATA'
?x@@4_JB	DD	087654321H,012345678H
_RDATA	ENDS
_DATA	SEGMENT DWORD PUBLIC FLAT 'DATA'
_DATA	ENDS
	END
[/cpp]
As for condition variable, all frequently accessed variables are in L1 cache. Your concern is ill-placed. You seem to be focusing on minor issues (I'd dare to call them cosmetic) instead of profiling the code to identify hotspots worthy of scrutiny and further optimization.

Intel compiler is always using its optimized math and runtime libraries. You can try to override that by specifying /NODEFAULTLIB:library_name to the linker and using your preferred library instead.

0 Kudos
mtlroom
Beginner
1,118 Views
declaration problem is C-specific. In C++ it's a error, that's why I had this unnoticed. Mistakenly I was thinking that intel compiler links to math lib that comes with visual studio and including mathimf.h makes it link to it's own lib.

So, lrintf problem is resolved as a no error in icl.

What about that declare(aligned(X)) problem? In many places to fix it I had to align to 32 bytes instead of 16 to avoid unaligned access exception. Not a big problem for me, but still a problem with support of this ms extension. By the way, while trying to find out about that I saw on some forums someone had similar problem with ms compiler.

Problem with VS integration and property sheets: since parallel studio integration seems to be completely different from regular c++ compiler release I tried to check if it works as it did in 11.0.072.
It appears that the problem is still there, more over some other stuff are missing: disable warning for instance and some other options are simply removed from configuration options in VS. So I had to manually add /wdNNN to additional command line options to avoid thousands of warnings comming out of ffmpeg.
0 Kudos
mtlroom
Beginner
1,118 Views
Ok, I just retested myself with 11.1.063 to verify if it was modified/fixed.
The problem is still there; I probably made a mistake saying that it has this problem in ms style asm. This problem exists only in gas inline assembler:

[cpp]// test.c     
    static const unsigned long long C64 = 0x1234567844441111ULL;  
    extern "C" void set_mm0()  
    {  
        __asm__ volatile ("movq %0, %%mm0" ::"m"(C64));  
    }  
      
    extern "C" void set_mm0_masm()  
   {  
       __asm movq    mm0, C64;  
   }  
[/cpp]

asm listing:

[plain]    PUBLIC _set_mm0:  
        sub       esp, 8 
        mov       eax, 1145311505 
        mov       edx, 305419896 
        mov       DWORD PTR [esp], eax 
        mov       DWORD PTR [4+esp], edx 
        movq      mm0, QWORD PTR [esp] 
        add       esp, 8 
        ret 
       ALIGN     16 
     
  PUBLIC _set_mm0_masm:  
       movq      mm0, QWORD PTR [?C64@@4_KB]  
       ret  
[/plain]



Results will be identical if I modify c code as follows (set_mm0_masm unmodified):

[cpp]static const unsigned long long C64[] = {0x1234567844441111ULL, 0};

extern "C" void set_mm0()
{
    __asm__ volatile ("movq %0, %%mm0" ::"m"(C64[0]));
}
[/cpp]


0 Kudos
mtlroom
Beginner
1,118 Views
Any info on that code? Isn't it what I was talking about?

PUBLIC _set_mm0:
sub esp, 8
mov eax, 1145311505
mov edx, 305419896
mov DWORD PTR [esp], eax
mov DWORD PTR [4+esp], edx
movq mm0, QWORD PTR [esp]
add esp, 8
ret
ALIGN 16
0 Kudos
levicki
Valued Contributor I
973 Views
I am not familiar with AT&T ASM syntax, but there is a chance that you are using it incorrectly. Please check whether you are getting the same assember code with gcc as well. If you do, then there is an error on your side. If not, then you should submit an issue to Intel Premier Support along with a reproducible test case.

As for integration, try uninstalling the complete compiler packaga and then install it and integrate it again. For me all options work with the latest compiler version and Visual Studio 2008 Pro.
0 Kudos
Reply