- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]struct some_context * h; ... h->function_ptr(a,b,c); h->var1 = 123; decode(h, 1, 2, 3); [/cpp]
[cpp] Instead of movq _some64bit_const, mmx_reg //_some64bit_const is {0x12345678, 0x56781234 } It would emit junk like push 0x12345678 push 0x56781234 movq esp, mmx_reg [/cpp]
[cpp]static const int64_t var1 = 0x0101010101010101; static const int64_t var2 = 0x0202020202020202; became: static const int64_t var1_var2[] = {0x0101010101010101, 0x0202020202020202}; [/cpp]
[cpp]//#include "mathimf.h" int main(int argv, char**argc) { double d; int x; d = 1.123; x = lrintf(d); return x; } [/cpp]guess what's the value of x at exit? That's right, x is ...-2147483648 !!! icl compains about function "lrintf" declared implicitly, but still it links and runs, with garbage results. I tried to step through and it goes to libmmd anyways, but if you uncomment the first to incude mathimf.h then result of x at exit is correct: 1, BUT ... I tried to step trhough assembly generated and definetly icl generates big load of gunk with multiple tests of some variabels, running cpuid instruction, doing something I have no idea what exactly and finally doing what was needed to be done, fistpl instruction. Instead I opted to use code from gcc which work fine for me:
[cpp]static __inline long lrintf(float x) { long retval; __asm__ __volatile__("fistpl %0" : "=m" (retval) : "t" (x) : "st"); return retval; }[/cpp]
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You hit on one of the many difficulties in using fistpl; since it is an x87 instruction available only on SSE3 CPUs, you would need to specify x87 code and use inline asm, for which there is no compatibility between gcc and Microsoft syntax. x87 is supported only as a compatibility option for older 32-bit only CPUs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[xhtml]Then set inherited property sheet config value of a relase build to(ConfigurationName)$(TargetExt).vsprops. Save solution, exit visual studio and reopen solution. Now, in release settings check "Command Line" - it has all the extra defines and settings inherited from property sheet. Now if I convert project to intel compiler and check "Command Line" it doesn't have anymore settings from property sheet. Settings are simply ignored. In property Sheet manager (View->Property Manager) I see my property sheets like it should be, but the intel compiler doesn't see it.[/xhtml]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had them noted as following:
1. debug information and stack problem --- I'd need a test case for this.
2. optimization: use inline asm to assign a 64-bit const to mmx register. --- I'll try with a simple test.
static const int64_t var1 = 0x0101010101010101;
__asm movq _some64bit_const, mmx_reg
3. icl gives invaid asm for some 3dnow inline asm --- can you post some code snippets here?
4. lrintf() --- I'll try as well.
5. property sheet issue. --- I'll have to check with the latest compiler update. we've been supporting property sheet feature for a while. this shouldn't happen.
Thank you !
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had them noted as following:
1. debug information and stack problem --- I'd need a test case for this.
2. optimization: use inline asm to assign a 64-bit const to mmx register. --- I'll try with a simple test.
3. icl gives invaid asm for some 3dnow inline asm --- can you post some code snippets here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]#define H264_MC_H(OPNAME, SIZE, MMX, ALIGN) .... static void OPNAME ## h264_qpel ## SIZE ## _mc12_ ## MMX(uint8_t *dst, uint8_t *src, int stride){ DECLARE_ALIGNED(ALIGN, uint8_t, temp[SIZE*(SIZE<8?12:24)*2 + SIZE*SIZE]); uint8_t * const halfHV= temp; int16_t * const halfV= (int16_t*)(temp + SIZE*SIZE); assert(((int)temp & 7) == 0); put_h264_qpel ## SIZE ## _hv_lowpass_ ## MMX(halfHV, halfV, src, SIZE, SIZE, stride); OPNAME ## pixels ## SIZE ## _l2_shift5_ ## MMX(dst, halfV+2, halfHV, stride, SIZE, SIZE); } [/cpp]
[cpp] if(! (((int)temp & 7) == 0)){av_log(0, AV_LOG_ERROR, "ALIGN: %d, sizeof(temp): %d PPS TODO: %s ((int)temp & 7) => %d, temp: %dn", ALIGN, sizeof(temp), __FUNCTION__, ((int)temp & 7), (int)temp); /*__debugbreak();*/} [/cpp]
[cpp] ALIGN: 8, sizeof(temp): 112 PPS TODO: put_h264_qpel4_mc12_mmx2 ((int)temp & 7) => 4, temp: 1227276 [/cpp]
[cpp]#elif defined(_MSC_VER) #define DECLARE_ALIGNED(n,t,v) __declspec(align(n)) t v #define DECLARE_ASM_CONST(n,t,v) __declspec(align(n)) static const t v #else[/cpp]
[cpp]static void put_h264_qpel14_mc21_mmx2(uint8_t *dst, uint8_t *src, int stride){ __declspec(align(8)) uint8_t temp[4*(4<8?12:24)*2 + 4*4]; uint8_t * const halfHV= temp; int16_t * const halfV= (int16_t*)(temp + 4*4); assert(((int)temp & 7) == 0); .... }[/cpp]
[cpp] DECLARE_ALIGNED(ALIGN, uint8_t, temp[SIZE*(SIZE<8?12:24)*2 + SIZE*SIZE]);[/cpp]and changed to ALIGN*2 and now it's all good.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As for your second issue, using Intel Parallel Studio ICL compiler version 11.1.063 with default options, and with #include
As for the "big load of gunk" remark, compiler has to generate auto-dispatch code if you are compiling for multiple CPU targets. Even if you don't, compiler still has to check whether the instruction (FISTP) is supported in hardware before attempting to execute it or the application will crash with an #UD exception (undefined instruction). That checking code is executed only once at start of main() though so it has minimal (if any) impact on performance.
As for the instruction itself, it is available on CPUs which support SSE3 instruction set (Prescott and newer) -- it won't run on Pentium 4 (Northwood) CPUs, Pentium 3 or AMD CPUs prior to Athlon 64 Venice core if I remember correclty.
As for the static const "issue", Intel Compiler usually knows what it is doing. Depending on the optimization settings and the code itself it may choose different approach for constant loading into SIMD registers. If you haven't performed extensive performance benchmarking of those two different approaches you should not be so quick to dismiss the code generated by the compiler. Perhaps it is advantageous to have those values on stack to keep good data locality or the data (or the stack space) is being reused later, etc.
In any case, if I remember correctly there is no MMX intrinsic to load 64-bit value from memory but writing:
[cpp]#include "mmintrin.h" __m64 m0; void test(void) { static const int x1 = 0x12345678; static const int x0 = 0x87654321; m0 = _mm_set_pi32(x1, x0); // _mm_empty(); } [/cpp]
Would do what you wanted (make the compiler use the MOVQ instruction). Of course, you can always use it directly in inline asm code.
As for the 3DNow instructions, if I remember correctly Intel Compiler does not support those instructions. Supporting them would not make much sense because Intel CPUs do not support them in hardware and those instructions are terribly outdated anyway, so what you have seen can be classified as expected behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Highly optimized code is very hard to debug since the code goes through many transformations. Thus, it is hard to match debug info and the actual code. Fault may be in PDB file generation, Microsoft debugger or in their interaction. If you are seeing correct values on stack (using memory window) and in registers, then at least the code is ok.
Compiler I am using is part of the Intel Parallel Studio (Update 1). Compiling function call without a prototype makes compiler assume the argument type which can lead to the error you described. Calling unprototyped functions in considered bad programming practice and I really do not understand why you are insisting on it and how it happened that the compiler does not abort with an error -- perhaps you suppressed errors or made some particular errors to be warnings?
As for the FISTP, if you are doing conversion to int using (int) cast, I guess you should be using FISTTP which always performs truncation. You can also use SSE2 for that -- CVTTSS2SI (but only for 32-bit floats).
As for dispatcher, I guess that the conditional jump is more likely to be predicted than a jump through a function pointer and thus have better performance, but I haven't bothered to verify that assumption.
64-bit const has to be passed as the memory operand. If you do not use intrinsics then you can use inline assember to perform MOVQ mm0, qword ptr [mem].
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the compiler does not see a declaration before the call to the function it doesn't know how to pass parameters and receive back the result.
As for the compiler replacing MOVQ from const variable with two push immediate instructions you will have to show some sort of proof. I have just tested it with this code:
[cpp]const __int64 x = 0x1234567887654321; void test(void) { __asm { movq mm0, qword ptrAnd I am getting the following assember code:} } [/cpp]
[cpp]; -- Machine type PW ; mark_description "Intel C++ Compiler for applications running on IA-32, Version 11.1 Build 20090624 %s"; ; mark_description "-c -FAs"; .686P .387 OPTION DOTNAME ASSUME CS:FLAT,DS:FLAT,SS:FLAT _TEXT SEGMENT PARA PUBLIC FLAT 'CODE' ; COMDAT ?test@@YAXXZ TXTST0: ; -- Begin ?test@@YAXXZ ; mark_begin; IF @Version GE 800 .MMX ELSEIF @Version GE 612 .MMX MMWORD TEXTEQUAs for condition variable, all frequently accessed variables are in L1 cache. Your concern is ill-placed. You seem to be focusing on minor issues (I'd dare to call them cosmetic) instead of profiling the code to identify hotspots worthy of scrutiny and further optimization.ENDIF IF @Version GE 800 .XMM ELSEIF @Version GE 614 .XMM XMMWORD TEXTEQU ENDIF ALIGN 16 PUBLIC ?test@@YAXXZ ?test@@YAXXZ PROC NEAR .B1.1: ; Preds .B1.0 ;;; { ; LOE ebx ebp esi edi .B1.2: ; Preds .B1.1 ; Begin ASM ;;; __asm { ;;; movq mm0, qword ptr movq mm0, QWORD PTR [?x@@4_JB] ;7.3 ; End ASM ; LOE ebx ebp esi edi .B1.3: ; Preds .B1.2 ;;; } ;;; } ret ;9.1 ALIGN 16 ; LOE ; mark_end; ?test@@YAXXZ ENDP ;?test@@YAXXZ ENDS _TEXT ENDS _DATA SEGMENT DWORD PUBLIC FLAT 'DATA' _DATA ENDS ; -- End ?test@@YAXXZ _RDATA SEGMENT DWORD PUBLIC FLAT 'DATA' ?x@@4_JB DD 087654321H,012345678H _RDATA ENDS _DATA SEGMENT DWORD PUBLIC FLAT 'DATA' _DATA ENDS END [/cpp]
Intel compiler is always using its optimized math and runtime libraries. You can try to override that by specifying /NODEFAULTLIB:library_name to the linker and using your preferred library instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, lrintf problem is resolved as a no error in icl.
What about that declare(aligned(X)) problem? In many places to fix it I had to align to 32 bytes instead of 16 to avoid unaligned access exception. Not a big problem for me, but still a problem with support of this ms extension. By the way, while trying to find out about that I saw on some forums someone had similar problem with ms compiler.
Problem with VS integration and property sheets: since parallel studio integration seems to be completely different from regular c++ compiler release I tried to check if it works as it did in 11.0.072.
It appears that the problem is still there, more over some other stuff are missing: disable warning for instance and some other options are simply removed from configuration options in VS. So I had to manually add /wdNNN to additional command line options to avoid thousands of warnings comming out of ffmpeg.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem is still there; I probably made a mistake saying that it has this problem in ms style asm. This problem exists only in gas inline assembler:
[cpp]// test.c static const unsigned long long C64 = 0x1234567844441111ULL; extern "C" void set_mm0() { __asm__ volatile ("movq %0, %%mm0" ::"m"(C64)); } extern "C" void set_mm0_masm() { __asm movq mm0, C64; } [/cpp]
asm listing:
[plain] PUBLIC _set_mm0: sub esp, 8 mov eax, 1145311505 mov edx, 305419896 mov DWORD PTR [esp], eax mov DWORD PTR [4+esp], edx movq mm0, QWORD PTR [esp] add esp, 8 ret ALIGN 16 PUBLIC _set_mm0_masm: movq mm0, QWORD PTR [?C64@@4_KB] ret [/plain]
Results will be identical if I modify c code as follows (set_mm0_masm unmodified):
[cpp]static const unsigned long long C64[] = {0x1234567844441111ULL, 0}; extern "C" void set_mm0() { __asm__ volatile ("movq %0, %%mm0" ::"m"(C64[0])); } [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As for integration, try uninstalling the complete compiler packaga and then install it and integrate it again. For me all options work with the latest compiler version and Visual Studio 2008 Pro.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page