I found by exerience that to

Deyang_Gu · ‎07-11-2013

Hi,

I have a piece of assembly code which is written in NASM syntax. This is a vectorization code so we want to test it on Xeon Phi. I apologize if my question sounds too naive because this is the first day I have a Xeon Phi device. My question is:

How do I test this code? First question is it seems NASM/YASM doesn't support Xeon Phi yet. It seems difficult for me to rewrite the code in C because the algorithm itself is designed specific to some instructions and no description in C is considered.

I think intel compiler can recognize .s file as assembly but the syntax is different from NASM, and, is it still the same on for Xeon Phi? I mean syntax, format, I know instructions have changed.

Another question is: my current code involves intensive vector operations (it is coded with AVX/AVX2 in VEX prefix). I read from somewhere that on Xeon Phi, one thread is only able to do vector operation every other cycle so multi-thread on each core is recommended but I have some question here. Say my code used all 32 available zmm registers. Then when I execute the program, the code itself is actually designed for each core instead of each thread(thinking it is designed on CPU). Therefore, how to keep values in registers seems a problem for me. Say we use 32 zmm registers in our algorithm but I am using 4threads each core, then actually all threads would require 32 zmm registers. This is not possible I guess? So I am just courious...

My questions might sound naive and I apologize again if they really are. I will try to ask more sophisticated questions next time...when I have more knowledge about the device.

Thanks a lot!

Best

xiangpisai

TimP · ‎07-11-2013

The assembler is a gnu binutils assembler, presumably with the usual option to accept Intel in place of ATT syntax. However, it would likely recognize only the basic x86 instructions, not including automatic translations from AVX to MIC instructions.

There is a full program-accessible register set for each of the 4 logical hardware threads per core, including 32 zmm registers per thread.

Programming at the instruction level isn't encouraged, considering the incompatibility with AVX and the pending change for the KNL. There is a fair amount of use of the MIC intrinsics which are installed with icc.

As you said, you need at least 2 threads per core to fill the VPU pipeline.

Paul_C_7 · ‎07-11-2013

I have been developing a compiler that used to use nasm as a back end and I have found that it is possible to switch the distribution MIC assembler to recognise an variant of intel syntax that is similar to the nasm one, but not identical. The syntax for floating point stack operations is different, the syntax for operand size qualifiers is stricter, and you have to put the word ptr after the operand size ie

mov eax, dword ptr [ rbx]

instead of simply

mov eax, [rbx]

or

mov eax, dword[rbx]

You will also find that any isntruction introduced since the original Pentium has vanished, so the XMM registers and YMM registers are not there and there are no conditional move instructions nor floating point truncate and store instruction for the fpu.

Deyang_Gu · ‎07-12-2013

Paul C. wrote:

I have been developing a compiler that used to use nasm as a back end and I have found that it is possible to switch the distribution MIC assembler to recognise an variant of intel syntax that is similar to the nasm one, but not identical. The syntax for floating point stack operations is different, the syntax for operand size qualifiers is stricter, and you have to put the word ptr after the operand size ie

mov eax, dword ptr [ rbx]

instead of simply

mov eax, [rbx]

or

mov eax, dword[rbx]

You will also find that any isntruction introduced since the original Pentium has vanished, so the XMM registers and YMM registers are not there and there are no conditional move instructions nor floating point truncate and store instruction for the fpu.

Thanks a lot for the helping here!

I've tried a little but I just found it won't recognize stack variables. In NASM you have section data but here you cannot do it. Or you cannot use const: dd 0.5, etc...Do you know if there is any syntax manual for the assembler used by Intel Compiler? Thanks!

I have found it, please see my next reply :D

I don't think cmov will be a problem because you can do it in two instructions anyway, like jmp and then move. I know this is slower but anyway this won't affect you porting your algorithms to MIC.

As for my code, almost all of them are about AVX. I use a lot of blend and gather/scatter. That's the core part of my code. So I think MIC can do these parts pretty well. In addition, all my data movements are done by movaps and gather/scatter. So I guess that's fine :)

Good luck on developing your new compiler!

xiangpisai

Deyang_Gu · ‎07-12-2013

TimP (Intel) wrote:

The assembler is a gnu binutils assembler, presumably with the usual option to accept Intel in place of ATT syntax. However, it would likely recognize only the basic x86 instructions, not including automatic translations from AVX to MIC instructions.

There is a full program-accessible register set for each of the 4 logical hardware threads per core, including 32 zmm registers per thread.

Programming at the instruction level isn't encouraged, considering the incompatibility with AVX and the pending change for the KNL. There is a fair amount of use of the MIC intrinsics which are installed with icc.

As you said, you need at least 2 threads per core to fill the VPU pipeline.

Thanks Tim.

I think I have found answer to my previous reply. I am reading "Using GNU as" now. Thanks for letting me know! I don't think I would use automatic translations.

BTW: So according to your reply, it seems there will probably be a change in MIC instructions? Will the change have backward compability to what we are using now? Thanks! BTW: I've never used intrinsics before. But I can try. Since my original algorithm is designed according to AVX/AVX2 instructions and all algorithm description are done in instructions as well, I am not sure if it is easy to switch to intrinsics...

Thanks again!

xiangpisai

McCalpinJohn · ‎07-12-2013

On a related topic, the Intel 13 compilers emit Xeon Phi vector assembly code using a syntax that the assembler does not understand.
This happens when a vector instruction has multiple modifiers. For example, an instruction needing the "round to nearest" option and the "suppress all exceptions" option would be emitted by the compiler as "{rn-sae}". Unfortunately the assembler only understands the syntax "{rn}{sae}" for this combination.

If this is the only combination that is causing trouble, a simple sed script can modify the file in-place:
sed -i 's/-sae/\}\{sae/g' program.s

This has fixed all the problems that I have run across, but I don't typically try to activate a lot of the available options to the MIC vector instructions, so there could be other common occurrences of this syntax problem.

I filed an issue on premier for this bug, but I thought I should also post my workaround here in case anyone else is seeing this trouble.

Deyang_Gu · ‎07-12-2013

John D. McCalpin wrote:

On a related topic, the Intel 13 compilers emit Xeon Phi vector assembly code using a syntax that the assembler does not understand.
This happens when a vector instruction has multiple modifiers. For example, an instruction needing the "round to nearest" option and the "suppress all exceptions" option would be emitted by the compiler as "{rn-sae}". Unfortunately the assembler only understands the syntax "{rn}{sae}" for this combination.

If this is the only combination that is causing trouble, a simple sed script can modify the file in-place:
sed -i 's/-sae/\}\{sae/g' program.s

This has fixed all the problems that I have run across, but I don't typically try to activate a lot of the available options to the MIC vector instructions, so there could be other common occurrences of this syntax problem.

I filed an issue on premier for this bug, but I thought I should also post my workaround here in case anyone else is seeing this trouble.

Thanks for mentioning that John.

I will test my code asap and I will report to you if I have any result.

Best

xiangpisai

Paul_C_7 · ‎07-12-2013

I found by exerience that to create data sections or text sections you simple write

.text

.data

you do not use the word section

for data declarations instead of dd you write things like

.single 3.7

.double 100.8

Deyang_Gu · ‎07-12-2013

Paul C. wrote:

I found by exerience that to create data sections or text sections you simple write

.text

.data

you do not use the word section

for data declarations instead of dd you write things like

.single 3.7

.double 100.8

Yea. I also find them in GNU as manual. According to TimP, he told me these two share same syntax.

Assembly code on Xeon Phi