Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

Custom instruction opcode inside code

morca
Beginner
4,074 Views

I have question in mind and have tested that with GCC without success. So, I would like to know if it is *indeed* possible to do that with GCC or ICC. Actually the question is about machine instructions fetched by CPU.

I have written an assembly byte (0x00) as a code inside the main function. But after the compilation, according to the GDB, the 00 opcode is followed by the next instructions. In other words, the compiler (or gdb, I don't know) assumes that 00 is the add instructions so, it automatically appends other byte sequences to that. I don't want that!

 

(gdb) set disassembly-flavor intel
(gdb) list
1       void main()
2       {
3         __asm__(".byte 0x00");
4       }
(gdb) disass /r main
Dump of assembler code for function main:
   0x00000000004004ed <+0>:     55      push   rbp
   0x00000000004004ee <+1>:     48 89 e5        mov    rbp,rsp
   0x00000000004004f1 <+4>:     00 5d c3        add    BYTE PTR [rbp-0x3d],bl
End of assembler dump. 

So, I want to see

  0x00000000004004ed <+0>:     55      push   rbp
  0x00000000004004ee <+1>:     48 89 e5        mov    rbp,rsp  
    ==========>                00            SOMETHING
  0x00000000004004f1 <+4>:     5d      pop    rbp    
  0x00000000004004f2 <+5>:     c3      ret

 

Actually I want to get an invalid opcode exception by the processor itself and not segmentation fault raised by software.

Any comment? Can i accomplish that with ICC?

0 Kudos
22 Replies
jimdempseyatthecove
Honored Contributor III
3,828 Views

The one byte ADD instruction must be followed by what is to be added (ModR/M byte) and optional bytes depending on ModR/M:

A.2.4.1 One-Byte Opcode Instructions
The opcode map for 1-byte opcodes is shown in Table A-2. The opcode map for 1-byte opcodes is arranged by row
(the least-significant 4 bits of the hexadecimal value) and column (the most-significant 4 bits of the hexadecimal
value). Each entry in the table lists one of the following types of opcodes:
• Instruction mnemonics and operand types using the notations listed in Section A.2
• Opcodes used as an instruction prefix
For each entry in the opcode map that corresponds to an instruction, the rules for interpreting the byte following
the primary opcode fall into one of the following cases:
• A ModR/M byte is required and is interpreted according to the abbreviations listed in Section A.1 and Chapter
2, “Instruction Format,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A.
Operand types are listed according to notations listed in Section A.2.
• A ModR/M byte is required and includes an opcode extension in the reg field in the ModR/M byte. Use Table A-6
when interpreting the ModR/M byte.
A-4 Vol. 2D
OPCODE MAP
• Use of the ModR/M byte is reserved or undefined. This applies to entries that represent an instruction prefix or
entries for instructions without operands that use ModR/M (for example: 60H, PUSHA; 06H, PUSH ES).
Example A-1. Look-up Example for 1-Byte Opcodes
Opcode 030500000000H for an ADD instruction is interpreted using the 1-byte opcode map (Table A-2) as follows:
• The first digit (0) of the opcode indicates the table row and the second digit (3) indicates the table column. This
locates an opcode for ADD with two operands.
• The first operand (type Gv) indicates a general register that is a word or doubleword depending on the operandsize
attribute. The second operand (type Ev) indicates a ModR/M byte follows that specifies whether the
operand is a word or doubleword general-purpose register or a memory address.
• The ModR/M byte for this instruction is 05H, indicating that a 32-bit displacement follows (00000000H). The
reg/opcode portion of the ModR/M byte (bits 3-5) is 000, indicating the EAX register.
The instruction for this opcode is ADD EAX, mem_op, and the offset of mem_op is 00000000H.
Some 1- and 2-byte opcodes point to group numbers (shaded entries in the opcode map table). Group numbers
indicate that the instruction uses the reg/opcode bits in the ModR/M byte as an opcode extension (refer to Section
A.4).

Jim Dempsey

0 Kudos
morca
Beginner
3,828 Views

Let me rewrite the problem in some other words.

1- When the CPU starts to fetch a new instruction, how does it know to fetch one or two or ... bytes? For me, what you say means that at the fetch cycle, CPU gets one bytes from memory and decodes it (sends the opcode to the decode stage). If it founds and add instruction, it will re-enter the fetch stage and gets one more bytes from memory? That means the two bytes are not consecutive and there are some cycles in between to see if it has to fetch second byte or not.

Is that correct? Where is this operation documented?

 

2- That 0x00 (add) was an example. Assume I have used another code which should be invalid. For example, I see "05 lw" as "add ax, imm16". A correct instruction then would be "05 00 01" which means "add ax, 0x0001". Now, I want to try "05 00" in order to raise invalid instruction exception from cpu and catch that manually instead of default segmentation fault. How can I do that?

0 Kudos
McCalpinJohn
Honored Contributor III
3,828 Views

Instruction fetching for variable-length instructions a large topic....   In the Intel 32 and Intel 64 architectures it is complicated by the compilation of "instructions" into "micro-ops" (uops).  The encoding of uops is entirely internal to the processor, so many of the details are not published.

At the hardware level, the instruction fetch unit in the processor will typically fetch a fixed number of bytes (usually 16 Bytes) and will then decide on the instruction boundaries within that block. 

Because determining instruction boundaries in the machine code is a serial operation, many processors have hardware support for minimizing the cost of this operation.  Some processors save "uops" (rather than the public assembly code) in the instruction cache (or in some smaller, less documented cache) after the instructions have been translated.  Other processors keep the original assembly code in the instruction cache, but keep markers for the beginning of each instruction after it has been decoded.

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,828 Views

Think of instruction execution as similar to buffered read from a file. A chunk of data (instructions in this case) is read in, then it parses the words (some short some long) from the buffer, with provisional handling of words (instructions) split across buffer boundaries.

To complicate matters you have the L1 Instruction cache (currently 64 bytes), followed (presumably) by the instruction pipeline fetch buffer(s) (dependent on architecture, potentially 16 bytes as John mentions), both of which include pre-fetch, one with stride prediction and the other with and branch prediction. This is just for starters. Then you may have "public instruction" to uop conversion and register remapping together with the ability to re-order the instruction sequence in a manner that is consistent with the "public instruction" sequence.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,828 Views

Note, if your intention is to insert a single byte into your code to permit the program to stop for you to look around, then you can insert 0xCC which is the INT 03 instruction. Then if a debugger is running an undefined break will occur, if a debugger is not running then an unhandled exception occurs.

Jim Dempsey

0 Kudos
McCalpinJohn
Honored Contributor III
3,828 Views

For the specific use case of generating an invalid opcode exception, there are a few instructions that are architecturally defined for this purpose: UD0, UD1, UD2.    All of these are multi-byte instructions.  The easiest is UD2, since it has no register arguments -- it is just the 2-Byte sequence 0F 0B.

0 Kudos
morca
Beginner
3,828 Views

Thanks for the explanations. I will try them. Meanwhile I tried 0x3f which is known to be AAS (ascii adjust). However, GDB says

3f (bad)

and when I run the program (compiled with gcc), I get "illegal instruction" error. I haven't tried icc but it seems that my problem is more general than the compiler itself.

0 Kudos
morca
Beginner
3,828 Views

jimdempseyatthecove (Blackbelt) wrote:

Note, if your intention is to insert a single byte into your code to permit the program to stop for you to look around, then you can insert 0xCC which is the INT 03 instruction. Then if a debugger is running an undefined break will occur, if a debugger is not running then an unhandled exception occurs.

Jim Dempsey

So, I use __asm__(".byte 0xcc, 0x00, 0xcc"); but I see

cc   int3

00 cc   add  %cl,%ah

0 Kudos
morca
Beginner
3,828 Views

McCalpin, John (Blackbelt) wrote:

For the specific use case of generating an invalid opcode exception, there are a few instructions that are architecturally defined for this purpose: UD0, UD1, UD2.    All of these are multi-byte instructions.  The easiest is UD2, since it has no register arguments -- it is just the 2-Byte sequence 0F 0B.

I tried 0F 0B and that yields "illegal instruction". However, that is not what I want. I don't want to explicitly write invalid instruction.

I am trying to find a way to detect/mask/catch the execution of invalid (custom) opcode.

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,828 Views

>>So, I use __asm__(".byte 0xcc, 0x00, 0xcc"); but I see

Which is

One byte Int03
Followed by two byte instruction sequence 0c00,0xCC (one byte instruction code followed by the ModR/M byte)

What you see is correct.

>>I am trying to find a way to detect/mask/catch the execution of invalid (custom) opcode.

Most debuggers have the ability to specify break on illegal instruction (as opposed to inserted break point). If, as an example, your test code executes AVX512 instruction on system without AVX512 (or say FMA instruction on system without FMA), then assure that your debugger traps on invalid instruction (if default isn't set that way already).

If you are rolling your own handler, look at the sections

INT n/INTO/INT 3 -- Call to Interrupt Procedure of Intel 64 and IA- 32 architectures Software Developer's Manual Volume 2
and
Chapter 6.4 INTERRUPTS AND EXCEPTIONS

of Intel 64 and IA- 32 architectures Software Developer's Manual Combined Volumes 1, 2A, 2B, 2C, 3A, 3B and 3C

Jim Dempsey

0 Kudos
morca
Beginner
3,828 Views

OK but before creating the handler, I would like to verify the method with the currently available handling method. The question is simple, but I don't know why the implementation is hard...

All I want to do is to give a random sequence of bytes and be notified the illegal instructions to handle them (with the default or my own handler). So, the sequences look like

FF FF

01 01 01

FE FE FE FE FE FE FE FE

Is that really possible to do? Maybe I am in the wrong direction...

0 Kudos
morca
Beginner
3,828 Views

Intel C. wrote:

Thanks for the explanations. I will try them. Meanwhile I tried 0x3f which is known to be AAS (ascii adjust). However, GDB says

3f (bad)

and when I run the program (compiled with gcc), I get "illegal instruction" error. I haven't tried icc but it seems that my problem is more general than the compiler itself.

Any comment on this?

0 Kudos
McCalpinJohn
Honored Contributor III
3,828 Views

I am not sure I understand....  In the first note, you say

Actually I want to get an invalid opcode exception by the processor itself and not segmentation fault raised by software.

When the processor is presented with an illegal instruction, it will tag the instruction to generate an exception when it is time for that instruction to retire.   In this case, the exception will be interrupt #6 ["Invalid Opcode Exception (#UD)"], which will be handled by software.  Software may refer to this as a "segmentation fault", but that is just a matter of reporting....  The classes of invalid opcodes that raise this exception are listed in Section 6.15 of Volume 3 of the Intel Architectures SW Developer's Manual (document 325384-070, May 2019).

Then later, you wrote:

I tried 0F 0B and that yields "illegal instruction". However, that is not what I want. I don't want to explicitly write invalid instruction.

From Section 6.15 of Volume 3 of the Intel Architectures SW Developer's Manual, Interrupt #6 is called the "Invalid Opcode Exception (#UD)" -- so  "invalid opcode" and "undefined instruction" clearly mean the same thing.

Are you looking for a way to *detect* illegal instructions in an executable file without executing it?   This requires an instruction parser, but if there are branches whose target is not visible at parse-time, then you can't guarantee that executing the program will not hit an illegal instruction exception.  Obfuscating code by generating branches into the middle of an instruction is a time-honored hacking technique in the x86 world....

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,828 Views

>>All I want to do is to give a random sequence of bytes and be notified the illegal instructions to handle them (with the default or my own handler).

Are you intending to discover which byte sequences generate Invalid Opcode Exception for a given CPU?

Rather than generating random sequences, I'd suggest you use the documented instruction set description, and then probe the undefined regions. Of course this will not tell you what a non-exception undocumented instruction sequence does, nor will it tell you what Intel intends for a new sequence. For a basis of your discovery program, you might want to look at an Intel 64/IA-32 emulator to help you to produce a set of tables of "known opcodes to skip", then skip over testing sequences found in "known opcodes to skip".

Jim Dempsey

0 Kudos
morca
Beginner
3,828 Views

John and Jim,

Appreciate your time for helping on this.... Although the primary goal is not about hacking, but it seems that this area is more famous and it is better for me to map my question into this space.

I want to increase the fault tolerancy of code execution. What are faults? 1) random bit changes in memory/icache or instruction register due to noises, 2) intentional bit changes for undefined/illegal/privileged instructions that are not documented (hacking).

So, I don't want to intentionally raise illegal instruction exception known as UD. But for some tests, I do have to inject and change a bit and raise UD to see what is what. So, I may get illegal instruction error but that is privileged. Or gcc/icc mark that as bad but CPU is able to run that (undocumented) and so on. Then, I am able to catch and mask illegal ones in case they are executed by CPU and handle them.

I found a blackhat video about breaking x86. Following that, I wrote a script to test some 15 bytes opcodes and see which one is known to be bad or a valid n-byte instruction.

For example, I have wrote 0xff,0xff,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00

Assuming that the first byte is the prefix and the second byte contains 6-bit opcode, both gcc and icc shows (output of objdump)

 

   4:   ff                      (bad)
   5:   ff 00                   inc    DWORD PTR [rax]
   7:   00 00                   add    BYTE PTR [rax],al
   9:   00 00                   add    BYTE PTR [rax],al
   b:   00 00                   add    BYTE PTR [rax],al
   d:   00 00                   add    BYTE PTR [rax],al
   f:   00 00                   add    BYTE PTR [rax],al
  11:   00 00                   add    BYTE PTR [rax],al

 

That means there is no FF prefix, but FF00 is valid instruction. If I run gcc|icc -g machine3.c -o machnine3, I get

$ ./machine3
Illegal instruction

That is because the first byte is considered as an instruction which is bad.

 

As another example, I see

4:    f0 06                    lock (bad)

However, 06 is a valid opcode. For example, I can verify

4:    00 06                    add    BYTE PTR [rsi],al

 

For me, it is not clear what is called an "invalid instruction" really. Although the terms looks simple, but it is complex indeed.

If we rely on the output of compiler, we have to know that they rely on the processor's document.

Any comment?

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,828 Views

>>I want to increase the fault tolerancy of code execution. What are faults? 1) random bit changes in memory/icache or instruction register due to noises, 2) intentional bit changes for undefined/illegal/privileged instructions that are not documented (hacking).

1) A random bit change in an instruction sequence is just as, if not more, likely to produce a valid instruction sequence al be it not the intended sequence. To protect against this, consider using a system with ECC memory.

2) Intentional malfeasance is a different story, use of Execute Only and/or SGX may be of help.

My 2 cents on the Specter and Meltdown and variants is that this is not a fault of the CPU architecture, but it is rather more of the fault of the Operating System not managing the virtual memory page tables properly. IOW the "protected" memory could not possibly be mis-used by branch not taken indirect JMP table, if that page of physical memory weren't still mapped (regardless of protection bits), or page mapped to a page with innocuous data.

Jim Dempsey 

0 Kudos
McCalpinJohn
Honored Contributor III
3,828 Views

Instruction encoding in x86-64 is certainly not easy.   I recommend spending some time reading Chapter 2 of Volume 2 of the Intel Architectures Software Developer's Manual (document 325383-070, May 2019), with another window open to Appendix A ("Opcode Map") of the same volume. 

From your example "F0 06", the first byte (F0) is the "LOCK" prefix, and the second byte (06) is the one-byte opcode for the "PUSH ES" instruction. "PUSH ES" is not valid in 64-bit mode (as noted in table A-2 and in the description for the PUSH instruction), so a LOCK PUSH ES is not valid either.

The alternate case "00 06" has no prefix.   From Table A-2, the first byte is the ADD (byte) instruction.  The table entry for opcode 00 includes the text "Eb, Gb".   To understand this field, you have to look in the "Key to Abbreviations" (Section A.2.1) and "Codes for Operand Types" (Section A.2.2).  From these, we see that the "E" in the table entry indicates that the opcode is followed by a "ModR/M" byte and the "G" field indicates that the register field of the ModR/M byte selects a general purpose register.   The "b" suffix indicates that the instruction operates on Byte (8-bit=r8) values.   (This is in agreement with the description of the ADD instruction on page 3-31.)

The meaning of the ModR/M byte comes from Table 2-2.  (Note that the encoding is the same as 32-bit mode, since there was no prefix on the instruction.)   From the text in Section 2.1.5 (above Tables 2-1 and 2-2), the 8 bits of the ModR/M byte are divided into three fields:

  • bits 7:6 are the "Mod" field -- in this case '00'
  • bits 5:3 are the "REG" field -- in this case '000'
  • bits 2:0 are the "RM" field -- in this case '110'

To use Table A-2, select the column using the "REG" field -- this is the first column, so the general purpose register is "AL".  Then look for the row with "Mod" = 00 and "RM" = 110.  This row says that the "Effective Address" is "[ESI]" -- the memory address pointed to by the ESI register.  Assuming that you are operating in 64-bit mode, addresses in registers are assumed to be 64-bit values, so this would be the memory address pointed to by the RSI register. 

So after all this bouncing back and forth between Chapter 2 and Appendix A, I have convinced myself that this instruction will load a byte from memory at the address pointed to by the contents of the RSI register and add that to the byte value in register A -- just as the compiler reports....

0 Kudos
morca
Beginner
3,828 Views

John, Thanks for the detailed explanation. It is quite complex. Although I am reading the chapters you mentioned, I see many weird things, IMO, that are not explained or are explained somewhere which I haven't seen.

For example, section 2.1.1 from Volume II says

Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string. Use these prefixes only with string and I/O instructions (MOVS, CMPS, SCAS, LODS, STOS, INS, and OUTS). Use of repeat prefixes and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

I did a test with F2,XX,00,00,00... where I used a fixed prefix and generated 256 possible second byte codes. I see so many instructions that are related to computation or stack or ...

f2 0b 00                 repnz or eax,DWORD PTR [rax]
f2 1a 00                 repnz sbb al,BYTE PTR [rax]
f2 1e                    repnz (bad)
f2 46 00 00              repnz rex.RX add BYTE PTR [rax],r8b
f2 5a                    repnz pop rdx
f2 95                    repnz xchg ebp,eax

So, are these the reserved instructions? If they are, compiler shouldn't generate them.

Moreover, the same section says

The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor environment. See “LOCK—Assert LOCK# Signal Prefix” in Chapter 3, “Instruction Set Reference, A-L,” for a description of this prefix.

But I see


26 f0 00 00              lock add BYTE PTR es:[rax],al
66 f0 00 00              data16 lock add BYTE PTR [rax],al

where the prefix is not F0.

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,828 Views

>>So, are these the reserved instructions? If they are, compiler shouldn't generate them.

No, you should not insert __asm__ to generate invalid instruction sequences. You got what you asked the compiler to generate.

Keep in mind that should your compiler precede a new CPU design, and you are required to generate a newer instruction sequence for that CPU, you'd have to resort to using __asm__(... to resolve the issue, and do so without the compiler issuing "Won't do this - don't know that instruction".

>>...where the prefix is not F0

2.1.1 Instruction Prefixes
Instruction prefixes are divided into four groups, each with a set of allowable prefix codes.
For each instruction, it is only useful to include up to one prefix code from each of the
four groups (Groups 1, 2, 3, 4).
Groups 1 through 4 may be placed in any order relative to each other.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,341 Views

>>26 f0 00 00              lock add BYTE PTR es:[rax],al

The disassembler could have displayed  es: lock  add BYTE PTR [rax],al
but by programming convention it knows better, and displays it better.

You could try: 36 26 f0 00 00

This is SS segment override, ES segment override, LOCK, add byte ptr [rax], al

Note the 2.1.1 section "it is only useful to include up to one prefix code from each of the four groups" does not state it is illegal to include more than one from a given group... AND it does not say which one is used should there be more than one. I assume one of them will be used, but it is not stated which one for a given CPU series.

Jim Dempsey

0 Kudos
Reply