Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Failed to enter protected mode in nested virtualization system

Tao_W_
Beginner
1,835 Views

I am testing VMX, and building a Linux kernel module to be as hypervisor.

The Linux kernel module is loaded into a Linux VM, where it is running in VMware workstation.

To test the hypervisor (kernel module), I used the code of bootasm.S http://pages.cs.wisc.edu/~skobov/cs537/P3/xv6/kernel/bootasm.S to build the VM, and loaded it to the hypervisor.

Once the VM is loaded, it can run well, but failed to long jump to start32:
I dumped the VM's GDT, the entries are as follows,
entry 0: 00000000000
entry 1: 00cf9a00,0000ffff
entry 2: 00cf9200,0000ffff
they looked good.
I suspected the instruction of ljmp may not be executed by real mode VM.
The VM's CR0 is 0x60000011, CR4 is 0x2000, RFLAGS is 0x3006.
 

This problem happened in a nested virtualization environment.

I tested it in a Linux system running in baremetal, it is working well.  

I don't know what is missed in my code.

Thanks,

-Tao

0 Kudos
6 Replies
Quoc-Thai_L_Intel
1,835 Views

Since you are running under VMware, could this be an issue under the VMware software?  You might want to check with them too on this issue.  I also have forwarded your question to my peers for any input. 

-Thai

0 Kudos
Quoc-Thai_L_Intel
1,835 Views

Hello, I got some feedback from my peer to ask you to try something:

Can you try following?

  

Replace

ljmp    $(SEG_KCODE<<3), $start32

with

 

  .byte   0x66, 0xea

 .long   start32

  .word  (SEG_KCODE<<3)

 

Regards,

-Thai

0 Kudos
Tao_W_
Beginner
1,835 Views

Thai,

Thank you very much for your reply.

I changed the code as you suggested, but it still failed to jump to start32.

Here is the Makefile,

all: bootblock
OBJDUMP=objdump
OBJCOPY=objcopy

CFLAGS = -fno-pic -static -fno-builtin -fno-strict-aliasing -Wall -MD -ggdb -m32 -Werror -fno-omit-frame-pointer
CFLAGS += $(shell $(CC) -fno-stack-protector -E -x c /dev/null >/dev/null 2>&1 && echo -fno-stack-protector)
ASFLAGS = -m32 -gdwarf-2 -Wa,-divide
# FreeBSD ld wants ``elf_i386_fbsd''
LDFLAGS += -m $(shell $(LD) -V | grep elf_i386 2>/dev/null)

bootblock: bootasm.s bootmain.c
        $(CC) $(CFLAGS) -fno-pic -nostdinc -I. -c bootasm.S
        $(CC) $(CFLAGS) -fno-pic -nostdinc -I. -c bootmain.c
        $(LD) $(LDFLAGS) -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o
        $(OBJDUMP) -S bootblock.o > bootblock.asm
        $(OBJCOPY) -S -O binary -j .text bootblock.o bootblock.bin

clean:
        rm *.o
        rm *.bin

 

the gcc version is

t@ubuntu:~/test/vmxx/kermod/linuxvmxx/toyvmm/vm$ cc -v
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.9' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)

Thanks,

-Tao

0 Kudos
Tao_W_
Beginner
1,835 Views

Here are more infor for your reference.

The VMCS for the VM

 0x0000003F = control_VMX_pin_based
 0xA501E1F2 = control_VMX_cpu_based
 0x00000082 = control_VMX_proc2_based
 0x00000000 = control_exception_bitmap
 0x00000000 = control_pagefault_errorcode_mask
 0xFFFFFFFF = control_pagefault_errorcode_match
 0x00000000 = control_CR3_target_count
 0x00036FFB = control_VM_exit_controls
 0x000011FB = control_VM_entry_controls
 0x00000000 = control_VM_entry_interruption_information
 0x00000000 = control_VM_entry_exception_errorcode
 0x00000000 = control_VM_entry_instruction_length

 0xFFFFFFFFFFFFFFF7 = control_CR0_mask
 0xFFFFFFFFFFFFF871 = control_CR4_mask
 0x0000000060000010 = control_CR0_shadow
 0x0000000000000000 = control_CR4_shadow
 0x0000000000000000 = control_CR3_target0
 0x00000000B2E98000 = control_CR3_target1
 0x0000000000000000 = control_CR3_target2
 0x0000000000000000 = control_CR3_target3

The VMX MSRs in the Linux VM ran in VMWare ESXi.

 VMX-Capability Model-Specific Registers

     00D8100000000001 = IA32_VMX_BASIC_MSR
     0000003F00000016 = IA32_VMX_PINBASED_CTLS_MSR
     FFF9FFFE0401E172 = IA32_VMX_PROCBASED_CTLS_MSR
     003FFFFF00036DFF = IA32_VMX_EXIT_CTLS_MSR
     0000F3FF000011FF = IA32_VMX_ENTRY_CTLS_MSR
     00000000000401E0 = IA32_VMX_MISC_MSR
     0000000080000021 = IA32_VMX_CR0_FIXED0_MSR
     00000000FFFFFFFF = IA32_VMX_CR0_FIXED1_MSR
     0000000000002000 = IA32_VMX_CR4_FIXED0_MSR
     00000000001727FF = IA32_VMX_CR4_FIXED1_MSR
     000000000000005A = IA32_VMX_VMCS_ENUM_MSR
     000038FE00000000 = IA32_VMX_PROCBASED_CTLS2
     00000F0106114141 = IA32_VMX_EPT_VPID_CAP
     0000003F00000016 = IA32_VMX_TRUE_PINBASED_CTLS
     FFF9FFFE04006172 = IA32_VMX_TRUE_PROCBASED_CTLS
     003FFFFF00036DFB = IA32_VMX_TRUE_EXIT_CTLS
     0000F3FF000011FB = IA32_VMX_TRUE_ENTRY_CTLS


 original_CR0=80050033  PG=1 CD=0 NW=0 AM=1 WP=1 NE=1 ET=1 TS=0 EM=0 MP=1 PE=1
 original_CR4=001406E0  VMXE=0 PGE=1 MCE=1 PAE=1 PSE=0 DE=0 TSD=0 PVI=0 VME=0

The guest status just before ljmp

 VMX Guest State

 CR0=0000000000000031  CR3=0000000000000000  CR4=0000000000002050

 RSP=00000000000017FA  SYSENTER_ESP=0000000000000000
 RIP=000000000000182E  SYSENTER_EIP=0000000000000000
 DR7=0000000000000400  SYSENTER_CS=00000000  RFLAGS=0000000000000006

   ES=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   CS=0000  [ base=0000000000000000 limit=0000FFFF rights=0000009B ]
   SS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   DS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   FS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   GS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
 LDTR=0000  [ base=0000000000000000 limit=0000FFFF rights=00000082 ]
   TR=0000  [ base=0000000000000000 limit=0000FFFF rights=0000008B ]
      GDTR  [ base=0000000000000000 limit=00000000 ]
      IDTR  [ base=0000000000000000 limit=0000FFFF ]

 EAX=60000011  ECX=00000000  ESI=00000000  ESP=000017FA   extints=0
 EBX=00000000  EDX=00000000  EDI=00000000  EBP=00000000   nmiints=0

 

thanks,

-Tao

0 Kudos
Quoc-Thai_L_Intel
1,835 Views

Thanks Tao!  I have forwarded your additional info. on to my peers. 

-Thai

0 Kudos
Tao_W_
Beginner
1,835 Views

Hi Thai,

Here is the updated guest infor and VMCS for your reference,

VMCS fields.
0x0000003F = control_VMX_pin_based
0xA501E1F2 = control_VMX_cpu_based
0x00000082 = control_VMX_proc2_based
0x00000000 = control_exception_bitmap
0x00000000 = control_pagefault_errorcode_mask
0xFFFFFFFF = control_pagefault_errorcode_match
0x00000000 = control_CR3_target_count
0x00036FFB = control_VM_exit_controls
0x000011FB = control_VM_entry_controls
0x00000000 = control_VM_entry_interruption_information
0x00000000 = control_VM_entry_exception_errorcode
0x00000000 = control_VM_entry_instruction_length

0xFFFFFFFFFFFFFFF7 = control_CR0_mask
0xFFFFFFFFFFFFF871 = control_CR4_mask
0x0000000060000010 = control_CR0_shadow
0x0000000000000000 = control_CR4_shadow
0x0000000000000000 = control_CR3_target0
0x00000000B7934000 = control_CR3_target1
0x0000000000000000 = control_CR3_target2
0x0000000000000000 = control_CR3_target3


Guest state:
CR0=0000000000000031  CR3=0000000000000000  CR4=0000000000002050

RSP=0000000000007BFA  SYSENTER_ESP=0000000000000000
RIP=0000000000007C2E  SYSENTER_EIP=0000000000000000
DR7=0000000000000400  SYSENTER_CS=00000000  RFLAGS=0000000000000006

   ES=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   CS=0000  [ base=0000000000000000 limit=0000FFFF rights=0000009B ]
   SS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   DS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   FS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   GS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
LDTR=0000  [ base=0000000000000000 limit=0000FFFF rights=00000082 ]
   TR=0000  [ base=0000000000000000 limit=0000FFFF rights=0000008B ]
      GDTR  [ base=0000000000007C3C limit=00000017 ]
      IDTR  [ base=0000000000000000 limit=0000FFFF ]

EAX=60000011  ECX=00000000  ESI=00000000  ESP=00007BFA   extints=0
EBX=00000000  EDX=00000000  EDI=00000000  EBP=00000000   nmiints=0

The cpuinfo of the Linux running in VMware is below.
processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
stepping        : 2
microcode       : 0x3c
cpu MHz         : 2397.291
cache size      : 15360 KB
physical id     : 2
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt arat
bugs            :
bogomips        : 4801.89
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:

 

The guest code is as follows,

#define SEG_KCODE 1  // kernel code
#define SEG_KDATA 2  // kernel data+stack
#define SEG_KCPU  3  // kernel per-cpu data
#define SEG_UCODE 4  // user code
#define SEG_UDATA 5  // user data+stack
#define SEG_TSS   6  // this process's task state

#define CR0_PE          0x00000001      // Protection Enable

#define SEG_NULLASM                                             \
    .word 0, 0;                                             \
    .byte 0, 0, 0, 0

// The 0xC0 means the limit is in 4096-byte units
// and (for executable segments) 32-bit mode.
#define SEG_ASM(type,base,lim)                                  \
        .word (((lim) >> 12) & 0xffff), ((base) & 0xffff);      \
        .byte (((base) >> 16) & 0xff), (0x90 | (type)),         \
        (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)

#define STA_X     0x8       // Executable segment
#define STA_E     0x4       // Expand down (non-executable segments)
#define STA_C     0x4       // Conforming code segment (executable only)
#define STA_W     0x2       // Writeable (non-executable segments)
#define STA_R     0x2       // Readable (executable segments)
#define STA_A       0x1 // Accessed
# Start the first CPU: switch to 32-bit protected mode, jump into C.

        .code16
        .global code16, code16_end
code16:
        xor %ecx, %ecx
        mov %cr3, %eax
        mov %eax, %cr3
    seta20.1:
        inb     $0x64,%al               # Wait for not busy
        testb   $0x2,%al
        jnz     seta20.1

        movb    $0xd1,%al               # 0xd1 -> port 0x64
        outb    %al,$0x64

    seta20.2:
        inb     $0x64,%al               # Wait for not busy
        testb   $0x2,%al
        jnz     seta20.2

        movb    $0xdf,%al               # 0xdf -> port 0x60
        outb    %al,$0x60

        wrmsr

        lgdt    gdtdesc
        movl    %cr0, %eax
        orl     $CR0_PE, %eax
        movl    %eax, %cr0

        rdmsr 
//PAGEBREAK!
# Complete transition to 32-bit protected mode by using long jmp
# to reload %cs and %eip.  The segment descriptors are set up with no
# translation, so that the mapping is still the identity mapping.

       .byte 0x66, 0xea
       .long start32
       .word (SEG_KCODE<<3)


        .code32  # Tell assembler to generate 32-bit code now.
start32:
cid:
        cpuid
        # Bootstrap GDT

        .p2align 2                                # force 4 byte alignment
gdt:
        SEG_NULLASM                              # NULL seg
        SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)   # code seg
        SEG_ASM(STA_W, 0x0, 0xffffffff)         # data seg

gdtdesc:

        .word   (gdtdesc - gdt - 1)             # sizeof(gdt) - 1

        .long   gdt
code16_end:

 

The Makefile to build it is,

G_CFLAGS = -fno-pic -static -fno-builtin -fno-strict-aliasing -Wall -MD -ggdb -m32 -Werror -fno-omit-frame-pointer
G_CFLAGS += $(shell $(CC) -fno-stack-protector -E -x c /dev/null >/dev/null 2>&1 && echo -fno-stack-protector)

 

        $(CC) $(G_CFLAGS) -fno-pic -nostdinc -I. -c code16.S
        $(LD) $(G_LDFLAGS) -N -e start -Ttext 0x7C00 -o bootblock.o code16.o
        $(OBJCOPY) -S -O binary -j .text bootblock.o bootblock.bin

Thank you very much for your help.

 

0 Kudos
Reply