Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Updated from IPP 7.1.1 to 8.2.1, seeing segmentation faults on AVX (e9)

Bob_Kirnum
Beginner
730 Views

We have been using the Intel IPP's for many years now (Dialogic was once an Intel Company :)).  A few years back we updated to version 7.1.1 and all was well until we ran into some segmentation faults on certain newer systems.  The crashes were on systems which supported AVX and AVX2 processors. We found that we were able to work around this by limiting the CPU type to AVX.

We recently updated to IPP 8.2.1 hoping that this limitation would no longer be required.  However, we are seeing more frequent segmentation faults on systems which support AVX using the e9 IPP functions.

First, in the crypto libraries.  This was from when we were originally using the deprecated functions.  Updating to the newer AES API's did not resolve this issue.

Apr 30 08:58:46 sut-1330 kernel: [6765] trap invalid opcode ip:7fe0be224e7a sp:7fde82bc8b80 error:0 in 

#0 0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm
#1 0x00007fde82bc8cd0 in ?? ()
#2 0x00007fe0be22425d in e9_ippsRijndael128EncryptCTR () from /usr/dialogic/data/ssp.mlm

Second . . . 

#0 0x00007fb554d4cee1 in e9_owniCopyReplicateBorder_8u_C1R ()

Debug I added indicating the IPP settings being used . . .

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: APInit.c.162:DisplayIPPCPUFeatures: 0x46 : 0x46

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: APInit.c.175:DisplayIPPCPUFeatures: Limiting from 0x46 to 0x46

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippCore 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippIP AVX (e9) 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippSP AVX (e9) 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippVC AVX (e9) 8.2.1 (r44077)

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: Processor supports Advanced Vector Extensions instruction set

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot:     8 cores on die

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippGetMaxCacheSizeB 20480 k

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: Available 0xfdf Enabled 0xfdf

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: MMX       A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE       A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE2      A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE3      A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSSE3     A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: MOVBE     X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE41     A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE42     A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX       A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX(OS)   A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AES       A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: CLMUL     A E

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ABR       X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: RDRRAND   X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: F16C      X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX2      X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ADCOX     X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: RDSEED    X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: PREFETCHW X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SHA       X X

Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: KNC       X X

 

We use gcc for building our product which links with the IPP libs.

gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.

redhat-release-5Server-5.4.0.3
redhat-release-notes-5Server-29

 

 

0 Kudos
16 Replies
Ying_H_Intel
Employee
730 Views
Hi Bob, Sorry for that. It seems unknown problem. Do you have a small test code for example, ippiCopyReplicateBorder() ,which can reproduce the problem on these system? pp8u src[8*4] = {5, 4, 3, 4, 5, 8, 8, 8, 3, 2, 1, 2, 3, 8, 8, 8, 3, 2, 1, 2, 3, 8, 8, 8, 5, 4, 3, 4, 5, 8, 8, 8}; Ipp8u dst[9*8]; IppiSize srcRoi = { 5, 4 }; IppiSize dstRoi = { 9, 8 }; int topborderHeight = 2; int leftborderWidth = 2; ippiCopyReplicateBorder_8u_C1R(src, 8, srcRoi, dst, 9, dstRoi, topBorderHeight, leftBorderWidth); Best Regards, Ying
0 Kudos
Igor_A_Intel
Employee
730 Views

Hi Bob,

could you provide opcodes at "trap invalid opcode ip:7fe0be224e7a"? - I mean you should use disasm cmd (under gdb) and provide a fragment of +-10 disassembler lines near the address ip:7fe0be224e7a (the invalid op-code will be pointed by an arrow).

regards, Igor

0 Kudos
Bob_Kirnum
Beginner
730 Views

Hey Igor,

Thanks for the reply, and here you go . . . 

   0x00007fe0be224e6c <+460>:   aesenc -0x20(%r10),%xmm0

   0x00007fe0be224e73 <+467>:   aesenc -0x10(%r10),%xmm0

=> 0x00007fe0be224e7a <+474>:   aesenc (%r10),%xmm0

   0x00007fe0be224e80 <+480>:   aesenc 0x10(%r10),%xmm0

  0x00007fe0be224e87 <+487>:   aesenc 0x20(%r10),%xmm0

 

Thanks,

Bob

0 Kudos
Igor_A_Intel
Employee
730 Views

Hey Bob,

I'm curious why several "aesenc" instructions are valid, while the next one raises "invalid opcode" exception. Could you provide some more info from the gdb séance:

1) x /100b $r10                      - in order to be sure that this is not access violation while accessing memory pointed by r10

2) x /100b 0x7fe0be224e6c      - in order to be sure that "aesenc" encoding is correct (gdb understands encoding, but for sure...)

from my side I'll check/disasm 8.2.1 e9 crypto binaries and make sure that this code is called/works in our test system (each IPP function has a number of mandatory tests - algorithm, bad argument, misalignment, multi-thread safety, mem-bound, performance, etc.)

regards, Igor

0 Kudos
Igor_A_Intel
Employee
730 Views

This code works well in our environment, I do not believe in miracles - let's find the root of this issue...

OS: RedHat_6.3_x86_64
Memory: 8GB
CPUCount: 1   CoreCount: 4   HT: no
CPU Model: Genuine Intel(R) CPU 0000 @ 2.60GHz

bash-4.1$ gdb ts_ippcp_mrg_compl_st_gcc412
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /nfs/inn/disks/sv-ssg_ipp_sandbox/usr/iastakh/cp/ts_ippcp_mrg_compl_st_gcc412...(no debugging symbols found)...done.
(gdb) set args -B -o -TAVX -fippsRijndael128EncryptCTR
(gdb) b e9_EncryptCTR_RIJ128pipe_AES_NI
Breakpoint 1 at 0x986fa0
(gdb) r
Starting program: /nfs/inn/disks/sv-ssg_ipp_sandbox/usr/iastakh/cp/ts_ippcp_mrg_compl_st_gcc412 -B -o -TAVX -fippsRijndael128EncryptCTR
[Thread debugging using libthread_db enabled]
-T AVX: ippInitCpu = ippStsNoErr: No errors.
ippGetNumThreads: 1
+----------------------------------------------------------------------------+
| CPU      : Genuine Intel(R) processor 4x2.6 GHz,                           |
|                                                                            |
| OS       : Linux (2.6.32-279.el6.x86_64, x86_64)                           |
| Library  : ippCP AVX (e9), 8.2.1 (r44077), Oct  9 2014                     |
| Library  : ippCore, 8.2.1 (r44077), Oct  9 2014                            |
|                                                    Wed May  6 16:43:50 2015|
+----------------------------------------------------------------------------+
-T AVX: ippInitCpu = ippStsNoErr: No errors.
ippGetNumThreads: 1
+----------------------------------------------------------------------------+
|Test        : tsRijn128_EncDecCTR_Alg               Wed May  6 16:43:50 2015|
|Function    : ippsRijndael128EncryptCTR / ippsRijndael128DecryptCTR         |
|Description : Algorithm's test for functions.                               |
|Class       : Algorithm                                                     |
|Source      : ts_rijnctr_vb.cpp                                             |
|Executable  : ts_ippcp_mrg_compl_st_gcc412                                  |
+----------------------------------------------------------------------------+
*** Beginning of the test:
msg size (bytes) =16

Breakpoint 1, 0x0000000000986fa0 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64

(gdb) display /i $pc

1: x/i $pc
=> 0x986fa0 <e9_EncryptCTR_RIJ128pipe_AES_NI>:  push   %rbx
(gdb) si
0x0000000000986fa1 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x986fa1 <e9_EncryptCTR_RIJ128pipe_AES_NI+1>:        mov    0x10(%rsp),%rax


(gdb) disas
Dump of assembler code for function e9_EncryptCTR_RIJ128pipe_AES_NI:
=> 0x0000000000986fa0 <+0>:     push   %rbx
   0x0000000000986fa1 <+1>:     mov    0x10(%rsp),%rax
   0x0000000000986fa6 <+6>:     movdqu (%rax),%xmm8
   0x0000000000986fab <+11>:    movdqu (%r9),%xmm0
   0x0000000000986fb0 <+16>:    movdqa %xmm8,%xmm9
   0x0000000000986fb5 <+21>:    pandn  %xmm0,%xmm9
   0x0000000000986fba <+26>:    mov    (%r9),%rbx
   0x0000000000986fbd <+29>:    mov    0x8(%r9),%rax
   0x0000000000986fc1 <+33>:    bswap  %rbx
   0x0000000000986fc4 <+36>:    bswap  %rax
   0x0000000000986fc7 <+39>:    movslq %r8d,%r8
   0x0000000000986fca <+42>:    sub    $0x40,%r8
   0x0000000000986fce <+46>:    jl     0x987117 <e9_EncryptCTR_RIJ128pipe_AES_NI+375>
   0x0000000000986fd4 <+52>:    movdqa -0x5c(%rip),%xmm4        # 0x986f80 <e9_EncryptCFB128_RIJ128_AES_NI+160>
   0x0000000000986fdc <+60>:    pinsrq $0x0,%rax,%xmm0
   0x0000000000986fe3 <+67>:    pinsrq $0x1,%rbx,%xmm0
   0x0000000000986fea <+74>:    pshufb %xmm4,%xmm0
   0x0000000000986fef <+79>:    pand   %xmm8,%xmm0
   0x0000000000986ff4 <+84>:    por    %xmm9,%xmm0
   0x0000000000986ff9 <+89>:    add    $0x1,%rax
   0x0000000000986ffd <+93>:    adc    $0x0,%rbx
   0x0000000000987001 <+97>:    pinsrq $0x0,%rax,%xmm1
   0x0000000000987008 <+104>:   pinsrq $0x1,%rbx,%xmm1
   0x000000000098700f <+111>:   pshufb %xmm4,%xmm1
   0x0000000000987014 <+116>:   pand   %xmm8,%xmm1
   0x0000000000987019 <+121>:   por    %xmm9,%xmm1
   0x000000000098701e <+126>:   add    $0x1,%rax
   0x0000000000987022 <+130>:   adc    $0x0,%rbx
   0x0000000000987026 <+134>:   pinsrq $0x0,%rax,%xmm2
   0x000000000098702d <+141>:   pinsrq $0x1,%rbx,%xmm2
   0x0000000000987034 <+148>:   pshufb %xmm4,%xmm2
   0x0000000000987039 <+153>:   pand   %xmm8,%xmm2
   0x000000000098703e <+158>:   por    %xmm9,%xmm2
   0x0000000000987043 <+163>:   add    $0x1,%rax
   0x0000000000987047 <+167>:   adc    $0x0,%rbx
   0x000000000098704b <+171>:   pinsrq $0x0,%rax,%xmm3
   0x0000000000987052 <+178>:   pinsrq $0x1,%rbx,%xmm3
   0x0000000000987059 <+185>:   pshufb %xmm4,%xmm3
   0x000000000098705e <+190>:   pand   %xmm8,%xmm3
   0x0000000000987063 <+195>:   por    %xmm9,%xmm3
   0x0000000000987068 <+200>:   movdqa (%rcx),%xmm4
   0x000000000098706c <+204>:   mov    %rcx,%r10
   0x000000000098706f <+207>:   pxor   %xmm4,%xmm0
   0x0000000000987073 <+211>:   pxor   %xmm4,%xmm1
   0x0000000000987077 <+215>:   pxor   %xmm4,%xmm2
   0x000000000098707b <+219>:   pxor   %xmm4,%xmm3
   0x000000000098707f <+223>:   movdqa 0x10(%r10),%xmm4
   0x0000000000987085 <+229>:   add    $0x10,%r10
   0x0000000000987089 <+233>:   mov    %rdx,%r11
   0x000000000098708c <+236>:   sub    $0x1,%r11
   0x0000000000987090 <+240>:   aesenc %xmm4,%xmm0
   0x0000000000987095 <+245>:   aesenc %xmm4,%xmm1
   0x000000000098709a <+250>:   aesenc %xmm4,%xmm2
   0x000000000098709f <+255>:   aesenc %xmm4,%xmm3
   0x00000000009870a4 <+260>:   movdqa 0x10(%r10),%xmm4
   0x00000000009870aa <+266>:   add    $0x10,%r10
   0x00000000009870ae <+270>:   dec    %r11
   0x00000000009870b1 <+273>:   jne    0x987090 <e9_EncryptCTR_RIJ128pipe_AES_NI+240>
   0x00000000009870b3 <+275>:   aesenclast %xmm4,%xmm0
   0x00000000009870b8 <+280>:   aesenclast %xmm4,%xmm1
   0x00000000009870bd <+285>:   aesenclast %xmm4,%xmm2
   0x00000000009870c2 <+290>:   aesenclast %xmm4,%xmm3
   0x00000000009870c7 <+295>:   movdqu (%rdi),%xmm4
   0x00000000009870cb <+299>:   movdqu 0x10(%rdi),%xmm5
   0x00000000009870d0 <+304>:   movdqu 0x20(%rdi),%xmm6
   0x00000000009870d5 <+309>:   movdqu 0x30(%rdi),%xmm7
   0x00000000009870da <+314>:   add    $0x40,%rdi
   0x00000000009870de <+318>:   pxor   %xmm4,%xmm0
   0x00000000009870e2 <+322>:   movdqu %xmm0,(%rsi)
   0x00000000009870e6 <+326>:   pxor   %xmm5,%xmm1
   0x00000000009870ea <+330>:   movdqu %xmm1,0x10(%rsi)
   0x00000000009870ef <+335>:   pxor   %xmm6,%xmm2
   0x00000000009870f3 <+339>:   movdqu %xmm2,0x20(%rsi)
   0x00000000009870f8 <+344>:   pxor   %xmm7,%xmm3
   0x00000000009870fc <+348>:   movdqu %xmm3,0x30(%rsi)
   0x0000000000987101 <+353>:   add    $0x1,%rax
   0x0000000000987105 <+357>:   adc    $0x0,%rbx
   0x0000000000987109 <+361>:   add    $0x40,%rsi
   0x000000000098710d <+365>:   sub    $0x40,%r8
   0x0000000000987111 <+369>:   jge    0x986fd4 <e9_EncryptCTR_RIJ128pipe_AES_NI+52>
   0x0000000000987117 <+375>:   add    $0x40,%r8
   0x000000000098711b <+379>:   je     0x987217 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
   0x0000000000987121 <+385>:   lea    0x0(,%rdx,4),%r10
   0x0000000000987129 <+393>:   lea    -0x90(%rcx,%r10,4),%r10
   0x0000000000987131 <+401>:   pinsrq $0x0,%rax,%xmm0
   0x0000000000987138 <+408>:   pinsrq $0x1,%rbx,%xmm0
   0x000000000098713f <+415>:   pshufb -0x1c8(%rip),%xmm0        # 0x986f80 <e9_EncryptCFB128_RIJ128_AES_NI+160>
   0x0000000000987148 <+424>:   pand   %xmm8,%xmm0
   0x000000000098714d <+429>:   por    %xmm9,%xmm0
   0x0000000000987152 <+434>:   pxor   (%rcx),%xmm0
   0x0000000000987156 <+438>:   cmp    $0xc,%rdx
   0x000000000098715a <+442>:   jl     0x98717a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
   0x000000000098715c <+444>:   je     0x98716c <e9_EncryptCTR_RIJ128pipe_AES_NI+460>
   0x000000000098715e <+446>:   aesenc -0x40(%r10),%xmm0
   0x0000000000987165 <+453>:   aesenc -0x30(%r10),%xmm0
   0x000000000098716c <+460>:   aesenc -0x20(%r10),%xmm0
   0x0000000000987173 <+467>:   aesenc -0x10(%r10),%xmm0
   0x000000000098717a <+474>:   aesenc (%r10),%xmm0
   0x0000000000987180 <+480>:   aesenc 0x10(%r10),%xmm0
   0x0000000000987187 <+487>:   aesenc 0x20(%r10),%xmm0
   0x000000000098718e <+494>:   aesenc 0x30(%r10),%xmm0
   0x0000000000987195 <+501>:   aesenc 0x40(%r10),%xmm0
   0x000000000098719c <+508>:   aesenc 0x50(%r10),%xmm0
   0x00000000009871a3 <+515>:   aesenc 0x60(%r10),%xmm0
   0x00000000009871aa <+522>:   aesenc 0x70(%r10),%xmm0
   0x00000000009871b1 <+529>:   aesenc 0x80(%r10),%xmm0
   0x00000000009871bb <+539>:   aesenclast 0x90(%r10),%xmm0
   0x00000000009871c5 <+549>:   add    $0x1,%rax
   0x00000000009871c9 <+553>:   adc    $0x0,%rbx
   0x00000000009871cd <+557>:   sub    $0x10,%r8
   0x00000000009871d1 <+561>:   jl     0x9871f2 <e9_EncryptCTR_RIJ128pipe_AES_NI+594>
   0x00000000009871d3 <+563>:   movdqu (%rdi),%xmm4
   0x00000000009871d7 <+567>:   pxor   %xmm4,%xmm0
   0x00000000009871db <+571>:   movdqu %xmm0,(%rsi)
   0x00000000009871df <+575>:   add    $0x10,%rdi
   0x00000000009871e3 <+579>:   add    $0x10,%rsi
   0x00000000009871e7 <+583>:   cmp    $0x0,%r8
   0x00000000009871eb <+587>:   je     0x987217 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
   0x00000000009871ed <+589>:   jmpq   0x987131 <e9_EncryptCTR_RIJ128pipe_AES_NI+401>
   0x00000000009871f2 <+594>:   add    $0x10,%r8
   0x00000000009871f6 <+598>:   pextrb $0x0,%xmm0,%r10d
   0x00000000009871fd <+605>:   psrldq $0x1,%xmm0
   0x0000000000987202 <+610>:   movzbl (%rdi),%r11d
   0x0000000000987206 <+614>:   xor    %r11,%r10
   0x0000000000987209 <+617>:   mov    %r10b,(%rsi)
   0x000000000098720c <+620>:   inc    %rdi
   0x000000000098720f <+623>:   inc    %rsi
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) b *0x98717a
Breakpoint 2 at 0x98717a
(gdb) d 1
(gdb) c
Continuing.

Breakpoint 2, 0x000000000098717a in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x98717a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>:      aesenc (%r10),%xmm0
(gdb) si
0x0000000000987180 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x987180 <e9_EncryptCTR_RIJ128pipe_AES_NI+480>:      aesenc 0x10(%r10),%xmm0
(gdb)
0x0000000000987187 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x987187 <e9_EncryptCTR_RIJ128pipe_AES_NI+487>:      aesenc 0x20(%r10),%xmm0
(gdb)
0x000000000098718e in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x98718e <e9_EncryptCTR_RIJ128pipe_AES_NI+494>:      aesenc 0x30(%r10),%xmm0
(gdb)
0x0000000000987195 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x987195 <e9_EncryptCTR_RIJ128pipe_AES_NI+501>:      aesenc 0x40(%r10),%xmm0
(gdb)
0x000000000098719c in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x98719c <e9_EncryptCTR_RIJ128pipe_AES_NI+508>:      aesenc 0x50(%r10),%xmm0
(gdb)
0x00000000009871a3 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x9871a3 <e9_EncryptCTR_RIJ128pipe_AES_NI+515>:      aesenc 0x60(%r10),%xmm0
(gdb)

regards, Igor

0 Kudos
Bob_Kirnum
Beginner
730 Views

Registers below. FYI, this is a Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GH.

(gdb) info reg

rax            0x1d080a499a2a0000       2091933338148929536

rbx            0xfd53b43722a3b0c3       -192612210149445437

rcx            0x7fe0a6d263f0   140602848207856

rdx            0xa      10

rsi            0x7fe0a720a4d4   140602853336276

rdi            0x7fde82bc8cf0   140593652862192

rbp            0x7fde82bc8c70   0x7fde82bc8c70

rsp            0x7fde82bc8b80   0x7fde82bc8b80

r8             0x10     16

r9             0x7fde82bc8cd0   140593652862160

r10            0x7fe0a6d26400   140602848207872

r11            0x7fde82bc8ba0   140593652861856

r12            0x7fde82bc8cf0   140593652862192

r13            0x7fde82bc8bae   140593652861870

r14            0x0      0

r15            0x7fe0a720a4d4   140602853336276

rip            0x7fe0be224e7a   0x7fe0be224e7a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>

eflags         0x10293  [ CF AF SF IF RF ]

cs             0x33     51

ss             0x2b     43

ds             0x0      0

es             0x0      0

fs             0x0      0

gs             0x0      0

0 Kudos
Igor_A_Intel
Employee
730 Views

Bob, I didn't ask for registers' content, I asked for

1) x /100b $r10                      - in order to be sure that this is not access violation while accessing memory pointed by r10

2) x /100b 0x7fe0be224e6c      - in order to be sure that "aesenc" encoding is correct (gdb understands encoding, but for sure...)

"x /100b $r10" gdb instruction means the next: "x" - examine memory, "/100b" - show the first 100 bytes, "$r10" - address at which examine (address that is currently, at trap, in r10

"x /100b 0x7fe0be224e6c" - the same, but examines memory at address where "aesenc" instructions started - to understand if encoding is correct.

regards, Igor

0 Kudos
Bob_Kirnum
Beginner
730 Views

1) x /100b $r10 - in order to be sure that this is not access violation while accessing memory pointed by r10

(gdb) x /100b $r10
0x7fc538010300: -62     50      -97     -15     -102    41      27      -98
0x7fc538010308: -65     85      -73     -88     -31     -45     121     89
0x7fc538010310: -90     -124    84      9       60      -83     79      -105
0x7fc538010318: -125    -8      -8      63      98      43      -127    102
0x7fc538010320: 83      -120    103     -93     111     37      40      52
0x7fc538010328: -20     -35     -48     11      -114    -10     81      109
0x7fc538010330: 25      89      91      -70     118     124     115     -114
0x7fc538010338: -102    -95     -93     -123    20      87      -14     -24
0x7fc538010340: 82      -48     -64     64      36      -84     -77     -50
0x7fc538010348: -66     13      16      75      -86     90      -30     -93
0x7fc538010350: -52     72      -54     -20     -24     -28     121     34
0x7fc538010358: 86      -23     105     105     -4      -77     -117    -54
0x7fc538010360: -31     117     -66     92

2) x /100b 0x7fe0be224e6c - in order to be sure that "aesenc" encoding is correct (gdb understands encoding, but for sure...)

(gdb)  x /100b 0x7fe0be224e6c
0x7fe0be224e6c: Cannot access memory at address 0x7fe0be224e6c

 

0 Kudos
Bob_Kirnum
Beginner
730 Views

Attached a more complete log from a single source to ensure the details are consistent.  The previous details are mixed from several people looking at the same issue.

Program terminated with signal 4, Illegal instruction.
#0  0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm


Backtrace

#0  0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm
#1  0x00007fde82bc8cd0 in ?? ()
#2  0x00007fe0be22425d in e9_ippsRijndael128EncryptCTR () from /usr/dialogic/data/ssp.mlm
#3  0x00007fe0bcf7d8e3 in srtpKeyDerivation (pDynamicInfo=0x7fe0a720a48c, KeyDerivationRate=<value optimized out>, pKey=0x7fe0b1366388, index=4, label=<value optimized out>)
    at srtpalg.c:319
#4  0x00007fe0bcf7b001 in srtpFromRtp (pSrtpObj=0x7fe0a720a48c, pPktData=0x7fe087cd4bd4 "\200", size=0x7fde82bc8dec) at srtp.c:796
#5  0x00007fe0bcf64dbb in rtpEncrypt (portType=<value optimized out>, srtp=0x1d080a499a2a0000, data=0x7fe0a720a4d4 "", count=0x7fe0a6d263f0) at rtpport.c:1134
#6  0x00007fe0bcf65070 in rtpSendPort (prtpHandle=0x7fded3bb7f18, handle=0x7fe0aad8203c, data=0x7fe087cd4bd4 "\200", count=172) at rtpport.c:1219
#7  0x00007fe0bcf4c13a in DoEncoder (handle=0x7fe0ab06a4bc, buf=0x7fe0810242d0, size=160, count=<value optimized out>, pCoder=0x7fe081024248, beforePktSendTimestampUpdSize=160, 
    afterPktSendTimestampUpdSize=160, pkt=0x7fe0b1374cd4) at pio.c:2308
#8  pioWrite (handle=0x7fe0ab06a4bc, buf=0x7fe0810242d0, size=160, count=<value optimized out>, pCoder=0x7fe081024248, beforePktSendTimestampUpdSize=160, afterPktSendTimestampUpdSize=160, 
    pkt=0x7fe0b1374cd4) at pio.c:705
#9  0x00007fe0bcf19563 in ptxWorkFxn (ptx=0x7fe081024200, pTaskMem=0x7fe0b400041c, cIndex=416, pRealTimeTraceItems=<value optimized out>, weightMask=0x7fde82bc9e1f "\001") at ptx.c:5954
#10 0x00007fe0bcf1dbc1 in ptx_workfxn (ptx=0x7fe081024200, pTaskMem=0x7fe0a720a4d4, cIndex=2193394896, pRealTimeTraceItems=0x7fe0a6d263f0, weightMask=0x10 <Address 0x10 out of bounds>)
    at ptx.c:4604
#11 0x00007fe0bcf050f7 in wrkTaskFxn (args=<value optimized out>) at wrk.c:1419
#12 0x00007fe0caff910c in helperEntry (pInfo=0x7fdee44c0100) at source/GEN_threadulx.c:871
#13 0x00007fe0cba32851 in start_thread (arg=0x7fde82bca700) at pthread_create.c:301
#14 0x00007fe0cb78090d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115


Contents of r10 and xmm

r10            0x7fe0a6d26400    140602848207872

xmm0           {
  v4_float = {0x0, 0xf4764800, 0x0, 0x11e78000}, 
  v2_double = {0x8000000000000000, 0x8000000000000000}, 
  v16_int8 = {0x97, 0x87, 0x81, 0x28, 0x92, 0x1d, 0x3d, 0x50, 0x36, 0xc7, 0xf9, 0xa4, 0x31, 0xdc, 0x8f, 0xd2}, 
  v8_int16 = {0x8797, 0x2881, 0x1d92, 0x503d, 0xc736, 0xa4f9, 0xdc31, 0xd28f}, 
  v4_int32 = {0x28818797, 0x503d1d92, 0xa4f9c736, 0xd28fdc31}, 
  v2_int64 = {0x503d1d9228818797, 0xd28fdc31a4f9c736}, 
  uint128 = 0xd28fdc31a4f9c736503d1d9228818797
}


Output of GDB "info all-registers" command

rax            0x1d080a499a2a0000    2091933338148929536
rbx            0xfd53b43722a3b0c3    -192612210149445437
rcx            0x7fe0a6d263f0    140602848207856
rdx            0xa    10
rsi            0x7fe0a720a4d4    140602853336276
rdi            0x7fde82bc8cf0    140593652862192
rbp            0x7fde82bc8c70    0x7fde82bc8c70
rsp            0x7fde82bc8b80    0x7fde82bc8b80
r8             0x10    16
r9             0x7fde82bc8cd0    140593652862160
r10            0x7fe0a6d26400    140602848207872
r11            0x7fde82bc8ba0    140593652861856
r12            0x7fde82bc8cf0    140593652862192
r13            0x7fde82bc8bae    140593652861870
r14            0x0    0
r15            0x7fe0a720a4d4    140602853336276
rip            0x7fe0be224e7a    0x7fe0be224e7a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
eflags         0x10293    [ CF AF SF IF RF ]
cs             0x33    51
ss             0x2b    43
ds             0x0    0
es             0x0    0
fs             0x0    0
gs             0x0    0
st0            0    (raw 0x00000000000000000000)
st1            0    (raw 0x00000000000000000000)
st2            0    (raw 0x00000000000000000000)
st3            0    (raw 0x00000000000000000000)
st4            0    (raw 0x00000000000000000000)
st5            0    (raw 0x00000000000000000000)
st6            0    (raw 0x00000000000000000000)
st7            0    (raw 0x00000000000000000000)
fctrl          0x37f    895
fstat          0x0    0
ftag           0xffff    65535
fiseg          0x0    0
fioff          0x0    0
foseg          0x0    0
fooff          0x0    0
fop            0x0    0
mxcsr          0x1fa1    [ IE PE IM DM ZM OM UM PM ]
ymm0           {
  v8_float = {0x0, 0xf4764800, 0x0, 0x11e78000, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, 
  v32_int8 = {0x97, 0x87, 0x81, 0x28, 0x92, 0x1d, 0x3d, 0x50, 0x36, 0xc7, 0xf9, 0xa4, 0x31, 0xdc, 0x8f, 0xd2, 0x0 <repeats 16 times>}, 
  v16_int16 = {0x8797, 0x2881, 0x1d92, 0x503d, 0xc736, 0xa4f9, 0xdc31, 0xd28f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0x28818797, 0x503d1d92, 0xa4f9c736, 0xd28fdc31, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0x503d1d9228818797, 0xd28fdc31a4f9c736, 0x0, 0x0}, 
  v2_int128 = {0xd28fdc31a4f9c736503d1d9228818797, 0x00000000000000000000000000000000}
}
ymm1           {
  v8_float = {0xc0000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x8000000000000000, 0x0, 0x0}, 
  v32_int8 = {0x3d, 0x88, 0x60, 0xda, 0x54, 0x99, 0x67, 0x1, 0x14, 0x84, 0xf2, 0x25, 0x72, 0x0, 0x72, 0x72, 0x0 <repeats 16 times>}, 
  v16_int16 = {0x883d, 0xda60, 0x9954, 0x167, 0x8414, 0x25f2, 0x72, 0x7272, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xda60883d, 0x1679954, 0x25f28414, 0x72720072, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0x1679954da60883d, 0x7272007225f28414, 0x0, 0x0}, 
  v2_int128 = {0x7272007225f2841401679954da60883d, 0x00000000000000000000000000000000}
}
ymm2           {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, 
  v32_int8 = {0x0, 0x4, 0x8, 0xc, 0xff <repeats 12 times>, 0x0 <repeats 16 times>}, 
  v16_int16 = {0x400, 0xc08, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xc080400, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xffffffff0c080400, 0xffffffffffffffff, 0x0, 0x0}, 
  v2_int128 = {0xffffffffffffffffffffffff0c080400, 0x00000000000000000000000000000000}
}
ymm3           {
  v8_float = {0x5f400000, 0x0, 0x43c00000, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0x6, 0x5, 0xbe, 0xd5, 0x0, 0x0, 0x0, 0x0, 0xf, 0x85, 0x38, 0x56, 0x0 <repeats 20 times>}, 
  v16_int16 = {0x506, 0xd5be, 0x0, 0x0, 0x850f, 0x5638, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xd5be0506, 0x0, 0x5638850f, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xd5be0506, 0x5638850f, 0x0, 0x0}, 
  v2_int128 = {0x000000005638850f00000000d5be0506, 0x00000000000000000000000000000000}
}
ymm4           {
  v8_float = {0x0, 0x0, 0x5f400000, 0x43c00000, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, 
  v32_int8 = {0xbf, 0x6d, 0x7a, 0xeb, 0xc2, 0xa3, 0x40, 0x5f, 0x6, 0x5, 0xbe, 0xd5, 0xf, 0x85, 0x38, 0x56, 0x0 <repeats 16 times>}, 
  v16_int16 = {0x6dbf, 0xeb7a, 0xa3c2, 0x5f40, 0x506, 0xd5be, 0x850f, 0x5638, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xeb7a6dbf, 0x5f40a3c2, 0xd5be0506, 0x5638850f, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0x5f40a3c2eb7a6dbf, 0x5638850fd5be0506, 0x0, 0x0}, 
  v2_int128 = {0x5638850fd5be05065f40a3c2eb7a6dbf, 0x00000000000000000000000000000000}
}
ymm5           {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0xbf, 0x6d, 0x7a, 0xeb, 0x0, 0x0, 0x0, 0x0, 0xc2, 0xa3, 0x40, 0x5f, 0x0 <repeats 20 times>}, 
  v16_int16 = {0x6dbf, 0xeb7a, 0x0, 0x0, 0xa3c2, 0x5f40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xeb7a6dbf, 0x0, 0x5f40a3c2, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xeb7a6dbf, 0x5f40a3c2, 0x0, 0x0}, 
  v2_int128 = {0x000000005f40a3c200000000eb7a6dbf, 0x00000000000000000000000000000000}
}
ymm6           {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0xf1, 0x0, 0x0, 0x0, 0xe1, 0x0, 0x0, 0x0, 0xa4, 0x0, 0x0, 0x0, 0xf2, 0x0 <repeats 19 times>}, 
  v16_int16 = {0xf1, 0x0, 0xe1, 0x0, 0xa4, 0x0, 0xf2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xf1, 0xe1, 0xa4, 0xf2, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xe1000000f1, 0xf2000000a4, 0x0, 0x0}, 
  v2_int128 = {0x000000f2000000a4000000e1000000f1, 0x00000000000000000000000000000000}
}
ymm7           {
  v8_float = {0xfffffff1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0x2a, 0x95, 0x72, 0xc1, 0x6c, 0xcf, 0x68, 0x84, 0x62, 0x15, 0x7a, 0xe9, 0x16, 0x22, 0x35, 0x9b, 0x0 <repeats 16 times>}, 
  v16_int16 = {0x952a, 0xc172, 0xcf6c, 0x8468, 0x1562, 0xe97a, 0x2216, 0x9b35, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xc172952a, 0x8468cf6c, 0xe97a1562, 0x9b352216, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0x8468cf6cc172952a, 0x9b352216e97a1562, 0x0, 0x0}, 
  v2_int128 = {0x9b352216e97a15628468cf6cc172952a, 0x00000000000000000000000000000000}
}
ymm8           {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x8000000000000000, 0x0, 0x0}, 
  v32_int8 = {0x0 <repeats 14 times>, 0xff, 0xff, 0x0 <repeats 16 times>}, 
  v16_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0x0, 0x0, 0x0, 0xffff0000, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0x0, 0xffff000000000000, 0x0, 0x0}, 
  v2_int128 = {0xffff0000000000000000000000000000, 0x00000000000000000000000000000000}
}
ymm9           {
  v8_float = {0x0, 0xfffffe9f, 0x8a081, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0xef5cddc84bac0300, 0x0, 0x0, 0x0}, 
  v32_int8 = {0xfd, 0x53, 0xb4, 0x37, 0x22, 0xa3, 0xb0, 0xc3, 0x1d, 0x8, 0xa, 0x49, 0x9a, 0x2a, 0x0 <repeats 18 times>}, 
  v16_int16 = {0x53fd, 0x37b4, 0xa322, 0xc3b0, 0x81d, 0x490a, 0x2a9a, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0x37b453fd, 0xc3b0a322, 0x490a081d, 0x2a9a, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xc3b0a32237b453fd, 0x2a9a490a081d, 0x0, 0x0}, 
  v2_int128 = {0x00002a9a490a081dc3b0a32237b453fd, 0x00000000000000000000000000000000}
}
ymm10          {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0xa4, 0x0, 0x0, 0x0, 0xf2, 0x0, 0x0, 0x0, 0xa4, 0x0, 0x0, 0x0, 0xf2, 0x0 <repeats 19 times>}, 
  v16_int16 = {0xa4, 0x0, 0xf2, 0x0, 0xa4, 0x0, 0xf2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xa4, 0xf2, 0xa4, 0xf2, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xf2000000a4, 0xf2000000a4, 0x0, 0x0}, 
  v2_int128 = {0x000000f2000000a4000000f2000000a4, 0x00000000000000000000000000000000}
}
ymm11          {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0xf3, 0x55, 0xa0, 0xa2, 0x0 <repeats 28 times>}, 
  v16_int16 = {0x55f3, 0xa2a0, 0x0 <repeats 14 times>}, 
  v8_int32 = {0xa2a055f3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xa2a055f3, 0x0, 0x0, 0x0}, 
  v2_int128 = {0x000000000000000000000000a2a055f3, 0x00000000000000000000000000000000}
}
ymm12          {
  v8_float = {0xfe4673ba, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0x23, 0xc6, 0xdc, 0xcb, 0x0 <repeats 28 times>}, 
  v16_int16 = {0xc623, 0xcbdc, 0x0 <repeats 14 times>}, 
  v8_int32 = {0xcbdcc623, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xcbdcc623, 0x0, 0x0, 0x0}, 
  v2_int128 = {0x000000000000000000000000cbdcc623, 0x00000000000000000000000000000000}
}
ymm13          {
  v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0x38, 0xd1, 0xc1, 0xd9, 0x0, 0x0, 0x0, 0x0, 0xa8, 0x1, 0x71, 0x39, 0x0 <repeats 20 times>}, 
  v16_int16 = {0xd138, 0xd9c1, 0x0, 0x0, 0x1a8, 0x3971, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xd9c1d138, 0x0, 0x397101a8, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xd9c1d138, 0x397101a8, 0x0, 0x0}, 
  v2_int128 = {0x00000000397101a800000000d9c1d138, 0x00000000000000000000000000000000}
}
ymm14          {
  v8_float = {0x0, 0x0, 0xfe4673ba, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x0, 0x0, 0x0}, 
  v32_int8 = {0xf3, 0x55, 0xa0, 0xa2, 0x0, 0x0, 0x0, 0x0, 0x23, 0xc6, 0xdc, 0xcb, 0x0 <repeats 20 times>}, 
  v16_int16 = {0x55f3, 0xa2a0, 0x0, 0x0, 0xc623, 0xcbdc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xa2a055f3, 0x0, 0xcbdcc623, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0xa2a055f3, 0xcbdcc623, 0x0, 0x0}, 
  v2_int128 = {0x00000000cbdcc62300000000a2a055f3, 0x00000000000000000000000000000000}
}
ymm15          {
  v8_float = {0x0, 0x0, 0x0, 0xfe4673ba, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x0, 0x8000000000000000, 0x0, 0x0}, 
  v32_int8 = {0x38, 0xd1, 0xc1, 0xd9, 0xa8, 0x1, 0x71, 0x39, 0xf3, 0x55, 0xa0, 0xa2, 0x23, 0xc6, 0xdc, 0xcb, 0x0 <repeats 16 times>}, 
  v16_int16 = {0xd138, 0xd9c1, 0x1a8, 0x3971, 0x55f3, 0xa2a0, 0xc623, 0xcbdc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0xd9c1d138, 0x397101a8, 0xa2a055f3, 0xcbdcc623, 0x0, 0x0, 0x0, 0x0}, 
  v4_int64 = {0x397101a8d9c1d138, 0xcbdcc623a2a055f3, 0x0, 0x0}, 
  v2_int128 = {0xcbdcc623a2a055f3397101a8d9c1d138, 0x00000000000000000000000000000000}
}


Dissassemly of the function that contains the "invalid instruction"

Dump of assembler code for function e9_EncryptCTR_RIJ128pipe_AES_NI:
   0x00007fe0be224ca0 <+0>:    push   %rbx
   0x00007fe0be224ca1 <+1>:    mov    0x10(%rsp),%rax
   0x00007fe0be224ca6 <+6>:    movdqu (%rax),%xmm8
   0x00007fe0be224cab <+11>:    movdqu (%r9),%xmm0
   0x00007fe0be224cb0 <+16>:    movdqa %xmm8,%xmm9
   0x00007fe0be224cb5 <+21>:    pandn  %xmm0,%xmm9
   0x00007fe0be224cba <+26>:    mov    (%r9),%rbx
   0x00007fe0be224cbd <+29>:    mov    0x8(%r9),%rax
   0x00007fe0be224cc1 <+33>:    bswap  %rbx
   0x00007fe0be224cc4 <+36>:    bswap  %rax
   0x00007fe0be224cc7 <+39>:    movslq %r8d,%r8
   0x00007fe0be224cca <+42>:    sub    $0x40,%r8
   0x00007fe0be224cce <+46>:    jl     0x7fe0be224e17 <e9_EncryptCTR_RIJ128pipe_AES_NI+375>
   0x00007fe0be224cd4 <+52>:    movdqa -0x5c(%rip),%xmm4        # 0x7fe0be224c80
   0x00007fe0be224cdc <+60>:    pinsrq $0x0,%rax,%xmm0
   0x00007fe0be224ce3 <+67>:    pinsrq $0x1,%rbx,%xmm0
   0x00007fe0be224cea <+74>:    pshufb %xmm4,%xmm0
   0x00007fe0be224cef <+79>:    pand   %xmm8,%xmm0
   0x00007fe0be224cf4 <+84>:    por    %xmm9,%xmm0
   0x00007fe0be224cf9 <+89>:    add    $0x1,%rax
   0x00007fe0be224cfd <+93>:    adc    $0x0,%rbx
   0x00007fe0be224d01 <+97>:    pinsrq $0x0,%rax,%xmm1
   0x00007fe0be224d08 <+104>:    pinsrq $0x1,%rbx,%xmm1
   0x00007fe0be224d0f <+111>:    pshufb %xmm4,%xmm1
   0x00007fe0be224d14 <+116>:    pand   %xmm8,%xmm1
   0x00007fe0be224d19 <+121>:    por    %xmm9,%xmm1
   0x00007fe0be224d1e <+126>:    add    $0x1,%rax
   0x00007fe0be224d22 <+130>:    adc    $0x0,%rbx
   0x00007fe0be224d26 <+134>:    pinsrq $0x0,%rax,%xmm2
   0x00007fe0be224d2d <+141>:    pinsrq $0x1,%rbx,%xmm2
   0x00007fe0be224d34 <+148>:    pshufb %xmm4,%xmm2
   0x00007fe0be224d39 <+153>:    pand   %xmm8,%xmm2
   0x00007fe0be224d3e <+158>:    por    %xmm9,%xmm2
   0x00007fe0be224d43 <+163>:    add    $0x1,%rax
   0x00007fe0be224d47 <+167>:    adc    $0x0,%rbx
   0x00007fe0be224d4b <+171>:    pinsrq $0x0,%rax,%xmm3
   0x00007fe0be224d52 <+178>:    pinsrq $0x1,%rbx,%xmm3
   0x00007fe0be224d59 <+185>:    pshufb %xmm4,%xmm3
   0x00007fe0be224d5e <+190>:    pand   %xmm8,%xmm3
   0x00007fe0be224d63 <+195>:    por    %xmm9,%xmm3
   0x00007fe0be224d68 <+200>:    movdqa (%rcx),%xmm4
   0x00007fe0be224d6c <+204>:    mov    %rcx,%r10
   0x00007fe0be224d6f <+207>:    pxor   %xmm4,%xmm0
   0x00007fe0be224d73 <+211>:    pxor   %xmm4,%xmm1
   0x00007fe0be224d77 <+215>:    pxor   %xmm4,%xmm2
   0x00007fe0be224d7b <+219>:    pxor   %xmm4,%xmm3
   0x00007fe0be224d7f <+223>:    movdqa 0x10(%r10),%xmm4
   0x00007fe0be224d85 <+229>:    add    $0x10,%r10
   0x00007fe0be224d89 <+233>:    mov    %rdx,%r11
   0x00007fe0be224d8c <+236>:    sub    $0x1,%r11
   0x00007fe0be224d90 <+240>:    aesenc %xmm4,%xmm0
   0x00007fe0be224d95 <+245>:    aesenc %xmm4,%xmm1
   0x00007fe0be224d9a <+250>:    aesenc %xmm4,%xmm2
   0x00007fe0be224d9f <+255>:    aesenc %xmm4,%xmm3
   0x00007fe0be224da4 <+260>:    movdqa 0x10(%r10),%xmm4
   0x00007fe0be224daa <+266>:    add    $0x10,%r10
   0x00007fe0be224dae <+270>:    dec    %r11
   0x00007fe0be224db1 <+273>:    jne    0x7fe0be224d90 <e9_EncryptCTR_RIJ128pipe_AES_NI+240>
   0x00007fe0be224db3 <+275>:    aesenclast %xmm4,%xmm0
   0x00007fe0be224db8 <+280>:    aesenclast %xmm4,%xmm1
   0x00007fe0be224dbd <+285>:    aesenclast %xmm4,%xmm2
   0x00007fe0be224dc2 <+290>:    aesenclast %xmm4,%xmm3
   0x00007fe0be224dc7 <+295>:    movdqu (%rdi),%xmm4
   0x00007fe0be224dcb <+299>:    movdqu 0x10(%rdi),%xmm5
   0x00007fe0be224dd0 <+304>:    movdqu 0x20(%rdi),%xmm6
   0x00007fe0be224dd5 <+309>:    movdqu 0x30(%rdi),%xmm7
   0x00007fe0be224dda <+314>:    add    $0x40,%rdi
   0x00007fe0be224dde <+318>:    pxor   %xmm4,%xmm0
   0x00007fe0be224de2 <+322>:    movdqu %xmm0,(%rsi)
   0x00007fe0be224de6 <+326>:    pxor   %xmm5,%xmm1
   0x00007fe0be224dea <+330>:    movdqu %xmm1,0x10(%rsi)
   0x00007fe0be224def <+335>:    pxor   %xmm6,%xmm2
   0x00007fe0be224df3 <+339>:    movdqu %xmm2,0x20(%rsi)
   0x00007fe0be224df8 <+344>:    pxor   %xmm7,%xmm3
   0x00007fe0be224dfc <+348>:    movdqu %xmm3,0x30(%rsi)
   0x00007fe0be224e01 <+353>:    add    $0x1,%rax
   0x00007fe0be224e05 <+357>:    adc    $0x0,%rbx
   0x00007fe0be224e09 <+361>:    add    $0x40,%rsi
   0x00007fe0be224e0d <+365>:    sub    $0x40,%r8
   0x00007fe0be224e11 <+369>:    jge    0x7fe0be224cd4 <e9_EncryptCTR_RIJ128pipe_AES_NI+52>
   0x00007fe0be224e17 <+375>:    add    $0x40,%r8
   0x00007fe0be224e1b <+379>:    je     0x7fe0be224f17 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
   0x00007fe0be224e21 <+385>:    lea    0x0(,%rdx,4),%r10
   0x00007fe0be224e29 <+393>:    lea    -0x90(%rcx,%r10,4),%r10
   0x00007fe0be224e31 <+401>:    pinsrq $0x0,%rax,%xmm0
   0x00007fe0be224e38 <+408>:    pinsrq $0x1,%rbx,%xmm0
   0x00007fe0be224e3f <+415>:    pshufb -0x1c8(%rip),%xmm0        # 0x7fe0be224c80
   0x00007fe0be224e48 <+424>:    pand   %xmm8,%xmm0
   0x00007fe0be224e4d <+429>:    por    %xmm9,%xmm0
   0x00007fe0be224e52 <+434>:    pxor   (%rcx),%xmm0
   0x00007fe0be224e56 <+438>:    cmp    $0xc,%rdx
   0x00007fe0be224e5a <+442>:    jl     0x7fe0be224e7a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
   0x00007fe0be224e5c <+444>:    je     0x7fe0be224e6c <e9_EncryptCTR_RIJ128pipe_AES_NI+460>
   0x00007fe0be224e5e <+446>:    aesenc -0x40(%r10),%xmm0
   0x00007fe0be224e65 <+453>:    aesenc -0x30(%r10),%xmm0
   0x00007fe0be224e6c <+460>:    aesenc -0x20(%r10),%xmm0
   0x00007fe0be224e73 <+467>:    aesenc -0x10(%r10),%xmm0
=> 0x00007fe0be224e7a <+474>:    aesenc (%r10),%xmm0
   0x00007fe0be224e80 <+480>:    aesenc 0x10(%r10),%xmm0
   0x00007fe0be224e87 <+487>:    aesenc 0x20(%r10),%xmm0
   0x00007fe0be224e8e <+494>:    aesenc 0x30(%r10),%xmm0
   0x00007fe0be224e95 <+501>:    aesenc 0x40(%r10),%xmm0
   0x00007fe0be224e9c <+508>:    aesenc 0x50(%r10),%xmm0
   0x00007fe0be224ea3 <+515>:    aesenc 0x60(%r10),%xmm0
   0x00007fe0be224eaa <+522>:    aesenc 0x70(%r10),%xmm0
   0x00007fe0be224eb1 <+529>:    aesenc 0x80(%r10),%xmm0
   0x00007fe0be224ebb <+539>:    aesenclast 0x90(%r10),%xmm0
   0x00007fe0be224ec5 <+549>:    add    $0x1,%rax
   0x00007fe0be224ec9 <+553>:    adc    $0x0,%rbx
   0x00007fe0be224ecd <+557>:    sub    $0x10,%r8
   0x00007fe0be224ed1 <+561>:    jl     0x7fe0be224ef2 <e9_EncryptCTR_RIJ128pipe_AES_NI+594>
   0x00007fe0be224ed3 <+563>:    movdqu (%rdi),%xmm4
   0x00007fe0be224ed7 <+567>:    pxor   %xmm4,%xmm0
   0x00007fe0be224edb <+571>:    movdqu %xmm0,(%rsi)
   0x00007fe0be224edf <+575>:    add    $0x10,%rdi
   0x00007fe0be224ee3 <+579>:    add    $0x10,%rsi
   0x00007fe0be224ee7 <+583>:    cmp    $0x0,%r8
   0x00007fe0be224eeb <+587>:    je     0x7fe0be224f17 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
   0x00007fe0be224eed <+589>:    jmpq   0x7fe0be224e31 <e9_EncryptCTR_RIJ128pipe_AES_NI+401>
   0x00007fe0be224ef2 <+594>:    add    $0x10,%r8
   0x00007fe0be224ef6 <+598>:    pextrb $0x0,%xmm0,%r10d
   0x00007fe0be224efd <+605>:    psrldq $0x1,%xmm0
   0x00007fe0be224f02 <+610>:    movzbl (%rdi),%r11d
   0x00007fe0be224f06 <+614>:    xor    %r11,%r10
   0x00007fe0be224f09 <+617>:    mov    %r10b,(%rsi)
   0x00007fe0be224f0c <+620>:    inc    %rdi
   0x00007fe0be224f0f <+623>:    inc    %rsi
   0x00007fe0be224f12 <+626>:    dec    %r8
   0x00007fe0be224f15 <+629>:    jne    0x7fe0be224ef6 <e9_EncryptCTR_RIJ128pipe_AES_NI+598>
   0x00007fe0be224f17 <+631>:    pinsrq $0x0,%rax,%xmm0
   0x00007fe0be224f1e <+638>:    pinsrq $0x1,%rbx,%xmm0
   0x00007fe0be224f25 <+645>:    pshufb -0x2ae(%rip),%xmm0        # 0x7fe0be224c80
   0x00007fe0be224f2e <+654>:    pand   %xmm8,%xmm0
   0x00007fe0be224f33 <+659>:    por    %xmm9,%xmm0
   0x00007fe0be224f38 <+664>:    movdqu %xmm0,(%r9)
   0x00007fe0be224f3d <+669>:    vzeroupper 
   0x00007fe0be224f40 <+672>:    pop    %rbx
   0x00007fe0be224f41 <+673>:    retq   
   0x00007fe0be224f42 <+674>:    nop
   0x00007fe0be224f43 <+675>:    nop
   0x00007fe0be224f44 <+676>:    nop
   0x00007fe0be224f45 <+677>:    nop
   0x00007fe0be224f46 <+678>:    nop
   0x00007fe0be224f47 <+679>:    nop
   0x00007fe0be224f48 <+680>:    nop
   0x00007fe0be224f49 <+681>:    nop
   0x00007fe0be224f4a <+682>:    nop
   0x00007fe0be224f4b <+683>:    nop
   0x00007fe0be224f4c <+684>:    nop
   0x00007fe0be224f4d <+685>:    nop
   0x00007fe0be224f4e <+686>:    nop
   0x00007fe0be224f4f <+687>:    nop
   0x00007fe0be224f50 <+688>:    femms  
   0x00007fe0be224f52 <+690>:    or     $0x90a0b0c,%eax
   0x00007fe0be224f57 <+695>:    or     %al,(%rdi)
   0x00007fe0be224f59 <+697>:    (bad)  
   0x00007fe0be224f5a <+698>:    add    $0x1020304,%eax
   0x00007fe0be224f5f <+703>:    add    %dl,0x48(%rbx)
End of assembler dump.


Dump of instruction bytes.  The "invalid instruction" is at 0x00007fe0be224e7a.
=> 0x00007fe0be224e7a <+474>:    aesenc (%r10),%xmm0

0x7fe0be224e5e <e9_EncryptCTR_RIJ128pipe_AES_NI+446>:    0x66    0x41    0x0f    0x38    0xdc    0x42    0xc0    0x66
0x7fe0be224e66 <e9_EncryptCTR_RIJ128pipe_AES_NI+454>:    0x41    0x0f    0x38    0xdc    0x42    0xd0    0x66    0x41
0x7fe0be224e6e <e9_EncryptCTR_RIJ128pipe_AES_NI+462>:    0x0f    0x38    0xdc    0x42    0xe0    0x66    0x41    0x0f
0x7fe0be224e76 <e9_EncryptCTR_RIJ128pipe_AES_NI+470>:    0x38    0xdc    0x42    0xf0    0x66    0x41    0x0f    0x38
0x7fe0be224e7e <e9_EncryptCTR_RIJ128pipe_AES_NI+478>:    0xdc    0x02    0x66    0x41    0x0f    0x38    0xdc    0x42
0x7fe0be224e86 <e9_EncryptCTR_RIJ128pipe_AES_NI+486>:    0x10    0x66    0x41    0x0f    0x38    0xdc    0x42    0x20
0x7fe0be224e8e <e9_EncryptCTR_RIJ128pipe_AES_NI+494>:    0x66    0x41    0x0f    0x38    0xdc    0x42    0x30    0x66
0x7fe0be224e96 <e9_EncryptCTR_RIJ128pipe_AES_NI+502>:    0x41    0x0f    0x38    0xdc    0x42    0x40    0x66    0x41
0x7fe0be224e9e <e9_EncryptCTR_RIJ128pipe_AES_NI+510>:    0x0f    0x38    0xdc    0x42    0x50    0x66    0x41    0x0f
0x7fe0be224ea6 <e9_EncryptCTR_RIJ128pipe_AES_NI+518>:    0x38    0xdc    0x42    0x60    0x66    0x41    0x0f    0x38
0x7fe0be224eae <e9_EncryptCTR_RIJ128pipe_AES_NI+526>:    0xdc    0x42    0x70    0x66    0x41    0x0f    0x38    0xdc
0x7fe0be224eb6 <e9_EncryptCTR_RIJ128pipe_AES_NI+534>:    0x82    0x80    0x00    0x00    0x00    0x66    0x41    0x0f
0x7fe0be224ebe <e9_EncryptCTR_RIJ128pipe_AES_NI+542>:    0x38    0xdd    0x82    0x90    0x00    0x00    0x00    0x48
0x7fe0be224ec6 <e9_EncryptCTR_RIJ128pipe_AES_NI+550>:    0x83    0xc0    0x01    0x48    0x83    0xd3    0x00    0x49
0x7fe0be224ece <e9_EncryptCTR_RIJ128pipe_AES_NI+558>:    0x83    0xe8    0x10    0x7c    0x1f    0xf3    0x0f    0x6f
0x7fe0be224ed6 <e9_EncryptCTR_RIJ128pipe_AES_NI+566>:    0x27    0x66    0x0f    0xef    0xc4    0xf3    0x0f    0x7f

 

0 Kudos
Igor_A_Intel
Employee
730 Views

did you execute these instructions just after the exception/trap? Your code and memory buffers are "swimming" in the memory from run to run - therefore the address for #2 instruction can be different - I took it from your dump:

 0x00007fe0be224e6c <+460>:   aesenc -0x20(%r10),%xmm0

   0x00007fe0be224e73 <+467>:   aesenc -0x10(%r10),%xmm0

=> 0x00007fe0be224e7a <+474>:   aesenc (%r10),%xmm0

   0x00007fe0be224e80 <+480>:   aesenc 0x10(%r10),%xmm0

  0x00007fe0be224e87 <+487>:   aesenc 0x20(%r10),%xmm0

 

you see - r10 content has changed from the previous run: (gdb) x /100b $r10 (0x7fc538010300) in the previous answer and r10            0x7fe0a6d26400   in the earlier answer (when you used "info reg"). So guess the address for "0x00007fe0be224e6c <+460>:   aesenc -0x20(%r10),%xmm0" also has changed for your last run.

I think that everything is OK with encoding as GDB translates code-bytes in the right way. I don't have any more ideas for now (on "remote" debugging), I think I need a reproducer of this issue (in any form - buildable source or executable). One more "food for thought" - you see that there are no "v" prefixes before any instruction in disassembled code (and ymm/zmm registers), only one AVX-related instruction is "vzerroupper" in the epilogue. This fact means that this function doesn't have any special AVX or AVX2 code - it's just y8 code (developed for westmere) compiled for AVX/AVX2. Therefore there is no any AVX related specific...

And the same is true for CopyReplicateBorder - I need parameters (size, step) at which this function crashes. And of course the best approach is to provide us a small reproducer (as Ying asked).

regards, Igor

0 Kudos
Bob_Kirnum
Beginner
730 Views

If we remove the CPU limit (default was 0x46 for AVX), we no longer use the 'e9' code set on this system.  We use 'l9' and we no longer see a segmentation fault.

0 Kudos
Igor_A_Intel
Employee
730 Views

Bob, you indicated 2 problems:

1) illegal instruction (ippCP)

2) seg fault (ippIP)

according to your last message I guess you've solved the 2nd issue - am I right? What about the 1st one?

If the 1st one has not been solved yet, - could you run IPP perf tests (available with each IPP release in ../tools subfolder):

ps_ippcp -B -r -TAVX -fippsRijndael128EncryptCTR

this test will execute exactly the same e9 code as in your application (you can verify under GDB) - just in order to check on illegal instruction exception.

regards, Igor

 

0 Kudos
Bob_Kirnum
Beginner
730 Views

Issue 2 was unrelated, there was an old bug (ours) in an old build.  This seemed related because it was also in the 'e9' code.

Issue 1 appears to be a real issue with the 'e9' code on this specific processor.  Since we appear to work fine with the 'l9' code the 'e9' issue is no longer a high priority for us.  We used the CPU limit to avoid similar issues running IPP 7.1.1 on newer processors (ones which supported AVX2).  It was initially left in even after updating to IPP 8.2.1 (default value but can be overridden).  Removing the restriction works on this system, but there could be systems we run into which might have similar problems.  We never know what our customers will use.

0 Kudos
Igor_A_Intel
Employee
730 Views

Hi Bob,

let's sort out with this very strange issue. As you've said - your customers still can meet this issue on some SNB CPUs if e9 code is dispatched. I've analyzed your last dump - encoding is absolutely correct even for "illegal instruction" address:

   66| 41/ 0F 38 DC aesenc xmm0, [r10]
    02

It's very strange that "aesenc" raises "illegal instruction" exception after it has been successfully executed several times before. Could you share your executable or some reproducer in order to sort out with this problem?

regards, Igor 
 

0 Kudos
Bob_Kirnum
Beginner
730 Views

The executable is a rather large, complex product.  It's our media server engine.  Although I could possibly come up with a scaled down test executable it would take some time.  Also, I do not have access to the system on which the failure is observed.

0 Kudos
Sergey_K_Intel
Employee
730 Views

Hi Bob,

You haven't answered Igor's question regarding executing the wanted IPP function in tested environment:

> ps_ippcp -B -r -TAVX -fippsRijndael128EncryptCTR

This simple exercise could give us hints for further investigations, if the problem is in function itself, or in your specific function's environment. Of course, it should be run on problematic system.

0 Kudos
Reply