- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have been using the Intel IPP's for many years now (Dialogic was once an Intel Company :)). A few years back we updated to version 7.1.1 and all was well until we ran into some segmentation faults on certain newer systems. The crashes were on systems which supported AVX and AVX2 processors. We found that we were able to work around this by limiting the CPU type to AVX.
We recently updated to IPP 8.2.1 hoping that this limitation would no longer be required. However, we are seeing more frequent segmentation faults on systems which support AVX using the e9 IPP functions.
First, in the crypto libraries. This was from when we were originally using the deprecated functions. Updating to the newer AES API's did not resolve this issue.
Apr 30 08:58:46 sut-1330 kernel: [6765] trap invalid opcode ip:7fe0be224e7a sp:7fde82bc8b80 error:0 in
#0 0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm
#1 0x00007fde82bc8cd0 in ?? ()
#2 0x00007fe0be22425d in e9_ippsRijndael128EncryptCTR () from /usr/dialogic/data/ssp.mlm
Second . . .
#0 0x00007fb554d4cee1 in e9_owniCopyReplicateBorder_8u_C1R ()
Debug I added indicating the IPP settings being used . . .
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: APInit.c.162:DisplayIPPCPUFeatures: 0x46 : 0x46
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: APInit.c.175:DisplayIPPCPUFeatures: Limiting from 0x46 to 0x46
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippCore 8.2.1 (r44077)
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippIP AVX (e9) 8.2.1 (r44077)
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippSP AVX (e9) 8.2.1 (r44077)
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippVC AVX (e9) 8.2.1 (r44077)
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: Processor supports Advanced Vector Extensions instruction set
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: 8 cores on die
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ippGetMaxCacheSizeB 20480 k
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: Available 0xfdf Enabled 0xfdf
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: MMX A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE2 A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE3 A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSSE3 A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: MOVBE X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE41 A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SSE42 A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX(OS) A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AES A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: CLMUL A E
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ABR X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: RDRRAND X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: F16C X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: AVX2 X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: ADCOX X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: RDSEED X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: PREFETCHW X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: SHA X X
Apr 30 08:57:07 sut-1330 ssp_x86Linux_boot: KNC X X
We use gcc for building our product which links with the IPP libs.
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.
redhat-release-5Server-5.4.0.3
redhat-release-notes-5Server-29
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bob,
could you provide opcodes at "trap invalid opcode ip:7fe0be224e7a"? - I mean you should use disasm cmd (under gdb) and provide a fragment of +-10 disassembler lines near the address ip:7fe0be224e7a (the invalid op-code will be pointed by an arrow).
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Igor,
Thanks for the reply, and here you go . . .
0x00007fe0be224e6c <+460>: aesenc -0x20(%r10),%xmm0
0x00007fe0be224e73 <+467>: aesenc -0x10(%r10),%xmm0
=> 0x00007fe0be224e7a <+474>: aesenc (%r10),%xmm0
0x00007fe0be224e80 <+480>: aesenc 0x10(%r10),%xmm0
0x00007fe0be224e87 <+487>: aesenc 0x20(%r10),%xmm0
Thanks,
Bob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Bob,
I'm curious why several "aesenc" instructions are valid, while the next one raises "invalid opcode" exception. Could you provide some more info from the gdb séance:
1) x /100b $r10 - in order to be sure that this is not access violation while accessing memory pointed by r10
2) x /100b 0x7fe0be224e6c - in order to be sure that "aesenc" encoding is correct (gdb understands encoding, but for sure...)
from my side I'll check/disasm 8.2.1 e9 crypto binaries and make sure that this code is called/works in our test system (each IPP function has a number of mandatory tests - algorithm, bad argument, misalignment, multi-thread safety, mem-bound, performance, etc.)
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This code works well in our environment, I do not believe in miracles - let's find the root of this issue...
OS: RedHat_6.3_x86_64
Memory: 8GB
CPUCount: 1 CoreCount: 4 HT: no
CPU Model: Genuine Intel(R) CPU 0000 @ 2.60GHz
bash-4.1$ gdb ts_ippcp_mrg_compl_st_gcc412
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /nfs/inn/disks/sv-ssg_ipp_sandbox/usr/iastakh/cp/ts_ippcp_mrg_compl_st_gcc412...(no debugging symbols found)...done.
(gdb) set args -B -o -TAVX -fippsRijndael128EncryptCTR
(gdb) b e9_EncryptCTR_RIJ128pipe_AES_NI
Breakpoint 1 at 0x986fa0
(gdb) r
Starting program: /nfs/inn/disks/sv-ssg_ipp_sandbox/usr/iastakh/cp/ts_ippcp_mrg_compl_st_gcc412 -B -o -TAVX -fippsRijndael128EncryptCTR
[Thread debugging using libthread_db enabled]
-T AVX: ippInitCpu = ippStsNoErr: No errors.
ippGetNumThreads: 1
+----------------------------------------------------------------------------+
| CPU : Genuine Intel(R) processor 4x2.6 GHz, |
| |
| OS : Linux (2.6.32-279.el6.x86_64, x86_64) |
| Library : ippCP AVX (e9), 8.2.1 (r44077), Oct 9 2014 |
| Library : ippCore, 8.2.1 (r44077), Oct 9 2014 |
| Wed May 6 16:43:50 2015|
+----------------------------------------------------------------------------+
-T AVX: ippInitCpu = ippStsNoErr: No errors.
ippGetNumThreads: 1
+----------------------------------------------------------------------------+
|Test : tsRijn128_EncDecCTR_Alg Wed May 6 16:43:50 2015|
|Function : ippsRijndael128EncryptCTR / ippsRijndael128DecryptCTR |
|Description : Algorithm's test for functions. |
|Class : Algorithm |
|Source : ts_rijnctr_vb.cpp |
|Executable : ts_ippcp_mrg_compl_st_gcc412 |
+----------------------------------------------------------------------------+
*** Beginning of the test:
msg size (bytes) =16
Breakpoint 1, 0x0000000000986fa0 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64
(gdb) display /i $pc
1: x/i $pc
=> 0x986fa0 <e9_EncryptCTR_RIJ128pipe_AES_NI>: push %rbx
(gdb) si
0x0000000000986fa1 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x986fa1 <e9_EncryptCTR_RIJ128pipe_AES_NI+1>: mov 0x10(%rsp),%rax
(gdb) disas
Dump of assembler code for function e9_EncryptCTR_RIJ128pipe_AES_NI:
=> 0x0000000000986fa0 <+0>: push %rbx
0x0000000000986fa1 <+1>: mov 0x10(%rsp),%rax
0x0000000000986fa6 <+6>: movdqu (%rax),%xmm8
0x0000000000986fab <+11>: movdqu (%r9),%xmm0
0x0000000000986fb0 <+16>: movdqa %xmm8,%xmm9
0x0000000000986fb5 <+21>: pandn %xmm0,%xmm9
0x0000000000986fba <+26>: mov (%r9),%rbx
0x0000000000986fbd <+29>: mov 0x8(%r9),%rax
0x0000000000986fc1 <+33>: bswap %rbx
0x0000000000986fc4 <+36>: bswap %rax
0x0000000000986fc7 <+39>: movslq %r8d,%r8
0x0000000000986fca <+42>: sub $0x40,%r8
0x0000000000986fce <+46>: jl 0x987117 <e9_EncryptCTR_RIJ128pipe_AES_NI+375>
0x0000000000986fd4 <+52>: movdqa -0x5c(%rip),%xmm4 # 0x986f80 <e9_EncryptCFB128_RIJ128_AES_NI+160>
0x0000000000986fdc <+60>: pinsrq $0x0,%rax,%xmm0
0x0000000000986fe3 <+67>: pinsrq $0x1,%rbx,%xmm0
0x0000000000986fea <+74>: pshufb %xmm4,%xmm0
0x0000000000986fef <+79>: pand %xmm8,%xmm0
0x0000000000986ff4 <+84>: por %xmm9,%xmm0
0x0000000000986ff9 <+89>: add $0x1,%rax
0x0000000000986ffd <+93>: adc $0x0,%rbx
0x0000000000987001 <+97>: pinsrq $0x0,%rax,%xmm1
0x0000000000987008 <+104>: pinsrq $0x1,%rbx,%xmm1
0x000000000098700f <+111>: pshufb %xmm4,%xmm1
0x0000000000987014 <+116>: pand %xmm8,%xmm1
0x0000000000987019 <+121>: por %xmm9,%xmm1
0x000000000098701e <+126>: add $0x1,%rax
0x0000000000987022 <+130>: adc $0x0,%rbx
0x0000000000987026 <+134>: pinsrq $0x0,%rax,%xmm2
0x000000000098702d <+141>: pinsrq $0x1,%rbx,%xmm2
0x0000000000987034 <+148>: pshufb %xmm4,%xmm2
0x0000000000987039 <+153>: pand %xmm8,%xmm2
0x000000000098703e <+158>: por %xmm9,%xmm2
0x0000000000987043 <+163>: add $0x1,%rax
0x0000000000987047 <+167>: adc $0x0,%rbx
0x000000000098704b <+171>: pinsrq $0x0,%rax,%xmm3
0x0000000000987052 <+178>: pinsrq $0x1,%rbx,%xmm3
0x0000000000987059 <+185>: pshufb %xmm4,%xmm3
0x000000000098705e <+190>: pand %xmm8,%xmm3
0x0000000000987063 <+195>: por %xmm9,%xmm3
0x0000000000987068 <+200>: movdqa (%rcx),%xmm4
0x000000000098706c <+204>: mov %rcx,%r10
0x000000000098706f <+207>: pxor %xmm4,%xmm0
0x0000000000987073 <+211>: pxor %xmm4,%xmm1
0x0000000000987077 <+215>: pxor %xmm4,%xmm2
0x000000000098707b <+219>: pxor %xmm4,%xmm3
0x000000000098707f <+223>: movdqa 0x10(%r10),%xmm4
0x0000000000987085 <+229>: add $0x10,%r10
0x0000000000987089 <+233>: mov %rdx,%r11
0x000000000098708c <+236>: sub $0x1,%r11
0x0000000000987090 <+240>: aesenc %xmm4,%xmm0
0x0000000000987095 <+245>: aesenc %xmm4,%xmm1
0x000000000098709a <+250>: aesenc %xmm4,%xmm2
0x000000000098709f <+255>: aesenc %xmm4,%xmm3
0x00000000009870a4 <+260>: movdqa 0x10(%r10),%xmm4
0x00000000009870aa <+266>: add $0x10,%r10
0x00000000009870ae <+270>: dec %r11
0x00000000009870b1 <+273>: jne 0x987090 <e9_EncryptCTR_RIJ128pipe_AES_NI+240>
0x00000000009870b3 <+275>: aesenclast %xmm4,%xmm0
0x00000000009870b8 <+280>: aesenclast %xmm4,%xmm1
0x00000000009870bd <+285>: aesenclast %xmm4,%xmm2
0x00000000009870c2 <+290>: aesenclast %xmm4,%xmm3
0x00000000009870c7 <+295>: movdqu (%rdi),%xmm4
0x00000000009870cb <+299>: movdqu 0x10(%rdi),%xmm5
0x00000000009870d0 <+304>: movdqu 0x20(%rdi),%xmm6
0x00000000009870d5 <+309>: movdqu 0x30(%rdi),%xmm7
0x00000000009870da <+314>: add $0x40,%rdi
0x00000000009870de <+318>: pxor %xmm4,%xmm0
0x00000000009870e2 <+322>: movdqu %xmm0,(%rsi)
0x00000000009870e6 <+326>: pxor %xmm5,%xmm1
0x00000000009870ea <+330>: movdqu %xmm1,0x10(%rsi)
0x00000000009870ef <+335>: pxor %xmm6,%xmm2
0x00000000009870f3 <+339>: movdqu %xmm2,0x20(%rsi)
0x00000000009870f8 <+344>: pxor %xmm7,%xmm3
0x00000000009870fc <+348>: movdqu %xmm3,0x30(%rsi)
0x0000000000987101 <+353>: add $0x1,%rax
0x0000000000987105 <+357>: adc $0x0,%rbx
0x0000000000987109 <+361>: add $0x40,%rsi
0x000000000098710d <+365>: sub $0x40,%r8
0x0000000000987111 <+369>: jge 0x986fd4 <e9_EncryptCTR_RIJ128pipe_AES_NI+52>
0x0000000000987117 <+375>: add $0x40,%r8
0x000000000098711b <+379>: je 0x987217 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
0x0000000000987121 <+385>: lea 0x0(,%rdx,4),%r10
0x0000000000987129 <+393>: lea -0x90(%rcx,%r10,4),%r10
0x0000000000987131 <+401>: pinsrq $0x0,%rax,%xmm0
0x0000000000987138 <+408>: pinsrq $0x1,%rbx,%xmm0
0x000000000098713f <+415>: pshufb -0x1c8(%rip),%xmm0 # 0x986f80 <e9_EncryptCFB128_RIJ128_AES_NI+160>
0x0000000000987148 <+424>: pand %xmm8,%xmm0
0x000000000098714d <+429>: por %xmm9,%xmm0
0x0000000000987152 <+434>: pxor (%rcx),%xmm0
0x0000000000987156 <+438>: cmp $0xc,%rdx
0x000000000098715a <+442>: jl 0x98717a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
0x000000000098715c <+444>: je 0x98716c <e9_EncryptCTR_RIJ128pipe_AES_NI+460>
0x000000000098715e <+446>: aesenc -0x40(%r10),%xmm0
0x0000000000987165 <+453>: aesenc -0x30(%r10),%xmm0
0x000000000098716c <+460>: aesenc -0x20(%r10),%xmm0
0x0000000000987173 <+467>: aesenc -0x10(%r10),%xmm0
0x000000000098717a <+474>: aesenc (%r10),%xmm0
0x0000000000987180 <+480>: aesenc 0x10(%r10),%xmm0
0x0000000000987187 <+487>: aesenc 0x20(%r10),%xmm0
0x000000000098718e <+494>: aesenc 0x30(%r10),%xmm0
0x0000000000987195 <+501>: aesenc 0x40(%r10),%xmm0
0x000000000098719c <+508>: aesenc 0x50(%r10),%xmm0
0x00000000009871a3 <+515>: aesenc 0x60(%r10),%xmm0
0x00000000009871aa <+522>: aesenc 0x70(%r10),%xmm0
0x00000000009871b1 <+529>: aesenc 0x80(%r10),%xmm0
0x00000000009871bb <+539>: aesenclast 0x90(%r10),%xmm0
0x00000000009871c5 <+549>: add $0x1,%rax
0x00000000009871c9 <+553>: adc $0x0,%rbx
0x00000000009871cd <+557>: sub $0x10,%r8
0x00000000009871d1 <+561>: jl 0x9871f2 <e9_EncryptCTR_RIJ128pipe_AES_NI+594>
0x00000000009871d3 <+563>: movdqu (%rdi),%xmm4
0x00000000009871d7 <+567>: pxor %xmm4,%xmm0
0x00000000009871db <+571>: movdqu %xmm0,(%rsi)
0x00000000009871df <+575>: add $0x10,%rdi
0x00000000009871e3 <+579>: add $0x10,%rsi
0x00000000009871e7 <+583>: cmp $0x0,%r8
0x00000000009871eb <+587>: je 0x987217 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
0x00000000009871ed <+589>: jmpq 0x987131 <e9_EncryptCTR_RIJ128pipe_AES_NI+401>
0x00000000009871f2 <+594>: add $0x10,%r8
0x00000000009871f6 <+598>: pextrb $0x0,%xmm0,%r10d
0x00000000009871fd <+605>: psrldq $0x1,%xmm0
0x0000000000987202 <+610>: movzbl (%rdi),%r11d
0x0000000000987206 <+614>: xor %r11,%r10
0x0000000000987209 <+617>: mov %r10b,(%rsi)
0x000000000098720c <+620>: inc %rdi
0x000000000098720f <+623>: inc %rsi
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) b *0x98717a
Breakpoint 2 at 0x98717a
(gdb) d 1
(gdb) c
Continuing.
Breakpoint 2, 0x000000000098717a in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x98717a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>: aesenc (%r10),%xmm0
(gdb) si
0x0000000000987180 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x987180 <e9_EncryptCTR_RIJ128pipe_AES_NI+480>: aesenc 0x10(%r10),%xmm0
(gdb)
0x0000000000987187 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x987187 <e9_EncryptCTR_RIJ128pipe_AES_NI+487>: aesenc 0x20(%r10),%xmm0
(gdb)
0x000000000098718e in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x98718e <e9_EncryptCTR_RIJ128pipe_AES_NI+494>: aesenc 0x30(%r10),%xmm0
(gdb)
0x0000000000987195 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x987195 <e9_EncryptCTR_RIJ128pipe_AES_NI+501>: aesenc 0x40(%r10),%xmm0
(gdb)
0x000000000098719c in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x98719c <e9_EncryptCTR_RIJ128pipe_AES_NI+508>: aesenc 0x50(%r10),%xmm0
(gdb)
0x00000000009871a3 in e9_EncryptCTR_RIJ128pipe_AES_NI ()
1: x/i $pc
=> 0x9871a3 <e9_EncryptCTR_RIJ128pipe_AES_NI+515>: aesenc 0x60(%r10),%xmm0
(gdb)
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Registers below. FYI, this is a Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GH.
(gdb) info reg
rax 0x1d080a499a2a0000 2091933338148929536
rbx 0xfd53b43722a3b0c3 -192612210149445437
rcx 0x7fe0a6d263f0 140602848207856
rdx 0xa 10
rsi 0x7fe0a720a4d4 140602853336276
rdi 0x7fde82bc8cf0 140593652862192
rbp 0x7fde82bc8c70 0x7fde82bc8c70
rsp 0x7fde82bc8b80 0x7fde82bc8b80
r8 0x10 16
r9 0x7fde82bc8cd0 140593652862160
r10 0x7fe0a6d26400 140602848207872
r11 0x7fde82bc8ba0 140593652861856
r12 0x7fde82bc8cf0 140593652862192
r13 0x7fde82bc8bae 140593652861870
r14 0x0 0
r15 0x7fe0a720a4d4 140602853336276
rip 0x7fe0be224e7a 0x7fe0be224e7a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
eflags 0x10293 [ CF AF SF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bob, I didn't ask for registers' content, I asked for
1) x /100b $r10 - in order to be sure that this is not access violation while accessing memory pointed by r10
2) x /100b 0x7fe0be224e6c - in order to be sure that "aesenc" encoding is correct (gdb understands encoding, but for sure...)
"x /100b $r10" gdb instruction means the next: "x" - examine memory, "/100b" - show the first 100 bytes, "$r10" - address at which examine (address that is currently, at trap, in r10
"x /100b 0x7fe0be224e6c" - the same, but examines memory at address where "aesenc" instructions started - to understand if encoding is correct.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) x /100b $r10 - in order to be sure that this is not access violation while accessing memory pointed by r10
(gdb) x /100b $r10
0x7fc538010300: -62 50 -97 -15 -102 41 27 -98
0x7fc538010308: -65 85 -73 -88 -31 -45 121 89
0x7fc538010310: -90 -124 84 9 60 -83 79 -105
0x7fc538010318: -125 -8 -8 63 98 43 -127 102
0x7fc538010320: 83 -120 103 -93 111 37 40 52
0x7fc538010328: -20 -35 -48 11 -114 -10 81 109
0x7fc538010330: 25 89 91 -70 118 124 115 -114
0x7fc538010338: -102 -95 -93 -123 20 87 -14 -24
0x7fc538010340: 82 -48 -64 64 36 -84 -77 -50
0x7fc538010348: -66 13 16 75 -86 90 -30 -93
0x7fc538010350: -52 72 -54 -20 -24 -28 121 34
0x7fc538010358: 86 -23 105 105 -4 -77 -117 -54
0x7fc538010360: -31 117 -66 92
2) x /100b 0x7fe0be224e6c - in order to be sure that "aesenc" encoding is correct (gdb understands encoding, but for sure...)
(gdb) x /100b 0x7fe0be224e6c
0x7fe0be224e6c: Cannot access memory at address 0x7fe0be224e6c
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Attached a more complete log from a single source to ensure the details are consistent. The previous details are mixed from several people looking at the same issue.
Program terminated with signal 4, Illegal instruction.
#0 0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm
Backtrace
#0 0x00007fe0be224e7a in e9_EncryptCTR_RIJ128pipe_AES_NI () from /usr/dialogic/data/ssp.mlm
#1 0x00007fde82bc8cd0 in ?? ()
#2 0x00007fe0be22425d in e9_ippsRijndael128EncryptCTR () from /usr/dialogic/data/ssp.mlm
#3 0x00007fe0bcf7d8e3 in srtpKeyDerivation (pDynamicInfo=0x7fe0a720a48c, KeyDerivationRate=<value optimized out>, pKey=0x7fe0b1366388, index=4, label=<value optimized out>)
at srtpalg.c:319
#4 0x00007fe0bcf7b001 in srtpFromRtp (pSrtpObj=0x7fe0a720a48c, pPktData=0x7fe087cd4bd4 "\200", size=0x7fde82bc8dec) at srtp.c:796
#5 0x00007fe0bcf64dbb in rtpEncrypt (portType=<value optimized out>, srtp=0x1d080a499a2a0000, data=0x7fe0a720a4d4 "", count=0x7fe0a6d263f0) at rtpport.c:1134
#6 0x00007fe0bcf65070 in rtpSendPort (prtpHandle=0x7fded3bb7f18, handle=0x7fe0aad8203c, data=0x7fe087cd4bd4 "\200", count=172) at rtpport.c:1219
#7 0x00007fe0bcf4c13a in DoEncoder (handle=0x7fe0ab06a4bc, buf=0x7fe0810242d0, size=160, count=<value optimized out>, pCoder=0x7fe081024248, beforePktSendTimestampUpdSize=160,
afterPktSendTimestampUpdSize=160, pkt=0x7fe0b1374cd4) at pio.c:2308
#8 pioWrite (handle=0x7fe0ab06a4bc, buf=0x7fe0810242d0, size=160, count=<value optimized out>, pCoder=0x7fe081024248, beforePktSendTimestampUpdSize=160, afterPktSendTimestampUpdSize=160,
pkt=0x7fe0b1374cd4) at pio.c:705
#9 0x00007fe0bcf19563 in ptxWorkFxn (ptx=0x7fe081024200, pTaskMem=0x7fe0b400041c, cIndex=416, pRealTimeTraceItems=<value optimized out>, weightMask=0x7fde82bc9e1f "\001") at ptx.c:5954
#10 0x00007fe0bcf1dbc1 in ptx_workfxn (ptx=0x7fe081024200, pTaskMem=0x7fe0a720a4d4, cIndex=2193394896, pRealTimeTraceItems=0x7fe0a6d263f0, weightMask=0x10 <Address 0x10 out of bounds>)
at ptx.c:4604
#11 0x00007fe0bcf050f7 in wrkTaskFxn (args=<value optimized out>) at wrk.c:1419
#12 0x00007fe0caff910c in helperEntry (pInfo=0x7fdee44c0100) at source/GEN_threadulx.c:871
#13 0x00007fe0cba32851 in start_thread (arg=0x7fde82bca700) at pthread_create.c:301
#14 0x00007fe0cb78090d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Contents of r10 and xmm
r10 0x7fe0a6d26400 140602848207872
xmm0 {
v4_float = {0x0, 0xf4764800, 0x0, 0x11e78000},
v2_double = {0x8000000000000000, 0x8000000000000000},
v16_int8 = {0x97, 0x87, 0x81, 0x28, 0x92, 0x1d, 0x3d, 0x50, 0x36, 0xc7, 0xf9, 0xa4, 0x31, 0xdc, 0x8f, 0xd2},
v8_int16 = {0x8797, 0x2881, 0x1d92, 0x503d, 0xc736, 0xa4f9, 0xdc31, 0xd28f},
v4_int32 = {0x28818797, 0x503d1d92, 0xa4f9c736, 0xd28fdc31},
v2_int64 = {0x503d1d9228818797, 0xd28fdc31a4f9c736},
uint128 = 0xd28fdc31a4f9c736503d1d9228818797
}
Output of GDB "info all-registers" command
rax 0x1d080a499a2a0000 2091933338148929536
rbx 0xfd53b43722a3b0c3 -192612210149445437
rcx 0x7fe0a6d263f0 140602848207856
rdx 0xa 10
rsi 0x7fe0a720a4d4 140602853336276
rdi 0x7fde82bc8cf0 140593652862192
rbp 0x7fde82bc8c70 0x7fde82bc8c70
rsp 0x7fde82bc8b80 0x7fde82bc8b80
r8 0x10 16
r9 0x7fde82bc8cd0 140593652862160
r10 0x7fe0a6d26400 140602848207872
r11 0x7fde82bc8ba0 140593652861856
r12 0x7fde82bc8cf0 140593652862192
r13 0x7fde82bc8bae 140593652861870
r14 0x0 0
r15 0x7fe0a720a4d4 140602853336276
rip 0x7fe0be224e7a 0x7fe0be224e7a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
eflags 0x10293 [ CF AF SF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0xffff 65535
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
mxcsr 0x1fa1 [ IE PE IM DM ZM OM UM PM ]
ymm0 {
v8_float = {0x0, 0xf4764800, 0x0, 0x11e78000, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x8000000000000000, 0x8000000000000000, 0x0, 0x0},
v32_int8 = {0x97, 0x87, 0x81, 0x28, 0x92, 0x1d, 0x3d, 0x50, 0x36, 0xc7, 0xf9, 0xa4, 0x31, 0xdc, 0x8f, 0xd2, 0x0 <repeats 16 times>},
v16_int16 = {0x8797, 0x2881, 0x1d92, 0x503d, 0xc736, 0xa4f9, 0xdc31, 0xd28f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0x28818797, 0x503d1d92, 0xa4f9c736, 0xd28fdc31, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x503d1d9228818797, 0xd28fdc31a4f9c736, 0x0, 0x0},
v2_int128 = {0xd28fdc31a4f9c736503d1d9228818797, 0x00000000000000000000000000000000}
}
ymm1 {
v8_float = {0xc0000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x8000000000000000, 0x0, 0x0},
v32_int8 = {0x3d, 0x88, 0x60, 0xda, 0x54, 0x99, 0x67, 0x1, 0x14, 0x84, 0xf2, 0x25, 0x72, 0x0, 0x72, 0x72, 0x0 <repeats 16 times>},
v16_int16 = {0x883d, 0xda60, 0x9954, 0x167, 0x8414, 0x25f2, 0x72, 0x7272, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xda60883d, 0x1679954, 0x25f28414, 0x72720072, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x1679954da60883d, 0x7272007225f28414, 0x0, 0x0},
v2_int128 = {0x7272007225f2841401679954da60883d, 0x00000000000000000000000000000000}
}
ymm2 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x8000000000000000, 0x8000000000000000, 0x0, 0x0},
v32_int8 = {0x0, 0x4, 0x8, 0xc, 0xff <repeats 12 times>, 0x0 <repeats 16 times>},
v16_int16 = {0x400, 0xc08, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xc080400, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xffffffff0c080400, 0xffffffffffffffff, 0x0, 0x0},
v2_int128 = {0xffffffffffffffffffffffff0c080400, 0x00000000000000000000000000000000}
}
ymm3 {
v8_float = {0x5f400000, 0x0, 0x43c00000, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0x6, 0x5, 0xbe, 0xd5, 0x0, 0x0, 0x0, 0x0, 0xf, 0x85, 0x38, 0x56, 0x0 <repeats 20 times>},
v16_int16 = {0x506, 0xd5be, 0x0, 0x0, 0x850f, 0x5638, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xd5be0506, 0x0, 0x5638850f, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xd5be0506, 0x5638850f, 0x0, 0x0},
v2_int128 = {0x000000005638850f00000000d5be0506, 0x00000000000000000000000000000000}
}
ymm4 {
v8_float = {0x0, 0x0, 0x5f400000, 0x43c00000, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x8000000000000000, 0x8000000000000000, 0x0, 0x0},
v32_int8 = {0xbf, 0x6d, 0x7a, 0xeb, 0xc2, 0xa3, 0x40, 0x5f, 0x6, 0x5, 0xbe, 0xd5, 0xf, 0x85, 0x38, 0x56, 0x0 <repeats 16 times>},
v16_int16 = {0x6dbf, 0xeb7a, 0xa3c2, 0x5f40, 0x506, 0xd5be, 0x850f, 0x5638, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xeb7a6dbf, 0x5f40a3c2, 0xd5be0506, 0x5638850f, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x5f40a3c2eb7a6dbf, 0x5638850fd5be0506, 0x0, 0x0},
v2_int128 = {0x5638850fd5be05065f40a3c2eb7a6dbf, 0x00000000000000000000000000000000}
}
ymm5 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0xbf, 0x6d, 0x7a, 0xeb, 0x0, 0x0, 0x0, 0x0, 0xc2, 0xa3, 0x40, 0x5f, 0x0 <repeats 20 times>},
v16_int16 = {0x6dbf, 0xeb7a, 0x0, 0x0, 0xa3c2, 0x5f40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xeb7a6dbf, 0x0, 0x5f40a3c2, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xeb7a6dbf, 0x5f40a3c2, 0x0, 0x0},
v2_int128 = {0x000000005f40a3c200000000eb7a6dbf, 0x00000000000000000000000000000000}
}
ymm6 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0xf1, 0x0, 0x0, 0x0, 0xe1, 0x0, 0x0, 0x0, 0xa4, 0x0, 0x0, 0x0, 0xf2, 0x0 <repeats 19 times>},
v16_int16 = {0xf1, 0x0, 0xe1, 0x0, 0xa4, 0x0, 0xf2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xf1, 0xe1, 0xa4, 0xf2, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xe1000000f1, 0xf2000000a4, 0x0, 0x0},
v2_int128 = {0x000000f2000000a4000000e1000000f1, 0x00000000000000000000000000000000}
}
ymm7 {
v8_float = {0xfffffff1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0x2a, 0x95, 0x72, 0xc1, 0x6c, 0xcf, 0x68, 0x84, 0x62, 0x15, 0x7a, 0xe9, 0x16, 0x22, 0x35, 0x9b, 0x0 <repeats 16 times>},
v16_int16 = {0x952a, 0xc172, 0xcf6c, 0x8468, 0x1562, 0xe97a, 0x2216, 0x9b35, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xc172952a, 0x8468cf6c, 0xe97a1562, 0x9b352216, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x8468cf6cc172952a, 0x9b352216e97a1562, 0x0, 0x0},
v2_int128 = {0x9b352216e97a15628468cf6cc172952a, 0x00000000000000000000000000000000}
}
ymm8 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x8000000000000000, 0x0, 0x0},
v32_int8 = {0x0 <repeats 14 times>, 0xff, 0xff, 0x0 <repeats 16 times>},
v16_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0x0, 0x0, 0x0, 0xffff0000, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x0, 0xffff000000000000, 0x0, 0x0},
v2_int128 = {0xffff0000000000000000000000000000, 0x00000000000000000000000000000000}
}
ymm9 {
v8_float = {0x0, 0xfffffe9f, 0x8a081, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0xef5cddc84bac0300, 0x0, 0x0, 0x0},
v32_int8 = {0xfd, 0x53, 0xb4, 0x37, 0x22, 0xa3, 0xb0, 0xc3, 0x1d, 0x8, 0xa, 0x49, 0x9a, 0x2a, 0x0 <repeats 18 times>},
v16_int16 = {0x53fd, 0x37b4, 0xa322, 0xc3b0, 0x81d, 0x490a, 0x2a9a, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0x37b453fd, 0xc3b0a322, 0x490a081d, 0x2a9a, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xc3b0a32237b453fd, 0x2a9a490a081d, 0x0, 0x0},
v2_int128 = {0x00002a9a490a081dc3b0a32237b453fd, 0x00000000000000000000000000000000}
}
ymm10 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0xa4, 0x0, 0x0, 0x0, 0xf2, 0x0, 0x0, 0x0, 0xa4, 0x0, 0x0, 0x0, 0xf2, 0x0 <repeats 19 times>},
v16_int16 = {0xa4, 0x0, 0xf2, 0x0, 0xa4, 0x0, 0xf2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xa4, 0xf2, 0xa4, 0xf2, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xf2000000a4, 0xf2000000a4, 0x0, 0x0},
v2_int128 = {0x000000f2000000a4000000f2000000a4, 0x00000000000000000000000000000000}
}
ymm11 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0xf3, 0x55, 0xa0, 0xa2, 0x0 <repeats 28 times>},
v16_int16 = {0x55f3, 0xa2a0, 0x0 <repeats 14 times>},
v8_int32 = {0xa2a055f3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xa2a055f3, 0x0, 0x0, 0x0},
v2_int128 = {0x000000000000000000000000a2a055f3, 0x00000000000000000000000000000000}
}
ymm12 {
v8_float = {0xfe4673ba, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0x23, 0xc6, 0xdc, 0xcb, 0x0 <repeats 28 times>},
v16_int16 = {0xc623, 0xcbdc, 0x0 <repeats 14 times>},
v8_int32 = {0xcbdcc623, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xcbdcc623, 0x0, 0x0, 0x0},
v2_int128 = {0x000000000000000000000000cbdcc623, 0x00000000000000000000000000000000}
}
ymm13 {
v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0x38, 0xd1, 0xc1, 0xd9, 0x0, 0x0, 0x0, 0x0, 0xa8, 0x1, 0x71, 0x39, 0x0 <repeats 20 times>},
v16_int16 = {0xd138, 0xd9c1, 0x0, 0x0, 0x1a8, 0x3971, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xd9c1d138, 0x0, 0x397101a8, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xd9c1d138, 0x397101a8, 0x0, 0x0},
v2_int128 = {0x00000000397101a800000000d9c1d138, 0x00000000000000000000000000000000}
}
ymm14 {
v8_float = {0x0, 0x0, 0xfe4673ba, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x0, 0x0, 0x0},
v32_int8 = {0xf3, 0x55, 0xa0, 0xa2, 0x0, 0x0, 0x0, 0x0, 0x23, 0xc6, 0xdc, 0xcb, 0x0 <repeats 20 times>},
v16_int16 = {0x55f3, 0xa2a0, 0x0, 0x0, 0xc623, 0xcbdc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xa2a055f3, 0x0, 0xcbdcc623, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0xa2a055f3, 0xcbdcc623, 0x0, 0x0},
v2_int128 = {0x00000000cbdcc62300000000a2a055f3, 0x00000000000000000000000000000000}
}
ymm15 {
v8_float = {0x0, 0x0, 0x0, 0xfe4673ba, 0x0, 0x0, 0x0, 0x0},
v4_double = {0x0, 0x8000000000000000, 0x0, 0x0},
v32_int8 = {0x38, 0xd1, 0xc1, 0xd9, 0xa8, 0x1, 0x71, 0x39, 0xf3, 0x55, 0xa0, 0xa2, 0x23, 0xc6, 0xdc, 0xcb, 0x0 <repeats 16 times>},
v16_int16 = {0xd138, 0xd9c1, 0x1a8, 0x3971, 0x55f3, 0xa2a0, 0xc623, 0xcbdc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v8_int32 = {0xd9c1d138, 0x397101a8, 0xa2a055f3, 0xcbdcc623, 0x0, 0x0, 0x0, 0x0},
v4_int64 = {0x397101a8d9c1d138, 0xcbdcc623a2a055f3, 0x0, 0x0},
v2_int128 = {0xcbdcc623a2a055f3397101a8d9c1d138, 0x00000000000000000000000000000000}
}
Dissassemly of the function that contains the "invalid instruction"
Dump of assembler code for function e9_EncryptCTR_RIJ128pipe_AES_NI:
0x00007fe0be224ca0 <+0>: push %rbx
0x00007fe0be224ca1 <+1>: mov 0x10(%rsp),%rax
0x00007fe0be224ca6 <+6>: movdqu (%rax),%xmm8
0x00007fe0be224cab <+11>: movdqu (%r9),%xmm0
0x00007fe0be224cb0 <+16>: movdqa %xmm8,%xmm9
0x00007fe0be224cb5 <+21>: pandn %xmm0,%xmm9
0x00007fe0be224cba <+26>: mov (%r9),%rbx
0x00007fe0be224cbd <+29>: mov 0x8(%r9),%rax
0x00007fe0be224cc1 <+33>: bswap %rbx
0x00007fe0be224cc4 <+36>: bswap %rax
0x00007fe0be224cc7 <+39>: movslq %r8d,%r8
0x00007fe0be224cca <+42>: sub $0x40,%r8
0x00007fe0be224cce <+46>: jl 0x7fe0be224e17 <e9_EncryptCTR_RIJ128pipe_AES_NI+375>
0x00007fe0be224cd4 <+52>: movdqa -0x5c(%rip),%xmm4 # 0x7fe0be224c80
0x00007fe0be224cdc <+60>: pinsrq $0x0,%rax,%xmm0
0x00007fe0be224ce3 <+67>: pinsrq $0x1,%rbx,%xmm0
0x00007fe0be224cea <+74>: pshufb %xmm4,%xmm0
0x00007fe0be224cef <+79>: pand %xmm8,%xmm0
0x00007fe0be224cf4 <+84>: por %xmm9,%xmm0
0x00007fe0be224cf9 <+89>: add $0x1,%rax
0x00007fe0be224cfd <+93>: adc $0x0,%rbx
0x00007fe0be224d01 <+97>: pinsrq $0x0,%rax,%xmm1
0x00007fe0be224d08 <+104>: pinsrq $0x1,%rbx,%xmm1
0x00007fe0be224d0f <+111>: pshufb %xmm4,%xmm1
0x00007fe0be224d14 <+116>: pand %xmm8,%xmm1
0x00007fe0be224d19 <+121>: por %xmm9,%xmm1
0x00007fe0be224d1e <+126>: add $0x1,%rax
0x00007fe0be224d22 <+130>: adc $0x0,%rbx
0x00007fe0be224d26 <+134>: pinsrq $0x0,%rax,%xmm2
0x00007fe0be224d2d <+141>: pinsrq $0x1,%rbx,%xmm2
0x00007fe0be224d34 <+148>: pshufb %xmm4,%xmm2
0x00007fe0be224d39 <+153>: pand %xmm8,%xmm2
0x00007fe0be224d3e <+158>: por %xmm9,%xmm2
0x00007fe0be224d43 <+163>: add $0x1,%rax
0x00007fe0be224d47 <+167>: adc $0x0,%rbx
0x00007fe0be224d4b <+171>: pinsrq $0x0,%rax,%xmm3
0x00007fe0be224d52 <+178>: pinsrq $0x1,%rbx,%xmm3
0x00007fe0be224d59 <+185>: pshufb %xmm4,%xmm3
0x00007fe0be224d5e <+190>: pand %xmm8,%xmm3
0x00007fe0be224d63 <+195>: por %xmm9,%xmm3
0x00007fe0be224d68 <+200>: movdqa (%rcx),%xmm4
0x00007fe0be224d6c <+204>: mov %rcx,%r10
0x00007fe0be224d6f <+207>: pxor %xmm4,%xmm0
0x00007fe0be224d73 <+211>: pxor %xmm4,%xmm1
0x00007fe0be224d77 <+215>: pxor %xmm4,%xmm2
0x00007fe0be224d7b <+219>: pxor %xmm4,%xmm3
0x00007fe0be224d7f <+223>: movdqa 0x10(%r10),%xmm4
0x00007fe0be224d85 <+229>: add $0x10,%r10
0x00007fe0be224d89 <+233>: mov %rdx,%r11
0x00007fe0be224d8c <+236>: sub $0x1,%r11
0x00007fe0be224d90 <+240>: aesenc %xmm4,%xmm0
0x00007fe0be224d95 <+245>: aesenc %xmm4,%xmm1
0x00007fe0be224d9a <+250>: aesenc %xmm4,%xmm2
0x00007fe0be224d9f <+255>: aesenc %xmm4,%xmm3
0x00007fe0be224da4 <+260>: movdqa 0x10(%r10),%xmm4
0x00007fe0be224daa <+266>: add $0x10,%r10
0x00007fe0be224dae <+270>: dec %r11
0x00007fe0be224db1 <+273>: jne 0x7fe0be224d90 <e9_EncryptCTR_RIJ128pipe_AES_NI+240>
0x00007fe0be224db3 <+275>: aesenclast %xmm4,%xmm0
0x00007fe0be224db8 <+280>: aesenclast %xmm4,%xmm1
0x00007fe0be224dbd <+285>: aesenclast %xmm4,%xmm2
0x00007fe0be224dc2 <+290>: aesenclast %xmm4,%xmm3
0x00007fe0be224dc7 <+295>: movdqu (%rdi),%xmm4
0x00007fe0be224dcb <+299>: movdqu 0x10(%rdi),%xmm5
0x00007fe0be224dd0 <+304>: movdqu 0x20(%rdi),%xmm6
0x00007fe0be224dd5 <+309>: movdqu 0x30(%rdi),%xmm7
0x00007fe0be224dda <+314>: add $0x40,%rdi
0x00007fe0be224dde <+318>: pxor %xmm4,%xmm0
0x00007fe0be224de2 <+322>: movdqu %xmm0,(%rsi)
0x00007fe0be224de6 <+326>: pxor %xmm5,%xmm1
0x00007fe0be224dea <+330>: movdqu %xmm1,0x10(%rsi)
0x00007fe0be224def <+335>: pxor %xmm6,%xmm2
0x00007fe0be224df3 <+339>: movdqu %xmm2,0x20(%rsi)
0x00007fe0be224df8 <+344>: pxor %xmm7,%xmm3
0x00007fe0be224dfc <+348>: movdqu %xmm3,0x30(%rsi)
0x00007fe0be224e01 <+353>: add $0x1,%rax
0x00007fe0be224e05 <+357>: adc $0x0,%rbx
0x00007fe0be224e09 <+361>: add $0x40,%rsi
0x00007fe0be224e0d <+365>: sub $0x40,%r8
0x00007fe0be224e11 <+369>: jge 0x7fe0be224cd4 <e9_EncryptCTR_RIJ128pipe_AES_NI+52>
0x00007fe0be224e17 <+375>: add $0x40,%r8
0x00007fe0be224e1b <+379>: je 0x7fe0be224f17 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
0x00007fe0be224e21 <+385>: lea 0x0(,%rdx,4),%r10
0x00007fe0be224e29 <+393>: lea -0x90(%rcx,%r10,4),%r10
0x00007fe0be224e31 <+401>: pinsrq $0x0,%rax,%xmm0
0x00007fe0be224e38 <+408>: pinsrq $0x1,%rbx,%xmm0
0x00007fe0be224e3f <+415>: pshufb -0x1c8(%rip),%xmm0 # 0x7fe0be224c80
0x00007fe0be224e48 <+424>: pand %xmm8,%xmm0
0x00007fe0be224e4d <+429>: por %xmm9,%xmm0
0x00007fe0be224e52 <+434>: pxor (%rcx),%xmm0
0x00007fe0be224e56 <+438>: cmp $0xc,%rdx
0x00007fe0be224e5a <+442>: jl 0x7fe0be224e7a <e9_EncryptCTR_RIJ128pipe_AES_NI+474>
0x00007fe0be224e5c <+444>: je 0x7fe0be224e6c <e9_EncryptCTR_RIJ128pipe_AES_NI+460>
0x00007fe0be224e5e <+446>: aesenc -0x40(%r10),%xmm0
0x00007fe0be224e65 <+453>: aesenc -0x30(%r10),%xmm0
0x00007fe0be224e6c <+460>: aesenc -0x20(%r10),%xmm0
0x00007fe0be224e73 <+467>: aesenc -0x10(%r10),%xmm0
=> 0x00007fe0be224e7a <+474>: aesenc (%r10),%xmm0
0x00007fe0be224e80 <+480>: aesenc 0x10(%r10),%xmm0
0x00007fe0be224e87 <+487>: aesenc 0x20(%r10),%xmm0
0x00007fe0be224e8e <+494>: aesenc 0x30(%r10),%xmm0
0x00007fe0be224e95 <+501>: aesenc 0x40(%r10),%xmm0
0x00007fe0be224e9c <+508>: aesenc 0x50(%r10),%xmm0
0x00007fe0be224ea3 <+515>: aesenc 0x60(%r10),%xmm0
0x00007fe0be224eaa <+522>: aesenc 0x70(%r10),%xmm0
0x00007fe0be224eb1 <+529>: aesenc 0x80(%r10),%xmm0
0x00007fe0be224ebb <+539>: aesenclast 0x90(%r10),%xmm0
0x00007fe0be224ec5 <+549>: add $0x1,%rax
0x00007fe0be224ec9 <+553>: adc $0x0,%rbx
0x00007fe0be224ecd <+557>: sub $0x10,%r8
0x00007fe0be224ed1 <+561>: jl 0x7fe0be224ef2 <e9_EncryptCTR_RIJ128pipe_AES_NI+594>
0x00007fe0be224ed3 <+563>: movdqu (%rdi),%xmm4
0x00007fe0be224ed7 <+567>: pxor %xmm4,%xmm0
0x00007fe0be224edb <+571>: movdqu %xmm0,(%rsi)
0x00007fe0be224edf <+575>: add $0x10,%rdi
0x00007fe0be224ee3 <+579>: add $0x10,%rsi
0x00007fe0be224ee7 <+583>: cmp $0x0,%r8
0x00007fe0be224eeb <+587>: je 0x7fe0be224f17 <e9_EncryptCTR_RIJ128pipe_AES_NI+631>
0x00007fe0be224eed <+589>: jmpq 0x7fe0be224e31 <e9_EncryptCTR_RIJ128pipe_AES_NI+401>
0x00007fe0be224ef2 <+594>: add $0x10,%r8
0x00007fe0be224ef6 <+598>: pextrb $0x0,%xmm0,%r10d
0x00007fe0be224efd <+605>: psrldq $0x1,%xmm0
0x00007fe0be224f02 <+610>: movzbl (%rdi),%r11d
0x00007fe0be224f06 <+614>: xor %r11,%r10
0x00007fe0be224f09 <+617>: mov %r10b,(%rsi)
0x00007fe0be224f0c <+620>: inc %rdi
0x00007fe0be224f0f <+623>: inc %rsi
0x00007fe0be224f12 <+626>: dec %r8
0x00007fe0be224f15 <+629>: jne 0x7fe0be224ef6 <e9_EncryptCTR_RIJ128pipe_AES_NI+598>
0x00007fe0be224f17 <+631>: pinsrq $0x0,%rax,%xmm0
0x00007fe0be224f1e <+638>: pinsrq $0x1,%rbx,%xmm0
0x00007fe0be224f25 <+645>: pshufb -0x2ae(%rip),%xmm0 # 0x7fe0be224c80
0x00007fe0be224f2e <+654>: pand %xmm8,%xmm0
0x00007fe0be224f33 <+659>: por %xmm9,%xmm0
0x00007fe0be224f38 <+664>: movdqu %xmm0,(%r9)
0x00007fe0be224f3d <+669>: vzeroupper
0x00007fe0be224f40 <+672>: pop %rbx
0x00007fe0be224f41 <+673>: retq
0x00007fe0be224f42 <+674>: nop
0x00007fe0be224f43 <+675>: nop
0x00007fe0be224f44 <+676>: nop
0x00007fe0be224f45 <+677>: nop
0x00007fe0be224f46 <+678>: nop
0x00007fe0be224f47 <+679>: nop
0x00007fe0be224f48 <+680>: nop
0x00007fe0be224f49 <+681>: nop
0x00007fe0be224f4a <+682>: nop
0x00007fe0be224f4b <+683>: nop
0x00007fe0be224f4c <+684>: nop
0x00007fe0be224f4d <+685>: nop
0x00007fe0be224f4e <+686>: nop
0x00007fe0be224f4f <+687>: nop
0x00007fe0be224f50 <+688>: femms
0x00007fe0be224f52 <+690>: or $0x90a0b0c,%eax
0x00007fe0be224f57 <+695>: or %al,(%rdi)
0x00007fe0be224f59 <+697>: (bad)
0x00007fe0be224f5a <+698>: add $0x1020304,%eax
0x00007fe0be224f5f <+703>: add %dl,0x48(%rbx)
End of assembler dump.
Dump of instruction bytes. The "invalid instruction" is at 0x00007fe0be224e7a.
=> 0x00007fe0be224e7a <+474>: aesenc (%r10),%xmm0
0x7fe0be224e5e <e9_EncryptCTR_RIJ128pipe_AES_NI+446>: 0x66 0x41 0x0f 0x38 0xdc 0x42 0xc0 0x66
0x7fe0be224e66 <e9_EncryptCTR_RIJ128pipe_AES_NI+454>: 0x41 0x0f 0x38 0xdc 0x42 0xd0 0x66 0x41
0x7fe0be224e6e <e9_EncryptCTR_RIJ128pipe_AES_NI+462>: 0x0f 0x38 0xdc 0x42 0xe0 0x66 0x41 0x0f
0x7fe0be224e76 <e9_EncryptCTR_RIJ128pipe_AES_NI+470>: 0x38 0xdc 0x42 0xf0 0x66 0x41 0x0f 0x38
0x7fe0be224e7e <e9_EncryptCTR_RIJ128pipe_AES_NI+478>: 0xdc 0x02 0x66 0x41 0x0f 0x38 0xdc 0x42
0x7fe0be224e86 <e9_EncryptCTR_RIJ128pipe_AES_NI+486>: 0x10 0x66 0x41 0x0f 0x38 0xdc 0x42 0x20
0x7fe0be224e8e <e9_EncryptCTR_RIJ128pipe_AES_NI+494>: 0x66 0x41 0x0f 0x38 0xdc 0x42 0x30 0x66
0x7fe0be224e96 <e9_EncryptCTR_RIJ128pipe_AES_NI+502>: 0x41 0x0f 0x38 0xdc 0x42 0x40 0x66 0x41
0x7fe0be224e9e <e9_EncryptCTR_RIJ128pipe_AES_NI+510>: 0x0f 0x38 0xdc 0x42 0x50 0x66 0x41 0x0f
0x7fe0be224ea6 <e9_EncryptCTR_RIJ128pipe_AES_NI+518>: 0x38 0xdc 0x42 0x60 0x66 0x41 0x0f 0x38
0x7fe0be224eae <e9_EncryptCTR_RIJ128pipe_AES_NI+526>: 0xdc 0x42 0x70 0x66 0x41 0x0f 0x38 0xdc
0x7fe0be224eb6 <e9_EncryptCTR_RIJ128pipe_AES_NI+534>: 0x82 0x80 0x00 0x00 0x00 0x66 0x41 0x0f
0x7fe0be224ebe <e9_EncryptCTR_RIJ128pipe_AES_NI+542>: 0x38 0xdd 0x82 0x90 0x00 0x00 0x00 0x48
0x7fe0be224ec6 <e9_EncryptCTR_RIJ128pipe_AES_NI+550>: 0x83 0xc0 0x01 0x48 0x83 0xd3 0x00 0x49
0x7fe0be224ece <e9_EncryptCTR_RIJ128pipe_AES_NI+558>: 0x83 0xe8 0x10 0x7c 0x1f 0xf3 0x0f 0x6f
0x7fe0be224ed6 <e9_EncryptCTR_RIJ128pipe_AES_NI+566>: 0x27 0x66 0x0f 0xef 0xc4 0xf3 0x0f 0x7f
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
did you execute these instructions just after the exception/trap? Your code and memory buffers are "swimming" in the memory from run to run - therefore the address for #2 instruction can be different - I took it from your dump:
0x00007fe0be224e6c <+460>: aesenc -0x20(%r10),%xmm0
0x00007fe0be224e73 <+467>: aesenc -0x10(%r10),%xmm0
=> 0x00007fe0be224e7a <+474>: aesenc (%r10),%xmm0
0x00007fe0be224e80 <+480>: aesenc 0x10(%r10),%xmm0
0x00007fe0be224e87 <+487>: aesenc 0x20(%r10),%xmm0
you see - r10 content has changed from the previous run: (gdb) x /100b $r10 (0x7fc538010300) in the previous answer and r10 0x7fe0a6d26400 in the earlier answer (when you used "info reg"). So guess the address for "0x00007fe0be224e6c <+460>: aesenc -0x20(%r10),%xmm0" also has changed for your last run.
I think that everything is OK with encoding as GDB translates code-bytes in the right way. I don't have any more ideas for now (on "remote" debugging), I think I need a reproducer of this issue (in any form - buildable source or executable). One more "food for thought" - you see that there are no "v" prefixes before any instruction in disassembled code (and ymm/zmm registers), only one AVX-related instruction is "vzerroupper" in the epilogue. This fact means that this function doesn't have any special AVX or AVX2 code - it's just y8 code (developed for westmere) compiled for AVX/AVX2. Therefore there is no any AVX related specific...
And the same is true for CopyReplicateBorder - I need parameters (size, step) at which this function crashes. And of course the best approach is to provide us a small reproducer (as Ying asked).
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If we remove the CPU limit (default was 0x46 for AVX), we no longer use the 'e9' code set on this system. We use 'l9' and we no longer see a segmentation fault.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bob, you indicated 2 problems:
1) illegal instruction (ippCP)
2) seg fault (ippIP)
according to your last message I guess you've solved the 2nd issue - am I right? What about the 1st one?
If the 1st one has not been solved yet, - could you run IPP perf tests (available with each IPP release in ../tools subfolder):
ps_ippcp -B -r -TAVX -fippsRijndael128EncryptCTR
this test will execute exactly the same e9 code as in your application (you can verify under GDB) - just in order to check on illegal instruction exception.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Issue 2 was unrelated, there was an old bug (ours) in an old build. This seemed related because it was also in the 'e9' code.
Issue 1 appears to be a real issue with the 'e9' code on this specific processor. Since we appear to work fine with the 'l9' code the 'e9' issue is no longer a high priority for us. We used the CPU limit to avoid similar issues running IPP 7.1.1 on newer processors (ones which supported AVX2). It was initially left in even after updating to IPP 8.2.1 (default value but can be overridden). Removing the restriction works on this system, but there could be systems we run into which might have similar problems. We never know what our customers will use.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bob,
let's sort out with this very strange issue. As you've said - your customers still can meet this issue on some SNB CPUs if e9 code is dispatched. I've analyzed your last dump - encoding is absolutely correct even for "illegal instruction" address:
66| 41/ 0F 38 DC aesenc xmm0, [r10]
02
It's very strange that "aesenc" raises "illegal instruction" exception after it has been successfully executed several times before. Could you share your executable or some reproducer in order to sort out with this problem?
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The executable is a rather large, complex product. It's our media server engine. Although I could possibly come up with a scaled down test executable it would take some time. Also, I do not have access to the system on which the failure is observed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bob,
You haven't answered Igor's question regarding executing the wanted IPP function in tested environment:
> ps_ippcp -B -r -TAVX -fippsRijndael128EncryptCTR
This simple exercise could give us hints for further investigations, if the problem is in function itself, or in your specific function's environment. Of course, it should be run on problematic system.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page