Getting stuck in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE

Bob_Kirnum · ‎05-02-2016

One of our customers is reporting an issue which we have isolated to the Intel IPP for GSMAMR processing. After forcing a core dump we have determined that we randomly get stuck in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE. We had been using IPP 8.2.1 on Linux and, due to issues we previously had observed on Windows, updated to IPP 8.2.3 but the problem persists. In addition to the IPP update, we changed the sample code to use the ippsAlgebraicCodebookSearchEX function as was recommended from the Windows issue. Would greatly appreciate any suggestions to resolve or work around this issue.

Thanks - Bob / Dialogic

Back trace from the forced core dump when thread is hung.

Thread 62 (Thread 0x7f58eb9fc700 (LWP 26864)):
#0 0x00007f598a730fe8 in e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE () from /usr/dialogic/data/ssp.mlm
#1 0x00007f598a54232f in e9_ownAlgebraicCodebookSearch_M122_GSMAMR_16s () from /usr/dialogic/data/ssp.mlm
#2 0x00007f598a541f0a in e9_ownsAlgebraicCodebookSearch_GSMAMR_16s () from /usr/dialogic/data/ssp.mlm
#3 0x00007f598a516ad0 in e9_ippsAlgebraicCodebookSearchEX_GSMAMR_16s () from /usr/dialogic/data/ssp.mlm
#4 0x00007f598a4ec7f5 in ownEncode_GSMAMR (encSt=0x7f5971e9dc18, rate=<value optimized out>, pAnaParam=0x7f58eb9fb5ce,
pVad=<value optimized out>, pSynthVec=0x7f58eb9fb470)
at /cm/vobs/3rdparty/components/intel/ipp-samples.7.1.1.013/sources/speech-codecs/codec/speech/gsmamr/src/encgsmamr.c:589
#5 0x00007f598a4ecefd in apiGSMAMREncode (encoderObj=0x7f5971e9dc00, src=<value optimized out>, rate=GSMAMR_RATE_12200,
dst=0x7f589188ef10 "", pVad=0x7f58eb9fb7d4)
at /cm/vobs/3rdparty/components/intel/ipp-samples.7.1.1.013/sources/speech-codecs/codec/speech/gsmamr/src/encgsmamr.c:313
#6 0x00007f598a068063 in GSMAMR_Encode (handle=0x7f58eb9fa8c0, src=0x2, rate=GSMAMR_RATE_DTX, dst=
0xffff7e2f <Address 0xffff7e2f out of bounds>, pVad=0x7) at x86/gsmamrapi.c:154
#7 0x00007f598a2ae413 in GSMAMREncode (pCodec=0x7f589188ee88, pSrcData=0x2, ppCodedData=0x7f58eb9fbdb0,
numSamples=<value optimized out>, idtmfFlag=<value optimized out>, silenceFlag=1207968416) at codec.c:1740

Environment details from IPP debug we have in our code.

DisplayIPPCPUFeatures: 0x4a : 0x60
ippCore 8.2.3 (r48108)
ippIP AVX2 (l9) 8.2.3 (r48108)
ippSP AVX2 (l9) 8.2.3 (r48108)
ippVC AVX2 (l9) 8.2.3 (r48108)
Processor supports Advanced Vector Extensions 2 instruction set
4 cores on die
ippGetMaxCacheSizeB 8192 k
Available 0xefff Enabled 0xefff
MMX A E
SSE A E
SSE2 A E
SSE3 A E
SSSE3 A E
MOVBE A E
SSE41 A E
SSE42 A E
AVX A E
AVX(OS) A E
AES A E
CLMUL A E
ABR X X
RDRRAND A E
F16C A E
AVX2 A E
ADCOX X X
RDSEED X X
PREFETCHW X X
SHA X X
KNC X X

Ying_H_Intel · ‎05-03-2016

Hi Bob,

Thank you for reporting the issue. I saw you issue in premier.intel.com. we will investigate them together and get back to you later.

Please note all of speech codec function are deprecated, so related developer and support work are discontinued.

Regarding the e9_ownSearchOptimalPulsePos_M122_GSMAMR_16s_optSSE issue, I get idea from another forum thread 628141.

IPP dispatched the optimized code according to the CPU type.

For example , the table in https://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp

and related article: https://software.intel.com/en-us/articles/ipp-dispatcher-control-functions-ippinit-functions

e9 : is for AVX Sandy Bridge µarchitecture

Platform	Architecture	SIMD Requirements	Processor / µarchitecture	Notes
IA-32	px	C optimized for all IA-32 processors	i386+
	w7	SSE2	P4, Xeon, Centrino
	v8	Supplemental SSE3	Core 2, Xeon® 5100, Atom
	p8	SSE4.1, SSE4.2, AES-NI	Penryn, Nehalem, Westmere	see notes below
	g9	AVX	Sandy Bridge µarchitecture	new since IPP v.6.1
	h9	AVX2	Haswell µarchitecture
Intel® 64 (EM64T)	mx	C-optimized for all Intel® 64 platforms	P4	SSE2 minimum
	m7	SSE3	Prescott
	u8	Supplemental SSE3	Core 2, Xeon® 5100, Atom
	y8	SSE4.1, SSE4.2, AES-NI	Penryn, Nehalem, Westmere	see notes below
	e9	AVX	Sandy Bridge µarchitecture	new in 6.1
	l9	AVX2	Haswell µarchitecture

From your output, the code should be 64bit l9 codec.

Could you please try

call ippInitCpu() with CPU-type argument for y8 and below CPU type,

ippCpuSSE = 0x40, /* Processor supports Pentium(R) III processor instruction set */
Intel® Integrated Performance Primitives Concepts 2 11
ippCpuSSE2, /* Processor supports Streaming SIMD Extensions 2 instruction set */
ippCpuSSE3, /* Processor supports Streaming SIMD Extensions 3 instruction set */
ippCpuSSSE3, /* Processor supports Supplemental Streaming SIMD Extensions 3 instruction set */

and see if it can workaround the issue?

please print the CPU info when run-time with the functions also

lib = ippsGetLibVersion();
printf(“%s %s %d.%d.%d.%d\n”,
lib->Name, lib->Version,
lib->major,
lib->minor, lib->majorBuild, lib->build);
}

Best Regards,

Ying

Bob_Kirnum · ‎05-09-2016

In parallel to posting here have been trying a number of things to replicate the issue our customer is reporting. We already had a means of changing the CPU type value using an environment variable. When trying to limit the CPU type to 0x45 (ippCpuSSE42) we see the following.

May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: APInit.c.164:DisplayIPPCPUFeatures: 0x46 : 0x60
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: 532: APInit.c.179:DisplayIPPCPUFeatures: dsp_framework, ipp_cpu_limit: Limiting from 0x46 to 0x45
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ippCore 8.2.3 (r48108)
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ippIP SSE4.1/4.2 (y8)+ 8.2.3 (r48108)
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ippSP SSE4.1/4.2 (y8)+ 8.2.3 (r48108)
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ippVC SSE4.1/4.2 (y8)+ 8.2.3 (r48108)
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: Processor supports Advanced Vector Extensions instruction set
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: 16 cores on die
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ippGetMaxCacheSizeB 4096 k
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: Available 0xdf Enabled 0xdf
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: MMX A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SSE A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SSE2 A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SSE3 A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SSSE3 A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: MOVBE X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SSE41 A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SSE42 A E
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: AVX X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: AVX(OS) X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: AES X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: CLMUL X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ABR X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: RDRRAND X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: F16C X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: AVX2 X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: ADCOX X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: RDSEED X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: PREFETCHW X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: SHA X X
May 5 10:05:24 bl-108-vm01 ssp_x86Linux_boot: KNC X X

Unfortunately this results in a segmentation fault rather quickly in our testing. The back trace appears corrupted.

#0 0x00007f6075f8e570 in y8_ipps_cRadix4FwdNorm_32fc () from /usr/dialogic/data/ssp.mlm
#1 0x0000000000000000 in ?? ()

Based on the error above, I assumed the CPU type selected is not quite valid. Since the compiler (and Intel documentation) shows the ippInitCpu is deprecated, we changed our code to use the ippSetCpuFeatures providing a mask value to override the 'available features mask'.

As far as I can tell, using the recommended CPU type value (ippCpuSSE, 0x40) I expect this results in a feature mask of 0x1f. This in turn results in an instruction set of u8. Am I missing something? What feature mask value(s) should we try?