Software Archive
Read-only legacy content
17061 Discussions

Error mesage when running Intel® Optimized LINPACK Benchmark for Linux* OS on Intel Phi cards.

Tinway_Chen
Beginner
255 Views

Hi,

I am trying Intel® Optimized LINPACK Benchmark for Linux* OS on Multi-Intel Phi cards configuration.

 (http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/GUID-D15B5C2F-07AC-4449-B148-6AF1DFDE674D.htm).

 

My test environment :

  1. AIC Sandy Bridge EP-4S server system with Sandy Bridge EP-4S *4 + 98GB memory
  2. Intel Xeon Phi : 3 pcs of 3110 and 4 pcs of 3115
  3. OS: Redhat Enterprise Linux 6.2 x64
  4. Xeon Phi MPSS: KNC_gold_update_2-2.1.5889-16-rhel-6.2.tar
  5. Intel Composer XE : l_ccompxe_2013.3.163.tgz
  6. Intel MPI : l_mpi_p_4.1.0.024.tgz or l_mpi_p_4.1.0.030.tgz

After ran the runme_xeon64_ao script to enables acceleration by offloading computations to Intel Xeon Phi coprocessors available on the system, I found that when I increase the HPL problem size(Ns) to a arrange, Linpack process(xlinpack_xeon64) will run endlessly and can’t be finished and found some relevant error message in host system log . For example, at 7 pcs Phi configuration, I got this problem when I set HPL problem size(Ns) to 46000. It related to Phi card quantity. At 1 pcs Phi configuration, I can increase HPL problem size(Ns) to 100000 without problem.

 

The below is error message:

 

__scif_fence_wait 3041 err -16

dma_mark_wait 1080 TO chan 0x0

drain_dma_intr 1151 err -16

micscif_rma_destroy_temp_windows 2082 DMA channel 0 hung ep->state 2 window->dma_mark 0x1c0 channel_mark 0x1c2

------------[ cut here ]------------

WARNING: at /home/build/sandbox/mpss/MPSS_4982/k1om/rhel-6.2/mpss/.rpmbuild_4982/BUILD/intel-mic-kmod-2.1.4982/micscif_rma.c:2084 micscif_rma_destroy_temp_windows+0x314/0x540 [mic]() (Not tainted)

Hardware name: SB301-TO

Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 mic(U) microcode sg ixgbe dca mdio sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp e1000e i2c_i801 i2c_core ext4 mbcache jbd2 sr_mod cdrom usb_storage sd_mod crc_t10dif ahci isci libsas scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 2812, comm: SCIF_MISC Not tainted 2.6.32-220.el6.x86_64 #1

Call Trace:

 [<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0

 [<ffffffff81069bca>] ? warn_slowpath_null+0x1a/0x20

 [<ffffffffa0235664>] ? micscif_rma_destroy_temp_windows+0x314/0x540 [mic]

 [<ffffffffa02321b5>] ? micscif_rma_handle_remote_fences+0x155/0x380 [mic]

 [<ffffffff814eca40>] ? thread_return+0x4e/0x77e

 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20

 [<ffffffffa022a0f0>] ? micscif_misc_handler+0x0/0xc0 [mic]

 [<ffffffffa022a10a>] ? micscif_misc_handler+0x1a/0xc0 [mic]

 [<ffffffffa022a0f0>] ? micscif_misc_handler+0x0/0xc0 [mic]

 [<ffffffff8108b2b0>] ? worker_thread+0x170/0x2a0

 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40

 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0

 [<ffffffff81090886>] ? kthread+0x96/0xa0

 [<ffffffff8100c14a>] ? child_rip+0xa/0x20

 [<ffffffff810907f0>] ? kthread+0x0/0xa0

 [<ffffffff8100c140>] ? child_rip+0x0/0x20

---[ end trace e0d2c31584645743 ]---

0 Kudos
0 Replies
Reply