- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For some reason the SCIF interface in my compute nodes is refusing connections. Any ideas on what's wrong or where to start investigating:
The node has a Mellanox ConnectX-3 HCA with the latest Gold Update 2 MPSS and everything else set up "by the book". All the IB services and modules load nicely and seem to work and I can ssh into the MIC and run natively.
However, if I try to run an offload (LEO or OpenCL) application it hangs. Doing an strace reveals the following:
[plain]
mmap(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f737396e000
mprotect(0x7f737396e000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f737436dfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f737436e9d0, tls=0x7f737436e700, child_tidptr=0x7f737436e9d0) = 26801
open("/dev/mic/scif", O_RDWR) = 5
fcntl(5, F_SETFD, FD_CLOEXEC) = 0
ioctl(5, 0xc0087303, 0x7fffa02d2710) = 0
futex(0x7f737436e9d0, FUTEX_WAIT, 26801, NULL) = 0
close(4) = 0
ioctl(3, 0xc0087303, 0x7fffa02d27d0) = -1 ECONNREFUSED (Connection refused)
nanosleep({0, 10000000}, NULL) = 0
ioctl(3, 0xc0087303, 0x7fffa02d27d0) = -1 ECONNREFUSED (Connection refused)
nanosleep({0, 20000000}, NULL) = 0
ioctl(3, 0xc0087303, 0x7fffa02d27d0) = -1 ECONNREFUSED (Connection refused)
nanosleep({0, 40000000}, NULL) = 0
[/plain]
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pinpointed the problem: We use a slightly customized system for user management on the MICs and due to that the 'micuser' user was missing during mpssd and ofed-mic initialization. I now added the user and offloading seems to work again. Suggestion: It would be nice to have a sanity check for this.
Olli-Pekka

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page