- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All:
I run the following command at Windows 7 x64.
mpiexec.exe -localonly -n 2 MyMPIApp.exe
And I got some problem.
result command received but the wait_list is empty.
unable to handle the command: "cmd=result src=1 dest=0 tag=0 cmd_tag=0 cmd_orig=
start_dbs kvs_name=30BA3E56-CB7A-40dd-9D07-8F5342D03976 domain_name=CB42CD80-05B
0-4e8b-9F76-CCD5882F9593 result=SUCCESS "
error closing the unknown context socket: Error = -1
sock_op_close returned while unknown context is in state: SMPD_IDLE
Is there any way to do further debugging? Thank you!
regards,
Seifer
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Seifer,
There are several debugging options you can try. Using the environment variable I_MPI_DEBUG at runtime will generate some debugging information, generally 5 is a good starting value.
[plain]mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe[/plain]You can compile with -check_mpi in order to link to correctness checking libraries. You can get logs from the smpd by using
[plain]smpd -traceon (as administrator)
mpiexec -n 2 test.exe
smpd -traceoff (as administrator)[/plain]I wouldrecommend using this, as the error you are getting appears to be from smpd. What version of the Intel MPI Library are you using? Have you been able to run one of the provided sample programs located in the test folder in the installation path?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
There are several debugging options you can try. Using the environment variable I_MPI_DEBUG at runtime will generate some debugging information, generally 5 is a good starting value.
[plain]mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe[/plain]You can compile with -check_mpi in order to link to correctness checking libraries. You can get logs from the smpd by using
[plain]smpd -traceon
Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James:
The followings are the log from smpd.
[01:183868]./SMPDU_Sock_post_connect
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CONNECT
[01:183868]..\smpd_handle_op_connect
[01:183868]../smpd_handle_op_connect
[01:183868]./smpd_enter_at_state
[01:183868].\smpd_create_command
[01:183868]..\smpd_init_command
[01:183868]../smpd_init_command
[01:183868]./smpd_create_command
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_post_write_command
[01:183868]..\smpd_package_command
[01:183868]../smpd_package_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..smpd_post_write_command on the pmi context sock 880: 118 bytes for command: "cmd=init src=1 dest=0 tag=0 ctx_key=0 name=57D3EA7D-F5DC-40ef-BEB6-A661AA3784B1 key=7 value=8 node_id=1 "
[01:183868]..\SMPDU_Sock_post_writev
[01:183868]../SMPDU_Sock_post_writev
[01:183868]./smpd_post_write_command
[01:183868].\smpd_post_read_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..posting a read for a command header on the pmi context, sock 880
[01:183868]..\SMPDU_Sock_post_read
[01:183868]...\SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_readv
[01:183868]../SMPDU_Sock_post_read
[01:183868]./smpd_post_read_command
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd_header
[01:183868]....read command header
[01:183868]....command header read, posting read for data: 66 bytes
[01:183868]....\SMPDU_Sock_post_read
[01:183868].....\SMPDU_Sock_post_readv
[01:183868]...../SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_read
[01:183868].../smpd_state_reading_cmd_header
[01:183868]../smpd_handle_op_read
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd
[01:183868]....read command
[01:183868]....\smpd_parse_command
[01:183868]..../smpd_parse_command
[01:183868]....read command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868]....\smpd_handle_command
[01:183868].....handling command:
[01:183868]..... src = 0
[01:183868]..... dest = 1
[01:183868]..... cmd = result
[01:183868]..... tag = 9
[01:183868]..... ctx = pmi
[01:183868]..... len = 66
[01:183868]..... str = cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS
[01:183868].....\smpd_command_destination
[01:183868]......1 -> 1 : returning NULL context
[01:183868]...../smpd_command_destination
[01:183868].....\smpd_handle_result
[01:183868]......ERROR:result command received but the wait_list is empty.
[01:183868]...../smpd_handle_result
[01:183868]..../smpd_handle_command
[01:183868]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868].../smpd_state_reading_cmd
[01:183868]../smpd_handle_op_read
[01:183868]..SOCK_OP_READ failed - result = -1, closing pmi context.
[01:183868]..\SMPDU_Sock_post_close
[01:183868]...\SMPDU_Sock_post_read
[01:183868]....\SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_read
[01:183868]../SMPDU_Sock_post_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_CLOSING state.
[01:183868]...Unaffiliated pmi context closing.
[01:183868]...\smpd_free_context
[01:183868]....freeing pmi context.
[01:183868]....\smpd_init_context
[01:183868].....\smpd_init_command
[01:183868]...../smpd_init_command
[01:183868]..../smpd_init_context
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_IDLE state.
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
[01:183868]...\smpd_free_context
[01:183868]....freeing a context not in the global list - this should be impossible.
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]./SMPDU_Sock_post_connect[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CONNECT
[01:183868]..\smpd_handle_op_connect
[01:183868]../smpd_handle_op_connect
[01:183868]./smpd_enter_at_state
[01:183868].\smpd_create_command
[01:183868]..\smpd_init_command
[01:183868]../smpd_init_command
[01:183868]./smpd_create_command
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_post_write_command
[01:183868]..\smpd_package_command
[01:183868]../smpd_package_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..smpd_post_write_command on the pmi context sock 880: 118 bytes for command: "cmd=init src=1 dest=0 tag=0 ctx_key=0 name=57D3EA7D-F5DC-40ef-BEB6-A661AA3784B1 key=7 value=8 node_id=1 "
[01:183868]..\SMPDU_Sock_post_writev
[01:183868]../SMPDU_Sock_post_writev
[01:183868]./smpd_post_write_command
[01:183868].\smpd_post_read_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..posting a read for a command header on the pmi context, sock 880
[01:183868]..\SMPDU_Sock_post_read
[01:183868]...\SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_readv
[01:183868]../SMPDU_Sock_post_read
[01:183868]./smpd_post_read_command
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd_header
[01:183868]....read command header
[01:183868]....command header read, posting read for data: 66 bytes
[01:183868]....\SMPDU_Sock_post_read
[01:183868].....\SMPDU_Sock_post_readv
[01:183868]...../SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_read
[01:183868].../smpd_state_reading_cmd_header
[01:183868]../smpd_handle_op_read
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd
[01:183868]....read command
[01:183868]....\smpd_parse_command
[01:183868]..../smpd_parse_command
[01:183868]....read command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868]....\smpd_handle_command
[01:183868].....handling command:
[01:183868]..... src = 0
[01:183868]..... dest = 1
[01:183868]..... cmd = result
[01:183868]..... tag = 9
[01:183868]..... ctx = pmi
[01:183868]..... len = 66
[01:183868]..... str = cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS
[01:183868].....\smpd_command_destination
[01:183868]......1 -> 1 : returning NULL context
[01:183868]...../smpd_command_destination
[01:183868].....\smpd_handle_result
[01:183868]......ERROR:result command received but the wait_list is empty.
[01:183868]...../smpd_handle_result[01:183868]..../smpd_handle_command
[01:183868]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868].../smpd_state_reading_cmd
[01:183868]../smpd_handle_op_read
[01:183868]..SOCK_OP_READ failed - result = -1, closing pmi context.
[01:183868]..\SMPDU_Sock_post_close
[01:183868]...\SMPDU_Sock_post_read
[01:183868]....\SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_read
[01:183868]../SMPDU_Sock_post_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_CLOSING state.
[01:183868]...Unaffiliated pmi context closing.
[01:183868]...\smpd_free_context
[01:183868]....freeing pmi context.
[01:183868]....\smpd_init_context
[01:183868].....\smpd_init_command
[01:183868]...../smpd_init_command
[01:183868]..../smpd_init_context
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_IDLE state.
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
[01:183868]...\smpd_free_context
[01:183868]....freeing a context not in the global list - this should be impossible.
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
regards,
Seifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James:
Yesterday we've done another test with
-genv I_MPI_FABRICS shm
And we still got the same error:
[01:441348]......ERROR:result command received but the wait_list is empty.
[01:441348]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "unable to read the cmd header on the pmi context, Error = -1.
[01:441348]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
regards,
Seifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Seifer,
Have you tried using
[plain]mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe[/plain]
Both with your program and with the test program provided with the Intel MPI Library? What output do you get from this?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
Have you tried using
[plain]mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe[/plain]
Both with your program and with the test program provided with the Intel MPI Library? What output do you get from this?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James:
The options for mpiexec.exe are
-genv I_MPI_DEBUG 5 -genv I_MPI_PLATFORM auto -genv I_MPI_FABRICS shm -genv I_MPI_WAIT_MODE 1
Since the problems occurred in the computer of our customer, I testedonlyour program.
But the problems doesn't occur every time in batch jobs.
Is "the test program provided with the Intel MPI Library" you mentionedIMB-MPI1.exe ?
regards,
Seifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Seifer,
I was intending you tocompile one of the files in\test\ and use it. These are simple hello world MPI programs that can test basic connectivity and MPI functionality.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
I was intending you tocompile one of the files in
Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page