Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

The error message of Intel MPI

Seifer_Lin
Beginner
950 Views

Hi All:

I run the following command at Windows 7 x64.

mpiexec.exe -localonly -n 2 MyMPIApp.exe

And I got some problem.

result command received but the wait_list is empty.

unable to handle the command: "cmd=result src=1 dest=0 tag=0 cmd_tag=0 cmd_orig=

start_dbs kvs_name=30BA3E56-CB7A-40dd-9D07-8F5342D03976 domain_name=CB42CD80-05B

0-4e8b-9F76-CCD5882F9593 result=SUCCESS "

error closing the unknown context socket: Error = -1

sock_op_close returned while unknown context is in state: SMPD_IDLE

Is there any way to do further debugging? Thank you!


regards,

Seifer

0 Kudos
6 Replies
James_T_Intel
Moderator
950 Views
Hi Seifer,

There are several debugging options you can try. Using the environment variable I_MPI_DEBUG at runtime will generate some debugging information, generally 5 is a good starting value.

[plain]mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe[/plain]You can compile with -check_mpi in order to link to correctness checking libraries. You can get logs from the smpd by using

[plain]smpd -traceon (as administrator) mpiexec -n 2 test.exe smpd -traceoff (as administrator)[/plain]I wouldrecommend using this, as the error you are getting appears to be from smpd. What version of the Intel MPI Library are you using? Have you been able to run one of the provided sample programs located in the test folder in the installation path?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Seifer_Lin
Beginner
950 Views
Hi James:
The followings are the log from smpd.
[01:183868]./SMPDU_Sock_post_connect
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CONNECT
[01:183868]..\smpd_handle_op_connect
[01:183868]../smpd_handle_op_connect
[01:183868]./smpd_enter_at_state
[01:183868].\smpd_create_command
[01:183868]..\smpd_init_command
[01:183868]../smpd_init_command
[01:183868]./smpd_create_command
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_post_write_command
[01:183868]..\smpd_package_command
[01:183868]../smpd_package_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..smpd_post_write_command on the pmi context sock 880: 118 bytes for command: "cmd=init src=1 dest=0 tag=0 ctx_key=0 name=57D3EA7D-F5DC-40ef-BEB6-A661AA3784B1 key=7 value=8 node_id=1 "
[01:183868]..\SMPDU_Sock_post_writev
[01:183868]../SMPDU_Sock_post_writev
[01:183868]./smpd_post_write_command
[01:183868].\smpd_post_read_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..posting a read for a command header on the pmi context, sock 880
[01:183868]..\SMPDU_Sock_post_read
[01:183868]...\SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_readv
[01:183868]../SMPDU_Sock_post_read
[01:183868]./smpd_post_read_command
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd_header
[01:183868]....read command header
[01:183868]....command header read, posting read for data: 66 bytes
[01:183868]....\SMPDU_Sock_post_read
[01:183868].....\SMPDU_Sock_post_readv
[01:183868]...../SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_read
[01:183868].../smpd_state_reading_cmd_header
[01:183868]../smpd_handle_op_read
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd
[01:183868]....read command
[01:183868]....\smpd_parse_command
[01:183868]..../smpd_parse_command
[01:183868]....read command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868]....\smpd_handle_command
[01:183868].....handling command:
[01:183868]..... src = 0
[01:183868]..... dest = 1
[01:183868]..... cmd = result
[01:183868]..... tag = 9
[01:183868]..... ctx = pmi
[01:183868]..... len = 66
[01:183868]..... str = cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS
[01:183868].....\smpd_command_destination
[01:183868]......1 -> 1 : returning NULL context
[01:183868]...../smpd_command_destination
[01:183868].....\smpd_handle_result
[01:183868]......ERROR:result command received but the wait_list is empty.
[01:183868]...../smpd_handle_result
[01:183868]..../smpd_handle_command
[01:183868]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868].../smpd_state_reading_cmd
[01:183868]../smpd_handle_op_read
[01:183868]..SOCK_OP_READ failed - result = -1, closing pmi context.
[01:183868]..\SMPDU_Sock_post_close
[01:183868]...\SMPDU_Sock_post_read
[01:183868]....\SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_read
[01:183868]../SMPDU_Sock_post_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_CLOSING state.
[01:183868]...Unaffiliated pmi context closing.
[01:183868]...\smpd_free_context
[01:183868]....freeing pmi context.
[01:183868]....\smpd_init_context
[01:183868].....\smpd_init_command
[01:183868]...../smpd_init_command
[01:183868]..../smpd_init_context
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_IDLE state.
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
[01:183868]...\smpd_free_context
[01:183868]....freeing a context not in the global list - this should be impossible.
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]./SMPDU_Sock_post_connect
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CONNECT
[01:183868]..\smpd_handle_op_connect
[01:183868]../smpd_handle_op_connect
[01:183868]./smpd_enter_at_state
[01:183868].\smpd_create_command
[01:183868]..\smpd_init_command
[01:183868]../smpd_init_command
[01:183868]./smpd_create_command
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_arg
[01:183868]./smpd_add_command_arg
[01:183868].\smpd_add_command_int_arg
[01:183868]./smpd_add_command_int_arg
[01:183868].\smpd_post_write_command
[01:183868]..\smpd_package_command
[01:183868]../smpd_package_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..smpd_post_write_command on the pmi context sock 880: 118 bytes for command: "cmd=init src=1 dest=0 tag=0 ctx_key=0 name=57D3EA7D-F5DC-40ef-BEB6-A661AA3784B1 key=7 value=8 node_id=1 "
[01:183868]..\SMPDU_Sock_post_writev
[01:183868]../SMPDU_Sock_post_writev
[01:183868]./smpd_post_write_command
[01:183868].\smpd_post_read_command
[01:183868]..\SMPDU_Sock_get_sock_id
[01:183868]../SMPDU_Sock_get_sock_id
[01:183868]..posting a read for a command header on the pmi context, sock 880
[01:183868]..\SMPDU_Sock_post_read
[01:183868]...\SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_readv
[01:183868]../SMPDU_Sock_post_read
[01:183868]./smpd_post_read_command
[01:183868].\smpd_enter_at_state
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd_header
[01:183868]....read command header
[01:183868]....command header read, posting read for data: 66 bytes
[01:183868]....\SMPDU_Sock_post_read
[01:183868].....\SMPDU_Sock_post_readv
[01:183868]...../SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_read
[01:183868].../smpd_state_reading_cmd_header
[01:183868]../smpd_handle_op_read
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_READ event.error = 0, result = 0, context->type=15
[01:183868]..\smpd_handle_op_read
[01:183868]...\smpd_state_reading_cmd
[01:183868]....read command
[01:183868]....\smpd_parse_command
[01:183868]..../smpd_parse_command
[01:183868]....read command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868]....\smpd_handle_command
[01:183868].....handling command:
[01:183868]..... src = 0
[01:183868]..... dest = 1
[01:183868]..... cmd = result
[01:183868]..... tag = 9
[01:183868]..... ctx = pmi
[01:183868]..... len = 66
[01:183868]..... str = cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS
[01:183868].....\smpd_command_destination
[01:183868]......1 -> 1 : returning NULL context
[01:183868]...../smpd_command_destination
[01:183868].....\smpd_handle_result
[01:183868]......ERROR:result command received but the wait_list is empty.
[01:183868]...../smpd_handle_result[01:183868]..../smpd_handle_command
[01:183868]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "
[01:183868].../smpd_state_reading_cmd
[01:183868]../smpd_handle_op_read
[01:183868]..SOCK_OP_READ failed - result = -1, closing pmi context.
[01:183868]..\SMPDU_Sock_post_close
[01:183868]...\SMPDU_Sock_post_read
[01:183868]....\SMPDU_Sock_post_readv
[01:183868]..../SMPDU_Sock_post_readv
[01:183868].../SMPDU_Sock_post_read
[01:183868]../SMPDU_Sock_post_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_CLOSING state.
[01:183868]...Unaffiliated pmi context closing.
[01:183868]...\smpd_free_context
[01:183868]....freeing pmi context.
[01:183868]....\smpd_init_context
[01:183868].....\smpd_init_command
[01:183868]...../smpd_init_command
[01:183868]..../smpd_init_context
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
[01:183868]../SMPDU_Sock_wait
[01:183868]..SOCK_OP_CLOSE
[01:183868]..\smpd_handle_op_close
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...op_close received - SMPD_IDLE state.
[01:183868]...\smpd_get_state_string
[01:183868].../smpd_get_state_string
[01:183868]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
[01:183868]...\smpd_free_context
[01:183868]....freeing a context not in the global list - this should be impossible.
[01:183868].../smpd_free_context
[01:183868]../smpd_handle_op_close
[01:183868]..sock_waiting for the next event.
[01:183868]..\SMPDU_Sock_wait
regards,
Seifer
0 Kudos
Seifer_Lin
Beginner
950 Views
Hi James:
Yesterday we've done another test with
-genv I_MPI_FABRICS shm
And we still got the same error:
[01:441348]......ERROR:result command received but the wait_list is empty.
[01:441348]....ERROR:unable to handle the command: "cmd=result src=0 dest=1 tag=9 cmd_tag=0 ctx_key=0 result=SUCCESS "unable to read the cmd header on the pmi context, Error = -1.
[01:441348]...ERROR:sock_op_close returned while unknown context is in state: SMPD_IDLE
regards,
Seifer
0 Kudos
James_T_Intel
Moderator
950 Views
Hi Seifer,

Have you tried using

[plain]mpiexec -n 2 -env I_MPI_DEBUG 5 test.exe[/plain]
Both with your program and with the test program provided with the Intel MPI Library? What output do you get from this?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Seifer_Lin
Beginner
950 Views
Hi James:
The options for mpiexec.exe are
-genv I_MPI_DEBUG 5 -genv I_MPI_PLATFORM auto -genv I_MPI_FABRICS shm -genv I_MPI_WAIT_MODE 1
Since the problems occurred in the computer of our customer, I testedonlyour program.
But the problems doesn't occur every time in batch jobs.
Is "the test program provided with the Intel MPI Library" you mentionedIMB-MPI1.exe ?
regards,
Seifer
0 Kudos
James_T_Intel
Moderator
950 Views
Hi Seifer,

I was intending you tocompile one of the files in\test\ and use it. These are simple hello world MPI programs that can test basic connectivity and MPI functionality.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
0 Kudos
Reply