Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

MPI Coding

kmvinoth
Beginner
894 Views
Hi i am doing MPI coding in C++ using Intel compiler in a linux cluster.when i compile my program i got a warning message as "remark: Partial loop was vectorized".After googling for sometime i used # pragma novector command before the loop and i got rid of that warning message.but when i run the mpi program it is not giving me the expected answer.only certain processors in the nodes work while the others are not.given below is the output. I hope some one can help me out.thanks in advance.

Q from process 0 = 0.000000e+00
Q from process 2 = 0.000000e+00
Q from process 6 = 0.000000e+00
Q from process 4 = 0.000000e+00
Q from process 7 = 2.684355e+08
Q from process 1 = 2.684355e+08
Q from process 5 = 2.684355e+08
Q from process 3 = 2.684355e+08
Total Q = 1.073742e+09

Regards
vinoth
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
894 Views
Quoting - kmvinoth

jim

No,I dont know what is 36-bit PDP-10.i wrote the codefor 2^25 and it is running well without any problem based on that only i have gone for 2^36 where the problem occurs(Partial Loop was vectorized).

let me explain about my cluster(Vega Cluster, hpce.iitm.ac.in/website/vega.html)where i am running my code.we have 8 processors in each node and each processor is 64 bit and i am using 64 bit Intel C++ compiler to run my code.what more my code need to run successfully without any problem in the cluster.

Regards
vinoth

What is sizeof(int) on each system? If sizeof(int)==4 then 2^36 will not fit in an int variable.

Change the type of your indexing variable(s) and variables use for limits/countsto intptr_t (an int whos size is that of pointer).

BTW PDP-10 was built by Digital Equipment Corporation back in the late 1960's. These machines had 36-bit word size.

Jim Dempsey


View solution in original post

0 Kudos
17 Replies
jimdempseyatthecove
Honored Contributor III
894 Views

Vinoth,

Check for uninitialized/unused variable (either input or output). If less than 8 threads are requried to perform work then you might see junk or NULL as result for uninitialized/unused variable.

Jim Dempsey
0 Kudos
kmvinoth
Beginner
894 Views

Vinoth,

Check for uninitialized/unused variable (either input or output). If less than 8 threads are requried to perform work then you might see junk or NULL as result for uninitialized/unused variable.

Jim Dempsey
Jim

Its not clear to me what you said.I have also checked for uninitialized/unsed variable in my code and there is no uninitialized/unsed variable in my code. apart from that always the processors with even number like 0,2,4,6 are giving the answer as zero while the odd numbers are giving some answer.hope i made it clear
0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views

I will type this slower so you can read this more easily...

Assume you have a parallel distribution point (e.g. parallel for) that is distributed across multiple machines using OpenMPI. However, also assume the iteration space for the parallel distribution point has fewer distributions than you have systems (threads if viewing as OpenMP). What do you expect for the results from the processors NOT scheduled?

Also, your configuration may be set up such that for processors withHT (Hyper Thread), only one of the siblings is scheduled. IOW if your iteration space is larger than number of "processors" only half get scheduled.

Therefore, if your setup is for each processor in your "system of systems" is to write back resultsto a mailbox, one slot per "processor" (HW thread), then for the processor(s) (thread(s))NOT scheduled, you will have no delivery into the mailbox. IOW the mailbox will have stale data (0, old value, or uninitialized data).

Jim Dempsey


0 Kudos
kmvinoth
Beginner
894 Views

I will type this slower so you can read this more easily...

Assume you have a parallel distribution point (e.g. parallel for) that is distributed across multiple machines using OpenMPI. However, also assume the iteration space for the parallel distribution point has fewer distributions than you have systems (threads if viewing as OpenMP). What do you expect for the results from the processors NOT scheduled?

Also, your configuration may be set up such that for processors withHT (Hyper Thread), only one of the siblings is scheduled. IOW if your iteration space is larger than number of "processors" only half get scheduled.

Therefore, if your setup is for each processor in your "system of systems" is to write back resultsto a mailbox, one slot per "processor" (HW thread), then for the processor(s) (thread(s))NOT scheduled, you will have no delivery into the mailbox. IOW the mailbox will have stale data (0, old value, or uninitialized data).

Jim Dempsey


hi Jim

I understood what you said, but it has not solved my problem.my question is how to ovecome that issue and get rid of the problem.

Regards
vinoth
0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views

Vinoth,

Insert code to have each process write diagnostic information that helps track down the problem.e.g.

are all the processes actually called?
is each process called with the input data you assume they are being called with?
are the results produced within the process (and diagnosticly stored)the results you expect?
are (all of)the results produced within the process returned to the controlling process?
is the controlling process waiting for all of the results?

When you discover the problem I anticipate a "Eurika moment".

Good luck hunting. I am sure it is just a small oversight.

Jim
0 Kudos
kmvinoth
Beginner
894 Views

Vinoth,

Insert code to have each process write diagnostic information that helps track down the problem.e.g.

are all the processes actually called?
is each process called with the input data you assume they are being called with?
are the results produced within the process (and diagnosticly stored)the results you expect?
are (all of)the results produced within the process returned to the controlling process?
is the controlling process waiting for all of the results?

When you discover the problem I anticipate a "Eurika moment".

Good luck hunting. I am sure it is just a small oversight.

Jim

jim

i have given my code for your reference to track down the problem.each processor should give the answer as 8589934592 andthe final value of Q(Total Partition function) should be 6.871947674*10^10.i am also trying to discover the problem.
0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views

Not knowing the systems in your cluster...

Are the even numbered systems 32-bit?

# define N 36
# define SIZE pow(2,N)
...
int i,n,p,tp,sv,ev;
int a[36];
n=(int)SIZE; // *** overflows on 32-bit system

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views

??
Was this code originaly written on 36-bit PDP-10?

Jim Dempsey
0 Kudos
kmvinoth
Beginner
894 Views

??
Was this code originaly written on 36-bit PDP-10?

Jim Dempsey

jim

No,I dont know what is 36-bit PDP-10.i wrote the codefor 2^25 and it is running well without any problem based on that only i have gone for 2^36 where the problem occurs(Partial Loop was vectorized).

let me explain about my cluster(Vega Cluster, hpce.iitm.ac.in/website/vega.html)where i am running my code.we have 8 processors in each node and each processor is 64 bit and i am using 64 bit Intel C++ compiler to run my code.what more my code need to run successfully without any problem in the cluster.

Regards
vinoth
0 Kudos
jimdempseyatthecove
Honored Contributor III
895 Views
Quoting - kmvinoth

jim

No,I dont know what is 36-bit PDP-10.i wrote the codefor 2^25 and it is running well without any problem based on that only i have gone for 2^36 where the problem occurs(Partial Loop was vectorized).

let me explain about my cluster(Vega Cluster, hpce.iitm.ac.in/website/vega.html)where i am running my code.we have 8 processors in each node and each processor is 64 bit and i am using 64 bit Intel C++ compiler to run my code.what more my code need to run successfully without any problem in the cluster.

Regards
vinoth

What is sizeof(int) on each system? If sizeof(int)==4 then 2^36 will not fit in an int variable.

Change the type of your indexing variable(s) and variables use for limits/countsto intptr_t (an int whos size is that of pointer).

BTW PDP-10 was built by Digital Equipment Corporation back in the late 1960's. These machines had 36-bit word size.

Jim Dempsey


0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views

I forgot to add that your code may have worked by accident dependent on content of data immediately following the indexing variables of your loop.

Jim
0 Kudos
TimP
Honored Contributor III
889 Views

What is sizeof(int) on each system? If sizeof(int)==4 then 2^36 will not fit in an int variable.

Change the type of your indexing variable(s) and variables use for limits/countsto intptr_t (an int whos size is that of pointer).

BTW PDP-10 was built by Digital Equipment Corporation back in the late 1960's. These machines had 36-bit word size.


36-bit machines, such as Honeywell 6000, in the early 80's, had 9-bit bytes, so sizeof(int) == 4. Most C programmers had moved on to 32-bit machines by then. We didn't even have a C compiler for it.
0 Kudos
kmvinoth
Beginner
894 Views
Quoting - tim18
36-bit machines, such as Honeywell 6000, in the early 80's, had 9-bit bytes, so sizeof(int) == 4. Most C programmers had moved on to 32-bit machines by then. We didn't even have a C compiler for it.

Jim

Thank you very much for your useful suggestion for the past one week on this problem.I changed the data type of the variables as suggested by you and it works well and good now(really a eurika moment).I have one more question what is the maximum number i can go. can i go for 2^256?

Regards
vinoth
0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views

For systems with 64-bit words unsigned integer of 64 bits you have 2^64-1. If you are using signed int it is +2^63-1 maxdown to -2^63 min.

If you wish to express integers of larger size you can create a class with operators for this purpose. There are some available on the web. Here is one link http://www.codeproject.com/KB/cpp/lint.aspx

How do you intend to use a number (variable) containing 2^256?
If you are only interested in the power 2 is raised to then consider holding n of 2^n instead of the result.

Jim Dempsey

0 Kudos
kmvinoth
Beginner
894 Views

For systems with 64-bit words unsigned integer of 64 bits you have 2^64-1. If you are using signed int it is +2^63-1 maxdown to -2^63 min.

If you wish to express integers of larger size you can create a class with operators for this purpose. There are some available on the web. Here is one link http://www.codeproject.com/KB/cpp/lint.aspx

How do you intend to use a number (variable) containing 2^256?
If you are only interested in the power 2 is raised to then consider holding n of 2^n instead of the result.

Jim Dempsey

jim

what you are saying is not clear to me.it will be much useful if you explain it with an example.

Regards
vinoth
0 Kudos
jimdempseyatthecove
Honored Contributor III
894 Views
Vinoth,

In an 8-bit system

[cpp]00000001 = 2^0 = 1
00000010 = 2^1 = 2
00000100 = 2^2 = 4
00001000 = 2^3 = 8
00010000 = 2^4 = 16
00100000 = 2^5 = 32
01000000 = 2^6 = 64
10000000 = 2^7 = 128 (usigned) or -128 (signed)
11111111 = 2^7+2^6+2^5+2^4+2^3+2^2+2^1+2^0 = 255 (unsigned) or -1 (signed)

A 16-bit system would extend this 8 more bits,
A 32-bit ... more bits
A 64-bit ... more bits

void DoSomething() { // ... your code to do something } // Do something 2^N times void DoSomething(int N) { if(N > 0) DoSomething(N-1); DoSomething(); } Jim Dempsey[/cpp]
0 Kudos
kmvinoth
Beginner
894 Views
Vinoth,

In an 8-bit system

[cpp]00000001 = 2^0 = 1
00000010 = 2^1 = 2
00000100 = 2^2 = 4
00001000 = 2^3 = 8
00010000 = 2^4 = 16
00100000 = 2^5 = 32
01000000 = 2^6 = 64
10000000 = 2^7 = 128 (usigned) or -128 (signed)
11111111 = 2^7+2^6+2^5+2^4+2^3+2^2+2^1+2^0 = 255 (unsigned) or -1 (signed)

A 16-bit system would extend this 8 more bits,
A 32-bit ... more bits
A 64-bit ... more bits

void DoSomething() { // ... your code to do something } // Do something 2^N times void DoSomething(int N) { if(N > 0) DoSomething(N-1); DoSomething(); } Jim Dempsey[/cpp]
jim

i got it.Thank you for your explaination jim.

regards
vinoth
0 Kudos
Reply