- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi i am doing MPI coding in C++ using Intel compiler in a linux cluster.when i compile my program i got a warning message as "remark: Partial loop was vectorized".After googling for sometime i used # pragma novector command before the loop and i got rid of that warning message.but when i run the mpi program it is not giving me the expected answer.only certain processors in the nodes work while the others are not.given below is the output. I hope some one can help me out.thanks in advance.
Q from process 0 = 0.000000e+00
Q from process 2 = 0.000000e+00
Q from process 6 = 0.000000e+00
Q from process 4 = 0.000000e+00
Q from process 7 = 2.684355e+08
Q from process 1 = 2.684355e+08
Q from process 5 = 2.684355e+08
Q from process 3 = 2.684355e+08
Total Q = 1.073742e+09
Regards
vinoth
Q from process 0 = 0.000000e+00
Q from process 2 = 0.000000e+00
Q from process 6 = 0.000000e+00
Q from process 4 = 0.000000e+00
Q from process 7 = 2.684355e+08
Q from process 1 = 2.684355e+08
Q from process 5 = 2.684355e+08
Q from process 3 = 2.684355e+08
Total Q = 1.073742e+09
Regards
vinoth
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - kmvinoth
jim
No,I dont know what is 36-bit PDP-10.i wrote the codefor 2^25 and it is running well without any problem based on that only i have gone for 2^36 where the problem occurs(Partial Loop was vectorized).
let me explain about my cluster(Vega Cluster, hpce.iitm.ac.in/website/vega.html)where i am running my code.we have 8 processors in each node and each processor is 64 bit and i am using 64 bit Intel C++ compiler to run my code.what more my code need to run successfully without any problem in the cluster.
Regards
vinoth
What is sizeof(int) on each system? If sizeof(int)==4 then 2^36 will not fit in an int variable.
Change the type of your indexing variable(s) and variables use for limits/countsto intptr_t (an int whos size is that of pointer).
BTW PDP-10 was built by Digital Equipment Corporation back in the late 1960's. These machines had 36-bit word size.
Jim Dempsey
Link Copied
17 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vinoth,
Check for uninitialized/unused variable (either input or output). If less than 8 threads are requried to perform work then you might see junk or NULL as result for uninitialized/unused variable.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Vinoth,
Check for uninitialized/unused variable (either input or output). If less than 8 threads are requried to perform work then you might see junk or NULL as result for uninitialized/unused variable.
Jim Dempsey
Its not clear to me what you said.I have also checked for uninitialized/unsed variable in my code and there is no uninitialized/unsed variable in my code. apart from that always the processors with even number like 0,2,4,6 are giving the answer as zero while the odd numbers are giving some answer.hope i made it clear
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will type this slower so you can read this more easily...
Assume you have a parallel distribution point (e.g. parallel for) that is distributed across multiple machines using OpenMPI. However, also assume the iteration space for the parallel distribution point has fewer distributions than you have systems (threads if viewing as OpenMP). What do you expect for the results from the processors NOT scheduled?
Also, your configuration may be set up such that for processors withHT (Hyper Thread), only one of the siblings is scheduled. IOW if your iteration space is larger than number of "processors" only half get scheduled.
Therefore, if your setup is for each processor in your "system of systems" is to write back resultsto a mailbox, one slot per "processor" (HW thread), then for the processor(s) (thread(s))NOT scheduled, you will have no delivery into the mailbox. IOW the mailbox will have stale data (0, old value, or uninitialized data).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
I will type this slower so you can read this more easily...
Assume you have a parallel distribution point (e.g. parallel for) that is distributed across multiple machines using OpenMPI. However, also assume the iteration space for the parallel distribution point has fewer distributions than you have systems (threads if viewing as OpenMP). What do you expect for the results from the processors NOT scheduled?
Also, your configuration may be set up such that for processors withHT (Hyper Thread), only one of the siblings is scheduled. IOW if your iteration space is larger than number of "processors" only half get scheduled.
Therefore, if your setup is for each processor in your "system of systems" is to write back resultsto a mailbox, one slot per "processor" (HW thread), then for the processor(s) (thread(s))NOT scheduled, you will have no delivery into the mailbox. IOW the mailbox will have stale data (0, old value, or uninitialized data).
Jim Dempsey
I understood what you said, but it has not solved my problem.my question is how to ovecome that issue and get rid of the problem.
Regards
vinoth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vinoth,
Insert code to have each process write diagnostic information that helps track down the problem.e.g.
are all the processes actually called?
is each process called with the input data you assume they are being called with?
are the results produced within the process (and diagnosticly stored)the results you expect?
are (all of)the results produced within the process returned to the controlling process?
is the controlling process waiting for all of the results?
When you discover the problem I anticipate a "Eurika moment".
Good luck hunting. I am sure it is just a small oversight.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Vinoth,
Insert code to have each process write diagnostic information that helps track down the problem.e.g.
are all the processes actually called?
is each process called with the input data you assume they are being called with?
are the results produced within the process (and diagnosticly stored)the results you expect?
are (all of)the results produced within the process returned to the controlling process?
is the controlling process waiting for all of the results?
When you discover the problem I anticipate a "Eurika moment".
Good luck hunting. I am sure it is just a small oversight.
Jim
jim
i have given my code for your reference to track down the problem.each processor should give the answer as 8589934592 andthe final value of Q(Total Partition function) should be 6.871947674*10^10.i am also trying to discover the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not knowing the systems in your cluster...
Are the even numbered systems 32-bit?
# define N 36
# define SIZE pow(2,N)
...
int i,n,p,tp,sv,ev;
int a[36];
n=(int)SIZE; // *** overflows on 32-bit system
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
??
Was this code originaly written on 36-bit PDP-10?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
??
Was this code originaly written on 36-bit PDP-10?
Jim Dempsey
jim
No,I dont know what is 36-bit PDP-10.i wrote the codefor 2^25 and it is running well without any problem based on that only i have gone for 2^36 where the problem occurs(Partial Loop was vectorized).
let me explain about my cluster(Vega Cluster, hpce.iitm.ac.in/website/vega.html)where i am running my code.we have 8 processors in each node and each processor is 64 bit and i am using 64 bit Intel C++ compiler to run my code.what more my code need to run successfully without any problem in the cluster.
Regards
vinoth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - kmvinoth
jim
No,I dont know what is 36-bit PDP-10.i wrote the codefor 2^25 and it is running well without any problem based on that only i have gone for 2^36 where the problem occurs(Partial Loop was vectorized).
let me explain about my cluster(Vega Cluster, hpce.iitm.ac.in/website/vega.html)where i am running my code.we have 8 processors in each node and each processor is 64 bit and i am using 64 bit Intel C++ compiler to run my code.what more my code need to run successfully without any problem in the cluster.
Regards
vinoth
What is sizeof(int) on each system? If sizeof(int)==4 then 2^36 will not fit in an int variable.
Change the type of your indexing variable(s) and variables use for limits/countsto intptr_t (an int whos size is that of pointer).
BTW PDP-10 was built by Digital Equipment Corporation back in the late 1960's. These machines had 36-bit word size.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to add that your code may have worked by accident dependent on content of data immediately following the indexing variables of your loop.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
What is sizeof(int) on each system? If sizeof(int)==4 then 2^36 will not fit in an int variable.
Change the type of your indexing variable(s) and variables use for limits/countsto intptr_t (an int whos size is that of pointer).
BTW PDP-10 was built by Digital Equipment Corporation back in the late 1960's. These machines had 36-bit word size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
36-bit machines, such as Honeywell 6000, in the early 80's, had 9-bit bytes, so sizeof(int) == 4. Most C programmers had moved on to 32-bit machines by then. We didn't even have a C compiler for it.
Jim
Thank you very much for your useful suggestion for the past one week on this problem.I changed the data type of the variables as suggested by you and it works well and good now(really a eurika moment).I have one more question what is the maximum number i can go. can i go for 2^256?
Regards
vinoth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For systems with 64-bit words unsigned integer of 64 bits you have 2^64-1. If you are using signed int it is +2^63-1 maxdown to -2^63 min.
If you wish to express integers of larger size you can create a class with operators for this purpose. There are some available on the web. Here is one link http://www.codeproject.com/KB/cpp/lint.aspx
How do you intend to use a number (variable) containing 2^256?
If you are only interested in the power 2 is raised to then consider holding n of 2^n instead of the result.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
For systems with 64-bit words unsigned integer of 64 bits you have 2^64-1. If you are using signed int it is +2^63-1 maxdown to -2^63 min.
If you wish to express integers of larger size you can create a class with operators for this purpose. There are some available on the web. Here is one link http://www.codeproject.com/KB/cpp/lint.aspx
How do you intend to use a number (variable) containing 2^256?
If you are only interested in the power 2 is raised to then consider holding n of 2^n instead of the result.
Jim Dempsey
what you are saying is not clear to me.it will be much useful if you explain it with an example.
Regards
vinoth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vinoth,
In an 8-bit system
In an 8-bit system
[cpp]00000001 = 2^0 = 1 00000010 = 2^1 = 2 00000100 = 2^2 = 4 00001000 = 2^3 = 8 00010000 = 2^4 = 16 00100000 = 2^5 = 32 01000000 = 2^6 = 64 10000000 = 2^7 = 128 (usigned) or -128 (signed) 11111111 = 2^7+2^6+2^5+2^4+2^3+2^2+2^1+2^0 = 255 (unsigned) or -1 (signed)
A 16-bit system would extend this 8 more bits,
A 32-bit ... more bits
A 64-bit ... more bits
void DoSomething() { // ... your code to do something } // Do something 2^N times void DoSomething(int N) { if(N > 0) DoSomething(N-1); DoSomething(); } Jim Dempsey[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Vinoth,
In an 8-bit system
In an 8-bit system
[cpp]00000001 = 2^0 = 1 00000010 = 2^1 = 2 00000100 = 2^2 = 4 00001000 = 2^3 = 8 00010000 = 2^4 = 16 00100000 = 2^5 = 32 01000000 = 2^6 = 64 10000000 = 2^7 = 128 (usigned) or -128 (signed) 11111111 = 2^7+2^6+2^5+2^4+2^3+2^2+2^1+2^0 = 255 (unsigned) or -1 (signed)
A 16-bit system would extend this 8 more bits,
A 32-bit ... more bits
A 64-bit ... more bits
void DoSomething() { // ... your code to do something } // Do something 2^N times void DoSomething(int N) { if(N > 0) DoSomething(N-1); DoSomething(); } Jim Dempsey[/cpp]
i got it.Thank you for your explaination jim.
regards
vinoth
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page