Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4957 Discussions

drastic reduction in performance when compute node running at half load

NThek
Beginner
2,245 Views

We have compute nodes with 24 cores( 48 threads) and 64 GB RAM (2x32GB). When I run a sample code (matrix multiplication)in one of the compute node in one thread, it takes only 4 seconds. But when I starting more runs (copy of the same program) in the same compute node, the time taken increases drastically. When the number of programs running reaches 24 (I gave maximum 24 since physically only 24 cores are present), the time taken becomes like around 40 seconds ( 10 times less). When I checked the temperature, it is below 40 deg Celsius.

When I searched in the Internet about this issue, I found some people saying that it may be due to slowing down of transfer of data from ram to processor when we run many programs. I was not satisfied with this comment, because the compute nodes are designed to run at maximum load with out much decrease in performance. Also, we are using only 1GB of memory even with 24 programs running. Since we are getting performance reduction of about 1/10, I guess the problem is something else.

0 Kudos
11 Replies
idata
Employee
1,004 Views

Hello namshad,

 

 

Thank you for joining the Intel® Community.

 

 

Before starting with any troubleshooting could you please specify the node, chassis, processor, board, RAM, interconnects and OS models so we could have a better picture about this issue please?

 

 

We will be waiting for your info.

 

 

Regards.

 

 

Jose A.
0 Kudos
NThek
Beginner
1,004 Views

Hello,

Thank you for your prompt response.

The details of our compute node is as follows:

Chassis: Chenbro RM23608

Processor: E-2560 v4

Board: S2600

RAM: 2x32GB DDR4

We are using ROCKS 6.2 version.

In our lab, we mainly use serial codes only. Also, writing files are almost negligible. So, node to node communication is not important.

Right now I am checking the performance of individual compute nodes only. I am giving load to only one compute node.

0 Kudos
idata
Employee
1,004 Views

Hello namshad,

 

 

Thanks for the info. About the board could you specify which exact model is the one you have installed? there are quite a few S2600 models out there which I know they share the C600 chipset but its important for us to know what board we are talking about.

 

 

About the OS, ROCKS is not a supported one, CentOS it is so I assume you are using the CentOS drivers for this matter

 

 

I will be waiting for your info

 

 

Regards

 

 

Jose A.
0 Kudos
NThek
Beginner
1,004 Views

Hi,

Thanks you for your support.

We have intel S2600CW2R board. About the OS, of course we have CentOS (version 6.2) and we are using CentOS drivers.

Regards

Namshad T

0 Kudos
idata
Employee
1,004 Views

Hello namshad,

 

 

Thanks for the info provided.

 

 

We are trying to determine if this is somehow related to hardware or processor so we will try to discard hardware issues first.

 

 

What firmware level is this board running on?

 

Does this happen when running other tasks besides this matrix multiplication?

 

 

Could you please run this offline confidence test to discard hardware errors: https://downloadcenter.intel.com/download/25923/EFI-Platform-Confidence-Test-Utility-for-Intel-Server-Systems-and-Boards-Based-on-the-Intel-Xeon-Processor-E5-2600-v3-and-v4-Product-Family?product=88278 https://downloadcenter.intel.com/download/25923/EFI-Platform-Confidence-Test-Utility-for-Intel-Server-Systems-and-Boards-Based-on-the-Intel-Xeon-Processor-E5-2600-v3-and-v4-Product-Family?product=88278 ?

 

 

Could you please attach the board logs in order for us to look for errors? You can download it from here: https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=88278 https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=88278

 

 

Last but not least keep in mind that the Operating System is not validated. There is a possibility we would need to test this under a validated environment. For more info go to https://www.intel.com/content/www/us/en/support/articles/000006988/server-products/server-boards.html https://www.intel.com/content/www/us/en/support/articles/000006988/server-products/server-boards.html

 

 

Regards

 

 

Jose A.
0 Kudos
idata
Employee
1,004 Views

Hello namshad,

 

 

Do you have any updates, questions or comments in regards to this issue?

 

 

Please do not hesitate to contact us back.

 

 

If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.

 

 

Regards

 

 

Jose A.
0 Kudos
NThek
Beginner
1,004 Views

Sorry for the late reply.

I was trying to do all the test you specified. The confidence tests you suggested was successful. I tried to do Intel SEL viewer utility, but failed due to some errors. I am getting some errors while installing ncurses-lib-5.7-3.20090208.el6.i686.rpm. I am still working on it.

Since Ubuntu 16.04 was listed as a supporting OS, I have also installed ubuntu 16.04 in another compute node independently (which has the same configuration) in order to confirm that it is not problem with a specific operating system. So, now this compute node is running like an independent Desktop PC. But the performance issue is still there. So, I am guessing that it has to do something with the memory transfer to the RAM. Since I am using only 2 RAMs (32B each), if I am giving too many runs, the memory transfer between the processor and the RAM may be becoming slow. In our old cluster, we had compute nodes with only 16 cores with 2x8GB RAMs. In that one, even if we were giving full load, the performance was not reducing significantly. I would like to know about your opinion in this regards.

0 Kudos
idata
Employee
1,004 Views

Hello namshad,

 

 

We will do some consultations and will update you soon.

 

 

Regards

 

 

Jose A.
0 Kudos
idata
Employee
1,004 Views

Hello namshad,

 

 

Thanks for the patience about this issue. After elevating this case we got the request to gather logs using the EFI version of the log retrieval utility.

 

 

Also keep in mind that we have only Basic installation/compatibility with Ubuntu 16.4

 

 

Will wait for the info

 

 

Regards

 

 

Jose A.
0 Kudos
idata
Employee
1,004 Views

Hello namshad,,

 

 

Do you have any updates, questions or comments in regards to this issue?

 

 

Please do not hesitate to contact us back.

 

 

If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.

 

 

Regards

 

 

Jose A.
0 Kudos
idata
Employee
1,004 Views

Hello namshad,

 

 

We will proceed to mark this thread as resolved. If you have further issues or questions just create a new topic.

 

 

Jose A.
0 Kudos
Reply