- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have compute nodes with 24 cores( 48 threads) and 64 GB RAM (2x32GB). When I run a sample code (matrix multiplication)in one of the compute node in one thread, it takes only 4 seconds. But when I starting more runs (copy of the same program) in the same compute node, the time taken increases drastically. When the number of programs running reaches 24 (I gave maximum 24 since physically only 24 cores are present), the time taken becomes like around 40 seconds ( 10 times less). When I checked the temperature, it is below 40 deg Celsius.
When I searched in the Internet about this issue, I found some people saying that it may be due to slowing down of transfer of data from ram to processor when we run many programs. I was not satisfied with this comment, because the compute nodes are designed to run at maximum load with out much decrease in performance. Also, we are using only 1GB of memory even with 24 programs running. Since we are getting performance reduction of about 1/10, I guess the problem is something else.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
Thank you for joining the Intel® Community.
Before starting with any troubleshooting could you please specify the node, chassis, processor, board, RAM, interconnects and OS models so we could have a better picture about this issue please?
We will be waiting for your info.
Regards.
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for your prompt response.
The details of our compute node is as follows:
Chassis: Chenbro RM23608
Processor: E-2560 v4
Board: S2600
RAM: 2x32GB DDR4
We are using ROCKS 6.2 version.
In our lab, we mainly use serial codes only. Also, writing files are almost negligible. So, node to node communication is not important.
Right now I am checking the performance of individual compute nodes only. I am giving load to only one compute node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
Thanks for the info. About the board could you specify which exact model is the one you have installed? there are quite a few S2600 models out there which I know they share the C600 chipset but its important for us to know what board we are talking about.
About the OS, ROCKS is not a supported one, CentOS it is so I assume you are using the CentOS drivers for this matter
I will be waiting for your info
Regards
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks you for your support.
We have intel S2600CW2R board. About the OS, of course we have CentOS (version 6.2) and we are using CentOS drivers.
Regards
Namshad T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
Thanks for the info provided.
We are trying to determine if this is somehow related to hardware or processor so we will try to discard hardware issues first.
What firmware level is this board running on?
Does this happen when running other tasks besides this matrix multiplication?
Could you please run this offline confidence test to discard hardware errors: https://downloadcenter.intel.com/download/25923/EFI-Platform-Confidence-Test-Utility-for-Intel-Server-Systems-and-Boards-Based-on-the-Intel-Xeon-Processor-E5-2600-v3-and-v4-Product-Family?product=88278 https://downloadcenter.intel.com/download/25923/EFI-Platform-Confidence-Test-Utility-for-Intel-Server-Systems-and-Boards-Based-on-the-Intel-Xeon-Processor-E5-2600-v3-and-v4-Product-Family?product=88278 ?
Could you please attach the board logs in order for us to look for errors? You can download it from here: https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=88278 https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=88278
Last but not least keep in mind that the Operating System is not validated. There is a possibility we would need to test this under a validated environment. For more info go to https://www.intel.com/content/www/us/en/support/articles/000006988/server-products/server-boards.html https://www.intel.com/content/www/us/en/support/articles/000006988/server-products/server-boards.html
Regards
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
Do you have any updates, questions or comments in regards to this issue?
Please do not hesitate to contact us back.
If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.
Regards
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the late reply.
I was trying to do all the test you specified. The confidence tests you suggested was successful. I tried to do Intel SEL viewer utility, but failed due to some errors. I am getting some errors while installing ncurses-lib-5.7-3.20090208.el6.i686.rpm. I am still working on it.
Since Ubuntu 16.04 was listed as a supporting OS, I have also installed ubuntu 16.04 in another compute node independently (which has the same configuration) in order to confirm that it is not problem with a specific operating system. So, now this compute node is running like an independent Desktop PC. But the performance issue is still there. So, I am guessing that it has to do something with the memory transfer to the RAM. Since I am using only 2 RAMs (32B each), if I am giving too many runs, the memory transfer between the processor and the RAM may be becoming slow. In our old cluster, we had compute nodes with only 16 cores with 2x8GB RAMs. In that one, even if we were giving full load, the performance was not reducing significantly. I would like to know about your opinion in this regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
We will do some consultations and will update you soon.
Regards
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
Thanks for the patience about this issue. After elevating this case we got the request to gather logs using the EFI version of the log retrieval utility.
Also keep in mind that we have only Basic installation/compatibility with Ubuntu 16.4
Will wait for the info
Regards
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,,
Do you have any updates, questions or comments in regards to this issue?
Please do not hesitate to contact us back.
If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.
Regards
Jose A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello namshad,
We will proceed to mark this thread as resolved. If you have further issues or questions just create a new topic.
Jose A.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page