Microsoft Visual Studio 2010 & Intel® Visual Fortran exe runs slower than Linux ifortran exe does - Page 2

Bakhbergen · ‎09-11-2020

One of my former colleagues gave me exe file and Fortran source code for it in 2012. According to the Make file included with his project, the colleague created the exe file on Linux ifortran.

As a Windows user, recently I created exe file for the same code using Microsoft Visual Studio 2010 and Intel® Visual Fortran 2013. After fixing a few errors and warnings in Debug mode, I have produced Release (Win32) mode executable. I have found that my exe runs significantly slower than my colleague's one does.

I have lost contact with the colleague. Does anyone know what may be causing this? I would appreciate any help.

Bakhbergen · ‎09-13-2020

All, please find attached Release BuildLog.htm files for both projects. I hope you find them informative.

mecej4 · ‎09-13-2020

Both build logs contain this alert, which may merit your attention:

...\subroutine_019.f(768): warning #6371: A jump into a block from outside the block may have occurred. [150]
IF (CUMPV.LT.WHPV) GO TO 150

Bakhbergen · ‎09-13-2020

mecej4, I know. I have to go further with this warning for a while.

JohnNichols · ‎09-19-2020

...\subroutine_019.f(768): warning #6371: A jump into a block from outside the block may have occurred. [150]

You can get this for a error call on a read statement if it jumps to the error code line. These are challenges to eliminate some time.

Bakhbergen · ‎09-13-2020

My apologies for the duplicate post.

JohnNichols · ‎09-13-2020

You are chasing a furphy inside a zephyr. I run a single program on multiple computers across the world. We record the loop time of the run time, in each loop, a loop is about 8 seconds, the loop time is observed to be a non-Gaussian distribution, which is often the problem with data generated by computers. I will explain if required, we have probably 10 million records, if I normalize my results and your results then you are within what I would call expected limits.

Computers do a lot of things you do not see and that affects run time. Two runs one after the other can vary a lot.

You are wasting your time, get a better computer and a better compiler and a modern Version of Windows, say version 20211 and you will get better results.

Or modern Linux, although we are having trouble with Linux and ethernet issues, you will have the same issues, we see this on NUCs PI,s and DELLS.

JMN

Bakhbergen · ‎09-13-2020

JohnNichols, thank you for your opinion. The problem is that I am getting about one and half times slower performance with a newer compiler, newer version of Microsoft Visual Studio, etc. I am OK with the overall computation time ratio of 2 minutes vs 3 minutes of CPU time. But how about 2 days vs 3 days?

JohnNichols · ‎09-19-2020

Timing can be critical at 8 seconds, I suggest some write statements and see the hell where the slow down is -- write statements to a log file.

Bernard · ‎09-18-2020

@JohnNichols wrote:

You are chasing a furphy inside a zephyr. I run a single program on multiple computers across the world. We record the loop time of the run time, in each loop, a loop is about 8 seconds, the loop time is observed to be a non-Gaussian distribution, which is often the problem with data generated by computers. I will explain if required, we have probably 10 million records, if I normalize my results and your results then you are within what I would call expected limits.

Computers do a lot of things you do not see and that affects run time. Two runs one after the other can vary a lot.

You are wasting your time, get a better computer and a better compiler and a modern Version of Windows, say version 20211 and you will get better results.

Or modern Linux, although we are having trouble with Linux and ethernet issues, you will have the same issues, we see this on NUCs PI,s and DELLS.

JMN

Unfortunately (from the performance analyst perspective) that is true. On daily base I measure the performance of our L1-PHY (5G physical (upper) layer) simulation and I saw a huge variations of the same test module results gathered by VTune (perf collector, and sep5.ko collector). The distributions are not-normal and usually for 100 runs are of muli-modal type.

The main contributor of those huge variations is a performance measurement process itself, the other factors are mainly OS-kernel generated (context switching, thread migration, periodic apic timer activity, interrupt handling) and HW-oriented (voltage ramp up, frequency throttling, and other thermal events).

JohnNichols · ‎09-19-2020

I use two NUC's one with a core i3 - 6100 and one with a core i3 - 7100.

I have them installed in identical situations running identical code, every 8 seconds - never to stop

The 7100 has run perfectly for years, the 6100 hangs the programs at odd intervals from days to weeks. Even slight differences can show up problems. As I try and solve the conflict that stuffs the 6100 but not the 7100.

If I could replace the 6100 I would, but it means a 3 day drive.

Good hunting with your problem - I understand the frustration.

Bakhbergen · ‎09-19-2020

Thank you everyone for contributing to the discussion on this topic. Your answers and comments are very informative and helpful.

Bernard · ‎09-19-2020

The 7100 has run perfectly for years, the 6100 hangs the programs at odd intervals from days to weeks.

What type of hang is it? Does it require the reboot sequence or is it a process (your exe container) hang?

JohnNichols · ‎09-20-2020

The exe container is a watcher program - the main program is multithreaded - connected to a mysql database in the cloud and reading data from a source - it crashes from time to time on some machines, which is the reason for the standard watcher program. The problem with the 6100 is the watcher does not restart the main program from time to time, but now I have a machine that is doing this about every 24 hours, so I can start to look at the issue, before it might last a month, very hard to debug a monthly problem.

The real problem is the closer you get to NASA TRL 9 the fewer mistakes you are allowed and really if it makes a mistake it has to be self correcting. The 7100 has run for years the only thing that stops that is a power outage and it can recover from that - it does exactly the same thing as the 6100 and the set up is identical - I only use SAMSUNG SSD's, I only use NUC's --

I checked the temperatures, they are all within normal - but I cannot replace the 6100 without a 3 day drive and man I do not want to do that --

Bernard · ‎09-20-2020

the main program is multithreaded - connected to a mysql database in the cloud and reading data from a source - it crashes from time to time on some machines, which is the reason for the standard watcher program.

So this is a main program (process) hang. You can configure (I presume you are using Windows) the OS to collect the minidump file of the failed process and try to analyze the root cause in windbg (of course it may be very hard to find the culprit), but at least there will be some possibility to investigate little bit in-depth.

JohnNichols · ‎09-20-2020

No the main process stops completely - but the watcher in this instance does not restart it -- I have got it on a debug loop

jimdempseyatthecove · ‎09-21-2020

>>the main process stops completely - but the watcher in this instance does not restart it

Apparently your watcher needs debugging. Can you perform an attach to process from the debugger?

I've experienced instances in a multi-threaded program where the number of threads are oversubscribed .AND. where one or more threads are dependent upon a different thread for enabling progressing. The programmer in this case (me) took proactive programming to address this issue by inserting thread yield calls in the wait-for-other-thread-to-complete. The problem with this, as I discovered after lengthy investigations, is the thread yield (on Windows) appears to yield to threads that were preempted and not to threads that may have been pending on I/O. IOW the (formerly) pending on I/O threads would not resume under a condition that a full subscription of threads were spinning in thread yield loops.

The corrective measure was to use Sleep/SleepQQ for 0ms.

Jim Dempsey

JohnNichols · ‎09-22-2020

I started the watcher program inside VS in debug mode and the blasted thing has run for 2 days -- I wonder why this helps?

jimdempseyatthecove · ‎09-22-2020

Start the watcher program normally (i.e. .NOT. from VS)

Upon hang, then start MS VS | Debug | Attach to Process | Pick Watcher Process | Break

You may need to compile the watcher with Full Debugging (IOW Release build with debugging)
.AND. have the Linker .NOT. remove debug symbols.

Doing the above, you have the watcher executable .AND. runtime environment as-was the hanging environment.

Jim Dempsey

JohnNichols · ‎09-22-2020

Jim:

Thanks, I has used the attach process, but it did not give me anything, I will make your modifications. Teh NUCS are great computers, Intel scored a home run there.

JMN

Bernard · ‎09-23-2020

In addition to Jim's advise, you may also experiment with the windbg which is more powerful than VS debugger and has a lot of meta-commands builtin for your debugging convenience.

In your specific case you may use windbg and issue a !runaway 3 command (rather extension) in order to detect the kernel and user mode highest cycle consumers.

Here is the detailed description

https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/tracking-down-a-processor-hog

Bernard · ‎09-18-2020

You may try to asses the performance delta between those two executable by using Intel VTune profiler. The hotspots analysis shall suffice for now. It is very hard to know the real root cause of aforementioned performance delta just by looking at some absolute time indication. As @JohnNichols said the distribution might be and usually is not-normal and rather (as I measured is either log-normal or multimodal), so I would suggest to run at least 10 (be aware it is not enough!) profiling sessions for each executable and analyze the results.