Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29255 Discussions

Internal reads slower in IVF compared to CVF

time-steps
Beginner
1,355 Views
We have been converting our CVF 6.6 programs to IVF 10.1 and weve discovered that internal reads take more than twice as long in IVF.

The following code snippet executes in 7 seconds with CVF and in 14 seconds with IVF on our reference computer. Both compilers are set to default settings.

Is there any way to improve performance of IVF?

integer*4 i,j
character*(20) c
c='123'
do j=1,1000000
read(c,'(i)') i
end do
end

Thank you for your time.
0 Kudos
13 Replies
Steven_L_Intel1
Employee
1,355 Views
When I try this with CVF 6.6c and IVF 10.1.024 on my aging Pentium 4 system, I get 2.1 seconds for CVF and 1.8 seconds for IVF. If I modify the program to use the value of i outside the loop, I get 2.1 seconds for CVF and 2.4 seconds for IVF - not anywhere near the difference you saw.

Looking at the generated code, the IVF code is shorter and cleaner - but the real work is going on inside the run-time library, which evolved from the CVF library.

Is this really representative of your application?
0 Kudos
time-steps
Beginner
1,355 Views
Thanks for your hint about the run-time library. I compiled our test program in IVF against both the Multithreaded and the Multithread DLL run time library.

For the Multithreaded DLL, our performance numbers correspond with yours (ie. IVF marginally slower). However, when linking statically (Multithreaded), the program takes 4x as long to execute. Do you have any suggestions as to why this could happen?

Currently we have got IVF 10.1.021, but I will update and give it another try.
0 Kudos
Steven_L_Intel1
Employee
1,355 Views
Sure - if you specify the multithread library, it has to synchronize against possible I/O in another thread - these synchronization calls can be slow. I tried with the threaded library and saw a 2X difference with IVF and about a 1.7X difference with CVF. A newer version isn't going to change that.

If internal I/O is the bottleneck in your program, perhaps you should look at another way of accomplishing this.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,355 Views

Steve,

If internal reads are performing locks in order to get exclusive access to an internal formatting buffer then you might suggest to your compiler writers to use thread local storage for the internal formatting buffer. i.e avoid locks for internal reads.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,355 Views
The locks are to protect the data structures, not the buffer. Internal I/O is treated like normal I/O to a special unit number. We have to protect against access from another thread.
0 Kudos
time-steps
Beginner
1,355 Views
Just to summarize the problem:

1) We have two run-time libraries available for our console program (Fortran -> Libraries -> Runtime Library) in VS2005:
- Multithreaded (/libs:static /threads)
- Multithreaded DLL (/libs:dll /threads)
2) The Multithreaded Dll gives the expected performance for internal reads (on par with CVF).
3) The Multithreaded static library is four times slower.

What causes the difference in performance? We would prefer to link statically.
0 Kudos
Steven_L_Intel1
Employee
1,355 Views
I don't see that kind of difference on my system.
0 Kudos
time-steps
Beginner
1,355 Views
Well, thank you for your help. I'll keep searching for a reason for the difference and in the meantime we'll link to the DLLs.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,355 Views

Steve,

>>The locks are to protect the data structures, not the buffer. Internal I/O is treated like normal I/O to a special unit number. We have to protect against access from another thread.

Can you give a reasonable explination why (for internal I/O) this protection is warranted?

Afterall you could have

ArrayA = ArrayB

Being issued by one thread while another thread is issuing

ArrayA = ArrayC

By the requirement of having internal I/O perform locks you should then require locks on the array copy (and all the other statements that would give the apperance of an atomic operation).

If a programmer is performing internal I/O to a shared variable with proper user code synchronization then they will never have a conflicting use.

If a programmer is performing internal I/O to a shared variable without proper user code synchronization then they will occasionaly have a conflicting use.

When the conflict occures with lock on internal I/O then you will have a consistant but incorrect value in the shared buffer and you have a programming error.

When the conflict occures without lock on internal I/O then you may have an inconsistant and incorrect value in the shared buffer and you have a programming error.

In either case you have a programming error. At the point of error does consistancy matter?

If consistancy does not matter in a programming error situation then performance should trump consistancy and therefore internal I/O should be done without locks.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,355 Views
Jim,

The actual access of the user data is not synchronized by the library. That is up to the user to do. The I/O library has internal data structures used to keep track of I/O operations in progress and these structures are synchronized.
0 Kudos
Lorri_M_Intel
Employee
1,355 Views

Please try this:

Go to the Properties -> Fortran ->Command Line, and add

/reentrancy:none

to the command line

When the default library configuration went from "single threaded" to "multi threaded" there was a period of time we had the "reentrancy:threaded" attribute set too.

If I read between the lines correctly, you're not really doing any multithreaded stuff, and so don't need to worry about reentrancy.

- Lorri

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,355 Views

Got it,

Then the user should call CRT functions that do not lock or write there own.

It seems a pitty that internal I/O would modify internal data structures thus requiring a lock. A little work on the compiler writers could eliminate this (e.g. post pone the lock until you get deeper into the I/O routine where lock is required).

Jim Dempsey

0 Kudos
time-steps
Beginner
1,355 Views
Yes, in this case we are not doing any multithreaded operations here. When I add the compiler flag /reentrancy:none, I get the statically linked version to be as fast as the DLL version. But do I get a linking warning
LIBC.lib(crt0init.obj) : warning LNK4254: section '.CRT' (40000040) merged into '.data' (C0000040) with different attributes
0 Kudos
Reply