Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28988 Discussions

How much work to take advantage of multi-core? + Ram drive

jimb2
Beginner
561 Views

I regularly use a CFD code and am hoping that upgrading to a 2 or 4 core XP64 or Vista64 system will greatly improve performance. I have the source code, but get upgrades every 6-12 months, so I don't want to add too many changes.

Are there compile options to do some auto-parallelization across cores? If so, can you give me someroughidea of what you've seen in terms of performance improvements? For example, if you see almost 4X increase in speed with a re-written code (which I definitely do not have!), do you see 2X or 3X increase by just turnign on compile switches? I know YMMV, but I'd like some info to help me decide whether paying the 4-core premium is worth it.

In addition, this code does a lot of I/O to scratch files, maybe a TB or so. My current cpu keeps up pretty well, but going to dual or quad core may just bring me to the I/O bottle neck. If I make a RAM drive (1 GB shoudl do it), will that help? Or will the program/OS think that any disk calls are slow and will not fully take advantage of running with RAM?

Finally, I think I saw that IVF9.1 doesn't support Vista64. Is this likely to change in the next few months? Or should I aim for a XP64 system? I'm looking at getting the new system in April or May, if it looks like I'll get a big performance boost, otherwise, I may try to wait another year.

Thanks, Jim

0 Kudos
4 Replies
TimP
Honored Contributor III
561 Views
The degree of performance improvement from auto-parallelization, how much effort you would need to tune options by subroutine, and the amount of additional improvement from relatively easy work on the source, are entirely dependent on the application. I suppose that a well written application (easily auto-parallelizable) should be expected to speed up by a factor of nearly 3 on a 2 socket dual core platform, maybe just under 2 on a single socket quad core. Questions such as whether it vectorizes, single or double precision, and how much time is spent in nested loops visible to the compiler, have a strong bearing on this. I find it hard to visualize a CFD application which spends much time writing scratch files, so won't comment, except to point out that high end Core 2 Duo platforms are available withWindows support for striping across 2 disk drives. Most commercial CFD codes have had effective support for threaded parallel operation for 5 years or more.
There would be an ifort supporting Vista, and improvements in auto-parallelization for some cases, within the time you mention. A full ifort license covers any version you choose, new or old, during the license term. 64-bit XP, of course, is probably the most satisfactory current Windows version, likely suitable for your purpose.
0 Kudos
jimb2
Beginner
561 Views

Thanks for your quick response. Regarding the scratch files - Unfortunately, I'm not using a commercial code. I'm stuck using a tool that is written for an extremelyspecific application. It's a 20 year old code that gets regular "performance enhancements" (slap in more code) from engineers with poor CS backgrounds. Welcome to the blunt , bloody edge of my field.

Any comment on a RAM drive? The scratch files get repeatedly rewritten, so I only need 1-2 GB to hold them.

0 Kudos
TimP
Honored Contributor III
561 Views
I guess I should have put plenty of blank lines between paragraphs, it sure came out as run-on stuff.

I can imagine that C code written by people who don't care about style could be bad news, and difficult for a compiler to optimize. The argument against Fortran back then was that it could be used by such people, but at least it is less vulnerable to such questions.

I have a customer who has used exclusively Fortran since before Intel existed, and now is planning to add some C code which is considered well written, but still potentially difficult to optimize.

I suppose it should be easy to set up a RAM disk if using a 64-bit OS with the additional RAM present, if the I/O system doesn't succeed in cacheing it effectively. No doubt there are trickier ways of accomplishing it in 32-bit systems.
0 Kudos
jimdempseyatthecove
Honored Contributor III
561 Views

Depending on the nature of your application the use of a RAM drive is not always a significant plus. It has the potential of being a negative impact on performance. For example. Your code, I assume, is single threaded. Should you take some effort to introduce OpenMP into the application to introduce (visible)threading into the application, you could do so in a manner that assigns the I/O to a seperate thread. When the I/O is to a disk drive a significant portion of the data transport overhead is performed by hardware independent of the (a)CPU. There remains memory bandwidth loss but the (a) CPU is relatively unburdended. Now by parallelizing the code (e.g. using OpenMP) the processing time is reclaimed.

Not seeing your application it is hard to determine the actual bottleneck. If you were to use a profiler (e.g. VTune from Intel or CodeAnalyst from AMD) you could see where the bottlenecks were in your program.

From my experience, 20 year old code, tends to be written well for 20 year old platforms. i.e. 1MB of RAM may have been the upper limit. As such, apps with larger data requirements could only rely on the use of a disk to hold the complet working data set and then page in and out subsets of the data for processing. Your thought of using a RAMDISK would seem to indicate this is the case. If this is the case then I would suggest you look at reworking that portion of the code such that it uses pointers (introduced in F90) to reference a non-paged (buffered) and complete working dataset.

Two years ago, I made a similar change and saw over 10x improvement in performance. Since you have source, you can tweek it here and there.

Jim Dempsey

0 Kudos
Reply