- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I regularly use a CFD code and am hoping that upgrading to a 2 or 4 core XP64 or Vista64 system will greatly improve performance. I have the source code, but get upgrades every 6-12 months, so I don't want to add too many changes.
Are there compile options to do some auto-parallelization across cores? If so, can you give me someroughidea of what you've seen in terms of performance improvements? For example, if you see almost 4X increase in speed with a re-written code (which I definitely do not have!), do you see 2X or 3X increase by just turnign on compile switches? I know YMMV, but I'd like some info to help me decide whether paying the 4-core premium is worth it.
In addition, this code does a lot of I/O to scratch files, maybe a TB or so. My current cpu keeps up pretty well, but going to dual or quad core may just bring me to the I/O bottle neck. If I make a RAM drive (1 GB shoudl do it), will that help? Or will the program/OS think that any disk calls are slow and will not fully take advantage of running with RAM?
Finally, I think I saw that IVF9.1 doesn't support Vista64. Is this likely to change in the next few months? Or should I aim for a XP64 system? I'm looking at getting the new system in April or May, if it looks like I'll get a big performance boost, otherwise, I may try to wait another year.
Thanks, Jim
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There would be an ifort supporting Vista, and improvements in auto-parallelization for some cases, within the time you mention. A full ifort license covers any version you choose, new or old, during the license term. 64-bit XP, of course, is probably the most satisfactory current Windows version, likely suitable for your purpose.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your quick response. Regarding the scratch files - Unfortunately, I'm not using a commercial code. I'm stuck using a tool that is written for an extremelyspecific application. It's a 20 year old code that gets regular "performance enhancements" (slap in more code) from engineers with poor CS backgrounds. Welcome to the blunt , bloody edge of my field.
Any comment on a RAM drive? The scratch files get repeatedly rewritten, so I only need 1-2 GB to hold them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can imagine that C code written by people who don't care about style could be bad news, and difficult for a compiler to optimize. The argument against Fortran back then was that it could be used by such people, but at least it is less vulnerable to such questions.
I have a customer who has used exclusively Fortran since before Intel existed, and now is planning to add some C code which is considered well written, but still potentially difficult to optimize.
I suppose it should be easy to set up a RAM disk if using a 64-bit OS with the additional RAM present, if the I/O system doesn't succeed in cacheing it effectively. No doubt there are trickier ways of accomplishing it in 32-bit systems.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Depending on the nature of your application the use of a RAM drive is not always a significant plus. It has the potential of being a negative impact on performance. For example. Your code, I assume, is single threaded. Should you take some effort to introduce OpenMP into the application to introduce (visible)threading into the application, you could do so in a manner that assigns the I/O to a seperate thread. When the I/O is to a disk drive a significant portion of the data transport overhead is performed by hardware independent of the (a)CPU. There remains memory bandwidth loss but the (a) CPU is relatively unburdended. Now by parallelizing the code (e.g. using OpenMP) the processing time is reclaimed.
Not seeing your application it is hard to determine the actual bottleneck. If you were to use a profiler (e.g. VTune from Intel or CodeAnalyst from AMD) you could see where the bottlenecks were in your program.
From my experience, 20 year old code, tends to be written well for 20 year old platforms. i.e. 1MB of RAM may have been the upper limit. As such, apps with larger data requirements could only rely on the use of a disk to hold the complet working data set and then page in and out subsets of the data for processing. Your thought of using a RAMDISK would seem to indicate this is the case. If this is the case then I would suggest you look at reworking that portion of the code such that it uses pointers (introduced in F90) to reference a non-paged (buffered) and complete working dataset.
Two years ago, I made a similar change and saw over 10x improvement in performance. Since you have source, you can tweek it here and there.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page