How can i speed up my running ?

milad_m_ · ‎06-14-2016

hi

I have a serial fortran code and use VS with intel parallel 2015. Running process use about 25 percent of my cpu. In C Sharp i used task rule simply for speed up but in fortran can't find any thing.I can use about 70 percent of cpu.

does any one can help?

with regard

TimP · ‎06-14-2016

Intel fortran has effective auto-parallel and auto-vectorizing compile options. Their effectiveness and the value of "modernization" optimization depends strongly on your application. Parallel studio includes several applications intended to help with that process.

If your cpu has hyperthreading, optimum performance may well leave some of those resources idle.

milad_m_ · ‎06-14-2016

thanks tim

But how can i change setting in vs for use all of cores/threads so that i can see maximum process of cpu in task manager during run?

John_Campbell · ‎06-15-2016

As Tim has indicated, you have two main approaches to speed up your program, but only one will increase the %CPU usage.

Vectorizing is by far the easier to achieve, but you typically need inner loops to be written so that vectors of calculation can be processed in SSE or AVX registers. This approach does not increase %CPU but does use better instructions to speed up the program. This can be easily achieved by using compiler option /O2, /Ofast or /Qxcode options. Vectorising is best considered as improving the speed of inner loops.

Multi thread or OpenMP coding will increase the %CPU, but it can involve more changes to your code to make it suitable. There are also run time overheads for initiating a !$OMP operating region, so it is recommended that you should place !$OMP selection around outer loops, rather than having the parallel region restarted many (thousands of) times inside nested DO loops or routine calls. Multi-threading is best considered as running large groups of code in parallel in multiple threads where these groups of calculation can act independently on separate parts of memory. You need to modify your code to implement this approach.

While Tim indicated Intel Fortran has effective auto-parallel (/Qparallel) and auto-vectorizing (/O2 or /Ofast) compile options, you need some understanding of what these approaches are trying to achieve. The auto features are only effective for suitable coding approaches, doing little for code that is not suitable. You will probably have to change your coding approach to best achieve these outcomes, as both these methods can not be applied to all coding approaches.

Do some more reading and use some of the simpler examples that are provided to understand what can be achieved.

I think this is a very good question as the goal of Intel Fortran needs to be a more effective auto-parallel and auto-vectorizing solution, directed at users that are not familiar with the detail of what is required.

andrew_4619 · ‎06-15-2016

CPU %use is a measure but what is more important is your actual run-time. CPU use might be low because (for example) the bottleneck in your program is reading/writing data to files. You need to understand what it the bottlenecks in your program are. You could consider using tools such a Intel Vtune to analyse your program to fins what the bottlenecks actually are.

Changes in algorithm/program structure (if feasible) can often yield far bigger performance increases than vectorisation/parallelisation.

andrew_4619 · ‎06-15-2016

I should also have noted your question is very very general and the answers given can only be general. You might want to give some more specifics about your program.

milad_m_ · ‎06-16-2016

Thanks all.It is better to ask my question in this way.

This is a simple code:

integer i,j,k
double precision time,c
time=0
c=0
call CPU_TIME(time)

do while (c<300000)
c=c+1
print*,c
enddo

call CPU_TIME(time)
print*,'time',time
pause
end

Run time=2.95 sec and cpu usage 25%

Fortran optimization is : Maximize Speed plus Higher Level Optimizations (/O3)

So how can run this serial code in less time(with change in code)?

Is there any setting in vs available for speed up?

my source code used for fluid dynamics, has run time about 3 min and for 100 times is 300 min.So my concern is speed up running ( in less time) as far as possible.

do change number of processor for speed up can help in serial code? or it must be parallel code to gain this goal?

Because i'm new with vs i would appreciate if proposed simple references and Opinions .

thanks

jimdempseyatthecove · ‎06-16-2016

The print *,c will make your program I/O bound, as well as serialize that section (print statement) should your (actual) code be parallelizable.

Your code above is a serial application, it will consume 100% (or less) of one of the hardware threads on your system (apparently there are 4 hardware threads available).

In fluid dynamics problems you generally have a 3D (or 2D) model where you:

iterate{compute forces, perform movements}

You obtain performance improvements through parallelization. While auto-parallelization can be used over portions of this problem, specific parallelization will be more effective. What you generally do is to partition the model space into sub-spaces that each hardware thread can work on individually. *** Caution, or should I say, learning experience for noob parallel programmers. The boundaries between subspaces require special care such that you do not have multi-thread race conditions when updating, for example, the net force.

There are several examples of N-body problems where you can see different ways on how to avoid (or protect against) race conditions at sub-section boundaries.

Jim Dempsey

Steven_L_Intel1 · ‎06-16-2016

This is a pretty silly program to be trying to optimize. Almost all of the elapsed time will be taken up by the PRINT. The value of C after the loop is a constant.

Don't waste your time trying to analyze performance of toy programs. Concentrate instead on your real application.

As a general comment, there is a /fast option that applies a set of options that usually improves performance. I don't think there's a single property for this but you could look at the description in the documentation to see what it does. Among these are /QxHost (Optimize for host processor) and /Qipo (Whole-program interprocedural optimization).

milad_m_ · ‎06-16-2016

Thanks Jim and steve.

I will use your opinions about code and settings.