Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28996 Discussions

Stepwise calculation for large datasets in Fortran

bhvj
Beginner
810 Views

Hi,

I am trying to do stepwise calculations, for each day, by using an equation in a Fortran program, using the datasets, as in the attached (each dataset has 18262 values corresponding to 18262 days). The output of the program would be another new dataset in which the two datasets are combined per the equation (as an example 0.5*TC1+ 7*TC2), for each day.

What would be the best way to approach this? Would it be a good idea to create two large arrays of size 18262 each for the input, read it into them, and create one large array of size 18262, and write the output into this array?

Any suggestions or insight into this would be greatly helpful.

Thank you.

 

0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
810 Views

And then after you have the 18262 results, what do you intend to do with them?

If the answer is nothing but write them out, then the answer would be to:

read line (header) of tc1 and verify it is as expected
read line (header) of tc2 and verify it is as expected
loop:
read line of tc1, exit loop on EOF, scan for TAB character, remember position
read of tc2, error if EOF, scan for TAB character, remember position
if text to left of tab in line of tc1 not same as text to the left of tab in line of tc2 error
(optional, if texts do not look like dates, error)
(optional, if gap in dates, error)
convert text to right of tab in line of tc1 to REAL being careful to note you have 0's not 0.'s, error when not number
convert text to right of tab in line of tc2 to REAL being careful to note you have 0's not 0.'s, error when not number
perform operation on the two reals, producing result
write line to output file in desired format
goto loop

Note, tab formatted files are generally not desired in Fortran. If the results file is intended to be exported to Excel, you can insert tab characters (or you could use commas).

Jim Dempsey

0 Kudos
mecej4
Honored Contributor III
810 Views

Unless this is an assigned task with the stipulation that you must solve it by writing Fortran code, you can use a variety of tools that can do trivial tasks such as this with much less code. Here is, for example, an AWK script ("gawkf") to do the job:

BEGIN {getline ; getline < "tc2.txt";} # header lines
{
  d1=$1; t1=$2;                                 # data in tc1.txt
  getline < "tc2.txt";
  d2=$1; t2=$2;                                 # data in tc2.txt
  if ( d1 != d2 ) {
     print "Error, dates do not match on line ",NR," ",d1," ",d2
     exit 1
     }
  printf("%10.10s %5.2f\n", $1,t1+t2);
  }
END {}

The command gawk -f gawkf tc1.txt > tc3.txt will place the results in file tc3.txt. If the fields are separated by tabs instead of blanks, add the option -F "\t" to the command.

0 Kudos
bhvj
Beginner
810 Views

Thank you Jim for your detailed explanation for the approach. I am still considering the above two options for the approach.

Attached herewith is the Fortran code file where I have implemented that I have understood so far. Can you please also suggest me further in this regard, thereby I could better understand the explanation? Thank you once again for your valuable suggestions.

Thank you mecej4 for the shell script. Since this is going to be part of an overall Fortran program, I can use this shell script by calling in, in the overall Fortran program using the SYSTEM function right?

But I have the following two tasks further in this module, which I haven't mentioned for the simplicity of the post, but if there is way that these two tasks could be accomplished within this shell script, then this module could be called by the overall Fortran program using the SYSTEM function:

1. Is there a way within this shell scripting a value can be looked up (based on both row and column as index) from another table (tc_coeff.txt, as attached in "shell_script_tc_test.zip") and can be multiplied as coefficient to tc1 and or tc2 in the equation?

2. Is there a way within this shell scripting to do a 5-day moving average for the output result and get the final output as daily values?

Lastly, I tried to run the shell script, as in the attached ("shell_script_tc_test.zip"), in bash, and it gave me the following error (would this script be specific to the shell that is being run in?)

#./tc-test.scr: line 1: syntax error near unexpected token `}'
#./tc-test.scr: line 1: `BEGIN {getline ; getline < "tc2.txt";} # header lines '

As always thank you very much for your valuable suggestions.

 

0 Kudos
mecej4
Honored Contributor III
810 Views

As I clearly stated in #3, the script that I posted is an AWK/GAWK script, not a shell script. See http://en.wikipedia.org/wiki/AWK 

However, since you now describe the real goal to be much more complex than that stated in the original post, I think that you may be better off sticking to Fortran. Secondly, although it is wise for a beginner to start out with a simple program and then add more capabilities, there are often times when the design of the program is no longer fit to meet the added requirements, and a fresh start is indicated.

It is for this reason that some may refrain from answering questions such as "what is the best way...". if there are doubts that the problem description is over-simplified and incomplete. Just as with my AWK script, a well-meant attempt to help will miss the mark.

0 Kudos
bhvj
Beginner
810 Views

Thank you for the clarification regarding the AWK/GAWK script, knowledge of this kind of tool and therefore your reference to it, is very helpful for me, in the kind of research we do, because of its simplicity. 

Most importantly thank you for the suggestion regarding the approach for this task. Henceforth, I will try to include all the elements of the problem description in any of my request for help in the posts.

As regards my two questions (looking up a value to use in an equation, and the moving average calcs), I believe I definitely need some help as to how to proceed. Any suggestions and pointers would be greatly helpful.

Attached herewith is a spreadsheet, the calculations of which I am trying to code in Fortran, in order to enable similar kinds of calculations, which are needed to be done several times (50 to 200 times), just so as to avoid using Excel Macros.

You had previously helped me with reformatting an input (also 50 to 200 times) feeding to another Fortran executable.

I am working on using a shell script to do all this together in Linux (run the Fortran executable and these Excel calcs) 50 to 200 times.

The pieces that are yet to be deciphered, as to how to do this in Fortran are the lookup table and the moving average. Any suggestions and pointers would be greatly helpful.

0 Kudos
mecej4
Honored Contributor III
810 Views

You are not going to get the job done if you keep asking for ways to implement specific features that are present in Excel (or another such productivity destruction tool) using Fortran. There is no such one-to-one mapping, by and large. For example, table look-up is not a built-in feature of Fortran. On the other hand, the task that you have here is rather simple, can be implemented in about 100 lines of Fortran, and the program will take about 1 second to perform the task and output a file of moving averages for each input date. By today's standards, a 20,000 line data file is "small" rather than a "large dataset".

My suggestion is that you improve your grasp of the steps that are needed. One way to do this is to prepare a small spreadsheet (no, not in Excel, but on real sheets of ruled paper spread on a large tabletop) with, say, ten rows, and do the calculations by hand. After you have done this, write code in Fortran to do the same thing. While you are doing this, have Fortran books and manuals readily available and consult them often.

At first, limit the program to processing a small number of rows. After the program runs correctly, alter it to process larger input data sets and incorporate checks on the correctness of the input data -- for example, if the coefficients are given for years 2001 to 2050, but the TC data contains xx/xx/1945, what should the program do?

0 Kudos
bhvj
Beginner
810 Views

Your comments are extremely helpful. You exactly described my situation the way it is. I will now focus on putting down everything on paper, write a draft program, first (even though it might not be an efficient program), making sure the results are correct, and then I'll consult you to check if there are ways to improve the efficiencies of the program. Your opinion that this task is rather simple, makes me feel better. Also the rough idea about the number of lines of code, and the time it should be taking to run, is also very helpful.

As always thank you very much for your valuable responses.

0 Kudos
bhvj
Beginner
810 Views

Hi

    Attached herewith is a draft program for the lookup table and also attached is the table. In this program I tried to create a subroutine with the following four arguments: 1. the first column values ("yrin"), 2. the index value for column header, 3. the first column ("year") vector, and 4. the values array ("param"). It appears that I am having trouble at line # 12, and line # 15, where I am trying to read in 1. and 2. from the table.

Any suggestions or pointers with this draft program will be greatly helpful.

Thank you.

0 Kudos
Reply