Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28381 Discussions

Problem with variable that change their values without apparent reason

Matteo
Novice
7,779 Views
I would propose a part of my code:INTEGER :: n,a,b
REAL*8 :: c
LOGICAL, ALLOCATABLE, DIMENSION(:,:) :: pop, new_pop
LOGICAL, ALLOCATABLE, DIMENSION(:,:,:;:) :: position
!
OPEN(10, FILE = 'printout.txt')
n = 150
ALLOCATE(pop(1:n,1:n))
ALLOCATE(new_pop(1:n,1:n))
ALLOCATE (position(1:n,1:n,1:n,1:n))
pop(:,:) = .FALSE.
position(:,:,:,:) = .FALSE.
! some assignation to pop matrix
! position should trace the assignation, but it is not important for this example
pop(2,5) = .TRUE.
pop(75,1) = .TRUE. !...ans so on
DO a=1,n
DO b=1,n
IF (a.gt.100.and.b.lt.50) THEN
CYCLE
ELSE
CALL random_number(c)
IF (x.gt.0.5d0) THEN
new_pop(a,b) = .TRUE.
ELSE
new_pop(a,b) = pop(a,b)
END IF
END IF
END DO
END DO
! I would write that
DO a=1,n
WRITE(10,1000) pop(a,:)
END DO

It is old fashion, but it should work.
Inside the two DO cycle "pop" is never on the left side of the assignment.
Anyway, when I try to write, "pop(128,1)" doesn't report the correct value. In fact, it should have the initial value .FALSE., but in debug I found .TRUE.

I fear it can be linked to the memory.
In fact, I use a 32bit compiler on Windows 7 (Intel Fortan 10.x) and I fear Fortran can make some mess with the memory management (these are the only variables and the arraies are very large).
I should set better my compiler?

Regards,
Matteo
 
Labels (2)
0 Kudos
42 Replies
Matteo
Novice
2,061 Views
Hi @mecej4,
thank you for your improvements.
I am replying by my mobile now, buy I am going to run your code as soon as possible.
You suggested to improve my algorithm before managing the code.
I accept your advice surely.
Anyway I opened this post not to improve the code speed, but to understand the issue that change the pop values where this variable is not called at left side of the assignment.
When I will be sure the code works properly, I will have the time to move my mind to optimization activities
0 Kudos
mecej4
Honored Contributor III
2,050 Views

Matteo,

I look forward to hearing from you regarding whether the modified code results appear correct to you.

Regards.

0 Kudos
Matteo
Novice
2,022 Views

Hi @mecej4 ,

thank you for the fantastic job. 
Now the code runs as Usain Bolt.
Let' me to ask for some clarifications in order I can learn something.

  1. integer, parameter :: nvm = 70  ! maximum number of connections for each (a,b)

    Why did you fix the maximum number of connection to 70?

  2. nv = nvir(a,b)

    I understood nvir track how many contacts pop(a,b) has added in this day. It should be equal to (usc+1) for any values at the end of j-day. Is it right?

  3. IF (r(2).ge.ymin.and.r(2).le.ymax) THEN
                               nv = nvir(a,b)
                               found = .false.
                               if(nv /= 0)call search(r1(1,a,b),r(1),r2(1,a,b),r(2),nv,found)
                                  added=.false.
                                  if(.not.found)then
                                     added = .true.
                                     nv = nv+1
                                     if(nv > nvm) then
                                        print *,'At a,b = ',a,b
                                        stop 'nvir1 limit'
                                     endif
                                     nvmu = max(nvmu,nv)
                                     nvir(a,b) = nv
                                     r1(nv,a,b) = r(1)
                                     r2(nv,a,b) = r(2)
                                  endif
                                  nv = nvir(r(1),r(2))
                                  found=.false.
                                  if(nv /= 0)call search(r1(1,r(1),r(2)),a,r2(1,r(1),r(2)),b,nv,found)
                                  if(.not.found)then
                                     nv = nv+1
                                     if(nv > nvm) then
                                        print *,'At r1,r2 = ',r
                                        stop 'nvir2 limit'
                                     endif
                                     nvmu = max(nvmu,nv)
                                     nvir(r(1),r(2)) = nv
                                     r1(nv,r(1),r(2)) = a
                                     r2(nv,r(1),r(2)) = b
                                     added=.true.
                                  endif

    I hope I understood your smart logic. In this way you scroll a shorter array.
    I would to be sure I explained correctly the sense of this check in my mind.
    For each day, pop(a,b) can visit only one time pop(r(1),r(2)).
    If pop(a,b) has already visited pop(r(1),r(2))pop(r(1),r(2)) cannot visit pop(a,b) during the same day.

  4.                                   IF (c.le.cont_rate) THEN
                                      ! Calcolo R0 e propago virus
    								     pop_new(a,b) = .TRUE.
    									 pop_new(r(1),r(2)) = .TRUE.
                                         IF ((.not. pop(a,b)).and.(immuni(a,b).eq.0)) THEN
                                                R0(r(1),r(2)) = R0(r(1),r(2)) + 1
                                                !pop_new(r(1),r(2)) = .FALSE.    !CNXS add?
                                         ELSEIF ((.not. pop(r(1), r(2))).and.(immuni(r(1), r(2)).eq.0)) THEN
                                                R0(a,b) = R0(a,b) + 1
                                                !pop_new(a,b) = .FALSE.          !CNXS add?
                                         END IF
                                      END IF

    I did a modification: both pop(a,b) and pop(r(1),r(2)) switch to .TRUE., so we are sure it works.

  5. You improved confina. It should be faster now, true?
  6. You modified cosd in cos. Is it better not use trigonometric function with degree arguments?

About the results.
At the beginning of the day the 4 infected people are:

Cattura.PNG

So, in my algorithm no pop inside the square (100:150, 1:51) should be infected, since no pop equal  .TRUE. is in that box from the beginning and "external" pop cannot go inside that block.
Anyway, in res.txt you can find a 1 value for pop(150,13).
I sowed some breakpoints with condition.
In particular:

 

CALL random_number(c) 
!breakpoint for ((a.eq.150).and.(b.eq.13)).or.(r(1).eq.150).and.(r(2).eq.13))

 

and I found:

Cattura_2.PNG

I don't understand why pop(r(1),r(2)) is .TRUE.

Anyway, you can find in attachment the whole project.
You find another breakopint, similar to the previous one in:

 

if(mod(k,5) == 0)then
!(a.eq.150).and.(b.eq.13)

 

but it is not triggered before the previous one.
I modified something: in prt_pop, pop was not a logical variable and I insert the writing of some file, which I need for my plots.
The "res" files that you can find have been generated by the Release configuration. It seems they found different results, but the same issue.

Regards

 

 

0 Kudos
mecej4
Honored Contributor III
2,009 Views

Matteo,

I think that there are some errors in the Monte Carlo procedure.

  • There is confusion between the numbering of nodes and grid lines. If you have a grid of n X n squares, you would have used (n+1) X (n+1) grid lines to form those squares. The squares have subscripts (1:n,1:n), but the grid points should have subscripts (0:n,0:n) -- or, if you choose, (1:n+1,1:n+1). One evidence of this confusion: The call to linspace() returns X = (1, 51, 100, 150) for the bounding coordinates of the 9 sections. Should that be (0,50,100,150)? If you have nonuniform spacing, the Monte Carlo results will be affected by this bias. However, with a little careful work, making the distinction between grid line coordinates and node numbers, this defect can be fixed.
  • The selection of a new point is  made using a pair of random numbers -- the first one normally distributed, which is scaled (by a factor of ~41) to produce a radius, and a second one, uniformly distributed, to produce an angle between 0 and 2 Pi. You take the absolute value in obtaining the radius, but that is not enough. If the point (a,b) is close to an edge or a corner of the limiting box, adding (r cos Phi, r sin Phi) to (a,b) may well put the new position in a different limiting box, or even outside the overall grid. This is a serious issue, and I do not know enough about the physical model to suggest a fix.

I added a few lines of code to monitor crossing the 9 box boundaries, and found that a violation occurred at the end of day 15, at node (a,b) = (36, 51). This violation precedes the one that you noticed.

I prefer to ask the computer to do such checks by adding code to the program rather than using the debugger and setting breakpoints, which is much more time consuming as well as running the risk of letting errors go unnoticed because, in reality, we have to watch all 150 X 150 values in a number of arrays for each day.

There are a number of your questions regarding the code that I have yet to answer, but I think that they are of less importance and can be answered quite easily later.

Do you have a verbal/mathematical description of the problem and the procedure? I should prefer a precise-in-Italian version over a vague-in-spotty-English version. Having such a description would allow me to overcome the handicap of guessing the problem and algorithm based on 

0 Kudos
mecej4
Honored Contributor III
2,007 Views

Last bit, which got cut off, and posted in a new post because the forum software does not let me edit my posts:

… based on reading the Fortran program.

0 Kudos
Matteo
Novice
1,997 Views

@mecej4 ,

my reply: 

  • I didn't catch the point.
    Ok, linspace could work differently, but that's not the origin of the problem.
    In fact, it doesn't matter how the boundaries are created and their shape.
    The important aspect is they are not violated during the computation.

  • Yes. It's a polar formula to create a displacement vector to move (a,b,) to (r(1),r(2)).
    This displacement vector can push (a,b) out of the matrix (n,n), of course.
    For this reason, at each time, two IF check r(1),r(2) is inside, at least, in the (n,n) (i.e. xmin,xmax,ymin,ymax). 
    Isn't it computationally efficient? Ok, but I can be satisfied at the moment

Sorry for my poor knowledge of English.
I don't have even a draft of the algorithm at the moment, since it is not for professional use. I will write something in Italian in the next few days and attach it.

Regards.

0 Kudos
mecej4
Honored Contributor III
1,989 Views

Matteo, 

I have found the reason for POP being set to .TRUE. at some nodes in zones where that is not expected.

Consider this part of the code:

! Calcolo R0 e propago virus
   IF ((.not.pop(a,b)).and.(immuni(a,b).eq.0)) THEN
          R0(r(1),r(2)) = R0(r(1),r(2)) + 1
          pop_new(a,b) = .TRUE.
   ELSEIF ((.not. pop(r(1), r(2))).and.(immuni(r(1), r(2)).eq.0)) THEN
          R0(a,b) = R0(a,b) + 1
          pop_new(r(1),r(2)) = .TRUE.
   END IF

Consider what happens the IF clause is evaluated when (a,b) is in one of the "forbidden" zones. We have already compared a with (xmin, xmax) and b with (ymin, ymax), and pop(a,b) will be .FALSE. However, if immuni(a,b) == 0, that THEN..ELSEIF block will be executed, and pop_new(a,b) will be set to .TRUE. Similarly for pop_new(r(1),r(2)) in the ELSEIF..ENDIF block.

After the DO a = 1,n and DO b = 1,n loops are done, we are going to copy pop_new to pop, and the inconsistency will have been created.

Thus, there is an inconsistency between the expectation that a zone with pop=.F. everywhere inside that zone will remain infection-free, and the update/evolution rules in the block of code above.

I hope that these observations will help you to improve the update rules so as to preserve the infection-free zones as such.

I tested a modification of the code in which the IF condition also includes "and (a,b) is not in an infection-free zone" and the ELSEIF condition also includes "and (r(1),r(2)) is not in an infection-free zone". The code ran more slowly (1 minute instead of a few seconds), because the number of rejected steps increases quite a bit. Furthermore, this modification may be equivalent to "sweeping the dirt under the carpet".

0 Kudos
Matteo
Novice
1,977 Views
Hi mecej4,

Thank you for your reply.
Maybe my IF system is wrong.
Theoretically a,b and r are checked to be in their own box before to propagate the infection.
Now I am far from my PC, but I will check asap
0 Kudos
mecej4
Honored Contributor III
1,581 Views

Matteo,

I tweaked the logic for keeping the current location from crossing the section boundaries, and added code to produce JPEG files for the bitmaps. The source code and the composite of the 60 Jpeg files (for days 1 to 60) are attached. I used Silverfrost FTN95 and Clearwin graphics to do the Jpeg encoding.

To run the 60-day simulation and output the Jpeg files took about 10 s of CPU time on an i7-2720.

Spread of infection, Days 1 to 60Spread of infection, Days 1 to 60

0 Kudos
Matteo
Novice
1,971 Views

@mecej4 ,

I think to have just found the solution.
I am organizing better this code, when I modified this IF condition.

IF ((r(1).gt.xmin.and.r(1).lt.xmax).and.(r(2).gt.ymin.and.r(2).lt.ymax)) THEN

Before this modification, as you remember, I applied .ge. and .le. 
In this way infected pop on the boundaries, exactly on the boundaries, can cross the infection between two sectors.
I don't know why it escaped from previous debugging.
Now it seems the code works properly and I payed attention to the pop initialization to verify not infected people is living on the boundaries.

At day 60, only 4 sector are contaminated, while the other sectors are infection free.
Now it is late, but I could run strictly test tomorrow.

In the attachment you can find the final version of the code.
If you want to run it, you could confirm my suppositions.
Anyway thanks a lot for the support that you gave in these days.

Regards.

Matteo

 

0 Kudos
mecej4
Honored Contributor III
1,954 Views

Matteo,

These errors are present in your poptrack_rev1.f90:

1. The order of the subscripts of arrays R1 and R2 is not consistent between the main program and subroutine SEARCH. It is more efficient to have the order (seq_no, x-coord, y-coord).

2. In SEARCH, you are replacing the last entries instead of adding two new entries, since you did not update nvir(a,b) and nvir(r(1),r(2)).

I fixed these errors, and added code to trap attempts to set pop(:,:) at locations within blocks that are required to be "uninfected", i.e., those that have no nodes within them where pop = .TRUE. I have also cleaned up the formatting of the code, and made the subroutines CONTAINed subroutines. The source file is attached.

The algorithm error is not only still present, but occurs quite early in the run for this version of the code. The error is in the lines

!
      IF (c.le.cont_rate) THEN
! Calcolo R0 e propago virur
         IF ((.not. pop(a,b)).and.(immuni(a,b).eq.0)) THEN
            pop_new(a,b) = .TRUE.
            R0(r(1),r(2)) = R0(r(1),r(2)) + 1
         ELSEIF ((.not. pop(r(1), r(2))).and.(immuni(r(1), r(2)).eq.0)) THEN
            pop_new(r(1),r(2)) = .TRUE.
            R0(a,b) = R0(a,b) + 1
         END IF
      END IF

It is not enough to check that pop(a,b) or pop(r(1),r(2)) is false before updating pop_new. We must also check and ensure that no other node in the same block as (a,b) or (r(1),r(2)) has pop = true.

0 Kudos
mecej4
Honored Contributor III
1,951 Views

CORRECTION: 

Please change

"We must also check and ensure that no other node in the same block as (a,b) or (r(1),r(2)) has pop = true."

to 

"We must also check and ensure that at least another node in the same block as (a,b) or (r(1),r(2)) has pop = true."

0 Kudos
Matteo
Novice
1,941 Views

Hi @mecej4 ,

concerning r1 e r2 dimension.
In the PROGRAM is so identified:

 

allocate(nvir(1:n,1:n), r1(1:nvm,1:n,1:n), r2(1:nvm,1:n,1:n),stat=error(8))

 

and inside the SUBROUTINE is so defined:

 

integer(kind=2), dimension(1:nvm,1:n,1:n), intent(inout) :: r1,r2

 

I think it respect your suggestion.

In search there was the bug you found. In fact, I spotted that only after the previous post issuing.
Not it is correct.

About this part of the code:

IF (pop(a,b).eqv.pop(r(1),r(2))) THEN ! entrambi infettati o no
                                CYCLE
                               END IF
                               CALL random_number(c)
                               IF (c.le.cont_rate) THEN
                                  ! Calcolo R0 e propago virur								   								 
                                     IF ((.not. pop(a,b)).and.(immuni(a,b).eq.0)) THEN
									        pop_new(a,b) = .TRUE.
                                            R0(r(1),r(2)) = R0(r(1),r(2)) + 1
                                     ELSEIF ((.not. pop(r(1), r(2))).and.(immuni(r(1), r(2)).eq.0)) THEN
											pop_new(r(1),r(2)) = .TRUE.
                                            R0(a,b) = R0(a,b) + 1
                                     END IF
                               END IF

 

You suggested further check.
As I said, pop(r(1),r(2)) is checked to be in the same sector/block of pop(a,b) at the beginning of IF lists.
(I corrected the check and it seems to work greatly).

If both are TRUE, they are already infected, so they cannot infect each other.
If both are FALSE, they are sane and they cannot infect each other.
So, the random propagation is computed only when a person is sane and the other no.
After that, I don't know who is infect. Anyway, if pop(a.b) is FALSE, it means pop(r(1), r(2)) is TRUE and it infects pop_new(a,b), pop(a,b,) of the next day.
Vice-versa, if pop(r(1),r(2)) is FALSE.

I attached the source code.
If you plot res_0060.txt you will find free infection sectors.

0 Kudos
mecej4
Honored Contributor III
1,934 Views

Matteo,

poptrack_rev2.f90 still has errors.

In SEARCH, there is this declaration:

integer(kind=2), dimension(1:nvm,1:n,1:n), intent(inout) :: r1,r2

which also agrees with the declaration (and allocation) in the main program.

However, in SEARCH r1 and r2 are used with inconsistent subscript order:

if (r1(a,b,i).eq.r(1).and.r2(a,b,i).eq.r(2)) then

The correct version is:

if (r1(i,a,b).eq.r(1).and.r2(i,a,b).eq.r(2)) then

There are four additional lines in SEARCH where the same kind of error occurs.

You may not see any adverse effects caused by these errors on the results because r1 and r2 have no elements referenced elsewhere (i.e., only the whole arrays are referenced). If, another day, you reference r1 or r2 with subscripts, in the main program or a different subroutine, there will be major errors. Secondly, the inconsistency is unseemly and unnecessary.

Unrelated question: What software do you use to visualize the output files?

0 Kudos
Matteo
Novice
1,927 Views
Hi, you are right.
I made a typing error last night for that index order.
I am fixing that.

I use some Matlab lines to plot blue scatter for zero values and red ones for one values
0 Kudos
JohnNichols
Valued Contributor III
1,904 Views

Permit me to drive home an important conclusion: Optimize the algorithm before you optimize the code.

I have not been following this post, but it was interesting to read, as usual mecej4 provided a pithy statement of truth. 

This type of problem is interesting -- modelling the contact problem is challenge, I have some starting code that I was looking at,  I was wondering if you have a 150 by 150 array and some are true and some false, add an extra element - if all are true on a row or a column set the last element to true, I mean once someone is in a dead group they do not leave?  Excluding a miracle and we will leave miracles to Python.  

0 Kudos
Matteo
Novice
1,892 Views

@JohnNichols ,

if you have followed the whole discussion, you should know this is a prototype.
I did not ask for optimize my code (e.g. having a faster code), but I wanted a support to find a bug in my code.
@mecej4  helped me in this task, with interesting suggestion to not use too much memory (despite that fact the solution doesn't cover worst case, but it works, so...).
In any case, if I had a 64bit compiler I did not have any problems in this sense.

I did not catch the point on the last part of your post.
What do you mean for "dead group"? The code doesn't consider this event and, instead, after some days the pop pass from TRUE to FALSE.

I think Fortran is a great language that it has nothing to envy to more recent languages.

0 Kudos
JohnNichols
Valued Contributor III
1,887 Views

1. I read the entire set of posts.

2. mecej4 often comments on the algorithmic needs -- he has said this recently to me on a Number Theory problem - it was an observation

3. All code is prototype even when it is released, there is always a subsequent version 

4. If you are doing a pandemic type model then there is a dead group, once you are in the dead group we need no longer search on your code data

5. IF you have a pandemic or epidemic type study and you want to do all the population then you can start to break it down into households, if your mother has the disease you are likely to get the disease, the households belong to streets, if there is flu on Elder Street you need to worry about the people on Elder Street and the shops they visit.  So this is easier than modelling all of the people as individual boxes. Relationships are important, we know if a Fungal infection has occurred in a hospital in March then there is a higher probability we will get one in April, but if April is disease free then May's chance reduce.  A simple statistical analysis - the cell phone data has helped with the COVID modelling but it is not perfect, people like me never take their cell phone with them - my 13 year old daughter does, so if you are modelling the owner you are out of luck.

6. This is a fairly relaxed bunch of people, it pays to stay relaxed 

7. I was not really interested in your problem I was merely making some observations,  they are for everyone who reads these posts not just the first person who started the post.  

8. There are always discussions about languages, the real trick is the ability of the programmer and the time for the solution, there is a reason Intel Fortran is on most supercomputers and Python less so -- it can take Python 5 hours to make an FEM model in RHINO and 2 minutes in Fortran. 

Stay well and merry xmas

0 Kudos
Matteo
Novice
1,877 Views
Hi John,
I am not a virology expert, so I cannot create a realistic model. My code is only an educative proprotype to test how some variables can impact the spread of a disease.
That's all. I let more realistic models are managed by people which has that capabilities to deal with this topic.

It is a prototype since it has not requriments in terms of speed, memory usage and so on.
0 Kudos
JohnNichols
Valued Contributor III
1,885 Views

Matteo:

You are to be congratulated for the work.  I think there are some interesting ideas.   The problem is the virus people are not programmers and often their statistics is suspect.  The reliance on the 1927 model with a million minor variations shows how old things can last forever, the IHME model has a number of challenges.  If I am modelling an ODE with a sixth order polynomial I have a problem. 

I was sent an article this morning, one of the referenced articles suggested the covid death rate increase with age was exponential, although a linear model provided the best R squared for the age > 60.   I prefer people like you who offer an example and then can listen to the likes of mecej4.  

The interesting argument is whether there are better programmers than this small group, excluding myself,  in this thought I balance the utility of their work and offering to the community, so someone like the DegenerateConic has some features, but his work is hidden - so it is hard to know - although he does have a few interesting things. 

So do not give up.  And remember mecej4 will continue to offer dry comments that often strike like a sledge hammer, but he means well and is always right as are the really good ones. 

JMN

 

 

0 Kudos
mecej4
Honored Contributor III
1,608 Views

John,

Matteo did say very early that he was more interested in tracking down why some aspect of the computed result did not seem correct, not optimization.

Regardless, this topic provides a good example of how a slow algorithm may hinder debugging and, conversely, a faster algorithm or computer can facilitate debugging. 

The original algorithm (linear search through a single long list of all previously visited locations) was such that simulating a 60-day infection propagation took 6 hours on an i7-2720. With such a long run, if one wants to use a symbolic debugger to catch bugs in the program, when a bug is spotted on, say, day 59, and one wishes to know "what happened at node xx, yy on day 59-1 in subroutine zz?", one has to restart the program with new breakpoints and wait another 6 hours to find the answer.

Similar issues occur if one uses PRINT statements instead of a symbolic debugger.

I was able to suggest a faster search algorithm, with which Matteo and I were able to find and fix a number of bugs in the program and to learn some characteristics of the infection spread model. It is far easier to find bugs and try variations to the model and the algorithm when the run can be completed in 0.6 minutes rather than in 6 hours!

0 Kudos
Reply