Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Corona Virus Analysis

JohnNichols
Valued Contributor II
1,658 Views

DateRep,Day,Month,Year,Cases,Deaths,Countries and territories,GeoId,Pop_Data.2018


25/03/2020,25,3,2020,2,0,Afghanistan,AF,37172386
21/03/2020,21,3,2020,2,0,Cape_Verde,CV,543767
10/03/2020,10,3,2020,-9,1,Cases_on_an_international_conveyance_Japan,JPG11668,
2/03/2020,2,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,
1/03/2020,1,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,

The death data file for the Corona Virus is in the above format.  I had a small play with the data in C# but run into graphing problems, I am translating the program into Fortran - there appear to be some interesting features in the FFT of the data, which I hope to publish to help the health stat people.

Does anyone have a good idea for reading the line and then taking apart into 

21/03/2020 -- ignore

21 integer to 0 on first line all integers  the name is different character length, CV is usually only 2 chars and the population is an integer, but is pop missing from the JPG line and the id is not two characters. 

There is a new file every day 

Regards

John

 

0 Kudos
54 Replies
JohnNichols
Valued Contributor II
249 Views

4. Reporting days are not to UTC so comparing US and European data is interesting. 

JohnNichols
Valued Contributor II
236 Views
 if (((iline(2:2) .eq. '-') .or. (iline(2:2) .eq. '/'))) then
            iline = "0"//iline
        endif
        flag = flag + 1
        IF ((iline(1:2)) .EQ.'DA' .OR. (iline(1:2)) .EQ.'da' .or. (iline(4:5) .eq. 'da')) THEN    ! If line tagged as node
            write(*,100)iline
100         Format(A130)

 

 

Brute force method -- raw data from Europe for today -- US data has been corrected

 

jimdempseyatthecove
Black Belt
236 Views

>>How do I look for these characters and ignore them in Fortran? 

When you see the UTF-8 first byte value, start your input line parse at character position 4 instead of 1.

US source files could be either USASCII or UTF-8 (with ASCII following). It is advised that you only test the 1st byte for UTF-8 formatted file. I cannot say that you will not ever see the UTF-8 escape sequence, you may see these in the ",Countries and territories," field.

Jim Dempsey

jimdempseyatthecove
Black Belt
236 Views

! first line of file
UTF8 = 0 ! US ASCII file offset = +0
read(inputFile,"(A)") iLine
if(iLine(UTF8+1:1) == 'ï") UTF8 = 3 ! UTF8 offset = +3
if (((iline(UTF8+2:UTF8+2) .eq. '-') .or. (iline(UTF8+2:UTF8+2) .eq. '/'))) then
   ... ! all subscripts pre-pended with UTF8+

Remainder lines of file do not prepend index with UTF8

IOW the UTF-8 or USASCII file format signature exists only in the first 3 bytes of the file. UTF-8 has header, ASCII does not.

Also, do not assume UTF-8 files have US formatted dates. You may need to examine data (or file name) to figure this out.

Jim Dempsey

 

JohnNichols
Valued Contributor II
236 Views

"Also, do not assume UTF-8 files have US formatted dates. You may need to examine data (or file name) to figure this out."

Luckily the file is done in Europe and they are using d/m/yr -- if not then I would need to check the numbers.

Thanks jim

 

 

JohnNichols
Valued Contributor II
236 Views

mecej4 wrote:

John Nichols has displayed semi-log plots and has conjectured some death rates based on exponential growth models. Such models correspond to linear autonomous differential equations. Realistic models of predator-prey interaction, financial collapse or infection propagation are linear only in the early stages of a rare event. With exponential growth/decay, the only possibilities are reaching infinity or zero (plus, perhaps, a nonzero offset or bias).

Consider a more reasonable model that is simple, but nonlinear: the SIR model, please see https://www.medrxiv.org/content/medrxiv/early/2020/02/13/2020.02.12.2002... . That model can be represented in terms of three ordinary differential equations. If we scale (by N) the three dependent variables (number tested and found susceptible, S, number tested and known to be infected, I, and the number removed -- by death or recovery -- R, by the sum of the three, N = S+I+R, which is estimated to be a few percent of the total population Ntot, and is assumed to be fixed, because the population is isolated, and births are not counted), u = S/N, v = I/N, w = R/N, and scale the independent variable t by 1/(α.N), we obtain the following nonlinear ODEs:

     du/dτ = - u v,     dv/dτ = (u - λ) v,     dw/dτ = λ v

with initial conditions u = 1 - 1/N, v = 1/N, w = 0 at τ = 0. Please note that u + v + w = 1 at all times.

Note that after scaling the problem has only two parameters: the coefficient λ = ρ/αN in the ODE, and the initial value parameter, 1/N.

Here is Matlab code to integrate the equations and display the results.

The "function" definition file, "sir.m":

function df = sir(t,f)
global lambda
df = [-f(1)*f(2); (f(1)-lambda)*f(2); lambda*f(2)];
end
% https://www.medrxiv.org/content/medrxiv/early/2020/02/13/2020.02.12.20021931.full.pdf
% Dependent variables: fraction susceptible and not infected; fraction infected; fraction removed;

The Matlab script to solve and display, "rsir.m":

global lambda
lambda = 0.72338; %https://www.medrxiv.org/content/medrxiv/early/2020/02/13/2020.02.12.20021931.full.pdf
g0 = 0.1; y0 = [1-g0; g0; 0]; % initial conditions
[t,y] = ode45('sir',[0 10],y0);
figure(1); clf; plot(y(:,2),y(:,3)); xlabel('Infected'); ylabel('Removed'); grid
figure(2); clf; plot(t,y(:,1),t,y(:,2),t,y(:,3)); xlabel('scaled time'); ylabel('fraction of susceptible population');
legend('still susceptible','infected','removed'); grid

My apologies to the forum moderators and those readers who find this response blatantly off-topic, and irrelevant to Intel Fortran. The topic is definitely of current interest if not concern, and John Nichols is the one who led me astray (sorry, John!).

I was looking for some data that was in error in the European data for the US and looked at the HIME University of Washington Model - interesting they imply the rate goes from exp to linear as of today -- is sir.m really going to translate from exp to linear in one day 

JohnNichols
Valued Contributor II
236 Views

Curaçao

Jim -- why does Fortran think that this word has 8 characters and not 7?

andrew_4619
Honored Contributor II
236 Views

Nichols, John wrote:

Curaçao

Jim -- why does Fortran think that this word has 8 characters and not 7?

UTF8 is a variable format where a character can span 1 to 4 bytes. The C with the accent is a double byte character, std ascii (ANSI) are all 1 byte.

JohnNichols
Valued Contributor II
236 Views

utf8  ====  I believe Oddball stated "I only know how to drive these tanks - not fix them'.  Of course he is not a good role model for a promising Fortran programmer . 

andrew_4619
Honored Contributor II
238 Views

1 Byte 0xxxxxxx   (normal ascii 128 char set 0-127)

2 Byte 110xxxxx

3 Byte 1110xxxx

4 Byte 11110xxx

 

The first byte of a multi-byte  UFT-8 character is as above.

To ditch non ANSI chars you would need to loop along the string and according to the bit pattern you ‘accept’ the character or skip the next 1, 2 or 3 bytes and insert a ‘default’ character in your ‘fixed’ string.

JohnNichols
Valued Contributor II
238 Views

Capture.PNG

JohnNichols
Valued Contributor II
238 Views

Yesterdays deaths were above the line -- we are on a small upswing -- a 0.9379 for a population of 300 million is really high. 

 

jimdempseyatthecove
Black Belt
238 Views

I wouldn't make the same conclusion from the last three points. Looking backwards 12 points from the last three points, the trend line has a higher probability of a reducing the slope. What do you see with non-linear curves?

Jim Dempsey

Reply