Corona Virus Analysis - Page 3

JohnNichols · ‎03-26-2020

DateRep,Day,Month,Year,Cases,Deaths,Countries and territories,GeoId,Pop_Data.2018

25/03/2020,25,3,2020,2,0,Afghanistan,AF,37172386
21/03/2020,21,3,2020,2,0,Cape_Verde,CV,543767
10/03/2020,10,3,2020,-9,1,Cases_on_an_international_conveyance_Japan,JPG11668,
2/03/2020,2,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,
1/03/2020,1,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,

The death data file for the Corona Virus is in the above format. I had a small play with the data in C# but run into graphing problems, I am translating the program into Fortran - there appear to be some interesting features in the FFT of the data, which I hope to publish to help the health stat people.

Does anyone have a good idea for reading the line and then taking apart into

21/03/2020 -- ignore

21 integer to 0 on first line all integers the name is different character length, CV is usually only 2 chars and the population is an integer, but is pop missing from the JPG line and the id is not two characters.

There is a new file every day

Regards

John

JohnNichols · ‎03-31-2020

4. Reporting days are not to UTC so comparing US and European data is interesting.

JohnNichols · ‎03-31-2020

 if (((iline(2:2) .eq. '-') .or. (iline(2:2) .eq. '/'))) then
            iline = "0"//iline
        endif
        flag = flag + 1
        IF ((iline(1:2)) .EQ.'DA' .OR. (iline(1:2)) .EQ.'da' .or. (iline(4:5) .eq. 'da')) THEN    ! If line tagged as node
            write(*,100)iline
100         Format(A130)

Brute force method -- raw data from Europe for today -- US data has been corrected

jimdempseyatthecove · ‎03-31-2020

>>How do I look for these characters and ignore them in Fortran?

When you see the UTF-8 first byte value, start your input line parse at character position 4 instead of 1.

US source files could be either USASCII or UTF-8 (with ASCII following). It is advised that you only test the 1st byte for UTF-8 formatted file. I cannot say that you will not ever see the UTF-8 escape sequence, you may see these in the ",Countries and territories," field.

Jim Dempsey

jimdempseyatthecove · ‎03-31-2020

! first line of file
UTF8 = 0 ! US ASCII file offset = +0
read(inputFile,"(A)") iLine
if(iLine(UTF8+1:1) == 'ï") UTF8 = 3 ! UTF8 offset = +3
if (((iline(UTF8+2:UTF8+2) .eq. '-') .or. (iline(UTF8+2:UTF8+2) .eq. '/'))) then
... ! all subscripts pre-pended with UTF8+

Remainder lines of file do not prepend index with UTF8

IOW the UTF-8 or USASCII file format signature exists only in the first 3 bytes of the file. UTF-8 has header, ASCII does not.

Also, do not assume UTF-8 files have US formatted dates. You may need to examine data (or file name) to figure this out.

Jim Dempsey

JohnNichols · ‎03-31-2020

"Also, do not assume UTF-8 files have US formatted dates. You may need to examine data (or file name) to figure this out."

Luckily the file is done in Europe and they are using d/m/yr -- if not then I would need to check the numbers.

Thanks jim

JohnNichols · ‎03-31-2020

mecej4 wrote:
John Nichols has displayed semi-log plots and has conjectured some death rates based on exponential growth models. Such models correspond to linear autonomous differential equations. Realistic models of predator-prey interaction, financial collapse or infection propagation are linear only in the early stages of a rare event. With exponential growth/decay, the only possibilities are reaching infinity or zero (plus, perhaps, a nonzero offset or bias).
Consider a more reasonable model that is simple, but nonlinear: the SIR model, please see https://www.medrxiv.org/content/medrxiv/early/2020/02/13/2020.02.12.2002... . That model can be represented in terms of three ordinary differential equations. If we scale (by N) the three dependent variables (number tested and found susceptible, S, number tested and known to be infected, I, and the number removed -- by death or recovery -- R, by the sum of the three, N = S+I+R, which is estimated to be a few percent of the total population N_tot, and is assumed to be fixed, because the population is isolated, and births are not counted), u = S/N, v = I/N, w = R/N, and scale the independent variable t by 1/(α.N), we obtain the following nonlinear ODEs:
du/dτ = - u v, dv/dτ = (u - λ) v, dw/dτ = λ v
with initial conditions u = 1 - 1/N, v = 1/N, w = 0 at τ = 0. Please note that u + v + w = 1 at all times.
Note that after scaling the problem has only two parameters: the coefficient λ = ρ/αN in the ODE, and the initial value parameter, 1/N.
Here is Matlab code to integrate the equations and display the results.
The "function" definition file, "sir.m":
function df = sir(t,f)
global lambda
df = [-f(1)*f(2); (f(1)-lambda)*f(2); lambda*f(2)];
end
% https://www.medrxiv.org/content/medrxiv/early/2020/02/13/2020.02.12.20021931.full.pdf
% Dependent variables: fraction susceptible and not infected; fraction infected; fraction removed;
The Matlab script to solve and display, "rsir.m":
global lambda
lambda = 0.72338; %https://www.medrxiv.org/content/medrxiv/early/2020/02/13/2020.02.12.20021931.full.pdf
g0 = 0.1; y0 = [1-g0; g0; 0]; % initial conditions
[t,y] = ode45('sir',[0 10],y0);
figure(1); clf; plot(y(:,2),y(:,3)); xlabel('Infected'); ylabel('Removed'); grid
figure(2); clf; plot(t,y(:,1),t,y(:,2),t,y(:,3)); xlabel('scaled time'); ylabel('fraction of susceptible population');
legend('still susceptible','infected','removed'); grid
My apologies to the forum moderators and those readers who find this response blatantly off-topic, and irrelevant to Intel Fortran. The topic is definitely of current interest if not concern, and John Nichols is the one who led me astray (sorry, John!).

I was looking for some data that was in error in the European data for the US and looked at the HIME University of Washington Model - interesting they imply the rate goes from exp to linear as of today -- is sir.m really going to translate from exp to linear in one day

JohnNichols · ‎03-31-2020

Curaçao

Jim -- why does Fortran think that this word has 8 characters and not 7?

andrew_4619 · ‎03-31-2020

Nichols, John wrote:
Curaçao
Jim -- why does Fortran think that this word has 8 characters and not 7?

UTF8 is a variable format where a character can span 1 to 4 bytes. The C with the accent is a double byte character, std ascii (ANSI) are all 1 byte.

JohnNichols · ‎03-31-2020

utf8 ==== I believe Oddball stated "I only know how to drive these tanks - not fix them'. Of course he is not a good role model for a promising Fortran programmer .

andrew_4619 · ‎04-01-2020

1 Byte 0xxxxxxx (normal ascii 128 char set 0-127)

2 Byte 110xxxxx

3 Byte 1110xxxx

4 Byte 11110xxx

The first byte of a multi-byte UFT-8 character is as above.

To ditch non ANSI chars you would need to loop along the string and according to the bit pattern you ‘accept’ the character or skip the next 1, 2 or 3 bytes and insert a ‘default’ character in your ‘fixed’ string.

JohnNichols · ‎04-01-2020

JohnNichols · ‎04-01-2020

Yesterdays deaths were above the line -- we are on a small upswing -- a 0.9379 for a population of 300 million is really high.

jimdempseyatthecove · ‎04-01-2020

I wouldn't make the same conclusion from the last three points. Looking backwards 12 points from the last three points, the trend line has a higher probability of a reducing the slope. What do you see with non-linear curves?

Jim Dempsey