Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
New Contributor II
139 Views

Corona Virus Analysis

DateRep,Day,Month,Year,Cases,Deaths,Countries and territories,GeoId,Pop_Data.2018


25/03/2020,25,3,2020,2,0,Afghanistan,AF,37172386
21/03/2020,21,3,2020,2,0,Cape_Verde,CV,543767
10/03/2020,10,3,2020,-9,1,Cases_on_an_international_conveyance_Japan,JPG11668,
2/03/2020,2,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,
1/03/2020,1,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,

The death data file for the Corona Virus is in the above format.  I had a small play with the data in C# but run into graphing problems, I am translating the program into Fortran - there appear to be some interesting features in the FFT of the data, which I hope to publish to help the health stat people.

Does anyone have a good idea for reading the line and then taking apart into 

21/03/2020 -- ignore

21 integer to 0 on first line all integers  the name is different character length, CV is usually only 2 chars and the population is an integer, but is pop missing from the JPG line and the id is not two characters. 

There is a new file every day 

Regards

John

 

0 Kudos
53 Replies
Highlighted
Black Belt
125 Views

As a CSV file, you could read the whole line as a text string and parse it to find each field and unpack into the appropriate variables.

0 Kudos
Highlighted
125 Views

10/03/2020,10,3,2020,-9,1,Cases_on_an_international_conveyance_Japan,JPG11668,

You will have to deal with the Geold field not being 2 characters, and missing Pop_Data.2018

My guess is JPG11668 refers to a different file and/or record in file, and/or different region (island) of Japan. This data will have to be obtained elsewhere.

Edit: found this

https://kieranhealy.org/blog/archives/2020/03/21/covid-19-tracking/

Jim Dempsey

0 Kudos
Highlighted
125 Views

In the above mentioned link, they do not investigate (plot) infections/deaths per 100,000 per day.

The charts listed are of minimal value. you will need to get the population figures for Japan seperately.

Jim Dempsey

0 Kudos
Highlighted
New Contributor II
125 Views

subroutine ReadFile()


    implicit none

    Logical Done
    logical Exists
    CHARACTER *2 af
    CHARACTER *130 iline
    EQUIVALENCE (af, iline)
    integer flag


    flag = 0
    Done = .false.

    call lineblankA()

    DO WHILE (.NOT. done)              ! Loop for all the data lines
        READ (srB, '(A)', ERR=1000, END=400) iline    ! Read line
        
        write(*,*)'Here1'
        write(*,*)af                ! Write line to log file for checking

        IF (af.EQ.'da' .OR. af.EQ.'DA') THEN
            write(*,*)'Here'
            Write(*,100)iline
100         format(A130)
        ELSE
            done = .TRUE.
        endif

    end do
    return
1000 Stop ' Input Error in Country Data.'
400 continue
    return

This does not work as expected -- the af does weird things -- help -- 

0 Kudos
Highlighted
New Contributor II
125 Views

Found error apologies -- need to actually call subroutine

0 Kudos
Highlighted
125 Views

In the file you uploaded, the header line reads:

DateRep,Day,Month,Year,Cases,Deaths,Countries and territories,GeoId,Pop_Data.2018

So af will never match "da" or "DA" as it has "Da" at beginning of line.

Jim Dempsey

0 Kudos
Highlighted
New Contributor II
125 Views

Jim:

Thanks -- if read the line in and use the equalivanece statement to match the first two characters on the first line.  These people change the file a lot and I am trying to make it as robust as possible. That is a mecej4 trick, not mine. 

I found a trick of looking for the markers using a series of if routines

it is working so far. 

 subroutine ReadFile(ModelA)


    implicit none

    TYPE (Model),       TARGET :: ModelA

    Logical Done
    logical Exists
    CHARACTER *2 af
    CHARACTER *130 iline
    EQUIVALENCE (af, iline)
    integer flag
    integer loc

    integer yyyy,mm,dd, rd,md, yd


    flag = 0
    Done = .false.

    call lineblankA()

    DO WHILE (.NOT. done)              ! Loop for all the data lines
        READ (srB, '(A)', ERR=1000, END=400) iline    ! Read line
        flag = flag + 1

        IF (af.EQ.'DA' .OR. af.EQ.'da') THEN    ! If line tagged as node
            write(*,100)iline
100         Format(A130)
        elseif ((iline(3:3) .eq. '-')) then

            read(iline(1:2),120)dd
120         format(i2)
            if ((iline(6:6) .eq. '-')) then
                read(iline(4:5),140)mm
140             format(i2)
                if((iline(11:11) .eq. ',')) then
                    read(iline(7:11),150)yyyy
150                 Format(i4)
                    if((iline(13:13) .eq. ',')) then
                        Write(*,*)'here b'
                        read(iline(12:12),160)rd
160                     format(i1)
                        if((iline(15:15) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(14:14),160)md
                            loc = 15
                        elseif((iline(16:16) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(14:15),120)md
                            loc = 16
                        endif
                    elseif((iline(14:14) .eq. ',')) then
                        Write(*,*)'here c'
                        read(iline(12:13),140)rd
                        if((iline(16:16) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(15:15),160)md
                            loc = 16
                        elseif((iline(17:17) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(15:16),120)md
                            loc = 17
                        endif
                    endif

                end if
            endif
            if((iline(loc+5:loc+5) .eq. ',')) then
                read(iline(loc+1:loc+4),150)yd
            endif

            write(*,110)flag, dd, mm, yyyy, rd, md, loc,yd
110         Format('Line number :: ' i6,' Line details - Day :: ', i2, ' Month :: ', i2, '  Year :: ',i4, '   Day :: ',i2, '   Month :: ',i2,'   Location :: ',i2,'   Year :: ',i4)
        ELSE
400         ModelA%LineCount = flag
            done = .TRUE.
        endif

    end do
    return
1000 Stop ' Input Error in Country Data.'
    return


    end subroutine ReadFile

I am almost there with the read then setting up the structure

I should have kept my mouth closed and not said - this looks interesting 

 

John

0 Kudos
Highlighted
New Contributor II
125 Views

This program is tripping my virus program - Microsoft defender

Can someone try it and see if is just my machine -- 

0 Kudos
Highlighted
New Contributor II
125 Views

It appears to be the manifest file, is it safe to turn it off

 

0 Kudos
Highlighted
125 Views

John,

Add arrays for:

DateRep, Day, Month, Year, Cases, Deaths, Countries_and_territories, GeoId, Pop_Data_2018

Read the file line by line (no equivalence)
Use the comma as a separator to extract each field and insert into each field into its respective array
note, some lines end in , leaving the last field blank.

This data file is not fixed field. don't assume the Geold is 2 characters. I saw that the data file may have an exception with a 3 character code.

RE: Microsoft Defender

Check the Document. See if there is an exclude folder option, I use Avast and it has this option.

Jim Dempsey
 

0 Kudos
Highlighted
New Contributor II
125 Views

Jim:

Thanks for the notes.  It has been interesting, I have swapped computers, deleted Intel , VS and done a lot of reinstalling with different versions of VS and Fortran. 

The error continues to occur in all combinations. 

So I have been back through and added line by line and tested at eaxh step. There appears to be a problem with 

            if((iline(loc+5:loc+5) .eq. ',')) then
                read(iline(loc+1:loc+4),210)name2
                nameA = name2
210             format(A4)
            elseif((iline(loc+6:loc+6) .eq. ',')) then
                read(iline(loc+1:loc+5),220)name5
                nameA = name5
220             format(A5)
            elseif((iline(loc+7:loc+7) .eq. ',')) then
                read(iline(loc+1:loc+6),*)name6
                nameA = name6
230             format( A6)


            end if

 

The read(iline(loc+1:loc+6),230)name6  causes the virus error on both VS which will not run the program if I change the * to 230 and from Defender. 

I am not sure if it is a bug or a code error. 

The CV.zip has the current iteration. 

I have to check the output files once I am finished to catch all the oddities in the data file. US Virgin Islands causes a strange output.  But I want to solve this problem first. 

Thanks for the help

John

0 Kudos
Highlighted
125 Views

This is more what I had in mind:

!  CVJD.f90 
program CVJD
    implicit none
    character(len=256) :: inputLine
    integer :: inputFileUnit, outputFileUnit
    character(len=*), parameter :: inputFileName = "e.txt"
    character(len=*), parameter :: outputFileName = "test.txt"
    integer :: comma, priorComma, nCommas
    character(len=50) :: hDateRep
    character(len=50) :: hDay
    character(len=50) :: hMonth
    character(len=50) :: hYear
    character(len=50) :: hCases
    character(len=50) :: hDeaths
    character(len=50) :: hCountries_and_territories
    character(len=50) :: hGeoId
    character(len=50) :: hPop_Data  ! .2018

    character(len=8), allocatable :: DateRep(:)
    integer, allocatable :: Day(:)
    integer, allocatable :: Month(:)
    integer, allocatable :: Year(:)
    integer, allocatable :: Cases(:)
    integer, allocatable :: Deaths(:)
    character(len=50), allocatable :: Countries_and_territories(:)
    character(len=8), allocatable :: GeoId(:)
    integer(8), allocatable :: Pop_Data(:)  ! .2018

    integer(8) :: inputFileSize
    integer :: maxRecords, nRecords
    
    open(newunit=outputFileUnit, file=outputFileName, access='sequential', action='write', err=888)
!    write(outputFileUnit) "test"
!    close(outputFileUnit)
    open(newunit=inputFileUnit, file=inputFileName, access='sequential', action='read', err=999)
    ! read the header
    read(inputFileUnit,"(A)") inputLine
    
    comma = index(inputLine,",")
    if(comma < 2) goto 777
    hDateRep = inputLine(1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hDay = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hMonth = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hYear = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hCases = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hDeaths = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hCountries_and_territories = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hGeoId = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma > 0)  goto 777
    hPop_Data = inputLine(priorComma+1:)  ! .2018
    
    ! estimate storage space
    inquire(inputFileUnit, size=inputFileSize) ! get file size
    ! smallest record (line size) is ~40 characters
    maxRecords = inputFileSize / 40
    allocate(DateRep(maxRecords), Day(maxRecords), Month(maxRecords), Year(maxRecords), Cases(maxRecords))
    allocate(Deaths(maxRecords), Countries_and_territories(maxRecords), GeoId(maxRecords), Pop_Data(maxRecords))    

    nRecords = 0
    do
        ! read the header
        read(inputFileUnit,"(A)",END=111) inputLine
        nRecords = nRecords + 1
        comma = index(inputLine,",")
        if(comma < 2) goto 777
        DateRep(nRecords) = inputLine(1:comma-1)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Day(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Month(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Year(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Cases(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Deaths(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        Countries_and_territories(nRecords) = inputLine(priorComma+1:comma-1)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        GeoId(nRecords) = inputLine(priorComma+1:comma-1)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma > 0)  goto 777
        if(inputLine(priorComma+1:priorComma+9) == "         ") then
            Pop_Data(nRecords) = 0
        else
            read(inputLine(priorComma+1:),"(I)") Pop_Data(nRecords)
        endif
    end do
    
111 print *,inputLine
    print *,"**************** have data, do your thing ***********"
    stop
777 print *,"Invalid header record"
    stop
888 print *, "Error opening output file", outputFileName
    stop
999 print *, "Error opening input file", inputFileName
    stop
end program CVJD

Jim Dempsey

0 Kudos
Highlighted
125 Views

You can fancify that if you want: additional error checking, pick nth argument from line as text or integer (or real or double), etc...

Jim Dempsey

0 Kudos
Highlighted
New Contributor II
125 Views

Jim:

I knew there was a better way to do it -- thanks ---

You are great thanks

John

20  Format(i6,'    ', A2, '    ', i4, 88('    ', i5))

How would you make the 88 a variable? 

0 Kudos
Highlighted
New Contributor II
125 Views

Capture.PNG

0 Kudos
Highlighted
New Contributor II
125 Views

7 days till a 1000 deaths per day and 17 days to 10000 per day and 27 days to 100000 

 

I pray to all that is holy that I am wrong.

0 Kudos
Highlighted
New Contributor II
125 Views

It is accelerating and it has an underlying FFT - damn

0 Kudos
Highlighted
New Contributor II
125 Views

The underlying FFT causes the 0.8 -- it is interesting - the line is fully about the line for the last 10 days

 

0 Kudos
Highlighted
New Contributor II
125 Views

Teh Europeans changed the file format again today -- got to love them. 

0 Kudos