Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Corona Virus Analysis

JohnNichols
Valued Contributor II
927 Views

DateRep,Day,Month,Year,Cases,Deaths,Countries and territories,GeoId,Pop_Data.2018


25/03/2020,25,3,2020,2,0,Afghanistan,AF,37172386
21/03/2020,21,3,2020,2,0,Cape_Verde,CV,543767
10/03/2020,10,3,2020,-9,1,Cases_on_an_international_conveyance_Japan,JPG11668,
2/03/2020,2,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,
1/03/2020,1,3,2020,0,0,Cases_on_an_international_conveyance_Japan,JPG11668,

The death data file for the Corona Virus is in the above format.  I had a small play with the data in C# but run into graphing problems, I am translating the program into Fortran - there appear to be some interesting features in the FFT of the data, which I hope to publish to help the health stat people.

Does anyone have a good idea for reading the line and then taking apart into 

21/03/2020 -- ignore

21 integer to 0 on first line all integers  the name is different character length, CV is usually only 2 chars and the population is an integer, but is pop missing from the JPG line and the id is not two characters. 

There is a new file every day 

Regards

John

 

0 Kudos
54 Replies
DavidWhite
Black Belt
535 Views

As a CSV file, you could read the whole line as a text string and parse it to find each field and unpack into the appropriate variables.

jimdempseyatthecove
Black Belt
535 Views

10/03/2020,10,3,2020,-9,1,Cases_on_an_international_conveyance_Japan,JPG11668,

You will have to deal with the Geold field not being 2 characters, and missing Pop_Data.2018

My guess is JPG11668 refers to a different file and/or record in file, and/or different region (island) of Japan. This data will have to be obtained elsewhere.

Edit: found this

https://kieranhealy.org/blog/archives/2020/03/21/covid-19-tracking/

Jim Dempsey

jimdempseyatthecove
Black Belt
535 Views

In the above mentioned link, they do not investigate (plot) infections/deaths per 100,000 per day.

The charts listed are of minimal value. you will need to get the population figures for Japan seperately.

Jim Dempsey

JohnNichols
Valued Contributor II
535 Views
subroutine ReadFile()


    implicit none

    Logical Done
    logical Exists
    CHARACTER *2 af
    CHARACTER *130 iline
    EQUIVALENCE (af, iline)
    integer flag


    flag = 0
    Done = .false.

    call lineblankA()

    DO WHILE (.NOT. done)              ! Loop for all the data lines
        READ (srB, '(A)', ERR=1000, END=400) iline    ! Read line
        
        write(*,*)'Here1'
        write(*,*)af                ! Write line to log file for checking

        IF (af.EQ.'da' .OR. af.EQ.'DA') THEN
            write(*,*)'Here'
            Write(*,100)iline
100         format(A130)
        ELSE
            done = .TRUE.
        endif

    end do
    return
1000 Stop ' Input Error in Country Data.'
400 continue
    return

This does not work as expected -- the af does weird things -- help -- 

JohnNichols
Valued Contributor II
535 Views

Found error apologies -- need to actually call subroutine

jimdempseyatthecove
Black Belt
535 Views

In the file you uploaded, the header line reads:

DateRep,Day,Month,Year,Cases,Deaths,Countries and territories,GeoId,Pop_Data.2018

So af will never match "da" or "DA" as it has "Da" at beginning of line.

Jim Dempsey

JohnNichols
Valued Contributor II
535 Views

Jim:

Thanks -- if read the line in and use the equalivanece statement to match the first two characters on the first line.  These people change the file a lot and I am trying to make it as robust as possible. That is a mecej4 trick, not mine. 

I found a trick of looking for the markers using a series of if routines

it is working so far. 

 subroutine ReadFile(ModelA)


    implicit none

    TYPE (Model),       TARGET :: ModelA

    Logical Done
    logical Exists
    CHARACTER *2 af
    CHARACTER *130 iline
    EQUIVALENCE (af, iline)
    integer flag
    integer loc

    integer yyyy,mm,dd, rd,md, yd


    flag = 0
    Done = .false.

    call lineblankA()

    DO WHILE (.NOT. done)              ! Loop for all the data lines
        READ (srB, '(A)', ERR=1000, END=400) iline    ! Read line
        flag = flag + 1

        IF (af.EQ.'DA' .OR. af.EQ.'da') THEN    ! If line tagged as node
            write(*,100)iline
100         Format(A130)
        elseif ((iline(3:3) .eq. '-')) then

            read(iline(1:2),120)dd
120         format(i2)
            if ((iline(6:6) .eq. '-')) then
                read(iline(4:5),140)mm
140             format(i2)
                if((iline(11:11) .eq. ',')) then
                    read(iline(7:11),150)yyyy
150                 Format(i4)
                    if((iline(13:13) .eq. ',')) then
                        Write(*,*)'here b'
                        read(iline(12:12),160)rd
160                     format(i1)
                        if((iline(15:15) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(14:14),160)md
                            loc = 15
                        elseif((iline(16:16) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(14:15),120)md
                            loc = 16
                        endif
                    elseif((iline(14:14) .eq. ',')) then
                        Write(*,*)'here c'
                        read(iline(12:13),140)rd
                        if((iline(16:16) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(15:15),160)md
                            loc = 16
                        elseif((iline(17:17) .eq. ',')) then
                            Write(*,*)'here d'
                            read(iline(15:16),120)md
                            loc = 17
                        endif
                    endif

                end if
            endif
            if((iline(loc+5:loc+5) .eq. ',')) then
                read(iline(loc+1:loc+4),150)yd
            endif

            write(*,110)flag, dd, mm, yyyy, rd, md, loc,yd
110         Format('Line number :: ' i6,' Line details - Day :: ', i2, ' Month :: ', i2, '  Year :: ',i4, '   Day :: ',i2, '   Month :: ',i2,'   Location :: ',i2,'   Year :: ',i4)
        ELSE
400         ModelA%LineCount = flag
            done = .TRUE.
        endif

    end do
    return
1000 Stop ' Input Error in Country Data.'
    return


    end subroutine ReadFile

I am almost there with the read then setting up the structure

I should have kept my mouth closed and not said - this looks interesting 

 

John

JohnNichols
Valued Contributor II
535 Views

This program is tripping my virus program - Microsoft defender

Can someone try it and see if is just my machine -- 

JohnNichols
Valued Contributor II
535 Views

It appears to be the manifest file, is it safe to turn it off

 

jimdempseyatthecove
Black Belt
534 Views

John,

Add arrays for:

DateRep, Day, Month, Year, Cases, Deaths, Countries_and_territories, GeoId, Pop_Data_2018

Read the file line by line (no equivalence)
Use the comma as a separator to extract each field and insert into each field into its respective array
note, some lines end in , leaving the last field blank.

This data file is not fixed field. don't assume the Geold is 2 characters. I saw that the data file may have an exception with a 3 character code.

RE: Microsoft Defender

Check the Document. See if there is an exclude folder option, I use Avast and it has this option.

Jim Dempsey
 

JohnNichols
Valued Contributor II
534 Views

Jim:

Thanks for the notes.  It has been interesting, I have swapped computers, deleted Intel , VS and done a lot of reinstalling with different versions of VS and Fortran. 

The error continues to occur in all combinations. 

So I have been back through and added line by line and tested at eaxh step. There appears to be a problem with 

            if((iline(loc+5:loc+5) .eq. ',')) then
                read(iline(loc+1:loc+4),210)name2
                nameA = name2
210             format(A4)
            elseif((iline(loc+6:loc+6) .eq. ',')) then
                read(iline(loc+1:loc+5),220)name5
                nameA = name5
220             format(A5)
            elseif((iline(loc+7:loc+7) .eq. ',')) then
                read(iline(loc+1:loc+6),*)name6
                nameA = name6
230             format( A6)


            end if

 

The read(iline(loc+1:loc+6),230)name6  causes the virus error on both VS which will not run the program if I change the * to 230 and from Defender. 

I am not sure if it is a bug or a code error. 

The CV.zip has the current iteration. 

I have to check the output files once I am finished to catch all the oddities in the data file. US Virgin Islands causes a strange output.  But I want to solve this problem first. 

Thanks for the help

John

jimdempseyatthecove
Black Belt
534 Views

This is more what I had in mind:

!  CVJD.f90 
program CVJD
    implicit none
    character(len=256) :: inputLine
    integer :: inputFileUnit, outputFileUnit
    character(len=*), parameter :: inputFileName = "e.txt"
    character(len=*), parameter :: outputFileName = "test.txt"
    integer :: comma, priorComma, nCommas
    character(len=50) :: hDateRep
    character(len=50) :: hDay
    character(len=50) :: hMonth
    character(len=50) :: hYear
    character(len=50) :: hCases
    character(len=50) :: hDeaths
    character(len=50) :: hCountries_and_territories
    character(len=50) :: hGeoId
    character(len=50) :: hPop_Data  ! .2018

    character(len=8), allocatable :: DateRep(:)
    integer, allocatable :: Day(:)
    integer, allocatable :: Month(:)
    integer, allocatable :: Year(:)
    integer, allocatable :: Cases(:)
    integer, allocatable :: Deaths(:)
    character(len=50), allocatable :: Countries_and_territories(:)
    character(len=8), allocatable :: GeoId(:)
    integer(8), allocatable :: Pop_Data(:)  ! .2018

    integer(8) :: inputFileSize
    integer :: maxRecords, nRecords
    
    open(newunit=outputFileUnit, file=outputFileName, access='sequential', action='write', err=888)
!    write(outputFileUnit) "test"
!    close(outputFileUnit)
    open(newunit=inputFileUnit, file=inputFileName, access='sequential', action='read', err=999)
    ! read the header
    read(inputFileUnit,"(A)") inputLine
    
    comma = index(inputLine,",")
    if(comma < 2) goto 777
    hDateRep = inputLine(1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hDay = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hMonth = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hYear = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hCases = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hDeaths = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hCountries_and_territories = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma == 0)  goto 777
    comma = priorComma + comma
    hGeoId = inputLine(priorComma+1:comma-1)
    priorComma = comma
    comma = index(inputLine(priorComma+1:),",")
    if(comma > 0)  goto 777
    hPop_Data = inputLine(priorComma+1:)  ! .2018
    
    ! estimate storage space
    inquire(inputFileUnit, size=inputFileSize) ! get file size
    ! smallest record (line size) is ~40 characters
    maxRecords = inputFileSize / 40
    allocate(DateRep(maxRecords), Day(maxRecords), Month(maxRecords), Year(maxRecords), Cases(maxRecords))
    allocate(Deaths(maxRecords), Countries_and_territories(maxRecords), GeoId(maxRecords), Pop_Data(maxRecords))    

    nRecords = 0
    do
        ! read the header
        read(inputFileUnit,"(A)",END=111) inputLine
        nRecords = nRecords + 1
        comma = index(inputLine,",")
        if(comma < 2) goto 777
        DateRep(nRecords) = inputLine(1:comma-1)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Day(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Month(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Year(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Cases(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        read(inputLine(priorComma+1:comma-1),"(I)") Deaths(nRecords)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        Countries_and_territories(nRecords) = inputLine(priorComma+1:comma-1)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma == 0)  goto 777
        comma = priorComma + comma
        GeoId(nRecords) = inputLine(priorComma+1:comma-1)
        priorComma = comma
        comma = index(inputLine(priorComma+1:),",")
        if(comma > 0)  goto 777
        if(inputLine(priorComma+1:priorComma+9) == "         ") then
            Pop_Data(nRecords) = 0
        else
            read(inputLine(priorComma+1:),"(I)") Pop_Data(nRecords)
        endif
    end do
    
111 print *,inputLine
    print *,"**************** have data, do your thing ***********"
    stop
777 print *,"Invalid header record"
    stop
888 print *, "Error opening output file", outputFileName
    stop
999 print *, "Error opening input file", inputFileName
    stop
end program CVJD

Jim Dempsey

jimdempseyatthecove
Black Belt
534 Views

You can fancify that if you want: additional error checking, pick nth argument from line as text or integer (or real or double), etc...

Jim Dempsey

JohnNichols
Valued Contributor II
533 Views

Jim:

I knew there was a better way to do it -- thanks ---

You are great thanks

John

20  Format(i6,'    ', A2, '    ', i4, 88('    ', i5))

How would you make the 88 a variable? 

JohnNichols
Valued Contributor II
533 Views

Capture.PNG

JohnNichols
Valued Contributor II
533 Views

7 days till a 1000 deaths per day and 17 days to 10000 per day and 27 days to 100000 

 

I pray to all that is holy that I am wrong.

JohnNichols
Valued Contributor II
533 Views

It is accelerating and it has an underlying FFT - damn

JohnNichols
Valued Contributor II
533 Views

The underlying FFT causes the 0.8 -- it is interesting - the line is fully about the line for the last 10 days

 

JohnNichols
Valued Contributor II
533 Views

Teh Europeans changed the file format again today -- got to love them. 

mecej4
Black Belt
332 Views

You displayed a plot in #16 and your subsequent comments refer to the same plot. Unfortunately, you did not tell us what the abscissa and ordinate variables were.

Does FFT mean 'food for thought' or 'fast Fourier transform'?

Reply