Getting physical/logical processor count

David_DiLaura1 · ‎09-21-2016

Steve,

A number of years ago you provided Fortran code (ported over from C, as I recall) that provided a way to acquire the physical and logical processor count, as well as the physical, page, and virtual memory on a host machine. We have used that successfully (and usefully!) for several years. I have encountered a problem with it when compiled with 17.0.0.109:

The code contains an allocatable variable BUFFER that is typed as T_SYSTEM_LOGICAL_PROCESSOR_INFORMATION, which comes from the ifwinty module. When compiled with 17.0.0.109 and excuted, an attempt to allocate BUFFER generates an 'out of virtual memory' error. Code line 075. The identical code, compiled with 16.03.207 executes without error. Nothing other than the VS setting of which version of the compiler to use is changed.

I tried to find the reference to your code, to refresh my understanding of the details, but I can't find it anywhere.

Here is the routine that is based on your example code:

subroutine SystemInfo( logicalProcessorCount, NumPhysicalProcessors,                                            &
                    TotalPhysicalMemory, AvailablePhysicalMemory, TotalPageableMemory, AvailablePageableMemory, &
                    TotalVirtualMemory, AvailableVirtualMemory, Worked)

use, intrinsic :: ISO_C_BINDING
!use kernel32_additions
use kernel32
use ifwinty, only: T_MEMORYSTATUSEX, DWORDLONG, T_LARGE_INTEGERX

implicit none

! Variables
procedure(GetLogicalProcessorInformation), pointer :: glpi
type(T_SYSTEM_LOGICAL_PROCESSOR_INFORMATION), allocatable  ::  buffer(:)   !dimension(:) :: buffer
integer(DWORD) :: returnLength
integer :: logicalProcessorCount
integer :: numaNodeCount
integer :: processorCoreCount
integer :: processorCacheCount(3)
integer :: processorPackageCount
integer(DWORD) :: ret
integer :: nlpi, lpi_element_length, i
logical(4) :: Worked
integer(8) ::	TotalPhysicalMemory		,&
				AvailablePhysicalMemory	,&
				TotalPageableMemory		,&
				AvailablePageableMemory	,&
				TotalVirtualMemory		,&
				AvailableVirtualMemory

integer(4) NumPhysicalProcessors
			
type (T_MEMORYSTATUSEX) :: stat
type(T_LARGE_INTEGERX) :: val
			
! MSDN says that because GetLogicalProcessorInformation is not supported on all versions
! of Windows, it suggests getting the address dynamically.  We'll do that here, though
! in reality it should not be necessary. The following statement uses only Fortran standard
! syntax - it would be a bit simpler to use the integer pointer extension, but this makes a 
! good example.
!
! The steps here are:
! 1. Call GetModuleHandle to get a handle to the kernel32 DLL which will already be loaded in this image.
!    Note that this is not the same as LoadLibrary, which assumes that a DLL is not already loaded.
! 2. Call GetProcAddress to get the address of GetLogicalProcessorInformation
! 3. Use TRANSFER to convert that address to a C_FUNPTR
! 4. Use C_F_PROCPOINTER to convert the C_FUNPTR to a Fortran procedure pointer

numaNodeCount = 0
processorCoreCount = 0
logicalProcessorCount = 0
processorPackageCount = 0
Worked = .true.
		
call c_f_procpointer( &
    transfer( &
        GetProcAddress( &
            GetModuleHandle("kernel32"//C_NULL_CHAR), &
            "GetLogicalProcessorInformation"//C_NULL_CHAR &
            ), &
        C_NULL_FUNPTR &
        ), &
    glpi)

if (.not. associated(glpi)) then
    print *, "GetLogicalProcessorInformation not supported"
	Worked = .false.
    return
end if
 
! We don't know in advance the size of the buffer we need. We'll pick a number, allocate it,
! and see if that's sufficient.  If not, we'll use the returned size information and reallocate
! the buffer to the required size.

allocate (buffer(100))
!lpi_element_length = C_SIZEOF(buffer(1))
!returnLength = C_SIZEOF(buffer)
lpi_element_length = SIZEOF(buffer(1))
returnLength = SIZEOF(buffer)
ret = glpi(buffer, returnLength)
if (ret == FALSE) then ! Failed
    if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) then
        deallocate (buffer)
        allocate (buffer(returnLength/lpi_element_length))
        ret = glpi(buffer, returnLength)
        if (ret == FALSE) then
            print *, "GetLogicalProcessorInformation call failed with error code ", GetLastError()
			Worked = .false.
            return
        end if
    else
        print *, "GetLogicalProcessorInformation call failed with error code ", GetLastError()
		Worked = .false.
        return
    end if
end if

! Now we can iterate through the elements of buffer and see what we can see

do i=1, returnLength / lpi_element_length ! Number of elements in buffer
    select case (buffer(i)%Relationship)
    case(RelationNumaNode)
        ! NUMA nodes return one record of this type
        numaNodeCount = numaNodeCount + 1
    
    case(RelationProcessorCore)
        processorCoreCount = processorCoreCount + 1
        
        ! A Hyperthreaded core supplies more than one logical processor
        logicalProcessorCount = logicalProcessorCount + popcnt(buffer(i)%processorMask)
        
    case(RelationCache)
        ! One cache descriptor for each cache
        if (buffer(i)%Cache%Level > 0 .and. buffer(i)%Cache%Level <= 3) then
            processorCacheCount(buffer(i)%Cache%Level) = processorCacheCount(buffer(i)%Cache%Level) + 1
        else
            print *, "Invalid processor cache level ", buffer(i)%Cache%Level
        end if
        
    case(RelationProcessorPackage)
        !Logical processors share a physical package (socket)
        processorPackageCount = processorPackageCount + 1
        
    case default
        print *, "Unrecognized relationship code ", buffer(i)%Relationship
        
    end select
end do

NumPhysicalProcessors = processorCoreCount

!%%%%%%%%% now get available memory

! Set the length field in structure stat
stat%dwlength = sizeof(stat)
! Get the statistics and convert as necessary
ret = GlobalMemoryStatusEx(stat)

TotalPhysicalMemory		= transfer(stat%ullTotalPhys, 0_DWORDLONG)
AvailablePhysicalMemory	= transfer(stat%ullAvailPhys, 0_DWORDLONG)
TotalPageableMemory		= transfer(stat%ullTotalPageFile, 0_DWORDLONG)
AvailablePageableMemory	= transfer(stat%ullAvailPageFile, 0_DWORDLONG)
TotalVirtualMemory			= transfer(stat%ullTotalVirtual, 0_DWORDLONG)
AvailableVirtualMemory		= transfer(stat%ullAvailVirtual, 0_DWORDLONG)

end subroutine SystemInfo

David

jimdempseyatthecove · ‎09-21-2016

The problem you have is if you code will run on a system with more than 64 logical processors. In that situation, you will need to call GetLogicalProcessorInformationEx and pay attention to the Processor Group.

I suggest you modify your code to first attempt to locate GetLogicalProcessorInformationEx, use that if found, failing that, then try to locate GetLogicalProcessorInformation, use that if found, else see if you can get the information in a different way (e.g. using CPUID and CPUIDEX) rather than bomb out.

Jim Dempsey

Steven_L_Intel1 · ‎09-21-2016

The ProcessorInfo sample is included in the samples bundle under compiler_f\Win32. It works fine for me in 17. I have attached it here.

jimdempseyatthecove · ‎09-21-2016

Have you run this on a Windows host with more than 64 logical processors?

The non Ex version (on system with more than 64 logical processors) of GetLogicalProcessorInformation would either

a) return the information for the issuing thread's processor Group
b) return invalid/conflicting 64-bit affinity bitmasks for unknown processor Group

Whereas the Ex version informs you of the group number (and the 64-bit affinity mask within that group).

Jim Dempsey

Steven_L_Intel1 · ‎09-21-2016

No, I don't have access to such a system. That function was new with Windows 7, but I suppose at this point it's reasonable to use it. I'll look into revising the sample to use the Ex version.

jimdempseyatthecove · ‎09-22-2016

Windows 10 will run on a KNL system..

Jim Dempsey

Steven_L_Intel1 · ‎10-21-2016

Ok, this was both fun, exasperating and more work than I had envisioned...

Attached is a complete rewrite of the ProcessorInfo example using GetLogicalProcessorInformationEx. I discovered that many of the required types and constants were not defined in the Intel Fortran-supplied modules, and that the declaration of GetLogicalProcessorInformationEx that was there was incorrect. I also had more fun with using the C interoperability features to do what had previously needed extensions.

So, without further ado, I attach the new example. It has three files - GLPI_mod.f90 is a module with helper routines, ProcessorInfo is the main program, and Win32_Additions is a module that adds/corrects API declarations. This latter will be fed back into the product - at the point where that happens, Win32_Additions will need to be dropped and minor edits made to GLPI_mod.f90 (see comments there.)

When I run this on my 4C8T system I get:

Requesting GLPI buffer of 1016 bytes
Logical to Physical Processor Map
  Core 0 logical processors: 0-1 (Hyperthreaded)
  Core 1 logical processors: 2-3 (Hyperthreaded)
  Core 2 logical processors: 4-5 (Hyperthreaded)
  Core 3 logical processors: 6-7 (Hyperthreaded)
Logical Processor to Socket Map
  Socket 0 logical processors: 0-7
Logical Processor to NUMA Node Map
  NUMA Node 0 logical processors: 0-7
Logical Processor Cache Map
  Processors: 0-1
    Level 1 Data, 32KB, LineSize 64, 8-way associative
  Processors: 0-1
    Level 1 Instruction, 32KB, LineSize 64, 4-way associative
  Processors: 0-1
    Level 2 Unified, 256KB, LineSize 64, 8-way associative
  Processors: 2-3
    Level 1 Data, 32KB, LineSize 64, 8-way associative
  Processors: 2-3
    Level 1 Instruction, 32KB, LineSize 64, 4-way associative
  Processors: 2-3
    Level 2 Unified, 256KB, LineSize 64, 8-way associative
  Processors: 4-5
    Level 1 Data, 32KB, LineSize 64, 8-way associative
  Processors: 4-5
    Level 1 Instruction, 32KB, LineSize 64, 4-way associative
  Processors: 4-5
    Level 2 Unified, 256KB, LineSize 64, 8-way associative
  Processors: 6-7
    Level 1 Data, 32KB, LineSize 64, 8-way associative
  Processors: 6-7
    Level 1 Instruction, 32KB, LineSize 64, 4-way associative
  Processors: 6-7
    Level 2 Unified, 256KB, LineSize 64, 8-way associative
  Processors: 0-7
    Level 3 Unified, 8192KB, LineSize 64, 16-way associative
Logical Processor to Group Map
  Group 0 logical processors: 0-7

I would be interested in seeing the results on systems with more cores or more complex configurations. Comments and questions welcomed.

(Grr - of course after posting this I find a problem - fixed.)

IanH · ‎10-22-2016

Just going from the docs, and given its use, should LTP_PC_SMT be of kind BYTE and/or should the comparison simply be `p_PR%Flags== LTP_PC_SMT`? The current IAND thing doesn't seem quite right.

Hello. My name is Inigo Montoya. You have used an extension to specify an integer literal constant. Prepare to die.

Steven_L_Intel1 · ‎10-22-2016

Yeah, I debated that one. The problem I have is that the MSDN documentation calls the member Flags, suggesting to me that there might eventually be more than just bit 0 defined, so I used IAND to test that one bit. You're right that the kind of the constant should be BYTE.

Extension? You keep using that word. I do not think it means what you think it means. Please elaborate.

That Dolores is a fine figure of a woman. The sysinternals utility I based the output on uses a row of stars, one for each processor. I figured that with MIC that this could get cluttered, so I have it print numbered ranges instead. The code is smart enough to understand discontiguous ranges.

andrew_4619 · ‎10-22-2016

I presume he meant "Win32_Additions.f90(33): warning #7023: Fortran 2008 does not allow radix-specified constants.". Is that film any good BTW?

Steven_L_Intel1 · ‎10-22-2016

Oh, IFWINTY is chock-full of extensions. You won't see that in normal use once these definitions are folded into the product.

I assume you're being rhetorical in asking about one of the greatest movies of all time... It's even better than an MLT!

Just noticed I had redundant declarations in the main program - these are leftovers from an initial implementation before I started using BLOCK. All of the local declarations except for that of ctx should be deleted.

Oh, and my IAND test for LTP_PC_SMT should be /= 0 rather than ==1 since I shouldn't assume that it's bit 0.

IanH · ‎10-22-2016

Yes - it was the # things I was referring to. !DEC$ stuff aside (and they can technically be regarded as comments) that module was conforming apart from the # constants.

If it was a set of flags, I was thinking something like `IAND(p_PR%Flags, LTP_PC_SMT) == LTP_PC_SMT` would be the go. But practically /= 0 is fair enough.

Steven_L_Intel1 · ‎10-23-2016

We tend to use # when the .h does, but -1 would probably be fine here.