A number of years ago you provided Fortran code (ported over from C, as I recall) that provided a way to acquire the physical and logical processor count, as well as the physical, page, and virtual memory on a host machine. We have used that successfully (and usefully!) for several years. I have encountered a problem with it when compiled with 188.8.131.52:
The code contains an allocatable variable BUFFER that is typed as T_SYSTEM_LOGICAL_PROCESSOR_INFORMATION, which comes from the ifwinty module. When compiled with 184.108.40.206 and excuted, an attempt to allocate BUFFER generates an 'out of virtual memory' error. Code line 075. The identical code, compiled with 16.03.207 executes without error. Nothing other than the VS setting of which version of the compiler to use is changed.
I tried to find the reference to your code, to refresh my understanding of the details, but I can't find it anywhere.
Here is the routine that is based on your example code:
subroutine SystemInfo( logicalProcessorCount, NumPhysicalProcessors, & TotalPhysicalMemory, AvailablePhysicalMemory, TotalPageableMemory, AvailablePageableMemory, & TotalVirtualMemory, AvailableVirtualMemory, Worked) use, intrinsic :: ISO_C_BINDING !use kernel32_additions use kernel32 use ifwinty, only: T_MEMORYSTATUSEX, DWORDLONG, T_LARGE_INTEGERX implicit none ! Variables procedure(GetLogicalProcessorInformation), pointer :: glpi type(T_SYSTEM_LOGICAL_PROCESSOR_INFORMATION), allocatable :: buffer(:) !dimension(:) :: buffer integer(DWORD) :: returnLength integer :: logicalProcessorCount integer :: numaNodeCount integer :: processorCoreCount integer :: processorCacheCount(3) integer :: processorPackageCount integer(DWORD) :: ret integer :: nlpi, lpi_element_length, i logical(4) :: Worked integer(8) :: TotalPhysicalMemory ,& AvailablePhysicalMemory ,& TotalPageableMemory ,& AvailablePageableMemory ,& TotalVirtualMemory ,& AvailableVirtualMemory integer(4) NumPhysicalProcessors type (T_MEMORYSTATUSEX) :: stat type(T_LARGE_INTEGERX) :: val ! MSDN says that because GetLogicalProcessorInformation is not supported on all versions ! of Windows, it suggests getting the address dynamically. We'll do that here, though ! in reality it should not be necessary. The following statement uses only Fortran standard ! syntax - it would be a bit simpler to use the integer pointer extension, but this makes a ! good example. ! ! The steps here are: ! 1. Call GetModuleHandle to get a handle to the kernel32 DLL which will already be loaded in this image. ! Note that this is not the same as LoadLibrary, which assumes that a DLL is not already loaded. ! 2. Call GetProcAddress to get the address of GetLogicalProcessorInformation ! 3. Use TRANSFER to convert that address to a C_FUNPTR ! 4. Use C_F_PROCPOINTER to convert the C_FUNPTR to a Fortran procedure pointer numaNodeCount = 0 processorCoreCount = 0 logicalProcessorCount = 0 processorPackageCount = 0 Worked = .true. call c_f_procpointer( & transfer( & GetProcAddress( & GetModuleHandle("kernel32"//C_NULL_CHAR), & "GetLogicalProcessorInformation"//C_NULL_CHAR & ), & C_NULL_FUNPTR & ), & glpi) if (.not. associated(glpi)) then print *, "GetLogicalProcessorInformation not supported" Worked = .false. return end if ! We don't know in advance the size of the buffer we need. We'll pick a number, allocate it, ! and see if that's sufficient. If not, we'll use the returned size information and reallocate ! the buffer to the required size. allocate (buffer(100)) !lpi_element_length = C_SIZEOF(buffer(1)) !returnLength = C_SIZEOF(buffer) lpi_element_length = SIZEOF(buffer(1)) returnLength = SIZEOF(buffer) ret = glpi(buffer, returnLength) if (ret == FALSE) then ! Failed if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) then deallocate (buffer) allocate (buffer(returnLength/lpi_element_length)) ret = glpi(buffer, returnLength) if (ret == FALSE) then print *, "GetLogicalProcessorInformation call failed with error code ", GetLastError() Worked = .false. return end if else print *, "GetLogicalProcessorInformation call failed with error code ", GetLastError() Worked = .false. return end if end if ! Now we can iterate through the elements of buffer and see what we can see do i=1, returnLength / lpi_element_length ! Number of elements in buffer select case (buffer(i)%Relationship) case(RelationNumaNode) ! NUMA nodes return one record of this type numaNodeCount = numaNodeCount + 1 case(RelationProcessorCore) processorCoreCount = processorCoreCount + 1 ! A Hyperthreaded core supplies more than one logical processor logicalProcessorCount = logicalProcessorCount + popcnt(buffer(i)%processorMask) case(RelationCache) ! One cache descriptor for each cache if (buffer(i)%Cache%Level > 0 .and. buffer(i)%Cache%Level <= 3) then processorCacheCount(buffer(i)%Cache%Level) = processorCacheCount(buffer(i)%Cache%Level) + 1 else print *, "Invalid processor cache level ", buffer(i)%Cache%Level end if case(RelationProcessorPackage) !Logical processors share a physical package (socket) processorPackageCount = processorPackageCount + 1 case default print *, "Unrecognized relationship code ", buffer(i)%Relationship end select end do NumPhysicalProcessors = processorCoreCount !%%%%%%%%% now get available memory ! Set the length field in structure stat stat%dwlength = sizeof(stat) ! Get the statistics and convert as necessary ret = GlobalMemoryStatusEx(stat) TotalPhysicalMemory = transfer(stat%ullTotalPhys, 0_DWORDLONG) AvailablePhysicalMemory = transfer(stat%ullAvailPhys, 0_DWORDLONG) TotalPageableMemory = transfer(stat%ullTotalPageFile, 0_DWORDLONG) AvailablePageableMemory = transfer(stat%ullAvailPageFile, 0_DWORDLONG) TotalVirtualMemory = transfer(stat%ullTotalVirtual, 0_DWORDLONG) AvailableVirtualMemory = transfer(stat%ullAvailVirtual, 0_DWORDLONG) end subroutine SystemInfo
The problem you have is if you code will run on a system with more than 64 logical processors. In that situation, you will need to call GetLogicalProcessorInformationEx and pay attention to the Processor Group.
I suggest you modify your code to first attempt to locate GetLogicalProcessorInformationEx, use that if found, failing that, then try to locate GetLogicalProcessorInformation, use that if found, else see if you can get the information in a different way (e.g. using CPUID and CPUIDEX) rather than bomb out.
Have you run this on a Windows host with more than 64 logical processors?
The non Ex version (on system with more than 64 logical processors) of GetLogicalProcessorInformation would either
a) return the information for the issuing thread's processor Group
b) return invalid/conflicting 64-bit affinity bitmasks for unknown processor Group
Whereas the Ex version informs you of the group number (and the 64-bit affinity mask within that group).
No, I don't have access to such a system. That function was new with Windows 7, but I suppose at this point it's reasonable to use it. I'll look into revising the sample to use the Ex version.
Ok, this was both fun, exasperating and more work than I had envisioned...
Attached is a complete rewrite of the ProcessorInfo example using GetLogicalProcessorInformationEx. I discovered that many of the required types and constants were not defined in the Intel Fortran-supplied modules, and that the declaration of GetLogicalProcessorInformationEx that was there was incorrect. I also had more fun with using the C interoperability features to do what had previously needed extensions.
So, without further ado, I attach the new example. It has three files - GLPI_mod.f90 is a module with helper routines, ProcessorInfo is the main program, and Win32_Additions is a module that adds/corrects API declarations. This latter will be fed back into the product - at the point where that happens, Win32_Additions will need to be dropped and minor edits made to GLPI_mod.f90 (see comments there.)
When I run this on my 4C8T system I get:
Requesting GLPI buffer of 1016 bytes Logical to Physical Processor Map Core 0 logical processors: 0-1 (Hyperthreaded) Core 1 logical processors: 2-3 (Hyperthreaded) Core 2 logical processors: 4-5 (Hyperthreaded) Core 3 logical processors: 6-7 (Hyperthreaded) Logical Processor to Socket Map Socket 0 logical processors: 0-7 Logical Processor to NUMA Node Map NUMA Node 0 logical processors: 0-7 Logical Processor Cache Map Processors: 0-1 Level 1 Data, 32KB, LineSize 64, 8-way associative Processors: 0-1 Level 1 Instruction, 32KB, LineSize 64, 4-way associative Processors: 0-1 Level 2 Unified, 256KB, LineSize 64, 8-way associative Processors: 2-3 Level 1 Data, 32KB, LineSize 64, 8-way associative Processors: 2-3 Level 1 Instruction, 32KB, LineSize 64, 4-way associative Processors: 2-3 Level 2 Unified, 256KB, LineSize 64, 8-way associative Processors: 4-5 Level 1 Data, 32KB, LineSize 64, 8-way associative Processors: 4-5 Level 1 Instruction, 32KB, LineSize 64, 4-way associative Processors: 4-5 Level 2 Unified, 256KB, LineSize 64, 8-way associative Processors: 6-7 Level 1 Data, 32KB, LineSize 64, 8-way associative Processors: 6-7 Level 1 Instruction, 32KB, LineSize 64, 4-way associative Processors: 6-7 Level 2 Unified, 256KB, LineSize 64, 8-way associative Processors: 0-7 Level 3 Unified, 8192KB, LineSize 64, 16-way associative Logical Processor to Group Map Group 0 logical processors: 0-7
I would be interested in seeing the results on systems with more cores or more complex configurations. Comments and questions welcomed.
(Grr - of course after posting this I find a problem - fixed.)
Just going from the docs, and given its use, should LTP_PC_SMT be of kind BYTE and/or should the comparison simply be `p_PR%Flags== LTP_PC_SMT`? The current IAND thing doesn't seem quite right.
Hello. My name is Inigo Montoya. You have used an extension to specify an integer literal constant. Prepare to die.
Yeah, I debated that one. The problem I have is that the MSDN documentation calls the member Flags, suggesting to me that there might eventually be more than just bit 0 defined, so I used IAND to test that one bit. You're right that the kind of the constant should be BYTE.
Extension? You keep using that word. I do not think it means what you think it means. Please elaborate.
That Dolores is a fine figure of a woman. The sysinternals utility I based the output on uses a row of stars, one for each processor. I figured that with MIC that this could get cluttered, so I have it print numbered ranges instead. The code is smart enough to understand discontiguous ranges.
Oh, IFWINTY is chock-full of extensions. You won't see that in normal use once these definitions are folded into the product.
I assume you're being rhetorical in asking about one of the greatest movies of all time... It's even better than an MLT!
Just noticed I had redundant declarations in the main program - these are leftovers from an initial implementation before I started using BLOCK. All of the local declarations except for that of ctx should be deleted.
Oh, and my IAND test for LTP_PC_SMT should be /= 0 rather than ==1 since I shouldn't assume that it's bit 0.
Yes - it was the # things I was referring to. !DEC$ stuff aside (and they can technically be regarded as comments) that module was conforming apart from the # constants.
If it was a set of flags, I was thinking something like `IAND(p_PR%Flags, LTP_PC_SMT) == LTP_PC_SMT` would be the go. But practically /= 0 is fair enough.