1) As this comment is in the section about ilp64 (64-bit integer) support, and follows the statement saying there is no f90 USE file for this, I have to assume it means there are no include files for long int.
2) The default threshold isn't documented, but presumably, by default, messages passed by shm within the originating node, which exceed some size threshold, use nontemporal store so that they do not evict all or most of the data in cache. It's easy to imagine situations where you might want a message to reside in the destination cache for immediate use, or where you might want nontemporal to apply to smaller messages than the default threshold. You would have to set up some baseline performance case to evaluate whether changes from the defaults are useful for your application.
3) I can't add to public descriptions of this feature. I haven't seen it used.
4) As far as I can tell, allcores is meant to facilitate use of 1 logical processor per core (when HT is enabled), while "all" makes all the logical processors available. allsocks may be intended to help distribute a smaller number of MPI processes across multiple sockets/packages. I agree that the description ought to be clarified.