To my knowledge the implementation for these latency events is similar on all microarchitectures - they randomly select loads to track.
One simple way to check is to collect MEM_INST_RETIRED.ALL_LOADS_PS and MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4 events at the same time. You should see that MEM_INST_RETIRED.ALL_LOADS_PS will have much lower count.
The reason I am asking this seemingly trivial, pedantic, useless trivia sounding matter is because the Linux kernel assumes the answer to that question is "Yes, the semantics are the same and emulation as stated is correct." And it has code which makes use of this.
The statement " implementation for these latency events is similar on all microarchitectures" is exactly the problem. Which is why I am asking an extremely pedantic question. Your suggestion at a solution is frustrating because it tells me that you really did not look into my question; that approach is literally impossible to do to answer my question.
The "GT" counters are not available on Haswell....hence, any suggestion involving their use is out of the solution. I do not own a skylake based machine...even if I did, whatever the results would be it would tell me that for Skylake specifically, these two counters were or were not counted with random load samples....which would be useful in telling Intel to make your descriptions more precises one way or the other. Whatever the result either the "GT" description would have to change from implying "exact counts" to explicitly stating "random samples", or the retired load latency event would have to change from "random loads" to "exact counts". Based on what I've been told so far, there are no other logical options left.
This is a question for your engineers....after you have found out the answer...could you please for the sake of all that is holy, correct, exact, precise and trustworthy update the bloody manual?