how do L2_TRANS events differ from L2_RQSTS event, e.g. for STRAM 1 thread into memory, i got following values:
It looks like the "L2_RQSTS" (0x24) events only count transactions that succeed, while "L2_TRANS" (0xF0) events also count attempted transactions that are rejected (and later retried). This interpretation is based on a few words that appear in the descriptions of some of the sub-events for some of the processor families in Chapter 19 of Volume 3 of the Software Developer's Manual, and is supported (for at least a subset of the available sub-events) by my microbenchmark experiments.
Your numbers for Demand Data Read are reasonably consistent with this interpretation, suggesting that ~27% of the Demand Data Read transactions are retried. (I have found that this happens when a Demand Data Read from an L1 Data Cache miss tries to access the L2 tags in the same cycle as an L2 HW prefetch, but there are almost certainly many other possible causes of retries.)
Your numbers for "all requests" are a bit harder to understand. The L2_RQSTS.REFERENCES is about 2.5 times the number of DEMAND_DATA_RD, which does not seem unreasonable. The expected ratio depends on how L1 HW prefetches are counted, how L2 HW prefetches are counted, how streaming stores are counted (if they are used in your STREAM binary), etc. The ratio of L2_TRANS.ALL_REQUESTS to L2_RQSTS.REFERENCES is about 1.88:1, which seems high to me, but it is possible that there are other differences in transactions that are counted by these two events (other than just retries). The documentation is insufficient to conclude much, and it is not clear to me that the counters are actually counting the same low-level transactions from one processor generation to the next. (This could be due to bug fixes in the counter events on newer processors, new bugs in the counter events on newer processors, or changed behavior due to low-level implementation changes in newer processors.)