This sounds like something that an engineer from the processor design team would know about.
From what little I know (or claim to know), I would think that once a memory request hits the bus, you would need to be able to look into the request "ticket" in order to determine where the results need to be sent (and what core initiated the request). Getting this information seems to need something built into the architecture and I wouldn't think that the needed hardware is available to general users (if it even exists).
If you're using VTune Performance Analyzer, you can break things down by CPU (core). Even so, all of this is not going to be able to pick out individual memory accesses. It sounds like you're looking for something more real-time, too. To what purpose do you need such information? Maybe there is a way to get what you are looking for.
If anyone knows better than me (and I'm sure there are thousands out there), please feel to refute or correct me.