I perform a customized analysis of my program by using vtune. I found there are many events about the L3 miss, including:
Sometimes the MEM_LOAD_RETIRED.L3_MISS_PS equals zero but the L3_MISS_RETIRED.LOCAL_DRAM_PS not. What's the difference between these events? How should I measure the L3 data cache miss using these events?
Judging by the event names, I think you're profiling on a processor with one of the Skylake microarchitectures family.
It should be clear now that these two are not necessarily equal and any of the two is not necessarily larger or smaller than the other.
In addition, VTune, by default, profiles many events with event multiplexing. If these two events happen to be counted in different groups of events, then their counts are for different periods of the program execution and may not comparable.
In particular, the zero value in VTune means that the count is lower than the threshold for triggering an interrupt.
For the Intel SKX platform, the https://download.01.org/perfmon/SKX/skylakex_core_v1.24.json says that the default threshold used for both of these events is 100007, so any count lower than that will result in zero samples.
The actual threshold used for these tests may depend on the version of VTune used and on the specific collection type requested.