- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I have a problem to monitor QAT card PCIe traffic with PCM. While two cards produce ~5800 MB/s read traffic for memory and ~3100 MB/s write traffic for memory from PCIe. Numbers reported by ./pcm-pcie.x are not even close. ~100 MB/s are reported for reads and for writes.
Could you clarify possible reason?
QAT - https://01.org/packet-processing/intel%C2%AE-quickassist-technology-drivers-and-patches
Environment and tools output
Fedora release 16 (Verne)
Linux intel45 3.1.0-7.fc16.x86_64 #1 SMP Tue Nov 1 21:10:48 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
CPU Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Two QAT cards. Intel Quick Assist Adapter 8950. They are both pluged into socket 1 (node_id=1) root complex.
Intel® QuickAssist Technology Driver (L.2.2.0-30), QAT 1.6
OUTPUT
./pcm-pcie.x 5
Skt | PCIeRdCur | PCIeNSRd | PCIeWiLF | PCIeItoM | PCIeNSWr | PCIeNSWrF
0 130 K 0 0 0 0 0
1 465 M 0 0 247 M 0 0
-----------------------------------------------------------------------------------
* 465 M 0 0 495 M 0 0
./pcm-memory.x OUTPUT
---------------------------------------||---------------------------------------
-- Socket 0 --||-- Socket 1 --
---------------------------------------||---------------------------------------
---------------------------------------||---------------------------------------
---------------------------------------||---------------------------------------
-- Memory Performance Monitoring --||-- Memory Performance Monitoring --
---------------------------------------||---------------------------------------
-- Mem Ch 0: Reads (MB/s): 736.92 --||-- Mem Ch 0: Reads (MB/s): 946.24 --
-- Writes(MB/s): 407.56 --||-- Writes(MB/s): 466.91 --
-- Mem Ch 1: Reads (MB/s): 723.93 --||-- Mem Ch 1: Reads (MB/s): 924.22 --
-- Writes(MB/s): 393.20 --||-- Writes(MB/s): 411.42 --
-- Mem Ch 2: Reads (MB/s): 722.76 --||-- Mem Ch 2: Reads (MB/s): 948.64 --
-- Writes(MB/s): 411.03 --||-- Writes(MB/s): 468.38 --
-- Mem Ch 3: Reads (MB/s): 725.39 --||-- Mem Ch 3: Reads (MB/s): 950.83 --
-- Writes(MB/s): 406.36 --||-- Writes(MB/s): 461.23 --
-- NODE0 Mem Read (MB/s): 2909.00 --||-- NODE1 Mem Read (MB/s): 3769.93 --
-- NODE0 Mem Write (MB/s): 1618.14 --||-- NODE1 Mem Write (MB/s): 1807.94 --
-- NODE0 P. Write (T/s) : 140529 --||-- NODE1 P. Write (T/s): 143146 --
-- NODE0 Memory (MB/s): 4527.14 --||-- NODE1 Memory (MB/s): 5577.87 --
---------------------------------------||---------------------------------------
-- System Read Throughput(MB/s): 6678.93 --
-- System Write Throughput(MB/s): 3426.08 --
-- System Memory Throughput(MB/s): 10105.01 --
---------------------------------------||---------------------------------------
QAT compression benchmark OUTPUT
---------------------------------------
API Data_Plane
Session State STATELESS
Algorithm DEFLATE
Huffman Type STATIC
Mode ASYNCHRONOUS
Direction COMPRESS
Packet Size 8192
Compression Level 1
Corpus CALGARY_CORPUS
Number of threads 24
Total Responses 3801600
Total Retries 122831954
Clock Cycles Start 2521268357176456
Clock Cycles End 2521279034160452
Total Cycles 10676983996
CPU Frequency(kHz) 1995869
Throughput(Mbps) 46577
Compression Ratio 45.2%
---------------------------------------
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alex,
The pcm-pcie stats looks fine to me. A few key points
- You're sampling in 5 second interval, so to calculate MB/second, you need to divide the event counts by 5.
- The event count is in number of cache line (64 bytes), so to estimate bandwidth, you need to then multiply the count by 64 or use the -B flag.
Knowing above two rules, we can asset the I/O bandwidth.
$ = cache line
PCIeItoM (Inbound allocating write): 247M$ / 5 seconds * 64 Bytes / $ = 3161.6MB/s
PCIeRdCur (outbound read): 465M$ / 5 seconds * 64 Bytes / $ = 5952MB/s
which correlates with your statement quite closely.
Best Regards,
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, my bad!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're welcome. No problem :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I tryed to run ./pcm-pcie.x -B and seems write sum for sockets doesn't match. If I am not doing some silly mistake again.
Detected Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz "Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown"
Update every 5 seconds
delay_ms: 417
Skt | PCIeRdCur | PCIeNSRd | PCIeWiLF | PCIeItoM | PCIeNSWr | PCIeNSWrF | PCIe Rd (B) | PCIe Wr (B)
0 130 K 0 0 0 0 0 8358 K 0
1 465 M 0 0 247 M 0 0 29 G 15 G
----------------------------------------------------------------------------------------------------------------
* 465 M 0 0 495 M 0 0 29 G 31 G
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for reporting your issue. I double checked the code and we have a typo in one of the event aggregations which cause the double count in the total sum. We will roll out another version with fix soon, but if you want to patch this one line code manually, here is the diff
diff --git a/pcm-pcie.cpp b/pcm-pcie.cpp index cdd9847..d541b88 100644 --- a/pcm-pcie.cpp +++ b/pcm-pcie.cpp @@ -838,7 +838,7 @@ void getPCIeEvents(PCM *m, PCM::PCIeEventCode opcode, uint32 delay_ms, sample_t sample.total.PCIeNSWr += (sizeof(PCIeEvents_t)/sizeof(uint64)) * getNumberOfEvents(before, after); sample.miss.PCIeNSWr += (sizeof(PCIeEvents_t)/sizeof(uint64)) * getNumberOfEvents(before2, after2); sample.hit.PCIeNSWr += (sample.total.PCIeNSWr > sample.miss.PCIeNSWr) ? sample.total.PCIeNSWr - sample.miss.PCIeNSWr : 0; - aggregate_sample.PCIeItoM += sample.total.PCIeItoM; + aggregate_sample.PCIeNSWr += sample.total.PCIeNSWr; break; case PCM::PCIeNSWrF: sample.total.PCIeNSWrF += (sizeof(PCIeEvents_t)/sizeof(uint64)) * getNumberOfEvents(before, after);
Thanks again for reporting the bug!
Sincerely,
Patrick
Alexander Alexeev wrote:
Hi, I tryed to run ./pcm-pcie.x -B and seems write sum for sockets doesn't match. If I am not doing some silly mistake again.
Detected Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz "Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown"
Update every 5 seconds
delay_ms: 417
Skt | PCIeRdCur | PCIeNSRd | PCIeWiLF | PCIeItoM | PCIeNSWr | PCIeNSWrF | PCIe Rd (B) | PCIe Wr (B)
0 130 K 0 0 0 0 0 8358 K 0
1 465 M 0 0 247 M 0 0 29 G 15 G
----------------------------------------------------------------------------------------------------------------
* 465 M 0 0 495 M 0 0 29 G 31 G

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page