- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we have an Arria 10 PCIe design with MM DMA interface, utilizing hprxm_master (bursting BAR2 rxm) for non-DMA access. Platform designer IP does a good job translating between different clock domains and interface widths for various slave components. There seems to be however a problem with unaligned (crossing 256 bit boundary) hprxm reads from slow slaves. In the appended Signaltap screenshot, I'm performing a 64 bit read at offset 0x1c. hprxm_master is splitting it into two read reads (burstcount = 2). hprxm read is requested at clock 0, first readdatavalid at clock 55, second at clock 100. The large latency is caused by a clock crossing bridge and interface width translation. Unfortunately, hprxm is sending completion message with data already at clock 63 without waiting for the second read, respectively it delivers arbitrary wrong data for the second word. The same unaligned read action performed per DMA gives correct result. Also unaligned read from fast (core clock domain + wide interface) slave, see second signaltap screenshot. Here both reads are occurring without intermediate latency.
I consider the reported hprxm_master behaviour as bug, DMA proves that unaligned read can work correctly with slow slaves. IP version is QPP 22.4 design suite.
Best regards
Frank
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
There is known issue reported for the MCDMA in Quartus v22.4
https://www.intel.com/content/www/us/en/docs/programmable/683821/22-4/known-issues.html
is it sound related to what you are facing now ?
Regards,
Wincent_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
thanks for linking the issue report. I don't see the reported problem directly related. The basic problem is premature read completion of hprxm_master. It's resulting in expected read completion, but partly wrong data content.
I'll try to record some internal hprxm_master data to understand why this happens.
Best regards,
Frank
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for trying to understand why that happen.
If you have any found out, you can share with me if you dont mine.
Regards,
Wincent_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I wish to follow up with you about this case.
Do you have any further questions on this matter ?
Else I would like to have your permission to close this forum ticket
Regards,
Wincent_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the problem is not solved!
My present understanding is that hprxm IP has no provisions to handle unaligned read requests correctly. I'm waiting for a statement from IP design team.
Regards
Frank
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Do you means that you are contacting our internal IP design team ?
Is there anything else that you think that I can help you meanwhile ?
Regards,
Wincent_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
my answer was phrased ambitiously. I didn't yet contact design team, I'd need to make the contact through FAE. I was rather hoping that the discussion is monitored by support and forwarded respectively. If there's a formal procedure to issue a support request, we might choose the option.
Presently we designed around the issue by rearranging a memory map.
Best regards
Frank
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can file an IPS case via the FAE, then you can share the .qar file there for further debug.
If I understand correctly this is the design that you get from the IP catalog right ?
or this is a custom design ?
Regards,
Wincent_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I could reduce the test setup reproducing the error to a 32-bit Avalon MM Slave (onchip memory) connected to hprxm master. An unaligned 64-bit read crossing the 256 bit address boundary gets wrong data. I put the test into PCIe_fundamental demonstration project for TR10a-HL development kit.
onchip_memory_2_1 is the 32-bit slave, onchip_memory_2_2 a 256-bit slave that doesn't show the error.
Regards
Frank
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frank,
Thanks for sharing with me your finding.
Do you test it in Arria 10 device as well ?
Regards,
Wincent_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wincent,
thanks for your continuous interest in the matter.
I'm doing all tests on Arria 10 development board TR10a-HL. The observed effect with my latest demo code is basically the same as documented in SignalTap screenshots appended in initial post. The basic difference is that I stripped of a clock-crossing bridge used in the original design and any application code not related to the issue.
A new signal tap recording is contained in .qar if you want to check.
I'm performing an unaligned 64-bit read which crosses a 256-bit word border of the hprxm master, e.g. reading from address 0x1c. Respectively a burst read is with burst count=2 is generated, reading first 256 bit from 0x00 and second 256 bit from 0x20. hprxm master logic is expected to assemble read data to a single completion TLP with data.
I tested two cases:
- hprxm reads from a 32 bit MM slave, respectively a 256-bit read is performed in 8 consecutive read cycles, unaligned read takes 16 cycles + slave latency.
- hprxm reads from 256 bit MM slave, unaligned read takes only 2 cycles + latency.
I can see that hprxm assembles read data correctly, unfortunately it doesn't wait for burst read completion in case 1, sending arbitrary data from a previous read instead. Case 2 returns correct data, the completion TLP is send-off after burst read has finished.
Looking at internal hprxm read processing, I see the same state sequence triggered in both cases. I didn't yet a recognize a provision to wait for burst read finish before sending completion TLP. Either it's not there or it's not triggered due to wrong assumptions.
Why are we using bursting hprxm interface instead of default 32-bit BAR interface?
- It's necessary to support native memory access of a 64-bit processor.
- We want to implement effective access to 256-bit slaves in DMA and non-DMA mode.
As previously mentioned, we have adjusted application register map to avoid unaligned reads, so it's not an urgent problem any more. Nevertheless I'd appreciate a correction of the seeming hprxm bug.
Best regards
Frank
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frank,
Thanks for sharing with us the solution. the best help I can do on this is to submit an internal ticket for this issue.
Hope the related IP design team will look at this issue.
Therefore following our support policy, I have to put this case in close status.
Hence, This thread will be transitioned to community support.
If you have a new question, feel free to open a new thread to get support from Intel experts.
Otherwise, the community users will continue to help you on this thread. Thank you
If your support experience falls below a 9 out of 10, I kindly request the opportunity to rectify it before concluding our interaction. If the issue cannot be resolved, please inform me of the cause so that I can learn from it and strive to enhance the quality of future service experiences.
Regards,
Wincent_Intel
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page