Solved: Issue with Simulation Stalling in Intel P-Tile Streaming PCIe Gen4 x8 Example Design

thanavignesh · ‎09-25-2024

Hi everyone,

I generated the Intel P-tile Streaming IP example design using different Quartus versions (23.3, 24.1, and 23.2), selecting a Stratix board that supports P-tile. In the IP catalog, I chose the P-Tile Streaming PCIe IP. For configuration, I set the PCIe to Gen 4 1x8 256 interface, leaving the other options at their default settings. After generating the example design, I simulated it in QuestaSim version 24.1, following the steps: do msim_setup.tcl -> ld_debug -> run -all. The simulation started, but at some point, it got stuck.

After approximately 1.5 hours in simulation time, I noticed the following info message: "RP USER AVMM DRIVER: begin RP Configuration."

For debugging, I added some prints and found that the simulation is stuck waiting for the wait_request signal to deassert from the Root Port BFM. I'm not sure how to proceed with debugging from here.

Any advice on how to resolve this issue?

Wincent_Altera · ‎10-02-2024

Hi,

I am Wincent, Application Engineer from Altera.

We sincerely apologize for the inconvenience caused by the delay in addressing your Forum queries.

Due to an unexpected back-end issue in our system, your Forum case, did not reach us as intended.

May I know which Questasim version that you are using?

There is an known issue we try to fix for the P-tile simulation tools.

To work around this problem, use Siemens* Questa Sim-64 2022.2. Starting in the Intel® Quartus® Prime Software version 23.3, solve this issue by adding this command " set USER_DEFINED_ELAB_OPTIONS "-voptargs=\"-noprotectopt\" before running simulation in the Siemens* Questa Sim.

Detail you may refer to https://www.intel.com/content/www/us/en/support/programmable/articles/000092901.html

Hope that is able to help you to move forward,

Regards,

Wincent_Intel

View solution in original post

Wincent_Altera · ‎10-02-2024

Hi,

I am Wincent, Application Engineer from Altera.

We sincerely apologize for the inconvenience caused by the delay in addressing your Forum queries.

Due to an unexpected back-end issue in our system, your Forum case, did not reach us as intended.

May I know which Questasim version that you are using?

There is an known issue we try to fix for the P-tile simulation tools.

To work around this problem, use Siemens* Questa Sim-64 2022.2. Starting in the Intel® Quartus® Prime Software version 23.3, solve this issue by adding this command " set USER_DEFINED_ELAB_OPTIONS "-voptargs=\"-noprotectopt\" before running simulation in the Siemens* Questa Sim.

Detail you may refer to https://www.intel.com/content/www/us/en/support/programmable/articles/000092901.html

Hope that is able to help you to move forward,

Regards,

Wincent_Intel

thanavignesh · ‎10-06-2024

Hi Wincent,

Thank you for your prompt response and for providing the workaround. I appreciate your help.

I am currently using Questasim version 22.4. I applied the suggested solution with the command set USER_DEFINED_ELAB_OPTIONS "-voptargs=-noprotectopt" in Questsim 22.1, and the simulation completed successfully.

Thanks also for sharing the reference link

Wincent_Altera · ‎10-06-2024

Hi Thanavignesh,

Glad that my suggestion is able to help to resolved the problem you facing.
Therefore, I will close this cases, the rest will transitioned to community support.
If you have any new question, feel free to file an new forum support, we will try our best to assist

If your support experience falls below a 9 out of 10, I kindly request the opportunity to rectify it before concluding our interaction. If the issue cannot be resolved, please inform me via this forum page of the cause so that I can learn from it and strive to enhance the quality of future service experiences.

Wincent_Intel

p/s: If any answer from the community or Intel Support is helpful, please feel free to give the best answer or rate 9/10 survey.

thanavignesh · ‎12-05-2024

Hi Wincent,
I simulated the example design generated in Quartus 23.3 using Questa Intel FPGA version 23.2. It took approximately 4.5 hours to reach the first log info:
INFO: 126725 ns RP User Avmm Driver: begin RP Configuration.
The simulation completed successfully after about 9 hours, with the final log :
INFO: 242847 ns PIO ED MWr/Mrd Completed.
SUCCESS: Simulation stopped due to successful completion.

Is there any way to reduce the overall simulation time?
What processes are running in the background before the 126725 ns mark?
Is there any documentation available that explains the activities happening during this period?

Wincent_Altera · ‎12-05-2024

Hi,

Is there any way to reduce the overall simulation time?
- For F-tile we do have a FastSIM mode https://www.intel.com/content/www/us/en/docs/programmable/683140/22-4-8-0-0/fastsim-mode-support.html . but the features is not available to P-tile
- Did you add all signal to the simulation model ?
- If you need a faster simulation completion time, perhaps you can add few only signal that you need to monitor for example pin_perst, ltssm, gen speed/width and other pcie related signal.
What processes are running in the background before the 126725 ns mark?
- I think the background process is written in the scripts of simulation. Other than that we do not have specific indicator to shows that.
Is there any documentation available that explains the activities happening during this period?
- Is there any specific reason you need to know what happen ?

Regards,
Wincent

thanavignesh · ‎12-06-2024

Hi wincent,

1.Is there any specific reason you need to know what happen ?

Yes Understanding from Link Training and Process Flow
Can you explain how the process starts for link training in the code and how and where the subsequent tasks are called in the BFM for both the Root Port (RP) and Endpoint (EP)? I’d like to understand how the process begins and progresses

2.When using the option set USER_DEFINED_ELAB_OPTIONS "-voptargs=\"-noprotectopt\" in msim_setup.tcl:

If I run ld_debug and run -all, it takes 4.5 hours to reach the log message: INFO: 126725 ns RP User Avmm Driver: begin RP Configuration, and the simulation finishes after 4.5 hours.
If I use ld and run -all, it takes 40 minutes to reach the same message INFO: 126725 ns RP User Avmm Driver: begin RP Configuration, but the simulation seems to get stuck afterward with no further logs waiting for the wait_request signal to deassert from the Root Port BFM..

What could be causing these differences, and how can I ensure smooth simulation without getting stuck?

3.Warning About Optimizations
When running msim_setup.tcl and ld_debug I get this warning:

Warning: (vopt-10587) Some optimizations are turned off because the +acc switch is in effect. This will cause your simulation to run slowly. Please use -access/-debug to maintain needed visibility.

How can I avoid this warning while maintaining necessary visibility without impacting simulation performance? Could this warning be related to the delays I’m experiencing?

4. Reset_status_n Signal Delay
In the simulation waveform ,One observation is the p0_reset_status_n signal remains low for approximately 100,000 ns, which seems to delay progress. Are there any adjustments or optimizations to reduce this reset assertion time? Could this be the case delay I’m experiencing?

Wincent_Altera · ‎12-06-2024

Hi

Can you explain how the process starts for link training in the code and how and where the subsequent tasks are called in the BFM for both the Root Port (RP) and Endpoint (EP)? I’d like to understand how the process begins and progresses.
>> The exact flow does not stated clear in user guide, But I try to explain as much as I could referring to my understanding toward PCIe system. It might not be exactly same as P-tile PCIe , but I believe the fundamental would be the same. Those are for your reference purpose, please correct me if you think I am wrong.
>> The link training process begins with the power-up and reset of both the Root Port and the Endpoint. This is managed by the power management and reset control logic of the system. Once powered up and out of reset, the Root Port initiates the link training process by sending a series of training sequences, known as TS1 and TS2, to the Endpoint. These sequences are used to negotiate the link parameters, such as link width and speed.
>> The Root Port then enters a polling state, waiting for the Endpoint to respond with its own training sequences. During this phase, the Root Port and Endpoint exchange information to agree on the link parameters. Once both sides have agreed on the parameters, they transition to the "Link Up" state, indicating that the link is trained and ready for data transfer.
>> On the Endpoint side, after power-up and reset, it waits to receive the training sequences from the Root Port. Upon receiving these sequences, the Endpoint responds with its own training sequences. This exchange continues until both the Root Port and Endpoint agree on the link parameters. Once the negotiation is complete, the Endpoint also transitions to the "Link Up" state.
>>In a BFM, this process is typically modeled using a series of tasks and functions that simulate the behavior of the Root Port and Endpoint. For the Root Port, an initialization task handles the power-up and reset sequence. This is followed by a link training task that sends the training sequences to the Endpoint and waits for a response. Once the link training is complete, a link up task is called to indicate that the link is ready for data transfer.
>> Similarly, for the Endpoint, an initialization task handles the power-up and reset sequence. A link training response task is then called to respond to the training sequences sent by the Root Port. Once the negotiation is complete, a link up task is called to indicate that the link is ready for data transfer. The sequence of calls in the BFM typically starts with the Root Port initialization, followed by the link training and link up tasks. The Endpoint follows a similar sequence, starting with initialization, followed by the link training response and link up tasks. This ensures that both the Root Port and Endpoint are synchronized and the link is properly trained and ready for data transfer

What could be causing these differences, and how can I ensure smooth simulation without getting stuck?
>> Following our user guide https://www.intel.com/content/www/us/en/docs/programmable/683038/24-1-9-1-1/running-simulations-using-questasim.html

>> we are suggested to use ld_debug, using "ld" only could potentially cause malfunction.

How can I avoid this warning while maintaining necessary visibility without impacting simulation performance? Could this warning be related to the delays I’m experiencing?
>> aside of the warning, do you see any error code raise up ?
>> those warning can safely ignored. However I suggest you to try out Quartus v24.3 , I believe we have better optimization on it. let me know if you still seeing the same issue

In the simulation waveform ,One observation is the p0_reset_status_n signal remains low for approximately 100,000 ns, which seems to delay progress. Are there any adjustments or optimizations to reduce this reset assertion time? Could this be the case delay I’m experiencing?
>> I do not experience the same before, perhaps you can check your .v file code and see if you can toggle the p0_reset_status_n signal assertion or not.
>>Check the logic that controls the p0_reset_status_n signal. Ensure that the conditions for deasserting the reset are met as soon as possible. Look for any unnecessary delays or conditions that might be keeping the reset asserted longer than needed. Ensure that the reset signal is properly synchronized with the clock domain it is associated with. Improper synchronization can cause metastability issues, leading to longer reset times.

Regards,
Wincent

thanavignesh · ‎12-08-2024

Hi wincent,

1.In the provided snippet from the Endpoint, the ref_clk0/1 and coreclkout_hip signals appear to be functioning correctly, and the pin_perst_n along with other reset signals asserted as expected during initialization. However, the p0_reset_status_n signal, an output of the PCIe Hard IP core, remains deasserted, and we lack visibility into the internal logic driving this signal. Are there alternative methods to control or debug the behavior of the reset_status signal?

2.Additionally, what is the typical simulation runtime for the P-Tile example design configured for 4x8 operation with default settings when using Questa 23.4 or other supported versions?

Regards,
Thanavignesh

Wincent_Altera · ‎12-08-2024

Hi,

1.In the provided snippet from the Endpoint, the ref_clk0/1 and coreclkout_hip signals appear to be functioning correctly, and the pin_perst_n along with other reset signals asserted as expected during initialization. However, the p0_reset_status_n signal, an output of the PCIe Hard IP core, remains deasserted, and we lack visibility into the internal logic driving this signal. Are there alternative methods to control or debug the behavior of the reset_status signal?
>> Reset_Status_N is active-low signal is held low until pin_perst_n has been deasserted and the PCIe* Hard IP has come out of reset. This signal is synchronous to coreclkout_hip. When port bifurcation is used, there is one such signal for each Avalon® -ST interface. The signals are differentiated by the prefixes pn. This is a per-port signal.
>> May I know if you hit the timing clean ? timing fail could potential cause the signal to be fail.
>> Just double confirm, it is example design generated from the ip catalog right ? Could you please share with me your .ip file ? so that I can try to replicate this on my place.

2.Additionally, what is the typical simulation runtime for the P-Tile example design configured for 4x8 operation with default settings when using Questa 23.4 or other supported versions?
>> If you grab all signal , the simulation time will be around 3-6 hours.
>> if certain signal only, it will be less than 1-2 hours.

Regards,
Wincent

thanavignesh · ‎12-09-2024

Hi Wincent,

With the default settings and only the change to Gen 4x8 configuration, it takes around 8 hours to complete without any signals logged.

To debug this further, could we arrange a call to discuss the issue in detail? I can share the .ip file and any other necessary files during the call for better clarity. Please let me know a convenient time, and I’ll make the arrangements.

Regards,

Thanavignesh

thanavignesh · ‎12-10-2024

Hi Wincent,

Are there any specific system requirements, such as having 100GB of RAM, to achieve faster simulation performance for the P-Tile example design in Questa Sim?

Wincent_Altera · ‎12-10-2024

Hi,

With the default settings and only the change to Gen 4x8 configuration, it takes around 8 hours to complete without any signals logged.
>> did you add any signal before ld_debug ?
>> means when the simulation success, there is no any signal triggering ?
>> did you set any trigger condition on your testbench ?

Are there any specific system requirements, such as having 100GB of RAM, to achieve faster simulation performance for the P-Tile example design in Questa Sim?
>> The HIP design is huge, that why it took sometime to run it. P-tile simulation can be expected to run for a couple hours.
>> is there any specific signal you need to monitor ? IF yes , I suggest to try a custom simple design specific for the particular signal. that could make the simulation faster.
>> May I know your current RAM ? we do not have specific requirement for faster simulation performance for P-tile.
>> Could you try to use VCS as the simulator ? which needs less time than the Modelsim/Questasim

Regards,
Wincent

thanavignesh · ‎12-10-2024

Hi wincent,

->Were any signals added before ld_debug?
No, signals were added only after ld_debug and before run -all in the waveform window.

->When the simulation succeeds, are any signals triggered? Did you set any trigger conditions in the testbench?
No, there are no signals triggered. No, the testbench remains unchanged and is the default one generated with the P-TILe example design BFM model only configured as 4x8.

->Is there a specific signal you need to monitor?
No specific signals are being monitored . The goal is to run the simulation faster to integrate and verify our user logic along with the BFM.

->Could VCS be used as an alternative simulator for faster simulation?
Unfortunately, only Aldec and Questa Sim are available, and both exhibit similar simulation times.

1.Is there a way to log signals into the .wlf file without adding/logging them to the waveform window? Is there anything that need to change in script?

2.We suspect that the p0_reset_status_n signal is delayed in our environment, as it is asserted only after 100,000 ns. Could you please check in your environment when this signal is asserted for both the Root Port (RP) and Endpoint (EP) and when was the link training initiated ?

Additionally, please attach a snapshot showing the assertion timing.
Note: The P-Tile example design was generated with the configuration modified to Gen 4x8, while all other settings were left unchanged which was generated in Quartus v23.3.

Regards
Thanavignesh

Wincent_Altera · ‎12-10-2024

Hi,

No specific signals are being monitored . The goal is to run the simulation faster to integrate and verify our user logic along with the BFM.
>> temporary the simulation time up to hours is expected. Due to large IP configuration.
>> I do apology if there is any inconvenience causing to you if have.
>> The best thing I could help is to address this to our design team, hope they can find a way to optimize this better in future release of software.

2.We suspect that the p0_reset_status_n signal is delayed in our environment, as it is asserted only after 100,000 ns. Could you please check in your environment when this signal is asserted for both the Root Port (RP) and Endpoint (EP) and when was the link training initiated ?
>> May I know which device OPN that you are using ?
>> try to run the same so that we have apple to apple comparison.

Regards,

Wincent

thanavignesh · ‎12-10-2024

Hi wincent,

Could you clarify the exact simulation time in hours when running in VCS ?

The device we are using is a Stratix 10, specifically 1SD110PJ2F43E2VG. We have also tested on other boards supporting P-Tile, and the results show similar simulation times in quest sim . Could you share your experimental results for comparison in questa sim?

Regards,
Thana Vignesh

Wincent_Altera · ‎12-10-2024

Hi Thana,

Please allow me to have sometime to try out the simulation, update you the time once I have it finish.
Again, I would like to emphasize that simulation time up to hours is expected. Due to large IP configuration.

Regards
Wincent

Wincent_Altera · ‎12-16-2024

Hi Thana,

Sorry for late, I was occupied with some other important agenda.
I am able to run and capture the signal via Questa Sim 2023.4 version based on your device#
I forget to record the time while leaving it run overtime, it shall be less than 6-8 hours

Regards,
Wincent

thanavignesh · ‎12-23-2024

Hi Winicent,

After configuring the P-Tile as a Native Endpoint IP and providing all the necessary settings in the GUI, if we program it onto the Stratix 10 board, will the enumeration take place? How can we verify if the enumeration was successful? We are using a Stratix 10 board with Quartus version 20.3. Which signals, such as ninit_done, ref_clk[1:0], and reset, need to be connected for the P-Tile IP? Is there a user guide that explicitly outlines these requirements?

Regards

Thanavignesh

Wincent_Altera · ‎10-05-2024

Hi,

We do apologize for the previous system down issue causing your question been delay in addressing.

Hope my previous response is not too late in order to solve your queries.

I wish to follow up with you about this case. Do you have any further questions on this matter ?

Else I would like to have your permission to close this forum ticket.

Regards,

Wincent_Intel

Issue with Simulation Stalling in Intel P-Tile Streaming PCIe Gen4 x8 Example Design

Interface Protocol - PCie (Avalon-MM|Avalon-ST)