We're doing regression testing on a build server and we need to program the FPGA using quartus_pgm.
However, we find that quartus_pgm is flaky.
Is there a chance that the Quartus developers could have a look at this to see where the problem might be?
There's plenty of information in the stack trace, it should be possible to determine what's going on.
I'm inclined to believe that quartus_pgm is missing a check on something that is failing. An intelligeble and actionable error message would be a huge improvement.
This is with Ubuntu 18.04 and Quartus 19.3
*** Fatal Error: Segment Violation at 0x4000 Module: quartus_pgm Stack Trace: 0xf8769: FBGEN_FRAME::get_total_instr_size() + 0x29 (pgm_fbgen) 0xf931c: FBGEN_FRAME::get_frame_size() + 0x2c (pgm_fbgen) 0xf17a4: FBGEN_DBLOCK::get_block_size() + 0x24 (pgm_fbgen) 0x42110b: PGMIO_FBGEN_PROXY::transfer_to_element() + 0x7b (pgm_pgmio) 0x427df9: PGMIO_FBGEN_PROXY::create_bitsteam() + 0x329 (pgm_pgmio) 0x2c2cc8: PGMIO_F2P::create_bitstream(PGM_CHAIN_ELEMENT*, std::vector<std::string, std::allocator<std::string> >*, PGMIO_CCF*) + 0x148 (pgm_pgmio) 0x284c14: PGM_CHAIN_ELEMENT::generate_bv_list(bool) + 0x104 (pgm_pgmio) 0x28868d: PGM_CHAIN_ELEMENT::create_chain_element(PGM_CHAIN_ELEMENT*, bool, FIO_PATH*, bool, PGMIO_CONFIG_SCHEME, bool, bool) + 0xd2d (pgm_pgmio) 0x232d9: PGME_PROGRAMMER::lookup_device(PGM_CHAIN_ELEMENT*, PGMIO_CONFIG_SCHEME, bool, bool, bool) + 0x29 (pgm_pgme) 0x2175d: QPGM_FRAMEWORK::create_element(std::string, std::string, unsigned int, unsigned int) + 0x601 (quartus_pgm) 0x23b91: QPGM_FRAMEWORK::process_operation(std::string*) + 0x1e93 (quartus_pgm) 0x24cde: QPGM_FRAMEWORK::post_check_arguments() + 0x2d6 (quartus_pgm) 0x1c08f: qexe_standard_main(QEXE_FRAMEWORK*, QEXE_OPTION_DEFINITION const**, int, char const**) + 0x1bc (comp_qexe) 0x1fd97: qpgm_main(int, char const**) + 0x5e (quartus_pgm) 0x40720: msg_main_thread(void*) + 0x10 (ccl_msg) 0x602c: thr_final_wrapper + 0xc (ccl_thr) 0x407df: msg_thread_wrapper(void* (*)(void*), void*) + 0x62 (ccl_msg) 0xa559: mem_thread_wrapper(void* (*)(void*), void*) + 0x99 (ccl_mem) 0x8f92: err_thread_wrapper(void* (*)(void*), void*) + 0x27 (ccl_err) 0x63f2: thr_thread_wrapper + 0x15 (ccl_thr) 0x427e2: msg_exe_main(int, char const**, int (*)(int, char const**)) + 0xa3 (ccl_msg) 0x1fe21: main + 0x26 (quartus_pgm) 0x270b3: __libc_start_main + 0xf3 (c.so.6)
You already have the stack trace, you should be able to inspect the source code to find the missing error check/message.
Try running nios2-configure-sof with the cable detached, see if that reproduces the problem.
If the cable is detach then I don't expect it to work. I would like to confirm if the setup is correct and it can be easily duplicate. The reason is that without the duplication we are not sure what is actually happening even with the Internal Error information.
The bug here isn't that it doesn't work, but that it crashes without a helpful error message.
There should be a helpful error message without the cable attached, not a crash.
This may seem like a small thing, but when using FPGAs in automated test setups(which *should* be two for the par in 2020), then the logs are all you have to determine what the problem is.
Can you reproduce a non-helpful error message without a cable attached?
Usually when you do not have the blaster connected then it will show error message that there is no blaster connected. It will be helpful if you are able to provide the guide to duplicate it. The reason is that I am not able to observed the internal error you mention
I know that, ideally, you'd want a reproduction procedure.
However, the stack trace should give the developers plenty of clues to hunt down the problem by code-inspection, which is why they made the stack trace in the first place.
The internal error is just providing the some guidance but does not really fully root cause it. We will need to have a method to duplicate the issue so that we are able to fixed the issue correctly in case we implement the wrong solution.
Sorry for the inconvenience.