Re: Crash

Altera_Forum · ‎11-02-2011

Hi,

some months ago I made a program in C++ in MicroC/OS-II installed on FPGA programmed with NIOS processor on Altera board.

The software was composed of three threads that made nothing: in each thread there was only a cout and before each cout a semaphore was brought and then, after the cout, the semaphore was released. The three threads was synchronized by semaphores.

If the thread was named A, B, C:

1) A run, B and C aspected

2) A released a semaphore, B took the semaphore and run; A and C aspected

3) B release a semaphore, C took the semaphore and run; A and B aspected

4) C release a semaphore, A took the semaphore and run; B and C aspected

and so on...

After two hours the program crashed, and this fact happened every time that I launched the program, and I didn't understand why.

Is there someone that has an idea?

Thank you very much

Altera_Forum · ‎11-02-2011

Time limited license ?

Altera_Forum · ‎11-02-2011

--- Quote Start ---

Time limited license ?

--- Quote End ---

Excuse me, I didn't understand. Could you explain?

Thanks

Altera_Forum · ‎11-02-2011

Sorry for my English: when in the first post I wrote aspected, I should write expected... I don't speak English from 15 years...

I thought that can be an hardware problem, is it possible? When I added some instructions in each thread, the program crashed (blocked, not exited) in an other time (for example, one hour instead of two hours), and the time of crash was every the same, each time that I launched it with the same instructions.

Thanks

Altera_Forum · ‎11-02-2011

Your english isn't that bad.

I was thinking that stopping after exactly 2 hours might be the limit of the licence for the nios cpu - I can't remember the actual interval - when you have not bought a full licence.

If changing the code changes the time before the failure that is unlikely to be the problem.

A hardware problem would be more likely to give a spread of 'times to failure' than a fixed time.

A fixed time might actually be a fixed count of some other activity - and be a 'simple' software bug.

Altera_Forum · ‎11-02-2011

In your opinion, a software's bug of the Operative System or a bug of my program? Because my program was very simple, it's difficult to make an error in a such program. Also it is difficult that the released version of MicroC/OS-II has a bug of this type, I think that programmers have reported the bug. For these reasons I thought that was an hardware problem.

Altera_Forum · ‎11-03-2011

How do you manage those semaphores? Do you continuously create and delete them?

Can you post your code?

Usually such a "timed crash" behaviour means your are periodically allocating resources but never release them.

Altera_Forum · ‎11-03-2011

Now I haven't the code, if I will find it i will post it, but I remember that I created them out of threads, in the main. In each thread I made only the pend and the post operations.

Thank you

Altera_Forum · ‎11-03-2011

I found this old post of yours

http://www.alteraforum.com/forum/showthread.php?t=28275

If this actually is the code you refer to, I'm concerned about a particular situation that will eventually happen after a while:

Because of task priorities it is possible that both high priority tasks repost the semaphore between the OSSemPost and the next OSSemPend of the low priority one. This will happen whenever the scheduler is activated exactly between the to instructions.

I'm not sure, but I think such a situation could generate anomalies in the normal flow.

As someone pointed out there, I think cout is not thread safe, if you still use it. Moreover, if you use jtag uart as standard output, I guess the interface would get stuck after some time, because of lot of print data under fast task switching

Altera_Forum · ‎11-03-2011

Thank you very much.

Yes, my problem is referred to this post, but the code was changed adding the semaphores concerned the cout that isn't thread safe.

If the problem is caused by the things that you said, which is the solution? Seems that there isn't a solution!

I used jtag uart as standard output.

Altera_Forum · ‎11-03-2011

Although I don't know what's the exact purpose the fflush function, it seems your tasks have no idle/sleep status; I mean they continuously rush in switching from one to the other at the maximum possible speed allowed by the scheduler.

IMHO this can generate two issues:

1. jtag allows a rather slow throughtput; the huge amount of output traffic generated by this relay race among tasks can easily choke the interface and affect the operation of the whole system

2. as I said in the previous post, the intrinsic absence of sleep instructions or significative execution times, will also make higher priorities tasks to immediately fall in the pending state, even before the lower priority one. In such a situation you could have inversion of the expected flow and possibly overlapped messages.

If tasks are actually not required to switch at that incredible rate, a possible solution would be to insert a TK_SLEEP(ticks) instruction between the cout print and the next OSSemPost.

A few ticks delay will ensure tasks to sequence in the correct order.

A better solution would be implementing a 3 state machine in a single task, if your complete project allows it; now I can't see the point in running 3 tasks if only one runs at a time.

Altera_Forum · ‎11-03-2011

I suspect the code spends most of its time blocked waiting for the JTAG uart inside 'cout' - This probably isn't the intention of the test!

Altera_Forum · ‎11-03-2011

--- Quote Start ---

Although I don't know what's the exact purpose the fflush function, it seems your tasks have no idle/sleep status; I mean they continuously rush in switching from one to the other at the maximum possible speed allowed by the scheduler.

IMHO this can generate two issues:

1. jtag allows a rather slow throughtput; the huge amount of output traffic generated by this relay race among tasks can easily choke the interface and affect the operation of the whole system

2. as I said in the previous post, the intrinsic absence of sleep instructions or significative execution times, will also make higher priorities tasks to immediately fall in the pending state, even before the lower priority one. In such a situation you could have inversion of the expected flow and possibly overlapped messages.

If tasks are actually not required to switch at that incredible rate, a possible solution would be to insert a TK_SLEEP(ticks) instruction between the cout print and the next OSSemPost.

A few ticks delay will ensure tasks to sequence in the correct order.

A better solution would be implementing a 3 state machine in a single task, if your complete project allows it; now I can't see the point in running 3 tasks if only one runs at a time.

--- Quote End ---

Thank you for this explanation. The problem happened also with many instructions in each thread, then I tried without instructions to see if I made an error in the instructions. With those instructions the threadd aren't so fast, because there was the formatting of packets, calculations, search and store of data,... before each Post(). With the instructions the problem appears in half an hour, in the posted code in 2-3 hours.

Altera_Forum · ‎11-03-2011

Possibly you were having problems with stack overflow. Which might require an interrupt to happen at the maximum stack depth - so wouldn't be that common.

Adding extra code might have increased the stack depth - making the overflow more likely.

I don't know how much stack things like 'cout' end up using - but it could be considerable.

Altera_Forum · ‎11-03-2011

--- Quote Start ---

With those instructions the threadd aren't so fast, because there was the formatting of packets, calculations, search and store of data,... before each Post(). With the instructions the problem appears in half an hour, in the posted code in 2-3 hours.

--- Quote End ---

The problem I conjectured is independent from how many istructions you have between the OSSemPend and OSSemPost; it's rather caused by the fact you have a higher priority task which runs without never releasing the control to low priority ones.

Since OS-II is a preemptive OS, whenever the high priority task is scheduled just after the semaphore was signaled by the other task, it takes completely the control and runs undisturbed until it reaches the next OSSemPend.

Then inserting more instructions simply delays the time when the event occurs, like in your case (in other words, you need the same number of cycles, but longer time, because each cycle takes longer)

About cout, I agree with dsl: if lot of data is queue for transmission out of jtag uart, your system may run short of stack or heap space.

Altera_Forum · ‎11-04-2011

--- Quote Start ---

The problem I conjectured is independent from how many istructions you have between the OSSemPend and OSSemPost; it's rather caused by the fact you have a higher priority task which runs without never releasing the control to low priority ones.

Since OS-II is a preemptive OS, whenever the high priority task is scheduled just after the semaphore was signaled by the other task, it takes completely the control and runs undisturbed until it reaches the next OSSemPend.

Then inserting more instructions simply delays the time when the event occurs, like in your case (in other words, you need the same number of cycles, but longer time, because each cycle takes longer)

About cout, I agree with dsl: if lot of data is queue for transmission out of jtag uart, your system may run short of stack or heap space.

--- Quote End ---

Thank you, I have understood the first part. Instead I didn't understand the cout problem, because with fflush() I free the buffer. Is not concerned?

Altera_Forum · ‎11-07-2011

--- Quote Start ---

Thank you, I have understood the first part. Instead I didn't understand the cout problem, because with fflush() I free the buffer. Is not concerned?

--- Quote End ---

As I said, I didn't know what the purpose of fflush() was. If it indeed waits for the cout buffer to free, it could avoid all the above problems, since this wait delay would keep the tasks synchronized.

However, I don't know how cout and its send buffer are managed at the lower levels, then I'd suggest you run your code without cout/fflush calss (add TKSLEEP(10) instead), in order to test the tasks sequencing itself. This way you can discriminate what really originates the crash problem.

Altera_Forum · ‎01-18-2013

--- Quote Start ---

Because of task priorities it is possible that both high priority tasks repost the semaphore between the OSSemPost and the next OSSemPend of the low priority one. This will happen whenever the scheduler is activated exactly between the to instructions.

I'm not sure, but I think such a situation could generate anomalies in the normal flow.

--- Quote End ---

I'm reading this old post, and I realize that I didn't understand why if both high priority tasks repost the semaphore between the OSSemPost and the next OSSemPend of the low priority one there can be anomalies.

Thanks

Altera_Forum · ‎01-30-2013

--- Quote Start ---

I'm reading this old post, and I realize that I didn't understand why if both high priority tasks repost the semaphore between the OSSemPost and the next OSSemPend of the low priority one there can be anomalies.

Thanks

--- Quote End ---

Could it be a scheduler's bug?