Watchdog triggered do_futex() on 4.4.277-cip60-rt35 #cip


Sebastian Holzgreve
 

Hi,

we were using the 4.4.277-cip60-rt35 kernel and it happens from time to time that the kernel watchdog is triggered and our embedded devices gets resetted.

This message specially appers when we run the CodeSYS Runtime (PLC runtime)

[  536.133081] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [BlkDrvUdp:330]
[  536.140637] Modules linked in:
[  536.143756] CPU: 0 PID: 330 Comm: BlkDrvUdp Not tainted 4.4.277-cip60-rt35-ohp+gd23c00f1c39e #1
[  536.152489] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[  536.158967] task: 86bd7700 ti: 841aa000 task.ti: 841aa000
[  536.164408] PC is at do_futex+0x458/0xa98
[  536.168450] LR is at do_futex+0x428/0xa98
[  536.172487] pc : [<80069a40>]    lr : [<80069a10>]    psr: a0070013
[  536.172487] sp : 841abe90  ip : 86896244  fp : 841abf4c

I contacted the support and they told me following about usage of mutexes of the runtime:

The runtime is using pi mutexes
pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT);
and also with recursive attribute.
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE_NP);
and that they got other customers that had also problems with this constellation.


My current knowledge of the linux kernel is not big enough to solve that problem by myself. But i'm willed to learn more about the kernel and maybe solve the problem by own.


Can anyone give me a hint how to continue investigation of that problem?
Maybe someone has solved that problem for another target hardware?

Every RTFM hint is also welcome, when it least tell me the manual to dig trough :)

Kind regards,
Sebastian

Join cip-dev@lists.cip-project.org to automatically receive all group messages.