BUG: using smp_processor_id() in preemptible [00000000] code: TCPTSK/1809


Rainer Kloud
 

Hi,
 
I'm useing the kernel 4.19.165-cip41-rt18 on the Altera SOCkit and I encountered an problem with high TCP traffic.
I get the following kernel message over and over: 
 
Jun  1 09:11:46 sicam kernel: [46802.944165] BUG: using smp_processor_id() in preemptible [00000000] code: TCPTSK/1809
Jun  1 09:11:46 sicam kernel: [46802.944210] caller is migrate_enable+0x40/0x488
Jun  1 09:11:46 sicam kernel: [46802.944221] CPU: 0 PID: 1809 Comm: TI4SDTSK#135 Tainted: G        W         4.19.165-cip41-rt18 #1
Jun  1 09:11:46 sicam kernel: [46802.944225] Hardware name: Altera SOCFPGA
Jun  1 09:11:46 sicam kernel: [46802.944252] [<c010e9d4>] (unwind_backtrace) from [<c010b9f4>] (show_stack+0x10/0x14)
Jun  1 09:11:46 sicam kernel: [46802.944272] [<c010b9f4>] (show_stack) from [<c062f854>] (dump_stack+0x94/0xa8)
Jun  1 09:11:46 sicam kernel: [46802.944299] [<c062f854>] (dump_stack) from [<c03a57ec>] (check_preemption_disabled+0x110/0x114)
Jun  1 09:11:46 sicam kernel: [46802.944316] [<c03a57ec>] (check_preemption_disabled) from [<c014163c>] (migrate_enable+0x40/0x488)
Jun  1 09:11:46 sicam kernel: [46802.944338] [<c014163c>] (migrate_enable) from [<c053ff0c>] (ip_finish_output2+0x21c/0x460)
Jun  1 09:11:46 sicam kernel: [46802.944353] [<c053ff0c>] (ip_finish_output2) from [<c0542854>] (ip_output+0x140/0x184)
Jun  1 09:11:46 sicam kernel: [46802.944364] [<c0542854>] (ip_output) from [<c0542128>] (__ip_queue_xmit+0x134/0x40c)
Jun  1 09:11:46 sicam kernel: [46802.944381] [<c0542128>] (__ip_queue_xmit) from [<c055cf70>] (__tcp_transmit_skb+0x53c/0xb20)
Jun  1 09:11:46 sicam kernel: [46802.944392] [<c055cf70>] (__tcp_transmit_skb) from [<c055e188>] (tcp_write_xmit+0x27c/0xfd0)
Jun  1 09:11:46 sicam kernel: [46802.944403] [<c055e188>] (tcp_write_xmit) from [<c055ef10>] (__tcp_push_pending_frames+0x34/0xa8)
Jun  1 09:11:46 sicam kernel: [46802.944413] [<c055ef10>] (__tcp_push_pending_frames) from [<c054f5cc>] (tcp_sendmsg_locked+0x66c/0xc40)
Jun  1 09:11:46 sicam kernel: [46802.944422] [<c054f5cc>] (tcp_sendmsg_locked) from [<c054fbc8>] (tcp_sendmsg+0x28/0x3c)
Jun  1 09:11:46 sicam kernel: [46802.944441] [<c054fbc8>] (tcp_sendmsg) from [<c04cb3f4>] (sock_sendmsg+0x14/0x24)
Jun  1 09:11:46 sicam kernel: [46802.944455] [<c04cb3f4>] (sock_sendmsg) from [<c04cc6d0>] (__sys_sendto+0xc4/0x104)
Jun  1 09:11:46 sicam kernel: [46802.944467] [<c04cc6d0>] (__sys_sendto) from [<c04cc72c>] (sys_send+0x18/0x20)
Jun  1 09:11:46 sicam kernel: [46802.944479] [<c04cc72c>] (sys_send) from [<c0101000>] (ret_fast_syscall+0x0/0x5c)
Jun  1 09:11:46 sicam kernel: [46802.944484] Exception stack(0xda337fa8 to 0xda337ff0)
Jun  1 09:11:46 sicam kernel: [46802.944493] 7fa0:                   001c6f2c 00000001 00000037 0020ae40 00000006 00000000
Jun  1 09:11:46 sicam kernel: [46802.944502] 7fc0: 001c6f2c 00000001 00000001 00000121 001651c8 001ea378 ffff0000 00000006
Jun  1 09:11:46 sicam kernel: [46802.944508] 7fe0: 00000000 b3efea18 00000000 b6a2ff40
 
After some time (normally 10 - 18 hours) the kernel stucks completelly. The TCPTSK is a custom task which has 20 TCP connections to different communication parners running on other hardware (PC, RaspberryPI, ...).
 
Can anybody tell me what problem this message triggers?
The check_preemption_disabled function is called when smp_processor_id() is used. This function should be called with preemtion disabled, but I think migrate_enable will allways enable preemption ...
 
Thanx for your response!
Rainer
 


Pavel Machek
 

Hi!

<html><head></head><body>
Please avoid html on mailing lists.

I&#39;m useing the kernel&nbsp;4.19.165-cip41-rt18 on the Altera
SOCkit and I encountered an problem with high TCP traffic.
I notice you are using -rt kernel. Do you actually need realtime
features?

Can you try to reproduce the problem on 4.19.193?

Can anybody tell me what problem this message triggers?

The check_preemption_disabled function is called when
smp_processor_id() is used. This function should be called with
preemtion disabled, but I think migrate_enable will allways enable
preemption ...
I believe you got your analysis mostly right, I'd have to study the
code some more to see what is going on.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Rainer Kloud
 

Hello Pavel,

I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needs to run periodically in realtime (triggered every 10ms by an
external IRQ).

Can you try to reproduce the problem on 4.19.193?
This is only a problem with the realtime patch. The patch
introduces migrate_enable which is part of the callstack:

Jun 1 09:11:46 sicam kernel: [46802.944165] BUG: using smp_processor_id() in preemptible [00000000] code: TCPTSK/1809
Jun 1 09:11:46 sicam kernel: [46802.944210] caller is migrate_enable+0x40/0x488
Jun 1 09:11:46 sicam kernel: [46802.944221] CPU: 0 PID: 1809 Comm: TI4SDTSK#135 Tainted: G W 4.19.165-cip41-rt18 #1
Jun 1 09:11:46 sicam kernel: [46802.944225] Hardware name: Altera SOCFPGA
Jun 1 09:11:46 sicam kernel: [46802.944252] [<c010e9d4>] (unwind_backtrace) from [<c010b9f4>] (show_stack+0x10/0x14)
Jun 1 09:11:46 sicam kernel: [46802.944272] [<c010b9f4>] (show_stack) from [<c062f854>] (dump_stack+0x94/0xa8)
Jun 1 09:11:46 sicam kernel: [46802.944299] [<c062f854>] (dump_stack) from [<c03a57ec>] (check_preemption_disabled+0x110/0x114)
Jun 1 09:11:46 sicam kernel: [46802.944316] [<c03a57ec>] (check_preemption_disabled) from [<c014163c>] (migrate_enable+0x40/0x488)
Jun 1 09:11:46 sicam kernel: [46802.944338] [<c014163c>] (migrate_enable) from [<c053ff0c>] (ip_finish_output2+0x21c/0x460)
Jun 1 09:11:46 sicam kernel: [46802.944353] [<c053ff0c>] (ip_finish_output2) from [<c0542854>] (ip_output+0x140/0x184)
Jun 1 09:11:46 sicam kernel: [46802.944364] [<c0542854>] (ip_output) from [<c0542128>] (__ip_queue_xmit+0x134/0x40c)
Jun 1 09:11:46 sicam kernel: [46802.944381] [<c0542128>] (__ip_queue_xmit) from [<c055cf70>] (__tcp_transmit_skb+0x53c/0xb20)
Jun 1 09:11:46 sicam kernel: [46802.944392] [<c055cf70>] (__tcp_transmit_skb) from [<c055e188>] (tcp_write_xmit+0x27c/0xfd0)
Jun 1 09:11:46 sicam kernel: [46802.944403] [<c055e188>] (tcp_write_xmit) from [<c055ef10>] (__tcp_push_pending_frames+0x34/0xa8)
Jun 1 09:11:46 sicam kernel: [46802.944413] [<c055ef10>] (__tcp_push_pending_frames) from [<c054f5cc>] (tcp_sendmsg_locked+0x66c/0xc40)
Jun 1 09:11:46 sicam kernel: [46802.944422] [<c054f5cc>] (tcp_sendmsg_locked) from [<c054fbc8>] (tcp_sendmsg+0x28/0x3c)
Jun 1 09:11:46 sicam kernel: [46802.944441] [<c054fbc8>] (tcp_sendmsg) from [<c04cb3f4>] (sock_sendmsg+0x14/0x24)
Jun 1 09:11:46 sicam kernel: [46802.944455] [<c04cb3f4>] (sock_sendmsg) from [<c04cc6d0>] (__sys_sendto+0xc4/0x104)
Jun 1 09:11:46 sicam kernel: [46802.944467] [<c04cc6d0>] (__sys_sendto) from [<c04cc72c>] (sys_send+0x18/0x20)
Jun 1 09:11:46 sicam kernel: [46802.944479] [<c04cc72c>] (sys_send) from [<c0101000>] (ret_fast_syscall+0x0/0x5c)
Jun 1 09:11:46 sicam kernel: [46802.944484] Exception stack(0xda337fa8 to 0xda337ff0)
Jun 1 09:11:46 sicam kernel: [46802.944493] 7fa0: 001c6f2c 00000001 00000037 0020ae40 00000006 00000000
Jun 1 09:11:46 sicam kernel: [46802.944502] 7fc0: 001c6f2c 00000001 00000001 00000121 001651c8 001ea378 ffff0000 00000006
Jun 1 09:11:46 sicam kernel: [46802.944508] 7fe0: 00000000 b3efea18 00000000 b6a2ff40

Best Regards,
Rainer


Jan Kiszka
 

Hi Rainer,

On 07.06.21 07:23, Rainer Kloud wrote:
Hello Pavel,

I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needs to run periodically in realtime (triggered every 10ms by an
external IRQ).
Please don't forget to share your kernel config with us so that we can
make sure your use case is covered subsystem-wise in CIP. From Siemens
side, we still have room for improvements in this regard, even more on -rt.

Jun 1 09:11:46 sicam kernel: [46802.944299] [<c062f854>] (dump_stack) from [<c03a57ec>] (check_preemption_disabled+0x110/0x114)
Jun 1 09:11:46 sicam kernel: [46802.944316] [<c03a57ec>] (check_preemption_disabled) from [<c014163c>] (migrate_enable+0x40/0x488)
Jun 1 09:11:46 sicam kernel: [46802.944338] [<c014163c>] (migrate_enable) from [<c053ff0c>] (ip_finish_output2+0x21c/0x460)
Migration should be on across migration-disabled sections, that's their
whole purpose. But maybe the check that preemption needs to be off when
using smp_processor_id needs relaxing to at least migration must be off.

Jan

--
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


Rainer Kloud
 

Hi,
 
I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needs to run periodically in realtime (triggered every 10ms by an
external IRQ).
Please don't forget to share your kernel config with us so that we can
make sure your use case is covered subsystem-wise in CIP. From Siemens
side, we still have room for improvements in this regard, even more on -rt.
Attached you can find my kernel config. 
 
Jun 1 09:11:46 sicam kernel: [46802.944299] [<c062f854>] (dump_stack) from [<c03a57ec>] (check_preemption_disabled+0x110/0x114)
Jun 1 09:11:46 sicam kernel: [46802.944316] [<c03a57ec>] (check_preemption_disabled) from [<c014163c>] (migrate_enable+0x40/0x488)
Jun 1 09:11:46 sicam kernel: [46802.944338] [<c014163c>] (migrate_enable) from [<c053ff0c>] (ip_finish_output2+0x21c/0x460)
Migration should be on across migration-disabled sections, that's their
whole purpose. But maybe the check that preemption needs to be off when
using smp_processor_id needs relaxing to at least migration must be off.
Sorry, but I can not follow your words. What do you mean with 'needs relaxing to
at least migration must be off'?

Thanx,
Rainer


rainer.kloud@...
 

Hello Pavel,
 
I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needts to run periodically in realtime (every 10ms).

Can you try to reproduce the problem on 4.19.193?
This is only a problem with the realtime patch. The patch
introduces migrate_enable which is part of the callstack:

Jun 1 09:11:46 sicam kernel: [46802.944165] BUG: using smp_processor_id() in preemptible [00000000] code: TCPTSK/1809
Jun 1 09:11:46 sicam kernel: [46802.944210] caller is migrate_enable+0x40/0x488
Jun 1 09:11:46 sicam kernel: [46802.944221] CPU: 0 PID: 1809 Comm: TI4SDTSK#135 Tainted: G W 4.19.165-cip41-rt18 #1
Jun 1 09:11:46 sicam kernel: [46802.944225] Hardware name: Altera SOCFPGA
Jun 1 09:11:46 sicam kernel: [46802.944252] [<c010e9d4>] (unwind_backtrace) from [<c010b9f4>] (show_stack+0x10/0x14)
Jun 1 09:11:46 sicam kernel: [46802.944272] [<c010b9f4>] (show_stack) from [<c062f854>] (dump_stack+0x94/0xa8)
Jun 1 09:11:46 sicam kernel: [46802.944299] [<c062f854>] (dump_stack) from [<c03a57ec>] (check_preemption_disabled+0x110/0x114)
Jun 1 09:11:46 sicam kernel: [46802.944316] [<c03a57ec>] (check_preemption_disabled) from [<c014163c>] (migrate_enable+0x40/0x488)
Jun 1 09:11:46 sicam kernel: [46802.944338] [<c014163c>] (migrate_enable) from [<c053ff0c>] (ip_finish_output2+0x21c/0x460)
Jun 1 09:11:46 sicam kernel: [46802.944353] [<c053ff0c>] (ip_finish_output2) from [<c0542854>] (ip_output+0x140/0x184)
Jun 1 09:11:46 sicam kernel: [46802.944364] [<c0542854>] (ip_output) from [<c0542128>] (__ip_queue_xmit+0x134/0x40c)
Jun 1 09:11:46 sicam kernel: [46802.944381] [<c0542128>] (__ip_queue_xmit) from [<c055cf70>] (__tcp_transmit_skb+0x53c/0xb20)
Jun 1 09:11:46 sicam kernel: [46802.944392] [<c055cf70>] (__tcp_transmit_skb) from [<c055e188>] (tcp_write_xmit+0x27c/0xfd0)
Jun 1 09:11:46 sicam kernel: [46802.944403] [<c055e188>] (tcp_write_xmit) from [<c055ef10>] (__tcp_push_pending_frames+0x34/0xa8)
Jun 1 09:11:46 sicam kernel: [46802.944413] [<c055ef10>] (__tcp_push_pending_frames) from [<c054f5cc>] (tcp_sendmsg_locked+0x66c/0xc40)
Jun 1 09:11:46 sicam kernel: [46802.944422] [<c054f5cc>] (tcp_sendmsg_locked) from [<c054fbc8>] (tcp_sendmsg+0x28/0x3c)
Jun 1 09:11:46 sicam kernel: [46802.944441] [<c054fbc8>] (tcp_sendmsg) from [<c04cb3f4>] (sock_sendmsg+0x14/0x24)
Jun 1 09:11:46 sicam kernel: [46802.944455] [<c04cb3f4>] (sock_sendmsg) from [<c04cc6d0>] (__sys_sendto+0xc4/0x104)
Jun 1 09:11:46 sicam kernel: [46802.944467] [<c04cc6d0>] (__sys_sendto) from [<c04cc72c>] (sys_send+0x18/0x20)
Jun 1 09:11:46 sicam kernel: [46802.944479] [<c04cc72c>] (sys_send) from [<c0101000>] (ret_fast_syscall+0x0/0x5c)
Jun 1 09:11:46 sicam kernel: [46802.944484] Exception stack(0xda337fa8 to 0xda337ff0)
Jun 1 09:11:46 sicam kernel: [46802.944493] 7fa0: 001c6f2c 00000001 00000037 0020ae40 00000006 00000000
Jun 1 09:11:46 sicam kernel: [46802.944502] 7fc0: 001c6f2c 00000001 00000001 00000121 001651c8 001ea378 ffff0000 00000006
Jun 1 09:11:46 sicam kernel: [46802.944508] 7fe0: 00000000 b3efea18 00000000 b6a2ff40

Best Regards,
Rainer


Jan Kiszka
 

On 07.06.21 12:18, Rainer Kloud wrote:
Hi,
 
I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needs to run periodically in realtime (triggered every 10ms by an
external IRQ).
Please don't forget to share your kernel config with us so that we can
make sure your use case is covered subsystem-wise in CIP. From Siemens
side, we still have room for improvements in this regard, even more on -rt.
Attached you can find my kernel config. 
 
Would you propose it as patch to
https://gitlab.com/cip-project/cip-kernel/cip-kernel-config?

Jun 1 09:11:46 sicam kernel: [46802.944299] [<c062f854>] (dump_stack) from [<c03a57ec>] (check_preemption_disabled+0x110/0x114)
Jun 1 09:11:46 sicam kernel: [46802.944316] [<c03a57ec>] (check_preemption_disabled) from [<c014163c>] (migrate_enable+0x40/0x488)
Jun 1 09:11:46 sicam kernel: [46802.944338] [<c014163c>] (migrate_enable) from [<c053ff0c>] (ip_finish_output2+0x21c/0x460)
Migration should be on across migration-disabled sections, that's their
whole purpose. But maybe the check that preemption needs to be off when
using smp_processor_id needs relaxing to at least migration must be off.
Sorry, but I can not follow your words. What do you mean with 'needs relaxing to
at least migration must be off'?
Strike it, that case is too generic to be a reason for something that
long in the field by now.

From looking at the code of migrate_enable(), I suspect that we cause
the call to smp_processor_id() via "struct rq *rq = this_rq();". That
comes fairly at the beginning of migrated_enable(), which matches the
small offset in your backtrace (you can confirm that better). That would
complain about "preemptible code" if the caller does not have preemption
or at least migration off. So my suspect would be a
migration_disable/enable imbalance in the code path you triggered,
likely somewhere in the TCP code.

Did you already try more recent RT kernels, both in the 4.19 series as
well as maybe 5.10 or even latest -rt? Possibly, this issue has been
fixed by now. If you can even reproduce with latest -rt, the issue is
better reported to linux-rt-users.

Jan

--
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


Pavel Machek
 

Hi!

I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needs to run periodically in realtime (triggered every 10ms by an
external IRQ).
How fast response do you need to the external IRQ?

Can you try to reproduce the problem on 4.19.193?
This is only a problem with the realtime patch. The patch
introduces migrate_enable which is part of the callstack:

Jun 1 09:11:46 sicam kernel: [46802.944165] BUG: using smp_processor_id() in preemptible [00000000] code: TCPTSK/1809
Jun 1 09:11:46 sicam kernel: [46802.944210] caller is
migrate_enable+0x40/0x488
Fun :-(. As Jan mentioned, testing if it can be reproduced with
4.19.193-rt81 would be useful.

Best regards,
Pavel

--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Rainer Kloud
 

I notice you are using -rt kernel. Do you actually need realtime
features?
Yes, I actually need the realtime feature. I have one task which
needs to run periodically in realtime (triggered every 10ms by an
external IRQ).
How fast response do you need to the external IRQ?
We do not have "Hard-Realtime" requirements, but we should always handle the IRQ inside of 1ms.

Can you try to reproduce the problem on 4.19.193?
This is only a problem with the realtime patch. The patch
introduces migrate_enable which is part of the callstack:

Jun 1 09:11:46 sicam kernel: [46802.944165] BUG: using smp_processor_id() in preemptible [00000000] code: TCPTSK/1809
Jun 1 09:11:46 sicam kernel: [46802.944210] caller is
migrate_enable+0x40/0x488
Fun :-(. As Jan mentioned, testing if it can be reproduced with
4.19.193-rt81 would be useful.
Ok, I will give 4.19.193-rt81 a try.

Before we decided to switch to the 4.19.x kernel we tried in March this year the 5.10.25 kernel with -rt35.
There we observed a very bad realtime behaviour. Our external interrupt had a latency of more then 4ms when our complete application was running. The IRQ task has realtime prio 51 and is therefore over all other IRQ tasks in the system (they have by default 50).
With kernel 4.19.x we do not see a latency over 1ms in the same configuration.

Best regards,
Rainer