I need de0-nano testing for -rt release was Re: 4.19.106-cip21-rt8 problems on de0-nano


Pavel Machek
 

Hi!

I pushed candidate for -cip-rt, but it seems to fail on de0-nano
board. Code under testing is at:

https://gitlab.com/cip-project/cip-kernel/linux-cip/tree/ci/pavel/linux-cip
It is pipeline

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122762401

I'll reuse the branch for more testing.
I managed to narrow the bad commit to the -rt tree, between:

OK 122904930 pick 69aa73357e6a rcu: Don't allow to change rcu_normal_after_boot on RT
pick 849ef8789077 pci/switchtec: fix stream_open.cocci warnings
pick ad8a5e8279c4 sched/core: Drop a preempt_disable_rt() statement
pick 966f066d96cb timers: Redo the notification of canceling timers on -RT
pick 0393fd5a4f9a Revert "futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock"
pick 84eb0b64a27a Revert "futex: Fix bug on when a requeued RT task times out"
pick fcc893280f4e Revert "rtmutex: Handle the various new futex race conditions"
pick 2eac93cf9d16 Revert "futex: workaround migrate_disable/enable in different context"
pick 9b8964629f4f futex: Make the futex_hash_bucket lock raw
pick cc1812bf198b futex: Delay deallocation of pi_state
pick f5e115c43100 mm/zswap: Do not disable preemption in zswap_frontswap_store()
pick e0d0d09a08ad revert-aio
pick a0a40bfb4300 fs/aio: simple simple work
pick 0fae581d8c5e revert-thermal
pick c0d95b4a8a1b thermal: Defer thermal wakups to threads
pick 700fbb4afb6e revert-block
pick 4cda50ff12cf block: blk-mq: move blk_queue_usage_counter_release() into process context
pick 9e982f55745b workqueue: rework
pick c0db53dc3bf4 i2c: exynos5: Remove IRQF_ONESHOT
pick 1f160d170203 i2c: hix5hd2: Remove IRQF_ONESHOT
BAD 122882826 eae5a7cab722 sched/deadline: Ensure inactive_timer runs in hardirq context
And something went seriously wrong after these tests. I submitted same
tree twice, and got different results.

First this -- de0-nano succeeds:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122904930

Now this -- de0-nano fails (and ipc227e is unfinished for long time):

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122959477

I'll need some help here.
The logs read like the targets are not (always) coming up, e.g.
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/457824214#L377
Yes... I don't need that target, but I need de0-nano... and it did not
work last time I checked.

On a related note... it would be good to somehow show difference
between "kernel test failure" and "target failure".

If we see bootloader in the logs, and then test fails/timeouts =>
"kernel test failure", I need to solve it.

If we don't get messages from the bootloader => "target failure",
someone needs to check the power relays or something...

Best regards,
Pavel

--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Jan Kiszka
 

On 09.03.20 11:21, Pavel Machek wrote:
Hi!

I pushed candidate for -cip-rt, but it seems to fail on de0-nano
board. Code under testing is at:

https://gitlab.com/cip-project/cip-kernel/linux-cip/tree/ci/pavel/linux-cip
It is pipeline

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122762401

I'll reuse the branch for more testing.
I managed to narrow the bad commit to the -rt tree, between:

OK 122904930 pick 69aa73357e6a rcu: Don't allow to change rcu_normal_after_boot on RT
pick 849ef8789077 pci/switchtec: fix stream_open.cocci warnings
pick ad8a5e8279c4 sched/core: Drop a preempt_disable_rt() statement
pick 966f066d96cb timers: Redo the notification of canceling timers on -RT
pick 0393fd5a4f9a Revert "futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock"
pick 84eb0b64a27a Revert "futex: Fix bug on when a requeued RT task times out"
pick fcc893280f4e Revert "rtmutex: Handle the various new futex race conditions"
pick 2eac93cf9d16 Revert "futex: workaround migrate_disable/enable in different context"
pick 9b8964629f4f futex: Make the futex_hash_bucket lock raw
pick cc1812bf198b futex: Delay deallocation of pi_state
pick f5e115c43100 mm/zswap: Do not disable preemption in zswap_frontswap_store()
pick e0d0d09a08ad revert-aio
pick a0a40bfb4300 fs/aio: simple simple work
pick 0fae581d8c5e revert-thermal
pick c0d95b4a8a1b thermal: Defer thermal wakups to threads
pick 700fbb4afb6e revert-block
pick 4cda50ff12cf block: blk-mq: move blk_queue_usage_counter_release() into process context
pick 9e982f55745b workqueue: rework
pick c0db53dc3bf4 i2c: exynos5: Remove IRQF_ONESHOT
pick 1f160d170203 i2c: hix5hd2: Remove IRQF_ONESHOT
BAD 122882826 eae5a7cab722 sched/deadline: Ensure inactive_timer runs in hardirq context
And something went seriously wrong after these tests. I submitted same
tree twice, and got different results.

First this -- de0-nano succeeds:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122904930

Now this -- de0-nano fails (and ipc227e is unfinished for long time):

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122959477

I'll need some help here.
The logs read like the targets are not (always) coming up, e.g.
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/457824214#L377
Yes... I don't need that target, but I need de0-nano... and it did not
work last time I checked.
Bikram, could someone on your side check the board status in the Mentor lab? Thanks!

On a related note... it would be good to somehow show difference
between "kernel test failure" and "target failure".
If we see bootloader in the logs, and then test fails/timeouts =>
"kernel test failure", I need to solve it.
If we don't get messages from the bootloader => "target failure",
someone needs to check the power relays or something...
I'm not happy about the parsability of those LAVA logs either, but I have no idea if/how that can be improved best. Maybe Quirin has some idea based on his work with them.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


Bhola, Bikram <Bikram_Bhola@...>
 

Hi Jan and All,,

We are working on it.

Looks like we have a slow network in last few days in our lab that results in rootfs download timeout failure. Time being we need to increase the current timeout from 15 mins to 30 mins for safer side (its failing in between 90% completion). Meantime I am working with our network team to diagnose the slowness.

Thank You!!

Regards,
Bikram

-----Original Message-----
From: Jan Kiszka [mailto:jan.kiszka@siemens.com]
Sent: 10 March 2020 00:23
To: Pavel Machek <pavel@denx.de>; Bhola, Bikram <Bikram_Bhola@mentor.com>; Quirin Gylstorff <quirin.gylstorff@siemens.com>
Cc: cip-dev@lists.cip-project.org
Subject: Re: I need de0-nano testing for -rt release was Re: [cip-dev] 4.19.106-cip21-rt8 problems on de0-nano

On 09.03.20 11:21, Pavel Machek wrote:
Hi!

I pushed candidate for -cip-rt, but it seems to fail on de0-nano
board. Code under testing is at:

https://gitlab.com/cip-project/cip-kernel/linux-cip/tree/ci/pavel
/linux-cip
It is pipeline

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/1227
62401

I'll reuse the branch for more testing.
I managed to narrow the bad commit to the -rt tree, between:

OK 122904930 pick 69aa73357e6a rcu: Don't allow to change
rcu_normal_after_boot on RT pick 849ef8789077 pci/switchtec: fix
stream_open.cocci warnings pick ad8a5e8279c4 sched/core: Drop a
preempt_disable_rt() statement pick 966f066d96cb timers: Redo the
notification of canceling timers on -RT pick 0393fd5a4f9a Revert "futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock"
pick 84eb0b64a27a Revert "futex: Fix bug on when a requeued RT task times out"
pick fcc893280f4e Revert "rtmutex: Handle the various new futex race conditions"
pick 2eac93cf9d16 Revert "futex: workaround migrate_disable/enable in different context"
pick 9b8964629f4f futex: Make the futex_hash_bucket lock raw pick
cc1812bf198b futex: Delay deallocation of pi_state
pick f5e115c43100 mm/zswap: Do not disable preemption in
zswap_frontswap_store()
pick e0d0d09a08ad revert-aio
pick a0a40bfb4300 fs/aio: simple simple work pick 0fae581d8c5e
revert-thermal pick c0d95b4a8a1b thermal: Defer thermal wakups to
threads pick 700fbb4afb6e revert-block pick 4cda50ff12cf block:
blk-mq: move blk_queue_usage_counter_release() into process context
pick 9e982f55745b workqueue: rework pick c0db53dc3bf4 i2c: exynos5:
Remove IRQF_ONESHOT pick 1f160d170203 i2c: hix5hd2: Remove
IRQF_ONESHOT BAD 122882826 eae5a7cab722 sched/deadline: Ensure
inactive_timer runs in hardirq context
And something went seriously wrong after these tests. I submitted
same tree twice, and got different results.

First this -- de0-nano succeeds:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122904
930

Now this -- de0-nano fails (and ipc227e is unfinished for long time):

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122959
477

I'll need some help here.
The logs read like the targets are not (always) coming up, e.g.
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/457824214#
L377
Yes... I don't need that target, but I need de0-nano... and it did not
work last time I checked.
Bikram, could someone on your side check the board status in the Mentor lab? Thanks!


On a related note... it would be good to somehow show difference
between "kernel test failure" and "target failure".

If we see bootloader in the logs, and then test fails/timeouts =>
"kernel test failure", I need to solve it.

If we don't get messages from the bootloader => "target failure",
someone needs to check the power relays or something...
I'm not happy about the parsability of those LAVA logs either, but I have no idea if/how that can be improved best. Maybe Quirin has some idea based on his work with them.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux


Bhola, Bikram <Bikram_Bhola@...>
 

Hi Jan and All,

Both de0-nano and IPC227E targets are up and running. I have monitored for test jobs on it and those completed successfully.

Thank You!!

Regards,
Bikram

-----Original Message-----
From: Bhola, Bikram
Sent: 10 March 2020 22:20
To: 'Jan Kiszka' <jan.kiszka@siemens.com>; Pavel Machek <pavel@denx.de>; Quirin Gylstorff <quirin.gylstorff@siemens.com>
Cc: cip-dev@lists.cip-project.org
Subject: RE: I need de0-nano testing for -rt release was Re: [cip-dev] 4.19.106-cip21-rt8 problems on de0-nano

Hi Jan and All,,

We are working on it.

Looks like we have a slow network in last few days in our lab that results in rootfs download timeout failure. Time being we need to increase the current timeout from 15 mins to 30 mins for safer side (its failing in between 90% completion). Meantime I am working with our network team to diagnose the slowness.

Thank You!!

Regards,
Bikram

-----Original Message-----
From: Jan Kiszka [mailto:jan.kiszka@siemens.com]
Sent: 10 March 2020 00:23
To: Pavel Machek <pavel@denx.de>; Bhola, Bikram <Bikram_Bhola@mentor.com>; Quirin Gylstorff <quirin.gylstorff@siemens.com>
Cc: cip-dev@lists.cip-project.org
Subject: Re: I need de0-nano testing for -rt release was Re: [cip-dev] 4.19.106-cip21-rt8 problems on de0-nano

On 09.03.20 11:21, Pavel Machek wrote:
Hi!

I pushed candidate for -cip-rt, but it seems to fail on de0-nano
board. Code under testing is at:

https://gitlab.com/cip-project/cip-kernel/linux-cip/tree/ci/pavel
/linux-cip
It is pipeline

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/1227
62401

I'll reuse the branch for more testing.
I managed to narrow the bad commit to the -rt tree, between:

OK 122904930 pick 69aa73357e6a rcu: Don't allow to change
rcu_normal_after_boot on RT pick 849ef8789077 pci/switchtec: fix
stream_open.cocci warnings pick ad8a5e8279c4 sched/core: Drop a
preempt_disable_rt() statement pick 966f066d96cb timers: Redo the
notification of canceling timers on -RT pick 0393fd5a4f9a Revert "futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock"
pick 84eb0b64a27a Revert "futex: Fix bug on when a requeued RT task times out"
pick fcc893280f4e Revert "rtmutex: Handle the various new futex race conditions"
pick 2eac93cf9d16 Revert "futex: workaround migrate_disable/enable in different context"
pick 9b8964629f4f futex: Make the futex_hash_bucket lock raw pick
cc1812bf198b futex: Delay deallocation of pi_state
pick f5e115c43100 mm/zswap: Do not disable preemption in
zswap_frontswap_store()
pick e0d0d09a08ad revert-aio
pick a0a40bfb4300 fs/aio: simple simple work pick 0fae581d8c5e
revert-thermal pick c0d95b4a8a1b thermal: Defer thermal wakups to
threads pick 700fbb4afb6e revert-block pick 4cda50ff12cf block:
blk-mq: move blk_queue_usage_counter_release() into process context
pick 9e982f55745b workqueue: rework pick c0db53dc3bf4 i2c: exynos5:
Remove IRQF_ONESHOT pick 1f160d170203 i2c: hix5hd2: Remove
IRQF_ONESHOT BAD 122882826 eae5a7cab722 sched/deadline: Ensure
inactive_timer runs in hardirq context
And something went seriously wrong after these tests. I submitted
same tree twice, and got different results.

First this -- de0-nano succeeds:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122904
930

Now this -- de0-nano fails (and ipc227e is unfinished for long time):

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/122959
477

I'll need some help here.
The logs read like the targets are not (always) coming up, e.g.
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/457824214#
L377
Yes... I don't need that target, but I need de0-nano... and it did not
work last time I checked.
Bikram, could someone on your side check the board status in the Mentor lab? Thanks!


On a related note... it would be good to somehow show difference
between "kernel test failure" and "target failure".

If we see bootloader in the logs, and then test fails/timeouts =>
"kernel test failure", I need to solve it.

If we don't get messages from the bootloader => "target failure",
someone needs to check the power relays or something...
I'm not happy about the parsability of those LAVA logs either, but I have no idea if/how that can be improved best. Maybe Quirin has some idea based on his work with them.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux


Pavel Machek
 

Hi!

Both de0-nano and IPC227E targets are up and running. I have monitored for test jobs on it and those completed successfully.

Thank You!!
There's still something broken with the testing. renesas_shmobile
initially failed (okay after restart), rest failed:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/126355890

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Pavel Machek
 

Hi!

Both de0-nano and IPC227E targets are up and running. I have monitored for test jobs on it and those completed successfully.

Thank You!!
There's still something broken with the testing. renesas_shmobile
initially failed (okay after restart), rest failed:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/126355890
https://lava.ciplatform.org/scheduler/job/12718

Going through the logs:

progress 90% (81MB)
progress 95% (86MB)
progress 100% (90MB)
90MB downloaded in 383.03s (0.24MB/s)
end: 1.3.1 http-download (duration 00:06:23) [common]
case: http-download
case_id: 403737
definition: lava
duration: 383.03
extra: ...
level: 1.3.1
namespace: common
result: pass
tftp-deploy timed out after 1283 seconds
end: 1.3 download-retry (duration 00:06:24) [common

You are not trying to do tftp over WAN, are you?

Seeing the download speeds... would it make sense to do downloads with
rsync? Root filesystems (etc) are not changing too often, so that
should provide some speedups.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Bhola, Bikram <Bikram_Bhola@...>
 

Hi Pavel,

All these random failures are happening because of the slowness in network.
Because of the COVID-19 situation and in preparation of all employees to work from home office, Our IT team was doing some experiments/setup by reserving some network bandwidth. That caused the slowness in network and the timeout.

The board is connected through network cable. If Root filesystems (etc) are not changing too often, it totally make sense to do rsync to pull additional changes if any. We will give it a try on that.

I saw all of the jobs started working fine today again for both 127E and 227E board. But we may see these random failures as IT told to have these network thing will settle in another day or two.

Sorry for the trouble and thank you for being patient.

Regards,
Bikram

-----Original Message-----
From: Pavel Machek [mailto:pavel@denx.de]
Sent: 16 March 2020 17:29
To: Pavel Machek <pavel@denx.de>
Cc: Bhola, Bikram <Bikram_Bhola@mentor.com>; Jan Kiszka <jan.kiszka@siemens.com>; Quirin Gylstorff <quirin.gylstorff@siemens.com>; cip-dev@lists.cip-project.org
Subject: Re: I need de0-nano testing for -rt release was Re: [cip-dev] 4.19.106-cip21-rt8 problems on de0-nano

Hi!

Both de0-nano and IPC227E targets are up and running. I have monitored for test jobs on it and those completed successfully.

Thank You!!
There's still something broken with the testing. renesas_shmobile
initially failed (okay after restart), rest failed:

https://gitlab.com/cip-project/cip-kernel/linux-cip/pipelines/12635589
0
https://lava.ciplatform.org/scheduler/job/12718

Going through the logs:

progress 90% (81MB)
progress 95% (86MB)
progress 100% (90MB)
90MB downloaded in 383.03s (0.24MB/s)
end: 1.3.1 http-download (duration 00:06:23) [common]
case: http-download
case_id: 403737
definition: lava
duration: 383.03
extra: ...
level: 1.3.1
namespace: common
result: pass
tftp-deploy timed out after 1283 seconds
end: 1.3 download-retry (duration 00:06:24) [common

You are not trying to do tftp over WAN, are you?

Seeing the download speeds... would it make sense to do downloads with rsync? Root filesystems (etc) are not changing too often, so that should provide some speedups.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany