-stable-rc tests failing


Pavel Machek
 

Hi!

-stable-rc testing seems to be failing more than usual.

This is the "usual" failure:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/1493077012

WARNING: Retrying... error=invalid argument
2170ERROR: Downloading artifacts from coordinator... error couldn't execute GET against https://gitlab.com/api/v4/jobs/1492901612/artifacts?: Get https://gitlab.com/api/v4/jobs/1492901612/artifacts?: dial tcp: i/o timeout id=1492901612 token=9gNGsuaZ
2171FATAL: invalid argument
2173
Uploading artifacts for failed job

But I don't usually see this one:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/1492901660

Job #371616: Submitted
219Health: Unknown
220Device Type: r8a7743-iwg20d-q7
221Device: None
222Test: boot
223URL: https://lava.ciplatform.org/scheduler/job/371616
224
226ERROR: Job failed: execution took longer than 2h0m0s seconds

Links where all the failures can be seen are:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4.4.y
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4.19.y
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-5.10.y

I'll try hitting some resubmit buttons, but help would be welcome.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Chris Paterson
 

Hello Pavel,

From: Pavel Machek <pavel@...>
Sent: 11 August 2021 07:40

Hi!

-stable-rc testing seems to be failing more than usual.

This is the "usual" failure:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1493077012

WARNING: Retrying...                                error=invalid argument
2170ERROR: Downloading artifacts from coordinator... error couldn't execute
GET against https://gitlab.com/api/v4/jobs/1492901612/artifacts?: Get
https://gitlab.com/api/v4/jobs/1492901612/artifacts?: dial tcp: i/o timeout
id=1492901612 token=9gNGsuaZ
2171FATAL: invalid argument
2173
Uploading artifacts for failed job

But I don't usually see this one:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1492901660

Job #371616: Submitted
219Health: Unknown
220Device Type: r8a7743-iwg20d-q7
221Device: None
222Test: boot
223URL: https://lava.ciplatform.org/scheduler/job/371616
224
226ERROR: Job failed: execution took longer than 2h0m0s seconds
It looks like the LAVA server has stopped processing jobs. Our queue has become very large!
I'll have to see what died and no doubt reboot it and the labs.

Sorry for the pain.

Kind regards, Chris


Links where all the failures can be seen are:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-
4.4.y
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-
4.19.y
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-
5.10.y

I'll try hitting some resubmit buttons, but help would be welcome.

Best regards,
                                                              Pavel
--
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Pavel Machek
 

Hi!

WARNING: Retrying...                                error=invalid argument
2170ERROR: Downloading artifacts from coordinator... error couldn't execute
GET against https://gitlab.com/api/v4/jobs/1492901612/artifacts?: Get
https://gitlab.com/api/v4/jobs/1492901612/artifacts?: dial tcp: i/o timeout
id=1492901612 token=9gNGsuaZ
2171FATAL: invalid argument
2173
Uploading artifacts for failed job

But I don't usually see this one:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1492901660

Job #371616: Submitted
219Health: Unknown
220Device Type: r8a7743-iwg20d-q7
221Device: None
222Test: boot
223URL: https://lava.ciplatform.org/scheduler/job/371616
224
226ERROR: Job failed: execution took longer than 2h0m0s seconds
It looks like the LAVA server has stopped processing jobs. Our queue has become very large!
I'll have to see what died and no doubt reboot it and the labs.

Sorry for the pain.
Thank you, it seems to be better now, and it looks like ctj_zynqmp is
available, too (I have not yet seen successful test, but it at least
tries).

But I still see problems with x86_qemu... the "2h timeout".

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Nobuhiro Iwamatsu
 

Hi,

-----Original Message-----
From: cip-dev@... [mailto:cip-dev@...] On Behalf Of Pavel Machek
Sent: Saturday, August 14, 2021 4:47 AM
To: Chris Paterson <Chris.Paterson2@...>
Cc: Pavel Machek <pavel@...>; cip-dev@...
Subject: Re: [cip-dev] -stable-rc tests failing

Hi!

WARNING: Retrying...                                error=invalid argument
2170ERROR: Downloading artifacts from coordinator... error couldn't execute
GET against https://gitlab.com/api/v4/jobs/1492901612/artifacts?: Get
https://gitlab.com/api/v4/jobs/1492901612/artifacts?: dial tcp: i/o timeout
id=1492901612 token=9gNGsuaZ
2171FATAL: invalid argument
2173
Uploading artifacts for failed job

But I don't usually see this one:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1492901660

Job #371616: Submitted
219Health: Unknown
220Device Type: r8a7743-iwg20d-q7
221Device: None
222Test: boot
223URL: https://lava.ciplatform.org/scheduler/job/371616
224
226ERROR: Job failed: execution took longer than 2h0m0s seconds
It looks like the LAVA server has stopped processing jobs. Our queue has become very large!
I'll have to see what died and no doubt reboot it and the labs.

Sorry for the pain.
Thank you, it seems to be better now, and it looks like ctj_zynqmp is
available, too (I have not yet seen successful test, but it at least
tries).

But I still see problems with x86_qemu... the "2h timeout".
QEMU's queue is full. And QEMU of each lab is not working and it seems
that it can not be processed correctly.

Best regards,
Nobuhiro


Chris Paterson
 

Hello,

From: cip-dev@... <cip-dev@...> On
Behalf Of Nobuhiro Iwamatsu via lists.cip-project.org
Sent: 16 August 2021 00:15

Hi,

-----Original Message-----
From: cip-dev@... [mailto:cip-dev@...]
On Behalf Of Pavel Machek
Sent: Saturday, August 14, 2021 4:47 AM
To: Chris Paterson <Chris.Paterson2@...>
Cc: Pavel Machek <pavel@...>; cip-dev@...
Subject: Re: [cip-dev] -stable-rc tests failing

Hi!

WARNING: Retrying...                                error=invalid argument
2170ERROR: Downloading artifacts from coordinator... error couldn't
execute
GET against
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
.com%2Fapi%2Fv4%2Fjobs%2F1492901612%2Fartifacts&amp;data=04%7C01
%7Cchris.paterson2%40renesas.com%7C4479c8657b2a40a46e3108d960428bd
8%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C6376466612029063
02%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qwhRPtRSueYf9
7CTKC1JVJbJGJWHpfwkWq9GebqwLs4%3D&amp;reserved=0?: Get
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
.com%2Fapi%2Fv4%2Fjobs%2F1492901612%2Fartifacts&amp;data=04%7C01
%7Cchris.paterson2%40renesas.com%7C4479c8657b2a40a46e3108d960428bd
8%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C6376466612029063
02%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qwhRPtRSueYf9
7CTKC1JVJbJGJWHpfwkWq9GebqwLs4%3D&amp;reserved=0?: dial tcp: i/o
timeout
id=1492901612 token=9gNGsuaZ
2171FATAL: invalid argument
2173
Uploading artifacts for failed job

But I don't usually see this one:

https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
.com%2Fcip-project%2Fcip-testing%2Flinux-stable-rc-ci%2F-
&amp;data=04%7C01%7Cchris.paterson2%40renesas.com%7C4479c8657b2a4
0a46e3108d960428bd8%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%
7C637646661202906302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sda
ta=7bk5d%2FyboUuOw0l12NBKvIDvsnVLmzwvyVSniEUOjI4%3D&amp;reserv
ed=0
/jobs/1492901660

Job #371616: Submitted
219Health: Unknown
220Device Type: r8a7743-iwg20d-q7
221Device: None
222Test: boot
223URL:
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flava.c
iplatform.org%2Fscheduler%2Fjob%2F371616&amp;data=04%7C01%7Cchris.
paterson2%40renesas.com%7C4479c8657b2a40a46e3108d960428bd8%7C53d
82571da1947e49cb4625a166a4a2a%7C0%7C0%7C637646661202906302%7CUn
known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Mkr7BDJRwjM%2F1Oeg4Q
GEOpN%2Bj6%2B8C2%2BZ8mtbOppdhY0%3D&amp;reserved=0
224
226ERROR: Job failed: execution took longer than 2h0m0s seconds
It looks like the LAVA server has stopped processing jobs. Our queue has
become very large!
I'll have to see what died and no doubt reboot it and the labs.

Sorry for the pain.
Thank you, it seems to be better now, and it looks like ctj_zynqmp is
available, too (I have not yet seen successful test, but it at least
tries).

But I still see problems with x86_qemu... the "2h timeout".
QEMU's queue is full. And QEMU of each lab is not working and it seems
that it can not be processed correctly.
All of the QEMU machines have somehow got stuck in the "reserved" state.
I'm trying to fix it. Will update you asap.

Chris


Best regards,
Nobuhiro


Chris Paterson
 

Hello,

From: cip-dev@... <cip-dev@...> On
Behalf Of Chris Paterson via lists.cip-project.org
Sent: 16 August 2021 21:39

Hello,
[...]

But I still see problems with x86_qemu... the "2h timeout".
QEMU's queue is full. And QEMU of each lab is not working and it seems
that it can not be processed correctly.
All of the QEMU machines have somehow got stuck in the "reserved" state.
I'm trying to fix it. Will update you asap.
I've got these devices un-stuck now by editing their state in the database directly.
However, in the process of debugging I rebooted the LAVA server instance, so we'll now need to reboot the various labs, which is dependent on the local admins.
Hopefully normal service will be resumed soon.

Kind regards, Chris


Chris


Best regards,
Nobuhiro


Nobuhiro Iwamatsu
 

Hi,

-----Original Message-----
From: cip-dev@... [mailto:cip-dev@...] On Behalf Of Chris Paterson
Sent: Tuesday, August 17, 2021 5:58 AM
To: cip-dev@...
Subject: Re: [cip-dev] -stable-rc tests failing

Hello,

From: cip-dev@... <cip-dev@...> On
Behalf Of Chris Paterson via lists.cip-project.org
Sent: 16 August 2021 21:39

Hello,
[...]

But I still see problems with x86_qemu... the "2h timeout".
QEMU's queue is full. And QEMU of each lab is not working and it seems
that it can not be processed correctly.
All of the QEMU machines have somehow got stuck in the "reserved" state.
I'm trying to fix it. Will update you asap.
I've got these devices un-stuck now by editing their state in the database directly.
However, in the process of debugging I rebooted the LAVA server instance, so we'll now need to reboot the various labs,
which is dependent on the local admins.
Hopefully normal service will be resumed soon.
Thanks for your work!


Kind regards, Chris


Chris
Best regards,
Nobuhiro