Failing -stable-rc tests


Pavel Machek
 

Hi!

This is one of the common failures:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/1364210937

The other one was not seen today.

Uploading binaries
79------------------
80upload: output/bzImage_siemens_ipc227e_defconfig_5.10.46-rc1_c00b84692/x86/siemens_ipc227e_defconfig/config/.config to s3://download.cip-project.org/ciptesting/ci/bzImage_siemens_ipc227e_defconfig_5.10.46-rc1_c00b84692/x86/siemens_ipc227e_defconfig/config/.config
81upload: output/bzImage_siemens_ipc227e_defconfig_5.10.46-rc1_c00b84692/x86/siemens_ipc227e_defconfig/kernel/bzImage to s3://download.cip-project.org/ciptesting/ci/bzImage_siemens_ipc227e_defconfig_5.10.46-rc1_c00b84692/x86/siemens_ipc227e_defconfig/kernel/bzImage
82upload: output/bzImage_siemens_ipc227e_defconfig_5.10.46-rc1_c00b84692/x86/siemens_ipc227e_defconfig/modules/modules.tar.gz to s3://download.cip-project.org/ciptesting/ci/bzImage_siemens_ipc227e_defconfig_5.10.46-rc1_c00b84692/x86/siemens_ipc227e_defconfig/modules/modules.tar.gz
83fatal error: Could not connect to the endpoint URL: "https://s3.us-west-2.amazonaws.com/download.cip-project.org?list-type=2&prefix=ciptesting%2Fci%2F&continuation-token=1B7XIJq02rzWljriOBePCr8doMhusLUS2jft0k2cLqJDmAxWeWFzs3V30zhXBtgfV87AvBf1P519d7AYS%2Bp7qt3yEDm7KapIDVvcmIAG8WJCA7F3O0u488Gs0%2F7oSTol6DRSt%2Btj3%2FJ4pXB7ZEDL47m4GSSNa0YzUJuMi%2B%2F5MyuaYuejHVTPV5mLw0GBv5oSzxDtIktUYrKcshmrP2CJh5A%3D%3D&encoding-type=url"
85
Uploading artifacts for failed job
01:39
86Uploading artifacts...
87output: found 11 matching files and directories
88ERROR: Uploading artifacts as "archive" to coordinator... error error=couldn't execute POST against https://gitlab.com/api/v4/jobs/1364210937/artifacts?artifact_format=zip&artifact_type=archive&expire_in=1+month: Post https://gitlab.com/api/v4/jobs/1364210937/artifacts?artifact_format=zip&artifact_type=archive&expire_in=1+month: dial tcp: i/o timeout id=1364210937 token=d4_rH_yF
89

Best regards,
Pavel

--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Chris Paterson
 

Hi Pavel,

From: Pavel Machek <pavel@denx.de>
Sent: 21 June 2021 20:36

Hi!

This is one of the common failures:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1364210937
Thanks for sharing the details.
We'll take a look.

Kind regards, Chris


The other one was not seen today.

Uploading binaries
79------------------
80upload: output/bzImage_siemens_ipc227e_defconfig_5.10.46-
rc1_c00b84692/x86/siemens_ipc227e_defconfig/config/.config to
s3://download.cip-
project.org/ciptesting/ci/bzImage_siemens_ipc227e_defconfig_5.10.46-
rc1_c00b84692/x86/siemens_ipc227e_defconfig/config/.config
81upload: output/bzImage_siemens_ipc227e_defconfig_5.10.46-
rc1_c00b84692/x86/siemens_ipc227e_defconfig/kernel/bzImage to
s3://download.cip-
project.org/ciptesting/ci/bzImage_siemens_ipc227e_defconfig_5.10.46-
rc1_c00b84692/x86/siemens_ipc227e_defconfig/kernel/bzImage
82upload: output/bzImage_siemens_ipc227e_defconfig_5.10.46-
rc1_c00b84692/x86/siemens_ipc227e_defconfig/modules/modules.tar.gz to
s3://download.cip-
project.org/ciptesting/ci/bzImage_siemens_ipc227e_defconfig_5.10.46-
rc1_c00b84692/x86/siemens_ipc227e_defconfig/modules/modules.tar.gz
83fatal error: Could not connect to the endpoint URL: "https://s3.us-west-
2.amazonaws.com/download.cip-project.org?list-
type=2&prefix=ciptesting%2Fci%2F&continuation-
token=1B7XIJq02rzWljriOBePCr8doMhusLUS2jft0k2cLqJDmAxWeWFzs3V30z
hXBtgfV87AvBf1P519d7AYS%2Bp7qt3yEDm7KapIDVvcmIAG8WJCA7F3O0u48
8Gs0%2F7oSTol6DRSt%2Btj3%2FJ4pXB7ZEDL47m4GSSNa0YzUJuMi%2B%2F5
MyuaYuejHVTPV5mLw0GBv5oSzxDtIktUYrKcshmrP2CJh5A%3D%3D&encodin
g-type=url"
85
Uploading artifacts for failed job
01:39
86Uploading artifacts...
87output: found 11 matching files and directories
88ERROR: Uploading artifacts as "archive" to coordinator... error
error=couldn't execute POST against
https://gitlab.com/api/v4/jobs/1364210937/artifacts?artifact_format=zip&art
ifact_type=archive&expire_in=1+month: Post
https://gitlab.com/api/v4/jobs/1364210937/artifacts?artifact_format=zip&art
ifact_type=archive&expire_in=1+month: dial tcp: i/o timeout id=1364210937
token=d4_rH_yF
89

Best regards,
                                                              Pavel

--
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Pavel Machek
 

Hi!

From: Pavel Machek <pavel@denx.de>
Sent: 21 June 2021 20:36

Hi!

This is one of the common failures:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1364210937
Thanks for sharing the details.
We'll take a look.
I re-tried, and now I got this:

https://lava.ciplatform.org/scheduler/job/302528

I'll retry once more, I believe it will go way.

Best regards,
Pavel

--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Chris Paterson
 

Hello Pavel,

From: Pavel Machek <pavel@denx.de>
Sent: 21 June 2021 22:51

Hi!

From: Pavel Machek <pavel@denx.de>
Sent: 21 June 2021 20:36

Hi!

This is one of the common failures:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-
/jobs/1364210937
Thanks for sharing the details.
We'll take a look.
I re-tried, and now I got this:

https://lava.ciplatform.org/scheduler/job/302528
This is something we sometimes see in LAVA. It's not 100% reliable in getting the serial connection to the board up and running :(


I'll retry once more, I believe it will go way.
Yea. It's annoying though.
Same test, same board: https://lava.ciplatform.org/scheduler/job/302551

Kind regards, Chris


Best regards,
                                              Pavel

--
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Michael Adler
 

Hi Pavel,

This is one of the common failures:

https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/1364210937
I suspect this is a network issue on the runner node:

- upload to AWS S3 failed
- upload to the Gitlab storage failed too

I'm wondering if this is just a temporary issue (which goes away after waiting for a couple of minutes) of if the
network is permanently down on our (ephermal) runner node.

I suggest adding some retry error handling logic to `/opt/submit_tests.sh`: if the upload to AWS S3 fails, perform some
network health checks, in particular ping/curl some other site like google.com, and try the upload again for a few minutes.

I have also upgraded our runners to v14 (although I don't think that will help in this case).

Kind Regards,
Michael

--
Michael Adler

Siemens AG
T RDA IOT SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322