gitlab test failures -- job failed ... pods not found


Pavel Machek
 

Hi!

I have test failures all over the place:

-----------------
44Creating test job
45-----------------
46Version: Image_ctj_zynqmp_defconfig_5.10.145-cip17-rt7_411cd76b5
47Arch: arm64
48Config: ctj_zynqmp_defconfig
49Device: zynqmp-zcu102
50Kernel: Image
51DTB: zynqmp-zcu102-rev1.0.dtb
52Modules: N/A
53Test: smc
54------------------
55Uploading binaries
56------------------
58
Uploading artifacts for failed job
00:00
60
Cleaning up project directory and file based variables
00:00
62ERROR: Job failed (system failure): pods
"runner-bwzp7ahx-project-2678032-concurrent-0pstk2" not found

Initially everything failed, so I resubmitted everything, then three
succeeded but no more luck with resubmitting.

Example is
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3109431179,
I'm trying to test 5.10-rt release candidate
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/pipelines/654967302
.

Any ideas?

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Chris Paterson
 

Hello Pavel,

From: Pavel Machek <pavel@...>
Sent: 30 September 2022 11:51

Hi!

I have test failures all over the place:

-----------------
44Creating test job
45-----------------
46Version: Image_ctj_zynqmp_defconfig_5.10.145-cip17-rt7_411cd76b5
47Arch: arm64
48Config: ctj_zynqmp_defconfig
49Device: zynqmp-zcu102
50Kernel: Image
51DTB: zynqmp-zcu102-rev1.0.dtb
52Modules: N/A
53Test: smc
54------------------
55Uploading binaries
56------------------
58
Uploading artifacts for failed job
00:00
60
Cleaning up project directory and file based variables
00:00
62ERROR: Job failed (system failure): pods
"runner-bwzp7ahx-project-2678032-concurrent-0pstk2" not found

Initially everything failed, so I resubmitted everything, then three
succeeded but no more luck with resubmitting.
Thank you for reporting.
We recently upgraded the gitlab runner versions, but that was last week so it's a bit strange we're seeing issues now.
I've resubmitted a few of the jobs and as you say, some seem to work, others don't.

@Adler, Michael, could you take a look when you get a chance? Could this be caused by the recent upgrade?

Thanks, Chris



Example is
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3109431179,
I'm trying to test 5.10-rt release candidate
https://gitlab.com/cip-project/cip-kernel/linux-cip/-/pipelines/654967302
.

Any ideas?

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Chris Paterson
 

Hi Pavel,

From: cip-dev@... <cip-dev@...> On
Behalf Of Chris Paterson via lists.cip-project.org
Sent: 30 September 2022 12:37

Hello Pavel,

From: Pavel Machek <pavel@...>
Sent: 30 September 2022 11:51

Hi!

I have test failures all over the place:

-----------------
44Creating test job
45-----------------
46Version: Image_ctj_zynqmp_defconfig_5.10.145-cip17-rt7_411cd76b5
47Arch: arm64
48Config: ctj_zynqmp_defconfig
49Device: zynqmp-zcu102
50Kernel: Image
51DTB: zynqmp-zcu102-rev1.0.dtb
52Modules: N/A
53Test: smc
54------------------
55Uploading binaries
56------------------
58
Uploading artifacts for failed job
00:00
60
Cleaning up project directory and file based variables
00:00
62ERROR: Job failed (system failure): pods
"runner-bwzp7ahx-project-2678032-concurrent-0pstk2" not found

Initially everything failed, so I resubmitted everything, then three
succeeded but no more luck with resubmitting.
Thank you for reporting.
We recently upgraded the gitlab runner versions, but that was last week so
it's a bit strange we're seeing issues now.
I've resubmitted a few of the jobs and as you say, some seem to work,
others don't.

@Adler, Michael, could you take a look when you get a chance? Could this be
caused by the recent upgrade?
FYI if you need some test results sooner, kernelci's test results on the same code are here:
https://linux.kernelci.org/test/job/cip-gitlab/branch/ci%2Fpavel%2Flinux-test/kernel/v5.10.145-cip17-422-g411cd76b5afe8/

The history of all builds/tests for your linux-test branch can be seen here:
https://linux.kernelci.org/job/cip-gitlab/branch/ci%2Fpavel%2Flinux-test/

Or results from all of our "CI" branches here:
https://linux.kernelci.org/job/cip-gitlab/


Kind regards, Chris


Thanks, Chris



Example is
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
.com%2Fcip-project%2Fcip-kernel%2Flinux-cip%2F-
%2Fjobs%2F3109431179&amp;data=05%7C01%7Cchris.paterson2%40renesas
.com%7C71678297f9c24981513908daa2d81661%7C53d82571da1947e49cb4625
a166a4a2a%7C0%7C0%7C638001346214414952%7CUnknown%7CTWFpbGZsb
3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
%3D%7C3000%7C%7C%7C&amp;sdata=3ABojWTs1UqFfSh%2F2ZkYPvcPX2pk
6zuOYQYJYGGm8BU%3D&amp;reserved=0,
I'm trying to test 5.10-rt release candidate
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
.com%2Fcip-project%2Fcip-kernel%2Flinux-cip%2F-
%2Fpipelines%2F654967302&amp;data=05%7C01%7Cchris.paterson2%40ren
esas.com%7C71678297f9c24981513908daa2d81661%7C53d82571da1947e49cb
4625a166a4a2a%7C0%7C0%7C638001346214414952%7CUnknown%7CTWFpb
GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=q9behdYj1bp38YqpKXWk8nvt9
UEvbpSnF7nC92W5kTk%3D&amp;reserved=0
.

Any ideas?

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Michael Adler
 

Hi all,

@Adler, Michael, could you take a look when you get a chance? Could this be
caused by the recent upgrade?
not sure. For now, I have downgraded the Gitlab runner to the previous version. Please try again and let me know how
that works out.

Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Pavel Machek
 

Hi!

@Adler, Michael, could you take a look when you get a chance? Could this be
caused by the recent upgrade?
not sure. For now, I have downgraded the Gitlab runner to the previous version. Please try again and let me know how
that works out.
Still there, I'd say:

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3123103311


94upload: output/Image_renesas_defconfig_5.10.145-cip17-rt6_0d804ef4a/arm64/renesas_defconfig/dtb/rcar_du_of_lvds_r8a7796.dtb to s3://download2.cip-project.org/cip-testing/linux-cip/Image_renesas_defconfig_5.10.145-cip17-rt6_0d804ef4a/arm64/renesas_defconfig/dtb/rcar_du_of_lvds_r8a7796.dtb
95Completed 10.3 MiB/~22.2 MiB (2.0 MiB/s) with ~1 file(s) remaining (calculating...)
96Completed 15.2 MiB/~22.2 MiB (2.7 MiB/s) with ~1 file(s) remaining (calculating...)
97Completed 22.2 MiB/~22.2 MiB (3.4 MiB/s) with ~1 file(s) remaining (calculating...)
98upload: output/Image_renesas_defconfig_5.10.145-cip17-rt6_0d804ef4a/arm64/renesas_defconfig/kernel/Image to s3://download2.cip-project.org/cip-testing/linux-cip/Image_renesas_defconfig_5.10.145-cip17-rt6_0d804ef4a/arm64/renesas_defconfig/kernel/Image
99Completed 22.2 MiB/~22.2 MiB (3.4 MiB/s) with ~0 file(s) remaining (calculating...)
101
Uploading artifacts for failed job
00:00
103
Cleaning up project directory and file based variables
00:00
105ERROR: Job failed (system failure): pods
"runner-kmcmucmn-project-2678032-concurrent-4jcrtw" not found

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/pipelines/657708852

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Michael Adler
 

Still there, I'd say:

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3123103311
Ok, I have restored our previous cluster (v9). Please try again and tell me if the problem still persists. If it does,
then I'd blame Gitlab I/O otherwise it could be a regression somewhere in the K8s stack - in that case, our best bet
might be to sit it out.

Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Chris Paterson
 

Hi Michael,

From: Michael Adler <michael.adler@...>
Sent: 07 October 2022 07:58

Still there, I'd say:

https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab
.com%2Fcip-project%2Fcip-kernel%2Flinux-cip%2F-
%2Fjobs%2F3123103311&amp;data=05%7C01%7CChris.Paterson2%40renesa
s.com%7C6f9f909bd6244ce82bc908daa83148bd%7C53d82571da1947e49cb462
5a166a4a2a%7C0%7C0%7C638007226871934080%7CUnknown%7CTWFpbGZs
b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
0%3D%7C3000%7C%7C%7C&amp;sdata=zxHVVXOY0Xdh4qmQMVWTK27n47
NeaESoLEaW2ho9Eyw%3D&amp;reserved=0

Ok, I have restored our previous cluster (v9). Please try again and tell me if
the problem still persists. If it does,
I haven't seen the issue Pavel reported yet, but it'll take time to be sure.

The issue I was having building docker containers for linux-cip-ci _is_ still present though :(
The jobs get stuck on "Cleaning up project directory and file based variables" for some reason.
e.g. https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/jobs/3140519507

From some debugging it looks like this is only happening when we use a kaniko container.
I've tried a handful of versions in case kaniko has broken something recently.
We use kankio for building the docker containers as dind builds using docker weren't working in our setup.

then I'd blame Gitlab I/O otherwise it could be a regression somewhere in
the K8s stack - in that case, our best bet
might be to sit it out.
We need to find some sort of solution for building our CI build containers.

Kind regards, Chris


Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann
Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike,
Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin
und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB
12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Michael Adler
 

We need to find some sort of solution for building our CI build containers.
Why not use the shared runners provided by Gitlab? Apparently, there's even a dind capable runner (tag gitlab-org-docker).
If the Gitlab runners work, the next step would be to make sure we are using the same version as Gitlab.
Since we're back to using the exact same versions as in the previous cluster (which did work for us!), my guess is that
the culprit is outside of our cluster, e.g. Gitlab or (less likely) kaniko.

Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Chris Paterson
 

Hello,

From: Michael Adler <michael.adler@...>
Sent: 07 October 2022 10:27

We need to find some sort of solution for building our CI build containers.
Why not use the shared runners provided by Gitlab? Apparently, there's
even a dind capable runner (tag gitlab-org-docker).
Yea I was actually just looking into that :)

If the Gitlab runners work, the next step would be to make sure we are using
the same version as Gitlab.
Since we're back to using the exact same versions as in the previous cluster
(which did work for us!), my guess is that
the culprit is outside of our cluster, e.g. Gitlab or (less likely) kaniko.
Agreed.

Thanks, Chris


Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann
Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike,
Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin
und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB
12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Chris Paterson
 

From: cip-dev@... <cip-dev@...> On
Behalf Of Chris Paterson via lists.cip-project.org
Sent: 07 October 2022 10:39

Hello,

From: Michael Adler <michael.adler@...>
Sent: 07 October 2022 10:27

We need to find some sort of solution for building our CI build containers.
Why not use the shared runners provided by Gitlab? Apparently, there's
even a dind capable runner (tag gitlab-org-docker).
Yea I was actually just looking into that :)
And indeed it seems to work fine:
https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/jobs/3141334688

I haven't tested a complete pipeline yet, but alas I need to go offline for most the day now.
I'll look again this evening.


If the Gitlab runners work, the next step would be to make sure we are
using
the same version as Gitlab.
Working version from above is gitlab-runner 15.4.0~beta.5.gdefc7017 (defc7017).

Kind regards, Chris

Since we're back to using the exact same versions as in the previous cluster
(which did work for us!), my guess is that
the culprit is outside of our cluster, e.g. Gitlab or (less likely) kaniko.
Agreed.

Thanks, Chris


Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann
Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik
Neike,
Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft:
Berlin
und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB
12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


Pavel Machek
 

Hi!

Still there, I'd say:

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3123103311
Ok, I have restored our previous cluster (v9). Please try again and tell me if the problem still persists. If it does,
then I'd blame Gitlab I/O otherwise it could be a regression somewhere in the K8s stack - in that case, our best bet
might be to sit it out.
It may be a bit better.

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/pipelines/660868595

Still see bbb failure, but that's different one:

https://lava.ciplatform.org/scheduler/job/756260

Thanks,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Michael Adler
 

And indeed it seems to work fine:
https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/jobs/3141334688
Good! Using shared runners saves us money as well. Is the performance good enough? If that's the case, I'd suggest to
use them permanently.

Working version from above is gitlab-runner 15.4.0~beta.5.gdefc7017 (defc7017).
Hmm, that's a bit older than the latest released version (tag v15.4.0). I wonder why they haven't upgraded from beta 5 to the
final version? Seems kinda odd, I would have guessed that Gitlab does CI/CD. Anyhow, I have upgraded to the officially
released version (and will purge the 15.2.0 runners once they are idle). Could you give this another try, Chris?

Kind Regards,
Michael

--
Michael Adler

Siemens AG
T CED SES-DE
Otto-Hahn-Ring 6
81739 München, Deutschland

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann Snabe; Vorstand: Roland Busch, Vorsitzender; Klaus Helmrich, Cedrik Neike, Matthias Rebellius, Ralf P. Thomas, Judith Wiese; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin-Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322