KernelCI, gitlab testing notes


Pavel Machek
 

Hi!

* 4.4 kernelci warnings

https://linux.kernelci.org/build/cip/branch/linux-4.4.y-cip/kernel/v4.4.302-cip70-98-g7f7838c92740f/

Thanks for pointer. This looks good. Most of warnings are
"net/ipv4/inet_hashtables.c:608:68: warning: suggest parentheses
around ‘+’ in operand of ‘&’ [-Wparentheses]" which is my fault and on
my TODO list.

* SMC QEMU x86-64

https://storage.kernelci.org/cip/linux-4.4.y-cip/v4.4.302-cip70-98-g7f7838c92740f/x86_64/x86_64_defconfig/gcc-10/lab-collabora/smc-qemu_x86_64.html

Ok, I'll need to know more about the config. Is it possible that qemu
runs paravirtualized -- KVM?

If yes, we are basically testing whatever hardware it happens to run
on. Not good.

If no... the "soft" cpu it runs on does not have those bugs.

* Understanding gitlab results

+https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3184699360

Ok, so what should I be looking at?

-----------------------------------
374All submitted tests were successful
375-----------------------------------
376------------------------------
377Job Summary
378------------------------------
379Job #763143 Finished. Job health: Complete. URL: https://lava.ciplatform.org/scheduler/job/763143
380Job #763147 Finished. Job health: Complete. URL: https://lava.ciplatform.org/scheduler/job/763147
382

I have see. "Job health: incomplete" and that indicated problems. I
see "All submitted tests were successful". I guess that's ok.

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3141003858

Now that's pretty evil:

* 0_spectre-meltdown-checker-test.CVE-2018-12126 [fail]
266* 0_spectre-meltdown-checker-test.CVE-2018-3646 [pass]
...
288-----------------------------------
289All submitted tests were successful
290-----------------------------------
291

First, make it stand out visually. [pass] => [ok] and [fail] =>
[FAILURE] or something like that.

Second, saying all tests were successful (line 289) when there's
failure is ... confusing.

* LTP failures

I'm not sure where we run this or how. Anyway.

https://lava.ciplatform.org/scheduler/job/763461

I picked up one failure randomly, and that's config failure, not
kernel failure: utimensat01 1 TBROK: can't read /etc/sudoers

Not sure what is going there: quotactl01. Do we have quotas enabled?
syslog01 and friends is also failing. Is syslog configured correctly?

I guess best way would be to run ltp on 4.4-mainline, 4.4.302, 4.4-cip
and compare the results.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Florian Bezdeka
 

Hi all,

On Thu, 2022-10-20 at 18:08 +0200, Pavel Machek via lists.cip-
project.org wrote:
Hi!

* 4.4 kernelci warnings

https://linux.kernelci.org/build/cip/branch/linux-4.4.y-cip/kernel/v4.4.302-cip70-98-g7f7838c92740f/

Thanks for pointer. This looks good. Most of warnings are
"net/ipv4/inet_hashtables.c:608:68: warning: suggest parentheses
around ‘+’ in operand of ‘&’ [-Wparentheses]" which is my fault and on
my TODO list.
Great, so the filtering seems to work as expected.


* SMC QEMU x86-64

https://storage.kernelci.org/cip/linux-4.4.y-cip/v4.4.302-cip70-98-g7f7838c92740f/x86_64/x86_64_defconfig/gcc-10/lab-collabora/smc-qemu_x86_64.html

Ok, I'll need to know more about the config. Is it possible that qemu
runs paravirtualized -- KVM?
According to line 174 of the referenced job log above /dev/kvm is
mounted into the container running what they seem to call "qemu
emulator" and -enable-kvm is given.

Depending on the concrete hardware this test is scheduled on, it might
deliver different results. That seems to be a general issue. I can try
to address that.

Is the 4.4 series missing the backport of the mitigation for this CVE
or is that really a "can be fixed by microcode only" thing? I don't
want to silence a real issue. My question is: If we would run a newer
kernel, would we still be affected on the same (virtual) hardware?


If yes, we are basically testing whatever hardware it happens to run
on. Not good.

If no... the "soft" cpu it runs on does not have those bugs.

* Understanding gitlab results
[snip, I don't know the gitlab infrastructure]


Chris Paterson
 

Hello Pavel,

From: Pavel Machek <pavel@...>
Sent: 20 October 2022 17:09

Hi!
[...]


* Understanding gitlab results

+https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3184699360

Ok, so what should I be looking at?

-----------------------------------
374All submitted tests were successful
375-----------------------------------
376------------------------------
377Job Summary
378------------------------------
379Job #763143 Finished. Job health: Complete. URL:
https://lava.ciplatform.org/scheduler/job/763143
380Job #763147 Finished. Job health: Complete. URL:
https://lava.ciplatform.org/scheduler/job/763147
382

I have see. "Job health: incomplete" and that indicated problems. I
see "All submitted tests were successful". I guess that's ok.

https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3141003858

Now that's pretty evil:

* 0_spectre-meltdown-checker-test.CVE-2018-12126 [fail]
266* 0_spectre-meltdown-checker-test.CVE-2018-3646 [pass]
...
288-----------------------------------
289All submitted tests were successful
290-----------------------------------
291
Yes, this isn't clear.
What it should say is "All submitted test jobs completed".
Test job = LAVA test job that can contain many test cases


First, make it stand out visually. [pass] => [ok] and [fail] =>
[FAILURE] or something like that.

Second, saying all tests were successful (line 289) when there's
failure is ... confusing.
Agreed. I've updated the wording in this commit [0], part of a dev branch [1] I'm working on.

[0] https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/commit/f273d55b1c78d80f2e44f10ca22334e64eeb4613
[1] https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/commits/patersonc/check-test-results

Can you look at the new output in jobs in this pipeline [2]?
At the moment I've set the whole GitLab CI job to fail if there are any test cases that fail.
However I really don't like this approach as it means that merge requests in GitLab can never be merged if there are test cases that always fail, as all jobs need to pass.
E.g. SMC tests where there isn't a mitigation for a particular CVE [3]

[2] https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/pipelines/676545968
[3] https://gitlab.com/cip-project/cip-kernel/linux-cip/-/jobs/3141003858#L262

To avoid this issue we could create a separate job that runs at the end of all testing, that goes through and checks for failures.
You'd then only have to look at one GitLab CI job to get the full report.
I'd still need this job to return a green tick though if the actual job runs successfully, even if there are failed test cases.

Does this approach sound okay?

Of course, perhaps we should just switch over to KernelCI fully and make use of the features they already support such as test regression detections.

Kind regards, Chris


* LTP failures

I'm not sure where we run this or how. Anyway.

https://lava.ciplatform.org/scheduler/job/763461

I picked up one failure randomly, and that's config failure, not
kernel failure: utimensat01 1 TBROK: can't read /etc/sudoers

Not sure what is going there: quotactl01. Do we have quotas enabled?
syslog01 and friends is also failing. Is syslog configured correctly?

I guess best way would be to run ltp on 4.4-mainline, 4.4.302, 4.4-cip
and compare the results.

Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany