Date
1 - 20 of 22
ldconfig segfault on RZ/Five was Re: Preparing isar-cip-core for RZ/Five
Pavel Machek
Hi!
binary.
But I do have slightly different results then you (I think; I'm far
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
(gdb)
Dump of assembler code from 0x385d4 to 0x385f4:
=> 0x00000000000385d4: lb zero,81(t1)
0x00000000000385d8: andi a1,a1,25
0x00000000000385da: sd zero,24(sp)
0x00000000000385dc: sd zero,32(sp)
If I do the stepi, it will give the illegal instruction, because,
well, we are in the middle of the auipc instruction:
(gdb) disassemble $pc-0x10,+0x20
Dump of assembler code from 0x385c4 to 0x385e4:
0x00000000000385c4: .4byte 0x4881f753
0x00000000000385c8: li a6,0
0x00000000000385ca: li a5,0
0x00000000000385cc: addi a3,a1,920
0x00000000000385d0: mv a2,s8
0x00000000000385d2: auipc a0,0x3f
0x00000000000385d6: addi a0,a0,-1890 # 0x76e70
0x00000000000385da: sd zero,24(sp)
0x00000000000385dc: sd zero,32(sp)
0x00000000000385de: sb t3,20(sp)
0x00000000000385e2: sd s7,40(sp)
End of assembler dump.
(gdb)
Weird. But it explains sigill when executing auipc does not result in
segfault...
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Yeah, ldconfig is needed for installation. But I get a segfaulting gccI tried, but installation fails - illegal instruction.Hmm, seems the issue persists::-(. Do you get gcc faulting, too?
binary.
No idea.No idea - but why should ldconfig be self-modifying?root@demo:~# ldconfig...
[ 297.146728] ldconfig[497]: unhandled signal 4 code 0x1 at 0x00000000000380c8 in ldconfig[10000+83000](gdb) disassemble $pc,+0x10auipc is something rather simple. a2 = pc + 0x66 << something. Not
Dump of assembler code from 0x380c8 to 0x380d8:
=> 0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
sure how it could fault. Plus we get "illegal instruction", suggesting
it is not some other fault.
Could some kind of self-modifying code be involved? I guess some kind
of debugging/watchpoint is not probable.
But I do have slightly different results then you (I think; I'm far
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
(gdb)
Dump of assembler code from 0x385d4 to 0x385f4:
=> 0x00000000000385d4: lb zero,81(t1)
0x00000000000385d8: andi a1,a1,25
0x00000000000385da: sd zero,24(sp)
0x00000000000385dc: sd zero,32(sp)
If I do the stepi, it will give the illegal instruction, because,
well, we are in the middle of the auipc instruction:
(gdb) disassemble $pc-0x10,+0x20
Dump of assembler code from 0x385c4 to 0x385e4:
0x00000000000385c4: .4byte 0x4881f753
0x00000000000385c8: li a6,0
0x00000000000385ca: li a5,0
0x00000000000385cc: addi a3,a1,920
0x00000000000385d0: mv a2,s8
0x00000000000385d2: auipc a0,0x3f
0x00000000000385d6: addi a0,a0,-1890 # 0x76e70
0x00000000000385da: sd zero,24(sp)
0x00000000000385dc: sd zero,32(sp)
0x00000000000385de: sb t3,20(sp)
0x00000000000385e2: sd s7,40(sp)
End of assembler dump.
(gdb)
Weird. But it explains sigill when executing auipc does not result in
segfault...
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Pavel Machek
Hi!
path.
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
It crashes rather soon after startup, so I was able to trace completeI tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get a segfaulting gcc
binary.
path.
But I do have slightly different results then you (I think; I'm farI believe it should not end at 0x00000000000385d4 at all. The
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Jan Kiszka
On 07.10.22 00:32, Pavel Machek wrote:
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Hi!Did you try to compare the call trace to QEMU, where we divert?It crashes rather soon after startup, so I was able to trace completeI tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get a segfaulting gcc
binary.
path.But I do have slightly different results then you (I think; I'm farI believe it should not end at 0x00000000000385d4 at all. The
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
Best regards,
Pavel
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Pavel Machek
Hi!
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you should
have sigill... and complete steps that lead to it.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Yes, that's possible way forward, but it will require someDid you try to compare the call trace to QEMU, where we divert?It crashes rather soon after startup, so I was able to trace completeI tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get a segfaulting gcc
binary.
path.But I do have slightly different results then you (I think; I'm farI believe it should not end at 0x00000000000385d4 at all. The
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you should
have sigill... and complete steps that lead to it.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Jan Kiszka
On 07.10.22 12:19, Pavel Machek wrote:
getting a page fault on the instruction before the one that was causing
SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted 5.10.83-cip1-riscv-renesas #1
[ 558.490697] epc: 00000000000380c6 ra : 0000000000015382 sp : 0000003fff9e3c10
[ 558.490703] gp : 0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0
[ 558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 : 0000002b079e9510
[ 558.490716] s1 : 0000000000000001 a0 : 0000003fff9e3d18 a1 : 0000000000000001
[ 558.490722] a2 : 0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18
[ 558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 : 00000000000000dd
[ 558.490734] s2 : 0000003fff9e3c88 s3 : 0000000000000000 s4 : 0000000000000000
[ 558.490740] s5 : 00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0
[ 558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10: 0000002acb8fc9b0
[ 558.490752] s11: 0000002acb8fc920 t3 : 0000002acb80f5d8 t4 : 000000000009259c
[ 558.490758] t5 : 0000000000000004 t6 : 0000002b0799c010
[ 558.490764] status: 0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10
Dump of assembler code from 0x380c6 to 0x380d6:
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flow is
identical. Registers are almost the same, except for some temporaries:
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memory sizes
and layouts) or a symptom of the problem.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Hi!I've updated sid-ports (dropped the snapshot pinning), and now I'mYes, that's possible way forward, but it will require someDid you try to compare the call trace to QEMU, where we divert?It crashes rather soon after startup, so I was able to trace completeI tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get a segfaulting gcc
binary.
path.But I do have slightly different results then you (I think; I'm farI believe it should not end at 0x00000000000385d4 at all. The
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you should
have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was causing
SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted 5.10.83-cip1-riscv-renesas #1
[ 558.490697] epc: 00000000000380c6 ra : 0000000000015382 sp : 0000003fff9e3c10
[ 558.490703] gp : 0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0
[ 558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 : 0000002b079e9510
[ 558.490716] s1 : 0000000000000001 a0 : 0000003fff9e3d18 a1 : 0000000000000001
[ 558.490722] a2 : 0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18
[ 558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 : 00000000000000dd
[ 558.490734] s2 : 0000003fff9e3c88 s3 : 0000000000000000 s4 : 0000000000000000
[ 558.490740] s5 : 00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0
[ 558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10: 0000002acb8fc9b0
[ 558.490752] s11: 0000002acb8fc920 t3 : 0000002acb80f5d8 t4 : 000000000009259c
[ 558.490758] t5 : 0000000000000004 t6 : 0000002b0799c010
[ 558.490764] status: 0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10
Dump of assembler code from 0x380c6 to 0x380d6:
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flow is
identical. Registers are almost the same, except for some temporaries:
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memory sizes
and layouts) or a symptom of the problem.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Jan Kiszka
On 08.10.22 10:27, Jan Kiszka wrote:
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
root@smarc-rzfive:~# ldconfig
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at 0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-cip1-riscv-renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp : 0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 : 0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 : 0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 : 0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 : 000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 : 0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 : ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 : 0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10: 0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 : 000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088 cause: 000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig" exe="/sbin/ldconfig" sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
On 07.10.22 12:19, Pavel Machek wrote:...Hi!I've updated sid-ports (dropped the snapshot pinning), and now I'mYes, that's possible way forward, but it will require someDid you try to compare the call trace to QEMU, where we divert?It crashes rather soon after startup, so I was able to trace completeI tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get a segfaulting gcc
binary.
path.But I do have slightly different results then you (I think; I'm farI believe it should not end at 0x00000000000385d4 at all. The
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you should
have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was causing
SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted 5.10.83-cip1-riscv-renesas #1
[ 558.490697] epc: 00000000000380c6 ra : 0000000000015382 sp : 0000003fff9e3c10
[ 558.490703] gp : 0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0
[ 558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 : 0000002b079e9510
[ 558.490716] s1 : 0000000000000001 a0 : 0000003fff9e3d18 a1 : 0000000000000001
[ 558.490722] a2 : 0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18
[ 558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 : 00000000000000dd
[ 558.490734] s2 : 0000003fff9e3c88 s3 : 0000000000000000 s4 : 0000000000000000
[ 558.490740] s5 : 00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0
[ 558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10: 0000002acb8fc9b0
[ 558.490752] s11: 0000002acb8fc920 t3 : 0000002acb80f5d8 t4 : 000000000009259c
[ 558.490758] t5 : 0000000000000004 t6 : 0000002b0799c010
[ 558.490764] status: 0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10
Dump of assembler code from 0x380c6 to 0x380d6:
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flow is
identical. Registers are almost the same, except for some temporaries:
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memory sizes
and layouts) or a symptom of the problem.
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156 uid=0 old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
root@smarc-rzfive:~# ldconfig
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at 0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-cip1-riscv-renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp : 0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 : 0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 : 0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 : 0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 : 000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 : 0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 : ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 : 0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10: 0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 : 000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088 cause: 000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig" exe="/sbin/ldconfig" sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Biju Das
Subject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re: PreparingWhat is your conclusion? Is it tool chain related issue? Or cache related issue?
isar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get araI've updated sid-ports (dropped the snapshot pinning), and now I'mYes, that's possible way forward, but it will require someDid you try to compare the call trace to QEMU, where we divert?gcc binary.It crashes rather soon after startup, so I was able to trace
complete path.But I do have slightly different results then you (I think; I'mI believe it should not end at 0x00000000000385d4 at all. The
far from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you
should have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was
causing SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted
5.10.83-cip1-riscv-renesas #1 [ 558.490697] epc: 00000000000380c6: 0000000000015382 sp : 0000003fff9e3c10 [ 558.490703] gp :from
0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0 [
558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 :
0000002b079e9510 [ 558.490716] s1 : 0000000000000001 a0 :
0000003fff9e3d18 a1 : 0000000000000001 [ 558.490722] a2 :
0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18 [
558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 :
00000000000000dd [ 558.490734] s2 : 0000003fff9e3c88 s3 :
0000000000000000 s4 : 0000000000000000 [ 558.490740] s5 :
00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0 [
558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10:
0000002acb8fc9b0 [ 558.490752] s11: 0000002acb8fc920 t3 :
0000002acb80f5d8 t4 : 000000000009259c [ 558.490758] t5 :
0000000000000004 t6 : 0000002b0799c010 [ 558.490764] status:
0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10 Dump of assembler code0x380c6 to 0x380d6:is
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flowidentical. Registers are almost the same, except for sometemporaries:sizes
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memoryand layouts) or a symptom of the problem....
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156 uid=0
old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
root@smarc-rzfive:~# ldconfig
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at
0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-
cip1-riscv-renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp :
0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 :
0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 :
0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 :
0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 :
000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 :
0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 :
ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 :
0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10:
0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 :
000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088
cause: 000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig"
exe="/sbin/ldconfig" sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...
Or
Something else ?
Cheers,
Biju
Chris Paterson
Hi Jan,
Kind regards, Chris
From: Jan Kiszka <jan.kiszka@...>Thanks, we'll take a look.
Sent: 09 October 2022 09:29
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:riscv-renesas #1Hi!I've updated sid-ports (dropped the snapshot pinning), and now I'mYes, that's possible way forward, but it will require someDid you try to compare the call trace to QEMU, where we divert?It crashes rather soon after startup, so I was able to trace completeI tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get a segfaulting gcc
binary.
path.But I do have slightly different results then you (I think; I'm farI believe it should not end at 0x00000000000385d4 at all. The
from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you should
have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was causing
SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted 5.10.83-cip1-[ 558.490697] epc: 00000000000380c6 ra : 0000000000015382 sp :0000003fff9e3c10[ 558.490703] gp : 0000000000099da8 tp : 0000003fe9c3c800 t0 :0000003fe9c427c0[ 558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 :0000002b079e9510[ 558.490716] s1 : 0000000000000001 a0 : 0000003fff9e3d18 a1 :0000000000000001[ 558.490722] a2 : 0000003fff9e3c88 a3 : 0000000000000000 a4 :0000003fff9e3d18[ 558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 :00000000000000dd[ 558.490734] s2 : 0000003fff9e3c88 s3 : 0000000000000000 s4 :0000000000000000[ 558.490740] s5 : 00000000000105a4 s6 : 000000000009e670 s7 :0000002b079c8ab0[ 558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10:0000002acb8fc9b0[ 558.490752] s11: 0000002acb8fc920 t3 : 0000002acb80f5d8 t4 :000000000009259c[ 558.490758] t5 : 0000000000000004 t6 : 0000002b0799c010000000000000000f
[ 558.490764] status: 0000000200004020 badaddr: 00000000000000e1 cause:...
(gdb) disassemble 0x00000000000380c6,+0x10
Dump of assembler code from 0x380c6 to 0x380d6:
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flow is
identical. Registers are almost the same, except for some temporaries:
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memory sizes
and layouts) or a symptom of the problem.
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156 uid=0 old-
auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
root@smarc-rzfive:~# ldconfig
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at
0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-cip1-riscv-
renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp :
0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 :
0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 :
0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 :
0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 :
000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 :
0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 : ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 :
0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10:
0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 :
000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088 cause:
000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3): auid=4294967295
uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig" exe="/sbin/ldconfig"
sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...
Kind regards, Chris
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Jan Kiszka
On 09.10.22 10:42, Biju Das wrote:
SoC. We can just rule out by now that the issue is Debian-exclusive.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
I have no idea and still only limited knowledge about the arch and thisSubject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re: PreparingWhat is your conclusion? Is it tool chain related issue? Or cache related issue?
isar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get araI've updated sid-ports (dropped the snapshot pinning), and now I'mYes, that's possible way forward, but it will require someDid you try to compare the call trace to QEMU, where we divert?gcc binary.It crashes rather soon after startup, so I was able to trace
complete path.But I do have slightly different results then you (I think; I'mI believe it should not end at 0x00000000000385d4 at all. The
far from risc-v expert). I did a breakpoint:
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up calling 0x3806a
AFAICT, but it calls 0x385d4 instead. It happens during
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig's entrypoint,
from that point you can just stepi. In less than 200 steps, you
should have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was
causing SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted
5.10.83-cip1-riscv-renesas #1 [ 558.490697] epc: 00000000000380c6: 0000000000015382 sp : 0000003fff9e3c10 [ 558.490703] gp :from
0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0 [
558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 :
0000002b079e9510 [ 558.490716] s1 : 0000000000000001 a0 :
0000003fff9e3d18 a1 : 0000000000000001 [ 558.490722] a2 :
0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18 [
558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 :
00000000000000dd [ 558.490734] s2 : 0000003fff9e3c88 s3 :
0000000000000000 s4 : 0000000000000000 [ 558.490740] s5 :
00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0 [
558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10:
0000002acb8fc9b0 [ 558.490752] s11: 0000002acb8fc920 t3 :
0000002acb80f5d8 t4 : 000000000009259c [ 558.490758] t5 :
0000000000000004 t6 : 0000002b0799c010 [ 558.490764] status:
0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10 Dump of assembler code0x380c6 to 0x380d6:is
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flowidentical. Registers are almost the same, except for sometemporaries:sizes
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memoryand layouts) or a symptom of the problem....
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156 uid=0
old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1 res=1
root@smarc-rzfive:~# ldconfig
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at
0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-
cip1-riscv-renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp :
0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 :
0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 :
0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 :
0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 :
000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 :
0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 :
ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 :
0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10:
0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 :
000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088
cause: 000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig"
exe="/sbin/ldconfig" sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...
Or
Something else ?
SoC. We can just rule out by now that the issue is Debian-exclusive.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Biju Das
Subject: Re: RE: [cip-dev] ldconfig segfault on RZ/Five was Re:Thanks for your feedback.
Preparing isar-cip-core for RZ/Five
On 09.10.22 10:42, Biju Das wrote:PreparingSubject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re:I'misar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get agcc binary.It crashes rather soon after startup, so I was able to trace
complete path.But I do have slightly different results then you (I think;0x3806afar from risc-v expert). I did a breakpoint:I believe it should not end at 0x00000000000385d4 at all. The
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up callingentrypoint,Yes, that's possible way forward, but it will require someAFAICT, but it calls 0x385d4 instead. It happens duringDid you try to compare the call trace to QEMU, where we divert?
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig'suid=0rafrom that point you can just stepi. In less than 200 steps, youI've updated sid-ports (dropped the snapshot pinning), and now I'm
should have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was
causing SIGILL before:
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted
5.10.83-cip1-riscv-renesas #1 [ 558.490697] epc: 00000000000380c6: 0000000000015382 sp : 0000003fff9e3c10 [ 558.490703] gp :from
0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0 [
558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 :
0000002b079e9510 [ 558.490716] s1 : 0000000000000001 a0 :
0000003fff9e3d18 a1 : 0000000000000001 [ 558.490722] a2 :
0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18 [
558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 :
00000000000000dd [ 558.490734] s2 : 0000003fff9e3c88 s3 :
0000000000000000 s4 : 0000000000000000 [ 558.490740] s5 :
00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0 [
558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10:
0000002acb8fc9b0 [ 558.490752] s11: 0000002acb8fc920 t3 :
0000002acb80f5d8 t4 : 000000000009259c [ 558.490758] t5 :
0000000000000004 t6 : 0000002b0799c010 [ 558.490764] status:
0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10 Dump of assembler code0x380c6 to 0x380d6:is
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flowidentical. Registers are almost the same, except for sometemporaries:sizes
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memoryand layouts) or a symptom of the problem....
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156res=1old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1related issue?root@smarc-rzfive:~# ldconfigWhat is your conclusion? Is it tool chain related issue? Or cache
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at
0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-
cip1-riscv-renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp :
0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 :
0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 :
0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 :
0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 :
000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 :
0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 :
ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 :
0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10:
0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 :
000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088
cause: 000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig"
exe="/sbin/ldconfig" sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...I have no idea and still only limited knowledge about the arch and
Or
Something else ?
this SoC. We can just rule out by now that the issue is Debian-
exclusive.
Cheers,
Biju
Florian Bezdeka
On 11.10.22 12:34, Biju Das via lists.cip-project.org wrote:
it might be related to
https://lore.kernel.org/linux-riscv/20220915193702.2201018-1-abrestic@rivosinc.com/
AFAIR all stable branches have that problem currently.
In case the requested page is a page with PROT_WRITE only (no PROT_READ)Subject: Re: RE: [cip-dev] ldconfig segfault on RZ/Five was Re:
Preparing isar-cip-core for RZ/Five
On 09.10.22 10:42, Biju Das wrote:PreparingSubject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re:I'misar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get agcc binary.It crashes rather soon after startup, so I was able to trace
complete path.But I do have slightly different results then you (I think;0x3806afar from risc-v expert). I did a breakpoint:I believe it should not end at 0x00000000000385d4 at all. The
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up callingentrypoint,Yes, that's possible way forward, but it will require someAFAICT, but it calls 0x385d4 instead. It happens duringDid you try to compare the call trace to QEMU, where we divert?
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig'sfrom that point you can just stepi. In less than 200 steps, youI've updated sid-ports (dropped the snapshot pinning), and now I'm
should have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was
causing SIGILL before:
it might be related to
https://lore.kernel.org/linux-riscv/20220915193702.2201018-1-abrestic@rivosinc.com/
AFAIR all stable branches have that problem currently.
Thanks for your feedback.uid=0ra
[ 558.490689] CPU: 0 PID: 3212 Comm: ldconfig Not tainted
5.10.83-cip1-riscv-renesas #1 [ 558.490697] epc: 00000000000380c6: 0000000000015382 sp : 0000003fff9e3c10 [ 558.490703] gp :from
0000000000099da8 tp : 0000003fe9c3c800 t0 : 0000003fe9c427c0 [
558.490710] t1 : 0000003fe9cd059c t2 : 0000002acb8f2c00 s0 :
0000002b079e9510 [ 558.490716] s1 : 0000000000000001 a0 :
0000003fff9e3d18 a1 : 0000000000000001 [ 558.490722] a2 :
0000003fff9e3c88 a3 : 0000000000000000 a4 : 0000003fff9e3d18 [
558.490728] a5 : 000000000009736e a6 : 0000003fff9e3c80 a7 :
00000000000000dd [ 558.490734] s2 : 0000003fff9e3c88 s3 :
0000000000000000 s4 : 0000000000000000 [ 558.490740] s5 :
00000000000105a4 s6 : 000000000009e670 s7 : 0000002b079c8ab0 [
558.490746] s8 : 0000002b079e91c0 s9 : 0000000000000000 s10:
0000002acb8fc9b0 [ 558.490752] s11: 0000002acb8fc920 t3 :
0000002acb80f5d8 t4 : 000000000009259c [ 558.490758] t5 :
0000000000000004 t6 : 0000002b0799c010 [ 558.490764] status:
0000000200004020 badaddr: 00000000000000e1 cause: 000000000000000f
(gdb) disassemble 0x00000000000380c6,+0x10 Dump of assembler code0x380c6 to 0x380d6:is
0x00000000000380c6: addi sp,sp,-416
0x00000000000380c8: auipc a2,0x66
0x00000000000380cc: addi a2,a2,2000 # 0x9e898
0x00000000000380d0: sd a0,0(a2)
0x00000000000380d2: mv a5,sp
0x00000000000380d4: addi a4,sp,416
End of assembler dump.
I've stepped this through under qemu as well, and the control flowidentical. Registers are almost the same, except for sometemporaries:sizes
--- regs-qemu
+++ regs-rzfive
@@ -2,9 +2,9 @@
ra 0x15382 0x15382
sp 0x3ffffffbe0 0x3ffffffbe0
gp 0x99da8 0x99da8
-tp 0x3ff7e77800 0x3ff7e77800
-t0 0x3ff7e7d7c0 274742106048
-t1 0x3ff7f0b59c 274742687132
+tp 0x3ff7e78800 0x3ff7e78800
+t0 0x3ff7e7e7c0 274742110144
+t1 0x3ff7f0c59c 274742691228
t2 0x2aaab92c00 183252888576
fp 0x2aaabaee00 0x2aaabaee00
s1 0x1 1
No idea if that is normal (different machines, different memoryand layouts) or a symptom of the problem....
OpenEmbedded nodistro.0 smarc-rzfive ttySC0
[ 12.829622] audit: type=1006 audit(1653987107.735:2): pid=156res=1old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=1related issue?root@smarc-rzfive:~# ldconfigWhat is your conclusion? Is it tool chain related issue? Or cache
[ 22.278868] ldconfig[166]: unhandled signal 11 code 0x1 at
0x0000000000000088 in ldconfig[10000+68000]
[ 22.290244] CPU: 0 PID: 166 Comm: ldconfig Not tainted 5.10.83-
cip1-riscv-renesas #1
[ 22.298954] epc: 0000000000030eea ra : 00000000000145a0 sp :
0000003fff9f8aa0
[ 22.306906] gp : 000000000007fe48 tp : 0000003fd958b720 t0 :
0000000000000000
[ 22.314973] t1 : 0000002adf9c3bbc t2 : 00000000000003ff s0 :
0000003fff9f8c90
[ 22.322986] s1 : 0000000000014b0e a0 : 0000003fff9f8c98 a1 :
0000000000000000
[ 22.330967] a2 : 0000003fff9f8be8 a3 : 0000000000014a86 a4 :
000000000007e576
[ 22.338936] a5 : 0000000000000000 a6 : 0000003fff9f8be0 a7 :
0000000000000000
[ 22.346897] s2 : 0000000000000000 s3 : 0000003fd96df918 s4 :
ffffffffffffffff
[ 22.354905] s5 : 0000002b01953f70 s6 : 0000002b01953c60 s7 :
0000002b019539b0
[ 22.362875] s8 : 0000002b01953b50 s9 : 0000000000000000 s10:
0000002adfa74584
[ 22.370884] s11: 0000000000000000 t3 : 0000003fd960ee18 t4 :
000000000000000f
[ 22.378945] t5 : 000000000000000f t6 : 0000000000000000
[ 22.385051] status: 8000000200004020 badaddr: 0000000000000088
cause: 000000000000000d
[ 22.393860] audit: type=1701 audit(1653987117.299:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=166 comm="ldconfig"
exe="/sbin/ldconfig" sig=11 res=1
Segmentation fault
That was the version I found on eMMC.
I think you have some real homework now...I have no idea and still only limited knowledge about the arch and
Or
Something else ?
this SoC. We can just rule out by now that the issue is Debian-
exclusive.
Cheers,
Biju
Jan Kiszka
On 11.10.22 20:51, Florian Bezdeka wrote:
didn't change the picture, unfortunately.
That said, being able to test linus/master would be very valuable here.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
On 11.10.22 12:34, Biju Das via lists.cip-project.org wrote:Nice idea. I quickly hacked that on top of the rzfive kernel, but itIn case the requested page is a page with PROT_WRITE only (no PROT_READ)Subject: Re: RE: [cip-dev] ldconfig segfault on RZ/Five was Re:
Preparing isar-cip-core for RZ/Five
On 09.10.22 10:42, Biju Das wrote:PreparingSubject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re:I'misar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get agcc binary.It crashes rather soon after startup, so I was able to trace
complete path.But I do have slightly different results then you (I think;0x3806afar from risc-v expert). I did a breakpoint:I believe it should not end at 0x00000000000385d4 at all. The
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up callingentrypoint,Yes, that's possible way forward, but it will require someAFAICT, but it calls 0x385d4 instead. It happens duringDid you try to compare the call trace to QEMU, where we divert?
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig'sfrom that point you can just stepi. In less than 200 steps, youI've updated sid-ports (dropped the snapshot pinning), and now I'm
should have sigill... and complete steps that lead to it.
getting a page fault on the instruction before the one that was
causing SIGILL before:
it might be related to
https://lore.kernel.org/linux-riscv/20220915193702.2201018-1-abrestic@rivosinc.com/
AFAIR all stable branches have that problem currently.
didn't change the picture, unfortunately.
That said, being able to test linus/master would be very valuable here.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
Lad Prabhakar
Hi Jan,
toggle quoted message
Show quoted text
-----Original Message-----Thanks for the quick test.
From: Jan Kiszka <jan.kiszka@...>
Sent: 11 October 2022 21:15
To: Florian Bezdeka <florian.bezdeka@...>; cip-dev@...; Chris Paterson
<Chris.Paterson2@...>; Prabhakar Mahadev Lad <prabhakar.mahadev-lad.rj@...>; Hung
Tran <hung.tran.jy@...>
Cc: Pavel Machek <pavel@...>
Subject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re: Preparing isar-cip-core for RZ/Five
On 11.10.22 20:51, Florian Bezdeka wrote:On 11.10.22 12:34, Biju Das via lists.cip-project.org wrote:Nice idea. I quickly hacked that on top of the rzfive kernel, but it didn't change the picture,In case the requested page is a page with PROT_WRITE only (noSubject: Re: RE: [cip-dev] ldconfig segfault on RZ/Five was Re:
Preparing isar-cip-core for RZ/Five
On 09.10.22 10:42, Biju Das wrote:PreparingSubject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re:I'misar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get agcc binary.It crashes rather soon after startup, so I was able to trace
complete path.But I do have slightly different results then you (I think;0x3806afar from risc-v expert). I did a breakpoint:I believe it should not end at 0x00000000000385d4 at all. The
Breakpoint 1, 0x00000000000385d4 in ?? ()
0x000000000001537e jal instruction should end up callingentrypoint,Yes, that's possible way forward, but it will require someAFAICT, but it calls 0x385d4 instead. It happens duringDid you try to compare the call trace to QEMU, where we divert?
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig'sfrom that point you can just stepi. In less than 200 steps, youI've updated sid-ports (dropped the snapshot pinning), and now
should have sigill... and complete steps that lead to it.
I'm getting a page fault on the instruction before the one that
was causing SIGILL before:
PROT_READ) it might be related to
https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
.kernel.org%2Flinux-riscv%2F20220915193702.2201018-1-abrestic%40rivosi
nc.com%2F&data=05%7C01%7Cprabhakar.mahadev-lad.rj%40bp.renesas.com
%7C4efefe2d9ed148944efd08daabc55ab5%7C53d82571da1947e49cb4625a166a4a2a
%7C0%7C0%7C638011161361337108%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
;sdata=msPQCy0siXTQOmhj7gAtCK1zSQChGNg%2B2KcmAhQvH4k%3D&reserved=0
AFAIR all stable branches have that problem currently.
unfortunately.
That said, being able to test linus/master would be very valuable here.I will test this on top of v6.0 and update the results.
Cheers,
Prabhakar
Lad Prabhakar
Hi Jan,
toggle quoted message
Show quoted text
-----Original Message-----I did a quick test with the patches pointed by Florian but unfortunately ldconfig still fails.
From: Prabhakar Mahadev Lad
Sent: 11 October 2022 21:49
To: Jan Kiszka <jan.kiszka@...>; Florian Bezdeka
<florian.bezdeka@...>; cip-dev@...; Chris
Paterson <Chris.Paterson2@...>; Hung Tran
<hung.tran.jy@...>
Cc: Pavel Machek <pavel@...>
Subject: RE: [cip-dev] ldconfig segfault on RZ/Five was Re: Preparing
isar-cip-core for RZ/Five
Hi Jan,-----Original Message-----Preparing
From: Jan Kiszka <jan.kiszka@...>
Sent: 11 October 2022 21:15
To: Florian Bezdeka <florian.bezdeka@...>;
cip-dev@...; Chris Paterson
<Chris.Paterson2@...>; Prabhakar Mahadev Lad
<prabhakar.mahadev-lad.rj@...>; Hung Tran
<hung.tran.jy@...>
Cc: Pavel Machek <pavel@...>
Subject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re:isar-cip-core for RZ/Fivetrace
On 11.10.22 20:51, Florian Bezdeka wrote:On 11.10.22 12:34, Biju Das via lists.cip-project.org wrote:Subject: Re: RE: [cip-dev] ldconfig segfault on RZ/Five was Re:
Preparing isar-cip-core for RZ/Five
On 09.10.22 10:42, Biju Das wrote:PreparingSubject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re:isar-cip-core for RZ/Five
On 08.10.22 10:27, Jan Kiszka wrote:On 07.10.22 12:19, Pavel Machek wrote:segfaultingHi!I tried, but installation fails - illegal instruction.Yeah, ldconfig is needed for installation. But I get agcc binary.It crashes rather soon after startup, so I was able tothink;complete path.But I do have slightly different results then you (IcallingI'mfar from risc-v expert). I did a breakpoint:I believe it should not end at 0x00000000000385d4 at all.
Breakpoint 1, 0x00000000000385d4 in ?? ()
The 0x000000000001537e jal instruction should end updivert?0x3806aAFAICT, but it calls 0x385d4 instead. It happens duringDid you try to compare the call trace to QEMU, where we
single-stepping, so it should not be anything subtle.
(gdb) disassemble $pc,+0x20
Dump of assembler code from 0x1537c to 0x1539c:
=> 0x000000000001537c: mv a0,a4
0x000000000001537e: jal ra,0x3806a
0x0000000000015382: auipc a5,0x8a
0x0000000000015386: addi a5,a5,1342 # 0x9f8c0
0x000000000001538a: ld a4,0(a5)
0x000000000001538c: beqz a4,0x153f0
0x000000000001538e: jal ra,0x38abe
0x0000000000015392: ld a0,0(s6)
0x0000000000015396: auipc s7,0x85
0x000000000001539a: ld s7,-406(s7) # 0x9a200
End of assembler dump.
(gdb)
(gdb) stepi
0x000000000001537e in ?? ()
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00000000000385d4 in ?? ()
(gdb)it.entrypoint,
Yes, that's possible way forward, but it will require some
considerable setup on my side.
If you have QEMU ready... objdump tells you ldconfig'sfrom that point you can just stepi. In less than 200 steps,
you should have sigill... and complete steps that lead tonowI've updated sid-ports (dropped the snapshot pinning), andthatI'm getting a page fault on the instruction before the onehttps://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2FloIn case the requested page is a page with PROT_WRITE only (nowas causing SIGILL before:
PROT_READ) it might be related toabrestic%40rivore
.kernel.org%2Flinux-riscv%2F20220915193702.2201018-1-lad.rj%40bp.renesas.csi
nc.com%2F&data=05%7C01%7Cprabhakar.mahadev-%7C4efefe2d9ed148944efd08daabc55ab5%7C53d82571da1947e49cb4625a166a4aom%7C0%7C0%7C638011161361337108%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj2aMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&aAw;sdata=msPQCy0siXTQOmhj7gAtCK1zSQChGNg%2B2KcmAhQvH4k%3D&reservedmpThanks for the quick test.=0Nice idea. I quickly hacked that on top of the rzfive kernel, but it
AFAIR all stable branches have that problem currently.
didn't change the picture, unfortunately.That said, being able to test linus/master would be very valuablehere.
I will test this on top of v6.0 and update the results.
Cheers,
Prabhakar
Ulrich Hecht
On 10/12/2022 11:50 AM CEST Lad Prabhakar <prabhakar.mahadev-lad.rj@...> wrote:I did some experiments on RZ/Five with this issue, and I'm almost positive that there is something wrong (or doesn't work as documented) with the icache handling on this SoC.
I did a quick test with the patches pointed by Florian but unfortunately ldconfig still fails.
1. The issue only affects non-PIE executables (there are very few of those, basically just ldconfig, gcc, cpp and gcov* on the Debian system), and it occurs very early during the execution of the program. According to the datasheet, the cache on the ax45mp-1c core is virtually indexed, so it is unlikely that a PIE executable will ever hit anything in the cache when newly loaded, but it is much more likely with non-PIE executables.
2. Setting a breakpoint before the illegal/segfaulting instruction doesn't work, and what is executed is clearly not what we're seeing through the dcache (the offending instructions are neither illegal, nor are they able to cause segfaults), so instruction fetches must see something different.
3. Neither manually calling __vdso_flush_icache() from gdb (which executes a "fence.i" instruction) nor patching a "fence.i" into the ldconfig binary seem to do anything. According to the ax45mp-1c datasheet "fence.i" should flush the dcache and invalidate the icache.
My educated guess is that, in spite of the claims in the core manual, the "fence.i" instruction is not implemented, or not implemented correctly. (The datasheet does acknowledge that "fence", without the ".i", is a nop.)
The RISC-V ISA manual says that "fence.i" is part of the optional "Zifencei" extension, which I don't see mentioned in the core datasheet anywhere. (And at least at first glance, I couldn't find any other mechanism to invalidate the icache there either.)
CU
Uli
Pavel Machek
Hi!
(Can I get you to wrap emails at ~72 columns or so?)
to put breakpoint at preceding instruction (which was a jump). It
looked like we jumped into the middle of instruction, which would
explain the fault.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
(Can I get you to wrap emails at ~72 columns or so?)
I did a quick test with the patches pointed by Florian but unfortunately ldconfig still fails.I did some experiments on RZ/Five with this issue, and I'm almost positive that there is something wrong (or doesn't work as documented) with the icache handling on this SoC.
1. The issue only affects non-PIE executables (there are very fewAh, I was wondering what does gcc and ldconfig have in common...
of those, basically just ldconfig, gcc, cpp and gcov* on the Debian
system), and it occurs very early during the execution of the
program. According to the datasheet, the cache on the ax45mp-1c core
is virtually indexed, so it is unlikely that a PIE executable will
ever hit anything in the cache when newly loaded, but it is much
more likely with non-PIE executables.
2. Setting a breakpoint before the illegal/segfaulting instructionIn my testing, I was able to stepi from the start, and then I was able
doesn't work, and what is executed is clearly not what we're seeing
through the dcache (the offending instructions are neither illegal,
nor are they able to cause segfaults), so instruction fetches must
see something different.
to put breakpoint at preceding instruction (which was a jump). It
looked like we jumped into the middle of instruction, which would
explain the fault.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Pavel Machek
Hi!
And indeed it looks like _any_ non-PIE executable fails. See:
root@smarc-rzfive:/my# cat mytest.c
#include <stdio.h>
void main(void) { printf("ahoj svete\n"); }
root@smarc-rzfive:/my# clang mytest.c -fno-pie -static
mytest.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main(void) { printf("ahoj svete\n"); }
^
mytest.c:3:1: note: change return type to 'int'
void main(void) { printf("ahoj svete\n"); }
^~~~
int
1 warning generated.
root@smarc-rzfive:/my# ./a.out
[ 279.010424] a.out[214]: unhandled signal 11 code 0x1 at 0xffffff8c38bd1524
(-O3 -g might be useful to add to clang command line).
Then you can
b _dl_discover_osversion
run
(gdb) disassemble /r
Dump of assembler code for function _dl_discover_osversion:
0x000000000002538a <+0>: 41 71 addi sp,sp,-496
0x000000000002538c <+2>: a8 00 addi a0,sp,72
0x000000000002538e <+4>: 86 f7 sd ra,488(sp)
0x0000000000025390 <+6>: a2 f3 sd s0,480(sp)
0x0000000000025392 <+8>: a6 ef sd s1,472(sp)
0x0000000000025394 <+10>: ca eb sd s2,464(sp)
=> 0x0000000000025396 <+12>: ef 60 a1 5c jal ra,0x3b960 <uname>
0x000000000002539a <+16>: 93 05 a1 0c addi a1,sp,202
0x000000000002539e <+20>: 49 e5 bnez a0,0x25428 <_dl_discover_osversion+158>
0x00000000000253a0 <+22>: 81 48 li a7,0
0x00000000000253a2 <+24>: 01 45 li a0,0
0x00000000000253a4 <+26>: 25 48 li a6,9
0x00000000000253a6 <+28>: 13 03 e0 02 li t1,46
It clearly tries to call uname, which.. it should, according to the
source code. But somehow it ends up in completely different function:
(gdb) stepi
Program received signal SIGILL, Illegal instruction.
0x000000000003b2fe in wcsrtombs ()
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
This is very good observation. Thanks!On 10/12/2022 11:50 AM CEST Lad Prabhakar <prabhakar.mahadev-lad.rj@...> wrote:I did some experiments on RZ/Five with this issue, and I'm almost positive that there is something wrong (or doesn't work as documented) with the icache handling on this SoC.
I did a quick test with the patches pointed by Florian but unfortunately ldconfig still fails.
1. The issue only affects non-PIE executables (there are very few of those, basically just ldconfig, gcc, cpp and gcov* on the Debian system), and it occurs very early during the execution of the program. According to the datasheet, the cache on the ax45mp-1c core is virtually indexed, so it is unlikely that a PIE executable will ever hit anything in the cache when newly loaded, but it is much more likely with non-PIE executables.
And indeed it looks like _any_ non-PIE executable fails. See:
root@smarc-rzfive:/my# cat mytest.c
#include <stdio.h>
void main(void) { printf("ahoj svete\n"); }
root@smarc-rzfive:/my# clang mytest.c -fno-pie -static
mytest.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main(void) { printf("ahoj svete\n"); }
^
mytest.c:3:1: note: change return type to 'int'
void main(void) { printf("ahoj svete\n"); }
^~~~
int
1 warning generated.
root@smarc-rzfive:/my# ./a.out
[ 279.010424] a.out[214]: unhandled signal 11 code 0x1 at 0xffffff8c38bd1524
(-O3 -g might be useful to add to clang command line).
Then you can
b _dl_discover_osversion
run
(gdb) disassemble /r
Dump of assembler code for function _dl_discover_osversion:
0x000000000002538a <+0>: 41 71 addi sp,sp,-496
0x000000000002538c <+2>: a8 00 addi a0,sp,72
0x000000000002538e <+4>: 86 f7 sd ra,488(sp)
0x0000000000025390 <+6>: a2 f3 sd s0,480(sp)
0x0000000000025392 <+8>: a6 ef sd s1,472(sp)
0x0000000000025394 <+10>: ca eb sd s2,464(sp)
=> 0x0000000000025396 <+12>: ef 60 a1 5c jal ra,0x3b960 <uname>
0x000000000002539a <+16>: 93 05 a1 0c addi a1,sp,202
0x000000000002539e <+20>: 49 e5 bnez a0,0x25428 <_dl_discover_osversion+158>
0x00000000000253a0 <+22>: 81 48 li a7,0
0x00000000000253a2 <+24>: 01 45 li a0,0
0x00000000000253a4 <+26>: 25 48 li a6,9
0x00000000000253a6 <+28>: 13 03 e0 02 li t1,46
It clearly tries to call uname, which.. it should, according to the
source code. But somehow it ends up in completely different function:
(gdb) stepi
Program received signal SIGILL, Illegal instruction.
0x000000000003b2fe in wcsrtombs ()
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Lad Prabhakar
Hi All,
toggle quoted message
Show quoted text
-----Original Message-----Just a brief about the issue and solution:
From: Pavel Machek <pavel@...>
Sent: 13 October 2022 22:48
To: Ulrich Hecht <uli@...>
Cc: cip-dev@...; Prabhakar Mahadev Lad <prabhakar.mahadev-lad.rj@...>;
Jan Kiszka <jan.kiszka@...>; Florian Bezdeka <florian.bezdeka@...>; Chris Paterson
<Chris.Paterson2@...>; Hung Tran <hung.tran.jy@...>; Pavel Machek <pavel@...>
Subject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re: Preparing isar-cip-core for RZ/Five
Hi!wrong (or doesn't work as documented) with the icache handling on this SoC.On 10/12/2022 11:50 AM CEST Lad Prabhakar <prabhakar.mahadev-lad.rj@...> wrote:I did some experiments on RZ/Five with this issue, and I'm almost positive that there is something
I did a quick test with the patches pointed by Florian but unfortunately ldconfig still fails.gcc, cpp and gcov* on the Debian system), and it occurs very early during the execution of the
1. The issue only affects non-PIE executables (there are very few of those, basically just ldconfig,
program. According to the datasheet, the cache on the ax45mp-1c core is virtually indexed, so it is
unlikely that a PIE executable will ever hit anything in the cache when newly loaded, but it is much
more likely with non-PIE executables.
This is very good observation. Thanks!
And indeed it looks like _any_ non-PIE executable fails. See:
TEXT_START_ADDR is the start of text segment of an application. This is being set to 0x10000 for RISCV platforms.
So when an application is compiled with the static flag the load would start from 0x10000 - xyz (depending on size of the application)
Entry point 0x101c0
There are 5 program headers, starting at offset 64Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000010000 0x0000000000010000
0x0000000000059b48 0x0000000000059b48 R E 0x1000
LOAD 0x0000000000059b60 0x000000000006ab60 0x000000000006ab60
0x0000000000001f68 0x0000000000003528 RW 0x1000
So for the above application which is compiled statically we can see the entry point is 0x101c0 and load 0x0000000000010000.
Andes cores have local memories ILM and DLM that are mapped in the region H'0_0003_0000 - H'0_0004_FFFF on the RZ/Five SoC. When the virtual address falls in this range the MMU doesnt trigger a page fault and assume the virtual address as physical address and hence the application fails to run (panics somewhere).
So to avoid this issue we set the TEXT_START_ADDR to 0x50000 so that virtual address of any statically compiled application doesnt fall in the range of H'0_0003_0000 - H'0_0004_FFFF.
Elf file type is EXEC (Executable file)
Entry point 0x504e4
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000050000 0x0000000000050000
0x0000000000057dc8 0x0000000000057dc8 R E 0x1000
LOAD 0x00000000000585b8 0x00000000000a95b8 0x00000000000a95b8
0x0000000000004ee0 0x00000000000064b0 RW 0x1000
NOTE 0x0000000000000158 0x0000000000050158 0x0000000000050158
0x0000000000000044 0x0000000000000044 R 0x4
So now with the fix for statically compiled application we can see its offsetted and entry point is 0x504e4 and load is at 0x0000000000050000. So with this we are for sure the MMU will always trigger a page fault.
I have attached a patch for binutils to the email. We plan to upstream this patch to binutils soon.
Cheers,
Prabhakar
Jan Kiszka
On 29.11.22 19:57, Prabhakar Mahadev Lad wrote:
upstream this as quickly as possible. It targets a fundamental tool and
requires recompilation of many components. And Debian will freeze the
toolchain in early January - although:
"It is unlikely that the release arch of bookworm will include riscv64."
[1] :(
Jan
[1] https://lists.debian.org/debian-riscv/2022/12/msg00009.html
--
Siemens AG, Technology
Competence Center Embedded Linux
Hi All,Good that the issue is understood and likely solved now. Make sure to-----Original Message-----Just a brief about the issue and solution:
From: Pavel Machek <pavel@...>
Sent: 13 October 2022 22:48
To: Ulrich Hecht <uli@...>
Cc: cip-dev@...; Prabhakar Mahadev Lad <prabhakar.mahadev-lad.rj@...>;
Jan Kiszka <jan.kiszka@...>; Florian Bezdeka <florian.bezdeka@...>; Chris Paterson
<Chris.Paterson2@...>; Hung Tran <hung.tran.jy@...>; Pavel Machek <pavel@...>
Subject: Re: [cip-dev] ldconfig segfault on RZ/Five was Re: Preparing isar-cip-core for RZ/Five
Hi!wrong (or doesn't work as documented) with the icache handling on this SoC.On 10/12/2022 11:50 AM CEST Lad Prabhakar <prabhakar.mahadev-lad.rj@...> wrote:I did some experiments on RZ/Five with this issue, and I'm almost positive that there is something
I did a quick test with the patches pointed by Florian but unfortunately ldconfig still fails.gcc, cpp and gcov* on the Debian system), and it occurs very early during the execution of the
1. The issue only affects non-PIE executables (there are very few of those, basically just ldconfig,
program. According to the datasheet, the cache on the ax45mp-1c core is virtually indexed, so it is
unlikely that a PIE executable will ever hit anything in the cache when newly loaded, but it is much
more likely with non-PIE executables.
This is very good observation. Thanks!
And indeed it looks like _any_ non-PIE executable fails. See:
TEXT_START_ADDR is the start of text segment of an application. This is being set to 0x10000 for RISCV platforms.
So when an application is compiled with the static flag the load would start from 0x10000 - xyz (depending on size of the application)
Entry point 0x101c0
There are 5 program headers, starting at offset 64Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000010000 0x0000000000010000
0x0000000000059b48 0x0000000000059b48 R E 0x1000
LOAD 0x0000000000059b60 0x000000000006ab60 0x000000000006ab60
0x0000000000001f68 0x0000000000003528 RW 0x1000
So for the above application which is compiled statically we can see the entry point is 0x101c0 and load 0x0000000000010000.
Andes cores have local memories ILM and DLM that are mapped in the region H'0_0003_0000 - H'0_0004_FFFF on the RZ/Five SoC. When the virtual address falls in this range the MMU doesnt trigger a page fault and assume the virtual address as physical address and hence the application fails to run (panics somewhere).
So to avoid this issue we set the TEXT_START_ADDR to 0x50000 so that virtual address of any statically compiled application doesnt fall in the range of H'0_0003_0000 - H'0_0004_FFFF.
Elf file type is EXEC (Executable file)
Entry point 0x504e4
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000050000 0x0000000000050000
0x0000000000057dc8 0x0000000000057dc8 R E 0x1000
LOAD 0x00000000000585b8 0x00000000000a95b8 0x00000000000a95b8
0x0000000000004ee0 0x00000000000064b0 RW 0x1000
NOTE 0x0000000000000158 0x0000000000050158 0x0000000000050158
0x0000000000000044 0x0000000000000044 R 0x4
So now with the fix for statically compiled application we can see its offsetted and entry point is 0x504e4 and load is at 0x0000000000050000. So with this we are for sure the MMU will always trigger a page fault.
I have attached a patch for binutils to the email. We plan to upstream this patch to binutils soon.
upstream this as quickly as possible. It targets a fundamental tool and
requires recompilation of many components. And Debian will freeze the
toolchain in early January - although:
"It is unlikely that the release arch of bookworm will include riscv64."
[1] :(
Jan
[1] https://lists.debian.org/debian-riscv/2022/12/msg00009.html
--
Siemens AG, Technology
Competence Center Embedded Linux
Pavel Machek
Hi!
toolchain, but the problem is really in the hardware: you can't just
take part of _virtual_ address space and reserve it. Not if you want
to claim board is riscv64 compatible. Someone else (manual mmap, some
kind of JIT, some kind of emulator) might want normal RAM there.
I believe this is quite important and should be solved in hardware (at
least in next generation).
Can ILM/DLM be disabled?
If we can not fix it at hardware level, we'll really need to prevent
attempts to map anything at that virtual memory range. Clear -EPERM
from mmap is better than strange behaviour at runtime, and it is
must-have from security perspective.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
...This is very good observation. Thanks!Just a brief about the issue and solution:
And indeed it looks like _any_ non-PIE executable fails. See:
TEXT_START_ADDR is the start of text segment of an application. This is being set to 0x10000 for RISCV platforms.
So when an application is compiled with the static flag the load would start from 0x10000 - xyz (depending on size of the application)
Entry point 0x101c0
There are 5 program headers, starting at offset 64Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000010000 0x0000000000010000
0x0000000000059b48 0x0000000000059b48 R E 0x1000
LOAD 0x0000000000059b60 0x000000000006ab60 0x000000000006ab60
0x0000000000001f68 0x0000000000003528 RW 0x1000
So for the above application which is compiled statically we can see the entry point is 0x101c0 and load 0x0000000000010000.
Andes cores have local memories ILM and DLM that are mapped in theregion H'0_0003_0000 - H'0_0004_FFFF on the RZ/Five SoC. When the
virtual address falls in this range the MMU doesnt trigger a page
fault and assume the virtual address as physical address and hence
the application fails to run (panics somewhere).
Good that the issue is understood and likely solved now. Make sure toI'm pretty sure this is not complete fix. Yes, we should change the
upstream this as quickly as possible. It targets a fundamental tool and
requires recompilation of many components. And Debian will freeze the
toolchain in early January - although:
"It is unlikely that the release arch of bookworm will include riscv64."
[1] :(
toolchain, but the problem is really in the hardware: you can't just
take part of _virtual_ address space and reserve it. Not if you want
to claim board is riscv64 compatible. Someone else (manual mmap, some
kind of JIT, some kind of emulator) might want normal RAM there.
I believe this is quite important and should be solved in hardware (at
least in next generation).
Can ILM/DLM be disabled?
If we can not fix it at hardware level, we'll really need to prevent
attempts to map anything at that virtual memory range. Clear -EPERM
from mmap is better than strange behaviour at runtime, and it is
must-have from security perspective.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany