Skip to content
Snippets Groups Projects

Draft: Try to fix build on mips64el

Merged Simon McVittie requested to merge wip/1042980 into debian/master
5 unresolved threads
  • Add patch to avoid error-prone ELF header parsing, fixing build on mips64el

    Helps: #1042980

  • d/rules: Increate arbitrary test timeouts

    The default test timeout is 30 seconds, but the perf-* tests take more like 45 seconds on mips64el.

    Helps: #1042980


/cc @syq

This isn't fully working yet. The tests are still failing for me on eller, with:

GNOME Shell-Message: 11:41:00.189: Registering session with GDM
(EE) failed to write to Xwayland fd: Broken pipe

Please could you investigate further? It would also be useful to know whether this gnome-shell version works or crashes on real mips hardware (build with DEB_BUILD_OPTIONS=nocheck if necessary).

Merge request reports

Merged by Simon McVittieSimon McVittie 2 years ago (Aug 6, 2023 4:36pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Author Maintainer

    It would also be useful to know whether this gnome-shell version works or crashes on real mips hardware

    It would also be useful to know whether gnome-shell 43.x (unstable) works or crashes on real mips hardware. The failing tests are new in version 44, so they don't actually tell us whether there is a regression when compared with version 43.

  • Simon McVittie added 2 commits

    added 2 commits

    • fd5a465e - d/rules: Increate arbitrary test timeouts
    • a455d1f1 - Add patch to avoid error-prone ELF header parsing, fixing build on mips64el

    Compare with previous version

  • Simon McVittie added 1 commit

    added 1 commit

    • 84be34c2 - Add patch to fix ELF header parsing on mips64el and riscv64

    Compare with previous version

  • Thanks you patch seems much better than mine. Let's consider to upstream it.

    And for the test:

    1. the test can pass on mips64 machine without MSA (MIPS SIMD), with softpipe instead of llvmpipe. So, it is a bug of LLVM. I will try to fix it. For now, can we run the test with softpipe on mips?
    2. and on MIPS with MSA, mesa try to use it, and trigger some problems. It is still the bug of LLVM. So maybe we should revert the changes to mesa before LLVM MSA JIT is fixed. https://gitlab.freedesktop.org/mesa/mesa/-/commit/88b234d7a7cd71fcb4955428010f238ec9530431
    Edited by YunQiang Su
    • Author Maintainer

      Thanks you patch seems much better than mine. Let's consider to upstream it.

      The patch from Daniel van Vugt (which replaced the one I wrote) has in fact been applied upstream, although probably only for v45.

      the test can pass on mips64 machine without MSA (MIPS SIMD), with softpipe instead of llvmpipe. So, it is a bug of LLVM. I will try to fix it. For now, can we run the test with softpipe on mips?

      We probably could, but are the resulting gnome-shell binaries going to be broken on mips machines?

      You didn't answer my questions about whether gnome-shell works (as a Wayland/X11 user interface, not just at build time) on mips machines:

      • if you install GNOME from unstable (gnome-shell 43) on a real mips(64)el machine, does gnome-shell work, or does it crash like this?

      • if that works, and then you upgrade to gnome-shell 44 (built from this branch but with DEB_BUILD_OPTIONS=nocheck, does it still work, or does it crash like this?

      on MIPS with MSA, mesa try to use it, and trigger some problems. It is still the bug of LLVM. So maybe we should revert the changes to mesa before LLVM MSA JIT is fixed

      If this is a LLVM bug, please could you open a bug in llvm-toolchain-15 and mark it as affecting mesa and gnome-shell? I think you understand the situation a lot better than I do!

      Edited by Simon McVittie
    • We probably could, but are the resulting gnome-shell binaries going to be broken on mips machines?

      The answer is yes, and no. In most case the MIPS/Loongson machines using gnome-shell, will have a AMD graphic, so llvmpipe won't be used at. For this case, gnome-shell works well.

      If this is a LLVM bug, please could you open a bug in llvm-toolchain-15 and mark it as affecting mesa and gnome-shell? I think you understand the situation a lot better than I do!

      Yes. I will submit this bug report to Debian and upstream. I will try to fix them upstreamly.

    • Author Maintainer

      I didn't see a bug report opened against either llvm-toolchain-15 or mesa. For now I have opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1049404 for the issue involving llvmpipe on mips(64)el: please follow up there with any further information.

    • Please register or sign in to reply
    • Author Maintainer

      Is this actually the same as https://bugs.debian.org/993550, originally reported against gtk4?

      We work around that in gtk4 by using GALLIUM_DRIVER=softpipe and LIBGL_ALWAYS_SOFTWARE=true on mips% CPUs. I'll see whether the same thing works for gnome-shell.

    • Author Maintainer

      Or perhaps the same as https://bugs.debian.org/1010838, also originally from gtk4.

    • Author Maintainer

      perf-headlessStart succeeds with GALLIUM_DRIVER=softpipe and LIBGL_ALWAYS_SOFTWARE=true, but perf-basic and perf-closeWithActiveWindows are still failing.

    • Ohhh, you are right: perf-closeWithActiveWindows is still failing. While when I am run the testing, perf-basic can pass now.

    • Author Maintainer

      perf-basic was failing for me on the mips64el porterbox eller, but passed on the buildds when I uploaded 44.3-2 to experimental. I don't know why that's different.

    • Author Maintainer

      perf-closeWithActiveWindows still failed on the buildds.

    • Yes. I am debugging it, while I am confused.

      The reason is that the in gjs/gi/function.cpp(Function::invoke), the value of ffi_arg_pointers.get() has no TOPLEVEL Stage, so shell_wm_completed_map segfault.

      I tried to grep where it is set, while get no info about it.

      Any idea about it?

    • Please register or sign in to reply
  • Simon McVittie added 3 commits

    added 3 commits

    • 5e996350 - Update to upstream gnome-44 branch commit 44.3-4-g1635c371
    • 18ac0f65 - Update changelog
    • 074d06ba - d/rules: Run tests with softpipe instead of llvmpipe on mips family

    Compare with previous version

    • I have a test on my Loongson 3A2000 machine with AMD R5 230 video card. The Gnome-Shell in Debian unstable (version 43) can work well.

    • Author Maintainer

      Please could you try building 44.3-2 from experimental, with DEB_BUILD_OPTIONS=nocheck to skip the failing tests, and see whether that still works on the same hardware? (That's basically the same as what's in this MR.)

    • 44.3-2 works with Loongson 3A2000+AMD R5 230.

    • Author Maintainer

      Great, so gnome-shell is still usable on mips(64)el with real GPUs, just not on llvmpipe or softpipe.

    • Please register or sign in to reply
    • Ohh, I can reproduce the same failure of perf-closeWithActiveWindows on ARM64 with softpipe.

      The reason is that ffi_arg_pointers[1] needs dereference 1 more time with softpipe than llvmpipe.... So it should be a bug of softpipe.

      Edited by YunQiang Su
    • Author Maintainer

      Great. Please could you report that bug too? And I'll disable these perf-* tests on mips(64)el until either the llvmpipe or softpipe bugs are fixed.

      Please leave a comment with the bug number so I can put a reference to it in debian/rules.

    • Author Maintainer

      If the crash seen with softpipe involves ffi_arg_pointers from gjs/gi/function.cpp, then that seems more likely to be a bug in gobject-introspection, gjs, gnome-shell or mutter than a bug in LLVM or Mesa.

      My guess would be that there's some fallback rendering path that is rarely tested and therefore contains bugs, because all real-world GNOME Shell users are using either a hardware GPU or llvmpipe, and nobody uses softpipe in practice.

    • Author Maintainer

      If you can get a backtrace for the crash with softpipe, please report it as a separate gnome-shell bug to start with.

    • Ohh, it is not about 1 more time dereference, I guess it is about multithread problem.

      On my ARM64 machine, if no breakpoint is set, segfault will always happen. If 2 breakpoints is set on both: b function.cpp:1050 if function=shell_wm_completed_map shell_wm_completed_map The test will always pass.

      So I guess some other thread change the data to shell_wm_completed_map.

    • Author Maintainer

      please report it as a separate gnome-shell bug to start with

      I didn't see a gnome-shell bug report, so I have opened https://bugs.debian.org/1049407 for the crash seen with softpipe. Please send any follow-ups there.

    • Please register or sign in to reply
  • nano sleep some time (1<<23 ns for my arm64 server) before the ffi_call can pass the test.

    and taskset also helps the possibility of test pass.

    Edited by YunQiang Su
Please register or sign in to reply
Loading