Monday 17 October 2016

MuQSS - The Multiple Queue Skiplist Scheduler v0.112

Here's an updated version of MuQSS.

 For 4.8.*:
4.8-sched-MuQSS_112.patch

 For 4.7.*:
4.7-sched-MuQSS_112.patch

Git tree here as 4.7-muqss or 4.8-muqss branches:
https://github.com/ckolivas/linux

It's getting close now to the point where it can replace BFS in -ck releases. Thanks to the many people testing and reporting back, some other misbehaviours were discovered and their associated fixes have been committed.

In particular,
- Balancing across CPUs was not looking at higher and lower scheduling policies correctly (SCHED_ISO, SCHED_IDLEPRIO and realtime policies)
- A serious stall/hang could happen with tasks using sched_yield (such as f@h client and numerous GPU drivers)
- Some minor accounting issues on new tasks with affinity set were fixed
- Overhead was further decreased on task selection
- Spurious preemption on CPUs where the preempted task had already gone are now avoided
- Spurious wakeup on CPUs that were assumed and are no longer idle are avoided
- A potential race in suspending to ram was fixed
- Old unused code from BFS was removed, along with unnecessary intermediate variables.
- Clean ups
- Some work towards actually documenting MuQSS in Documentation/scheduler/sched-MuQSS.txt was done, though incomplete.

Enjoy!
お楽しみ下さい
-ck

22 comments:

  1. I've been trying various versions of MuQSS (while staying on BFS for production/work) and muqss-112 is the first one that passes my "wiggle test" - niced kernel build on all 8 vcores, playing a HD movie in vlc and frantically jiggling a terminal window around no longer causes stalls or jerks; it's completely smooth all the time,as if idle. Well done! \o/

    One suggestion: I've noticed that the global_rq contains various atomic_t counters. It might be a good idea to make them cacheline-aligned so that they don't incur false sharing, which can lead to pretty pathological stalls esp. with contended SMT threads. I can create a GH pull if you like.

    ReplyDelete
    Replies
    1. Great thanks! I've been toying with the idea of just using the same runqueue variables that mainline does instead of most of those atomic counters anyway, leaving only the idle CPU map.

      Delete
    2. Oops I should have said create a pull request and I'll see what it looks like, thanks!

      Delete
  2. Running x86-64 MuQSS v112 no problems; also did a suspend/resume for good measure. Looking good!

    ReplyDelete
  3. Thanks Con.
    Updated the results with MuQSS112 (interactive=1 only).
    https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing

    The performance is more or less the same as with MuQSS110, with a slight improvement on make -j2.

    Pedro

    ReplyDelete
  4. Con,

    I'm not certain whether it's only imagination but with .112 reaction with compiz is very very good:
    no delays during app switching right now


    Might run compiz with sched_yield later to see how that works out

    also portage (Gentoo Linux' package manager) seems to work really quickly with it

    Once a new Chromium version is out I'll do a compilation test (update) and see if I can run additional backup jobs to really stress the system to see if I can still do work (that would be close to the ultimate test, well - in the extreme probably adding a game to the mix - we'll see about that ;) )


    Thanks !

    ReplyDelete
    Replies
    1. Thanks, KoT. It's probably not your imagination since there were quite significant scheduling logic issues missing until 112. It should now be equal to or better than BFS in every way, and as you see from the comments section here, you're not the only one who's noticed it.

      Delete
    2. Okay, great :)

      I'm however still seeing some kind of occasional stuttering


      Reproducer:

      running Konqueror (4.14.24, which seems to use QT5)
      or
      Chromium (55.0.2873.0, 64-bit)


      then browsing via Mouse through the bookmarks

      while moving the Mouse-Pointer up or down it "hangs" (like it is stuck, [driver?] transmission interrupted] then continues after 0.5-2 secs

      Chromium has a bookmark file 20+ MiB,

      in the case of Konqueror it was just launched and going to Settings -> scrolling down

      (the intention was to move to Settings -> Load View Profile -> Filesystem)


      compiz is running in default mode WITHOUT workarounds or export __GL_YIELD=USLEEP


      X and the whole system was NOT yet run with reniced IRQs (threadirqs appended to kernel)

      e.g.

      pnvidia=$(pgrep "irq/.*-nvidia")
      [[ -n $pnvidia ]] && chrt -f -p 84 $pnvidia

      was NOT used


      This has happened before on MuQSS, if I recall correctly indistriminate of compiz, kwin or xfwm4 window decorator/compositor


      Thanks

      Delete
    3. Try the gl yield usleep workaround. It could be the very aggressive change I did to sched_yield which may not be required any more.

      Delete
    4. You mean

      export __GL_YIELD=NOTHING

      ?

      it's seemingly running better with it,

      but needs more testing

      Thanks

      Delete
    5. Yes that's correct, thanks for testing it. I may be able to go back to the old way of yielding (like BFS) now that I've fixed other bugs in the code but your testing needs to confirm that's where the problem lies.

      Delete
  5. To run this new scheduler on a 2009 Mac Mini Core Duo Intel processor machine: How much a slow down compared to using BFS would one experience?
    How big an overhead of this "it takes a thousend" cpus scheduler is it?

    ReplyDelete
    Replies
    1. The idea is that this scheduler is a drop-in replacement for BFS where you won't notice any difference at all; this is why it took me years to come up with a design that had the best of both worlds. It should be perfectly fine in an old mac mini.

      Delete
    2. @ulenrich; I've been running MuQSS on an old Asus EeePC 701 from 2007 without issue. Has a unicore Intel Celeron-M ULV 353 running at 630 MHz stock; luckily I can overclock it to 990 MHz.

      Delete
  6. Yeah, I can remember: You wrote about some years ago how to allocate dynamically more scheduler units (run queues or some alike) ....

    The old talking point of BFS that at some number of processors BFS isn't well performing ... isn't any more!
    Also this was an obstacle going mainline, wasn't it?

    ReplyDelete
  7. @ck:
    The issues I've reported last time for v0.111 have completely gone away with MuQSS v0.112 (without changes to the rest of the system software).
    Thank you for your great work!
    With this test run I've also been lucky to find a tunable again for my ancient TOI revision, named "no_flusher_thread", that, defaulting to 1 and now set to 0, makes the whole combination (MuQSS, BFQ, WBT, TOI) work fine without failures or performance regression. I'm glad that I can report 10 successful hibernations, done from time to time, within 1 1/2 days uptime atm.
    Maybe that tunable eases some race condition/ timing issue, that an effective MuQSS brings into that old TOI algorithms. Painful, that I don't have enough programming knowledge to interpret it in depth.

    BR, Manuel Krause

    ReplyDelete
    Replies
    1. @ck:
      Also with kernel 4.7.9 and your 4.8 addons upto 7e3bed6f from github, with my full combo, everything is behaving well. Nice :-)

      BR, Manuel Krause

      Delete
  8. @ck, my i686 noSMP build fails with the latest patches thru 35d6279 (Merge branch '4.8-muqss'), but it succeeded with 1b7e569 (cacheline alignment). x64 built fine with the latest though; I'm running it now.

    Here are the build log output snips: https://gist.github.com/jeremywh7/944c10e189300086f1de58b8fa7fc0b4#gistcomment-1901214

    ReplyDelete
    Replies
    1. Commit cc32bf3* seems to define set_nr_and_not_polling for "CONFIG_SMP && TIF_POLLING_NRFLAG" -and- its else, but set_nr_if_polling is not defined for the else declaration.

      * Implement wake lists for CPUs that don't share cache

      Delete
    2. Ok...I built and am running x86 SMP and i686 noSMP, both thru 27fe1ef (fix for UP), and also the just released BFQ v8r4, and the 4.8.3-rc test release, plus a couple upstream patches. All good... :)

      Delete
  9. Running MuQSS (by means of the Liquorix kernel), also see
    https://liquorix.net/atom

    Just earlier, the combination of PulseAudio suspended via pasuspender (to run an application using ALSA) while alt-tabbing to Google Chrome to doublecheck something caused an unrecoverable stall, a clean shutdown was no longer viable (it would've probably taken hours, everything was incredibly slow).

    ReplyDelete
    Replies
    1. Edit: It might also be worthy to note this was using the schedutil CPUFreq scheduler.

      Delete