Saturday 16 August 2014

BFS 450, 3.16-ck1

Announcing a resync and update of BFS for linux kernel 3.16.x. Coding has proven a nice distraction from unpleasant life events so I've been able to bring the patch up to date with the latest kernel.

A number of minor fixes as queued up post 3.15-ck1 made their way into this patchset, along with some changes inspired by the development work of Alfred Chen (thanks!).

The major feature upgrade in this one is the inclusion of SMT nice as discussed at length on this blog. This version of BFS includes an updated version of SMT nice beyond version 6 posted here with one change - 25% of the CPU time of any nice level of SCHED_NORMAL tasks can be shared with any other nice level over and above the nice-based CPU distribution. This is to capitalise on the slightly increased throughput that is available by using the sibling CPU concurrently without too dramatically affecting higher priority process CPU loss. In addition it dramatically reduces the massive latencies that can sometimes otherwise be seen by heavily niced tasks with SMT nice enabled by dithering the metering out of CPU instead of giving it all as a burst only when it's entitled to CPU.

Making SMT nice configurable means users can get to choose if they still want the standard behaviour. The config option will recommend users who enable the SMT scheduler option also enable the SMT nice option. I believe this to be a good default choice for virtually all desktop users, and selectively for server users if they depend heavily on the use of 'nice' or scheduling policies for their work cases (but otherwise it should be disabled).

BFS by itself:
3.16-sched-bfs-450.patch
3.16-ck1 branded BFS patchset directory:
3.16-ck1

EDIT: A build fix for non SMT enabled kernels to prevent it being possible to enable SMT nice is here:
bfs450-nosmt-buildfix.patch
Just disabling SMT nice will achieve the same thing for those affected.


Enjoy!
お楽しみください

51 comments:

  1. Thanks. For now you can disable "SMT (Hyperthreading) aware nice priority and policy support" in the processor settings to fix it without any detriment. I will provide a build fix shortly.

    ReplyDelete
    Replies
    1. Awesome! Thanks so much for your help.

      Delete
  2. @ck
    In 0450, I can see tsk_is_polling checking is total removed in resched_task(). But a debug code shows that TIF_POLLING_NRFLAG bit is set when checking in resched_task(), about 10000 counts in 2mins.
    [ 116.728800] bfs: resched_task 9571
    Would you give further hint why remove this checking in bfs? Thanks.

    ReplyDelete
    Replies
    1. Still trying to decide if it even means anything on BFS without wake lists.

      Delete
  3. Don't know how, but it seems that BFS breaks i8k module for my laptop. It simple doesn't work.

    Reverting BFS fixes the issue.

    The only change to i8k module between 3.15 and 3.16 is this commit:

    http://permalink.gmane.org/gmane.linux.kernel.commits.head/465805

    And it's related to CPU scheduler.

    Con?

    ReplyDelete
  4. Well done!
    Thanks to CK
    and Chen looking in the code
    Greetings,
    Ralph from Hamburg

    ReplyDelete
  5. Also, BFSv450 still triggers ath9k issue, but after dancing around it I guess I've found solution:

    https://github.com/pfactum/pf-kernel/commit/85b98dfc2279d478c6aea72d7fac082fab28f55a

    At least, it doesn't hang for now. Will test more.

    ReplyDelete
    Replies
    1. Well this got me thinking and I've come up with this patch instead. Can you try it instead please for your ath9k problem?
      bfs450-sched-ipi.patch

      Delete
    2. I've reverted my patch and applied yours.

      With your patch ksoftirqd jumps from 0 to 100 of CPU usage periodically, but now it doesn't hang the whole system, just slows it down, and Wi-Fi works.

      With my patch ksoftirqd doesn't jump to high CPU usage and everything works OK too without slowdown.

      I guess that's right direction. Would be happy to test more patches.

      Delete
    3. Just to note, I realize my patch to be dirty hack, and the real solution should be elsewhere.

      Delete
    4. @pf
      Good finding. scheduler_ipi() is removed by my patch [BFS] Remove runqueue wake_list. I do missed the preempt_fold_need_resched() call in it. ck's patch fixed it.
      Do you have a workable old version of kernel(maybe 3.13, 3.14) to jump back and check the ksoftirqd behaviours?

      Delete
  6. Everything just worked before 3.14.

    I guess preempt_fold_need_resched() is right fix, but it's only the part of the problem.

    BTW, Con, do you remember replacing cond_resched() with schedule() in KVM code to make it work properly with BFS? The symptom was similar to ath9k one — 100% of CPU used by ksoftirqd.

    ReplyDelete
  7. Also, Alfred, you may take a look at this thread:

    http://lists.tuxonice.net/pipermail/tuxonice-devel/2014-August/007509.html

    (many emails spanned across several months).

    There's detailed ath9k issue description.

    ReplyDelete
  8. Also, I'd like to share with you more traces captured during ath9k+BFS hang (without any patches) with "perf top".

    1. all CPU-consuming kernel functions: http://habrastorage.org/files/ae0/70f/8c3/ae070f8c3cba46a6a041891de8cb10d9.JPG

    2–13. disassembled functions (those at the top of the list):

    * http://habrastorage.org/files/ab3/b7a/c6a/ab3b7ac6a81c44e88e06dfd9908ea745.JPG
    * http://habrastorage.org/files/750/f75/212/750f7521230a46e2b29ce2ace3f850ed.JPG
    * http://habrastorage.org/files/758/c07/aee/758c07aee9784cdda9c9df30d429a887.JPG
    * http://habrastorage.org/files/1b6/0b2/7b9/1b60b27b9325494597c06792cba6b6a9.JPG
    * http://habrastorage.org/files/f72/6ea/d98/f726ead980cc475e9b394fa7a6b7f40f.JPG
    * http://habrastorage.org/files/9c6/d29/1f5/9c6d291f59ef429c9e444348bb5ef32e.JPG
    * http://habrastorage.org/files/9d7/8a9/232/9d78a92326ab4df9b4661a2f1e297889.JPG
    * http://habrastorage.org/files/3e1/07b/f15/3e107bf159154122a57b6f81b8b20080.JPG
    * http://habrastorage.org/files/b23/a4f/f55/b23a4ff55126404ea1160f59ed3873c7.JPG
    * http://habrastorage.org/files/89e/337/034/89e3370342be42aab95075a8fb198062.JPG
    * http://habrastorage.org/files/56f/a44/1a9/56fa441a97644c439f127916050da99c.JPG
    * http://habrastorage.org/files/494/5f9/c7e/4945f9c7e3f348f4951f86e65551f204.JPG

    ReplyDelete
    Replies
    1. Thanks PF. I assume you're saying the hang still happens then even despite the last patch I posted?

      Delete
    2. @ck do you mean bfs450-sched-ipi.patch? No. I'll describe once again.

      With your last patch ksoftirqd CPU usage jumps from 0 to 100% periodically, but now it doesn't hang the whole system, just slows it down, and Wi-Fi works.

      I posted bunch of jpegs for the case *without* your patch just to help you to debug the issue as it still persists *with* your patch (though in less extent and without hangs).

      Delete
    3. @pf
      Would you apply this patch upon bfs450-sched-ipi.patch and see it help with the ksofirqd CPU usage issue?
      It enables the mainline TIF_POLLING_NRFLAG checking routines, should help with ipi in somehow, but I am not sure if it help with ath9k module.

      https://bitbucket.org/alfredchen/linux-gc/downloads/0450-enable-polling-check.patch

      Delete
    4. @Alfred, I've applied bfs450-sched-ipi.patch and 0450-enable-polling-check.patch, and that doesn't fix ath9k issue. The system behaves the same as without your 0450-enable-polling-check.patch.

      Delete
    5. @pf, thanks for testing anyway.

      Delete
    6. PF thanks for your tireless testing.

      Can you give this crazy patch a try on top of sched ipi please?
      bfs450-tifcheck_in_cond_resched.patch

      Delete
    7. @ck it works :).

      So, I guess, the working solution on top of bare -ck1 is:

      1. https://github.com/pfactum/pf-kernel/commit/44b3e870e656a11aa7116c236b7e00591141a68a — brings back scheduler_ipi()

      2. https://github.com/pfactum/pf-kernel/commit/6a180442f154c5a624ee377dacfcc0b8631eb1e0 — uses tif_need_resched() in cond_resched()

      Also I've reverted KVM workarounds here:

      3. https://github.com/pfactum/pf-kernel/commit/ad4d566baf9a825f41240ce1785096028fdacd45

      KVM+QEMU works OK.

      Also, we don't need special i8k workaround. I've reverted it, and i8k seems to work as well.

      I'm going to test it more, but now everything seems to be OK. Thanks!

      Delete
    8. Thanks PF. I'd spent the last couple of days auditing code to see what might be responsible and that was the only solution I could come up with. The behaviour with this patch is definitely correct, but it's a bit disappointing because it means there's something fundamentally different in BFS handling the resched flag compared to mainline and I didn't intend to start diverting from mainline in this way. I'll keep auditing the code to see if there's an obvious trigger to act on this flag in a different place that I've missed but it's fair to say this is a sane solution for the time being and if I can't come up with anything, I'll just run with it.

      Delete
    9. OK, if there's necessity to test more patches, feel free to send them to me.

      Delete
    10. @PF: Here's a crazy thing. Try with only the sched-ipi patch and disable all preempt completely in your config and see what that does please.

      Delete
    11. @PF: And after that try this combination:
      bfs450-resched-scap.patch
      bfs450-sched-ipi.patch
      +
      bfs450-add-preempt-resched.patch

      Delete
    12. This comment has been removed by the author.

      Delete
    13. @ck: applying sched-ipi patch only and disabling preemption completely does the trick — ath9k seems to work OK, but i8k doesn't work.

      Delete
    14. @ck: also the last combination of patches works OK (both ath9k and i8k) with preemption enabled.

      Delete
    15. Aha! Now we're talking! This last set of patches is the correct fix (unlike the tifcheck patch). Let's try it for a day or two and then I can formalise these changes as a new BFS if nothing shows up. Thanks for testing!

      Delete
    16. @ck: if that is correct fix, we kindly ask you to explain what's happened :).

      Delete
    17. @ck: also with the last set of patches we do not need KVM workaround as well. I've reverted it too.

      Delete
    18. I can confirm that the last set of patches fix the ath9k issue.
      I'll report back if I encounter any issues.

      kudos to pf an ck for solving this issue, and big thanks!

      Delete
    19. @PF: From linux 3.13 setting just the "tif needs resched" flag alone was not enough to trigger a descheduling from certain places in the code, it needed the "preempt needs resched" tagged to trigger a different type of descheduling to hand over to another process or kick it off a cpu where it should no longer be.

      Delete
    20. ck > "Let's try it for a day or two and then I can formalise these changes as a new BFS if nothing shows up."

      I hope you will release: new fixed BFS releases, or as a cumulative patch, for the previous kernel versions, like 3.15 and also for the current lts 3.14

      thanks,
      bye, NicCo

      Delete
  9. The recurring theme is that _cond_resched no longer works properly in BFS. It presents as a different bug for the i8k module not unconditionally rescheduling when the affinity changes but is the same issue as the ath9k tasklet not properly rescheduling and the ksoftirq spinning without rescheduing. Now to go back to 3.13 and see what changed at that time and how it broke.

    ReplyDelete
  10. Hi Con,

    would only comment, that my system is running fine with 3.16.1 and BFS+SMT on i7. Hibernate and suspend are working trouble-free (now with an Intel Wireless 7260 card and not anymore with the ath9k module). Had in mind, that there was an higher load value, but this was not the case. And the NFS server on my machine gives the same throughput as without BFS, or even enough to stream wireless some HD videos. Make operations could need some more time now, but that was the goal ;)
    Or with other words, no negative drawback.

    So thanks for your work.
    Regards sysitos

    ReplyDelete
  11. So, if I read correctly, the patches:
    (1) http://ck.kolivas.org/patches/bfs/3.0/3.16/test/bfs450-sched-ipi.patch
    (2) http://ck.kolivas.org/patches/bfs/3.0/3.16/test/bfs450-tifcheck_in_cond_resched.patch
    are of benefit?
    And this one is not needed?:
    (3) http://ck.kolivas.org/patches/bfs/3.0/3.16/test/bfs450-resched-scap.patch

    Do these patches "only" (but thankfully!) heal the issues post-factum and others reported, or are they considered as bug-fixes for BFS?

    @ Alfred Chen: Would you recommend the patches 1-2 to be applied onto your 3.16.y-gc patched kernel, too?

    Best regards, Manuel Krause

    ReplyDelete
    Replies
    1. All 3 are bugfixes for everyone, but the patches have not been finalised. If you have no behavioural issues you are unlikely to see anything by applying them.

      Delete
    2. Hehe, Con, you're getting funny... My "behavioural issues"...? ;-) Besides tracking and applying your patches...? ^^ Keep up this sense of humor!
      No, really funny, and self-ironical for me!

      I still want to read Alfred Chen on this, too, as I currently run his "old" 3.16.y-gc patches, so far, and he hasn't published an updated one for your, Con's, 3.16 release yet.

      Manuel Krause

      Delete
    3. The same 3 bugfix patches can be applied to Kernel 3.15.x?

      thanks
      NicCo

      Delete
    4. Yes they apply equally there too except for bfs450-resched-scap.patch

      Delete
    5. The NEW patch set also works well for a non affected system with a 3.16.y-gc patched kernel,
      applied on top on here:
      bfs450-resched-scap.patch
      bfs450-sched-ipi.patch
      bfs450-add-preempt-resched.patch (No.1 with fuzz o.k, No.8 failed, as already removed o.k)

      I hope Alfred Chen does consider this safe... ^^

      Thank you all, and best regards,
      Manuel Krause

      BTW, I knew the "behavioural issues" are meant regarding my system, that's why I found it so funny as it can have double meaning for real life..

      Delete
    6. I'm watching this thread. I will update my -gc branch by re-basing 0450 and sync with 3.16.2 from mainline, hopefully next week. As ck said debug is not finished, I will not include these 3 patches so you can apply updated ones if you affected by similar issues.

      Delete
  12. With "behavioural issues" he means the way your system behaves.

    ReplyDelete