Previously: v4.19.
Linux kernel v4.20 has been released today! Looking through the changes, here are some security-related things I found interesting:
stackleak plugin
Alexander Popov’s work to port the grsecurity STACKLEAK plugin to the upstream kernel came to fruition. While it had received Acks from x86 (and arm64) maintainers, it has been rejected a few times by Linus. With everything matching Linus’s expectations now, it and the x86 glue have landed. (The arch-specific portions for arm64 from Laura Abbott actually landed in v4.19.) The plugin tracks function calls (with a sufficiently large stack usage) to mark the maximum depth of the stack used during a syscall. With this information, at the end of a syscall, the stack can be efficiently poisoned (i.e. instead of clearing the entire stack, only the portion that was actually used during the syscall needs to be written). There are two main benefits from the stack getting wiped after every syscall. First, there are no longer “uninitialized” values left over on the stack that an attacker might be able to use in the next syscall. Next, the lifetime of any sensitive data on the stack is reduced to only being live during the syscall itself. This is mainly interesting because any information exposures or side-channel attacks from other kernel threads need to be much more carefully timed to catch the stack data before it gets wiped.
Enabling CONFIG_GCC_PLUGIN_STACKLEAK=y
means almost all uninitialized variable flaws go away, with only a very minor performance hit (it appears to be under 1% for most workloads). It’s still possible that, within a single syscall, a later buggy function call could use “uninitialized” bytes from the stack from an earlier function. Fixing this will need compiler support for pre-initialization (this is under development already for Clang, for example), but that may have larger performance implications.
raise faults for kernel addresses in copy_*_user()
Jann Horn reworked x86 memory exception handling to loudly notice when copy_{to,from}_user()
tries to access unmapped kernel memory. Prior this, those accesses would result in a silent error (usually visible to callers as EFAULT
), making it indistinguishable from a “regular” userspace memory exception. The purpose of this is to catch cases where, for example, the unchecked __copy_to_user()
is called against a kernel address. Fuzzers like syzcaller weren’t able to notice very nasty bugs because writes to kernel addresses would either corrupt memory (which may or may not get detected at a later time) or return an EFAULT
that looked like things were operating normally. With this change, it’s now possible to much more easily notice missing access_ok()
checks. This has already caught two other corner cases even during v4.20 in HID and Xen.
spectre v2 userspace mitigation
The support for Single Thread Indirect Branch Predictors (STIBP) has been merged. This allowed CPUs that support STIBP to effectively disable Hyper-Threading to avoid indirect branch prediction side-channels to expose information between userspace threads on the same physical CPU. Since this was a very expensive solution, this protection was made opt-in (via explicit prctl()
or implicitly under seccomp()
). LWN has a nice write-up of the details.
jump labels read-only after init
Ard Biesheuvel noticed that jump labels don’t need to be writable after initialization, so their data structures were made read-only. Since they point to kernel code, they might be used by attackers to manipulate the jump targets as a way to change kernel code that wasn’t intended to be changed. Better to just move everything into the read-only memory region to remove it from the possible kernel targets for attackers.
VLA removal finished
As detailed earlier for v4.17, v4.18, and v4.19, a whole bunch of people answered my call to remove Variable Length Arrays (VLAs) from the kernel. I count at least 153 commits having been added to the kernel since v4.16 to remove VLAs, with a big thanks to Gustavo A. R. Silva, Laura Abbott, Salvatore Mesoraca, Kyle Spiers, Tobin C. Harding, Stephen Kitt, Geert Uytterhoeven, Arnd Bergmann, Takashi Iwai, Suraj Jitindar Singh, Tycho Andersen, Thomas Gleixner, Stefan Wahren, Prashant Bhole, Nikolay Borisov, Nicolas Pitre, Martin Schwidefsky, Martin KaFai Lau, Lorenzo Bianconi, Himanshu Jha, Chris Wilson, Christian Lamparter, Boris Brezillon, Ard Biesheuvel, and Antoine Tenart. With all that done, “-Wvla
” has been added to the top-level Makefile so we don’t get any more added back in the future.
per-task stack canaries, powerpc
For a long time, only x86 has had per-task kernel stack canaries. Other architectures would generate a single canary for the life of the boot and use it in every task. This meant that exposing a canary from one task would give an attacker everything they needed to spoof a canary in a separate attack in a different task. Christophe Leroy has solved this on powerpc now, integrating the new GCC support for the -mstack-protector-guard-reg
and -mstack-protector-guard-offset
options.
Given the holidays, Linus opened the merge window before v4.20 was released, letting everyone send in pull requests in the week leading up to the release. v4.21 is in the making. :) Happy New Year everyone!
Edit: clarified stackleak details, thanks to Alexander Popov. Added per-task canaries note too.
© 2018 – 2019, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
Hello Kees, thanks for the article!
Let me add two brief remarks:
1. The STACKLEAK plugin adds stack tracking to functions with a big stack frame (>=CONFIG_STACKLEAK_TRACK_MIN_SIZE), not to leaf functions.
2. Within a single syscall, a later buggy function using uninitialized memory can utilize data left on stack by earlier functions.
Best regards!
Comment by Alexander Popov — December 24, 2018 @ 10:24 pm