Previously: v5.4.
I got a bit behind on this blog post series! Let’s get caught up. Here are a bunch of security things I found interesting in the Linux kernel v5.5 release:
restrict perf_event_open()
from LSM
Given the recurring flaws in the perf subsystem, there has been a strong desire to be able to entirely disable the interface. While the kernel.perf_event_paranoid
sysctl knob has existed for a while, attempts to extend its control to “block all perf_event_open()
calls” have failed in the past. Distribution kernels have carried the rejected sysctl patch for many years, but now Joel Fernandes has implemented a solution that was deemed acceptable: instead of extending the sysctl, add LSM hooks so that LSMs (e.g. SELinux, Apparmor, etc) can make these choices as part of their overall system policy.
generic fast full refcount_t
Will Deacon took the recent refcount_t
hardening work for both x86 and arm64 and distilled the implementations into a single architecture-agnostic C version. The result was almost as fast as the x86 assembly version, but it covered more cases (e.g. increment-from-zero), and is now available by default for all architectures. (There is no longer any Kconfig associated with refcount_t
; the use of the primitive provides full coverage.)
linker script cleanup for exception tables
When Rick Edgecombe presented his work on building Execute-Only memory under a hypervisor, he noted a region of memory that the kernel was attempting to read directly (instead of execute). He rearranged things for his x86-only patch series to work around the issue. Since I’d just been working in this area, I realized the root cause of this problem was the location of the exception table (which is strictly a lookup table and is never executed) and built a fix for the issue and applied it to all architectures, since it turns out the exception tables for almost all architectures are just a data table. Hopefully this will help clear the path for more Execute-Only memory work on all architectures. In the process of this, I also updated the section fill bytes on x86 to be a trap (0xCC, int3), instead of a NOP instruction so functions would need to be targeted more precisely by attacks.
KASLR for 32-bit PowerPC
Joining many other architectures, Jason Yan added kernel text base-address offset randomization (KASLR) to 32-bit PowerPC.
seccomp for RISC-V
After a bit of long road, David Abdurachmanov has added seccomp support to the RISC-V architecture. The series uncovered some more corner cases in the seccomp self tests code, which is always nice since then we get to make it more robust for the future!
seccomp USER_NOTIF
continuation
When the seccomp SECCOMP_RET_USER_NOTIF
interface was added, it seemed like it would only be used in very limited conditions, so the idea of needing to handle “normal” requests didn’t seem very onerous. However, since then, it has become clear that the overhead of a monitor process needing to perform lots of “normal” open()
calls on behalf of the monitored process started to look more and more slow and fragile. To deal with this, it became clear that there needed to be a way for the USER_NOTIF
interface to indicate that seccomp should just continue as normal and allow the syscall without any special handling. Christian Brauner implemented SECCOMP_USER_NOTIF_FLAG_CONTINUE
to get this done. It comes with a bit of a disclaimer due to the chance that monitors may use it in places where ToCToU is a risk, and for possible conflicts with SECCOMP_RET_TRACE
. But overall, this is a net win for container monitoring tools.
EFI_RNG_PROTOCOL
for x86
Some EFI systems provide a Random Number Generator interface, which is useful for gaining some entropy in the kernel during very early boot. The arm64 boot stub has been using this for a while now, but Dominik Brodowski has now added support for x86 to do the same. This entropy is useful for kernel subsystems performing very earlier initialization whre random numbers are needed (like randomizing aspects of the SLUB memory allocator).
FORTIFY_SOURCE
for MIPS
As has been enabled on many other architectures, Dmitry Korotin got MIPS building with CONFIG_FORTIFY_SOURCE
, so compile-time (and some run-time) buffer overflows during calls to the memcpy()
and strcpy()
families of functions will be detected.
limit copy_{to,from}_user()
size to INT_MAX
As done for VFS, vsnprintf()
, and strscpy()
, I went ahead and limited the size of copy_to_user()
and copy_from_user()
calls to INT_MAX
in order to catch any weird overflows in size calculations.
Other things
Alexander Popov pointed out some more v5.5 features that I missed in this blog post. I’m repeating them here, with some minor edits/clarifications. Thank you Alexander!
- KASan support for vmap memory expands KASan’s features to include analysis of memory allocated with
vmalloc()
. More specifically, this means KASan can examine the stack again, since it can be in that region sinceCONFIG_VMAP_STACK
was introduced. - MIPS can build with GCC plugins and KCOV now, providing those systems with the associated features like stack wiping, coverage analysis, etc.
- userfaultfd requires CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK. This was done to block unprivileged users from abusing the userfaultfd feature, as it is implemented in a way that provides the ability to inject file descriptors into unsuspecting processes.
Edit: added Alexander Popov’s notes
That’s it for v5.5! Let me know if there’s anything else that I should call out here. Next up: Linux v5.6.
© 2020, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
\o/ Thanks for catching up on those summaries :-D
With so many code additions per release, it’s good to see some folks focusing on security as well.
Comment by Chris — June 5, 2020 @ 2:53 am