Previously: v5.9
Linux v5.10 was released in December, 2020. Here’s my summary of various security things that I found interesting:
AMD SEV-ES
While guest VM memory encryption with AMD SEV has been supported for a while, Joerg Roedel, Thomas Lendacky, and others added register state encryption (SEV-ES). This means it’s even harder for a VM host to reconstruct a guest VM’s state.
x86 static calls
Josh Poimboeuf and Peter Zijlstra implemented static calls for x86, which operates very similarly to the “static branch” infrastructure in the kernel. With static branches, an if
/else
choice can be hard-coded, instead of being run-time evaluated every time. Such branches can be updated too (the kernel just rewrites the code to switch around the “branch”). All these principles apply to static calls as well, but they’re for replacing indirect function calls (i.e. a call through a function pointer) with a direct call (i.e. a hard-coded call address). This eliminates the need for Spectre mitigations (e.g. RETPOLINE) for these indirect calls, and avoids a memory lookup for the pointer. For hot-path code (like the scheduler), this has a measurable performance impact. It also serves as a kind of Control Flow Integrity implementation: an indirect call got removed, and the potential destinations have been explicitly identified at compile-time.
network RNG improvements
In an effort to improve the pseudo-random number generator used by the network subsystem (for things like port numbers and packet sequence numbers), Linux’s home-grown pRNG has been replaced by the SipHash round function, and perturbed by (hopefully) hard-to-predict internal kernel states. This should make it very hard to brute force the internal state of the pRNG and make predictions about future random numbers just from examining network traffic. Similarly, ICMP’s global rate limiter was adjusted to avoid leaking details of network state, as a start to fixing recent DNS Cache Poisoning attacks.
SafeSetID handles GID
Thomas Cedeno improved the SafeSetID LSM to handle group IDs (which required teaching the kernel about which syscalls were actually performing setgid.) Like the earlier setuid policy, this lets the system owner define an explicit list of allowed group ID transitions under CAP_SETGID
(instead of to just any group), providing a way to keep the power of granting this capability much more limited. (This isn’t complete yet, though, since handling setgroups()
is still needed.)
improve kernel’s internal checking of file contents
The kernel provides LSMs (like the Integrity subsystem) with details about files as they’re loaded. (For example, loading modules, new kernel images for kexec, and firmware.) There wasn’t very good coverage for cases where the contents were coming from things that weren’t files. To deal with this, new hooks were added that allow the LSMs to introspect the contents directly, and to do partial reads. This will give the LSMs much finer grain visibility into these kinds of operations.
set_fs removal continues
With the earlier work landed to free the core kernel code from set_fs()
, Christoph Hellwig made it possible for set_fs() to be optional for an architecture. Subsequently, he then removed set_fs()
entirely for x86, riscv, and powerpc. These architectures will now be free from the entire class of “kernel address limit” attacks that only needed to corrupt a single value in struct thead_info
.
sysfs_emit() replaces sprintf() in /sys
Joe Perches tackled one of the most common bug classes with sprintf()
and snprintf()
in /sys
handlers by creating a new helper, sysfs_emit()
. This will handle the cases where kernel code was not correctly dealing with the length results from sprintf()
calls, which might lead to buffer overflows in the PAGE_SIZE
buffer that /sys
handlers operate on. With the helper in place, it was possible to start the refactoring of the many sprintf()
callers.
nosymfollow mount option
Mattias Nissler and Ross Zwisler implemented the nosymfollow
mount option. This entirely disables symlink resolution for the given filesystem, similar to other mount options where noexec
disallows execve()
, nosuid
disallows setid bits, and nodev
disallows device files. Quoting the patch, it is “useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts.” (i.e. for when /proc/sys/fs/protected_symlinks
isn’t a big enough hammer.) Chrome OS uses this option for its stateful filesystem, as symlink traversal as been a common attack-persistence vector.
ARMv8.5 Memory Tagging Extension support
Vincenzo Frascino added support to arm64 for the coming Memory Tagging Extension, which will be available for ARMv8.5 and later chips. It provides 4 bits of tags (covering multiples of 16 byte spans of the address space). This is enough to deterministically eliminate all linear heap buffer overflow flaws (1 tag for “free”, and then rotate even values and odd values for neighboring allocations), which is probably one of the most common bugs being currently exploited. It also makes use-after-free and over/under indexing much more difficult for attackers (but still possible if the target’s tag bits can be exposed). Maybe some day we can switch to 128 bit virtual memory addresses and have fully versioned allocations. But for now, 16 tag values is better than none, though we do still need to wait for anyone to actually be shipping ARMv8.5 hardware.
fixes for flaws found by UBSAN
The work to make UBSAN generally usable under syzkaller continues to bear fruit, with various fixes all over the kernel for stuff like shift-out-of-bounds, divide-by-zero, and integer overflow. Seeing these kinds of patches land reinforces the the rationale of shifting the burden of these kinds of checks to the toolchain: these run-time bugs continue to pop up.
flexible array conversions
The work on flexible array conversions continues. Gustavo A. R. Silva and others continued to grind on the conversions, getting the kernel ever closer to being able to enable the -Warray-bounds
compiler flag and clear the path for saner bounds checking of array indexes and memcpy()
usage.
That’s it for now! Please let me know if you think anything else needs some attention. Next up is Linux v5.11.
© 2022, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.