Previously: v5.5.
Linux v5.6 was released back in March. Here’s my quick summary of various features that caught my attention:
WireGuard
The widely used WireGuard VPN has been out-of-tree for a very long time. After 3 1/2 years since its initial upstream RFC, Ard Biesheuvel and Jason Donenfeld finished the work getting all the crypto prerequisites sorted out for the v5.5 kernel. For this release, Jason has gotten WireGuard itself landed. It was a twisty road, and I’m grateful to everyone involved for sticking it out and navigating the compromises and alternative solutions.
openat2()
syscall and RESOLVE_*
flags
Aleksa Sarai has added a number of important path resolution “scoping” options to the kernel’s open()
handling, covering things like not walking above a specific point in a path hierarchy (RESOLVE_BENEATH
), disabling the resolution of various “magic links” (RESOLVE_NO_MAGICLINKS
) in procfs (e.g. /proc/$pid/exe
) and other pseudo-filesystems, and treating a given lookup as happening relative to a different root directory (as if it were in a chroot, RESOLVE_IN_ROOT
). As part of this, it became clear that there wasn’t a way to correctly extend the existing openat()
syscall, so he added openat2()
(which is a good example of the efforts being made to codify “Extensible Syscall” arguments). The RESOLVE_*
set of flags also cover prior behaviors like RESOLVE_NO_XDEV
and RESOLVE_NO_SYMLINKS
.
pidfd_getfd()
syscall
In the continuing growth of the much-needed pidfd APIs, Sargun Dhillon has added the pidfd_getfd()
syscall which is a way to gain access to file descriptors of a process in a race-less way (or when /proc
is not mounted). Before, it wasn’t always possible make sure that opening file descriptors via /proc/$pid/fd/$N
was actually going to be associated with the correct PID. Much more detail about this has been written up at LWN.
openat()
via io_uring
With my “attack surface reduction” hat on, I remain personally suspicious of the io_uring()
family of APIs, but I can’t deny their utility for certain kinds of workloads. Being able to pipeline reads and writes without the overhead of actually making syscalls is pretty great for performance. Jens Axboe has added the IORING_OP_OPENAT command so that existing io_urings can open files to be added on the fly to the mapping of available read/write targets of a given io_uring. While LSMs are still happily able to intercept these actions, I remain wary of the growing “syscall multiplexer” that io_uring is becoming. I am, of course, glad to see that it has a comprehensive (if “out of tree”) test suite as part of liburing
.
removal of blocking random pool
After making algorithmic changes to obviate separate entropy pools for random numbers, Andy Lutomirski removed the blocking random pool. This simplifies the kernel pRNG code significantly without compromising the userspace interfaces designed to fetch “cryptographically secure” random numbers. To quote Andy, “This series should not break any existing programs. /dev/urandom
is unchanged. /dev/random
will still block just after booting, but it will block less than it used to.” See LWN for more details on the history and discussion of the series.
arm64 support for on-chip RNG
Mark Brown added support for the future ARMv8.5’s RNG (SYS_RNDR_EL0
), which is, from the kernel’s perspective, similar to x86’s RDRAND
instruction. This will provide a bootloader-independent way to add entropy to the kernel’s pRNG for early boot randomness (e.g. stack canary values, memory ASLR offsets, etc). Until folks are running on ARMv8.5 systems, they can continue to depend on the bootloader for randomness (via the UEFI RNG interface) on arm64.
arm64 E0PD
Mark Brown added support for the future ARMv8.5’s E0PD feature (TCR_E0PD1
), which causes all memory accesses from userspace into kernel space to fault in constant time. This is an attempt to remove any possible timing side-channel signals when probing kernel memory layout from userspace, as an alternative way to protect against Meltdown-style attacks. The expectation is that E0PD would be used instead of the more expensive Kernel Page Table Isolation (KPTI) features on arm64.
powerpc32 VMAP_STACK
Christophe Leroy added VMAP_STACK support to powerpc32, joining x86, arm64, and s390. This helps protect against the various classes of attacks that depend on exhausting the kernel stack in order to collide with neighboring kernel stacks. (Another common target, the sensitive thread_info
, had already been moved away from the bottom of the stack by Christophe Leroy in Linux v5.1.)
generic Page Table dumping
Related to RISCV’s work to add page table dumping (via /sys/fs/debug/kernel_page_tables
), Steven Price extracted the existing implementations from multiple architectures and created a common page table dumping framework (and then refactored all the other architectures to use it). I’m delighted to have this because I still remember when not having a working page table dumper for ARM delayed me for a while when trying to implement upstream kernel memory protections there. Anything that makes it easier for architectures to get their kernel memory protection working correctly makes me happy.
That’s in for now; let me know if there’s anything you think I missed. Next up: Linux v5.7.
© 2020 – 2022, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
With the arm64 arch extensions we might see some implementations on earlier revisions of the architecture – an implementation can provide some features from newer architecture revisions without implementing all the mandatory features for that revision. Conversely v8.5-RNG might not be present on a v8.5 system since it’s optional.
Comment by Mark Brown — September 5, 2020 @ 5:07 am