November « 2019 « codeblog

November 20, 2019

experimenting with Clang CFI on upstream Linux

Filed under: Blogging,Chrome OS,Debian,Kernel,Security,Ubuntu,Ubuntu-Server — kees @ 9:09 pm

While much of the work on kernel Control Flow Integrity (CFI) is focused on arm64 (since kernel CFI is available on Android), a significant portion is in the core kernel itself (and especially the build system). Recently I got a sane build and boot on x86 with everything enabled, and I’ve been picking through some of the remaining pieces. I figured now would be a good time to document everything I do to get a build working in case other people want to play with it and find stuff that needs fixing.

First, everything is based on Sami Tolvanen’s upstream port of Clang’s forward-edge CFI, which includes his Link Time Optimization (LTO) work, which CFI requires. This tree also includes his backward-edge CFI work on arm64 with Clang’s Shadow Call Stack (SCS).

On top of that, I’ve got a few x86-specific patches that get me far enough to boot a kernel without warnings pouring across the console. Along with that are general linker script cleanups, CFI cast fixes, and x86 crypto fixes, all in various states of getting upstreamed. The resulting tree is here.

On the compiler side, you need a very recent Clang and LLD (i.e. “Clang 10”, or what I do is build from the latest git). For example, here’s how to get started. First, checkout, configure, and build Clang (leave out “--depth=1” if you want the full git history):

# Check out latest LLVM
mkdir -p $HOME/src
cd $HOME/src
git clone --depth=1 https://github.com/llvm/llvm-project.git
mkdir -p llvm-build
cd llvm-build
# Configure
mkdir -p $HOME/bin/clang-release
cmake -G Ninja \
      -DCMAKE_BUILD_TYPE=Release \
      -DLLVM_ENABLE_PROJECTS='clang;lld;compiler-rt' \
      -DCMAKE_INSTALL_PREFIX="$HOME/bin/clang-release" \
      ../llvm-project/llvm
# Build!
ninja install

Then checkout, configure, and build the CFI tree. (This assumes you’ve already got a checkout of Linus’s tree.)

# Check out my branch
cd ../linux
git remote add kees https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
git fetch kees
git checkout kees/kspp/cfi/x86 -b test/cfi
# Use the above built Clang path first for the needed binaries: clang, ld.lld, and llvm-ar.
PATH="$HOME/bin/clang-release/bin:$PATH"
# Configure (this uses "defconfig" but you could use "menuconfig"), but you must
# include CC and LD in the make args or your .config won't know about Clang.
make defconfig CC=clang LD=ld.lld
# Enable LTO and CFI.
scripts/config \
     -e CONFIG_LTO \
     -e CONFIG_THINLTO \
     -d CONFIG_LTO_NONE \
     -e CONFIG_LTO_CLANG \
     -e CONFIG_CFI_CLANG \
     -e CONFIG_CFI_PERMISSIVE \
     -e CONFIG_CFI_CLANG_SHADOW
# Enable LKDTM if you want runtime fault testing:
scripts/config -e CONFIG_LKDTM
# Build!
make -j$(getconf _NPROCESSORS_ONLN) CC=clang LD=ld.lld

Do not be alarmed by various warnings, such as:

ld.lld: warning: cannot find entry symbol _start; defaulting to 0x1000
llvm-ar: error: unable to load 'arch/x86/kernel/head_64.o': file too small to be an archive
llvm-ar: error: unable to load 'arch/x86/kernel/head64.o': file too small to be an archive
llvm-ar: error: unable to load 'arch/x86/kernel/ebda.o': file too small to be an archive
llvm-ar: error: unable to load 'arch/x86/kernel/platform-quirks.o': file too small to be an archive
WARNING: EXPORT symbol "page_offset_base" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "vmalloc_base" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "vmemmap_base" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: "__memcat_p" [vmlinux] is a static (unknown)
no symbols

Adjust your .config as you want (but, again, make sure the CC and LD args are pointed at Clang and LLD respectively). This should(!) result in a happy bootable x86 CFI-enabled kernel. If you want to see what a CFI failure looks like, you can poke LKDTM:

# Log into the booted system as root, then:
cat <(echo CFI_FORWARD_PROTO) >/sys/kernel/debug/provoke-crash/DIRECT
dmesg

Here’s the CFI splat I see on the console:

[   16.288372] lkdtm: Performing direct entry CFI_FORWARD_PROTO
[   16.290563] lkdtm: Calling matched prototype ...
[   16.292367] lkdtm: Calling mismatched prototype ...
[   16.293696] ------------[ cut here ]------------
[   16.294581] CFI failure (target: lkdtm_increment_int$53641d38e2dc4a151b75cbe816cbb86b.cfi_jt+0x0/0x10):
[   16.296288] WARNING: CPU: 3 PID: 2612 at kernel/cfi.c:29 __cfi_check_fail+0x38/0x40
...
[   16.346873] ---[ end trace 386b3874d294d2f7 ]---
[   16.347669] lkdtm: Fail: survived mismatched prototype function call!

The claim of “Fail: survived …” is due to CONFIG_CFI_PERMISSIVE=y. This allows the kernel to warn but continue with the bad call anyway. This is handy for debugging. In a production kernel that would be removed and the offending kernel thread would be killed. If you run this again with the config disabled, there will be no continuation from LKDTM. :)

Enjoy! And if you can figure out before me why there is still CFI instrumentation in the KPTI entry handler, please let me know and help us fix it. ;)

Comments (2)

November 14, 2019

security things in Linux v5.3

Filed under: Blogging,Chrome OS,Debian,Kernel,Security,Ubuntu,Ubuntu-Server — kees @ 5:36 pm

Previously: v5.2.

Linux kernel v5.3 was released! I let this blog post get away from me, but it’s up now! :) Here are some security-related things I found interesting:

heap variable initialization
In the continuing work to remove “uninitialized” variables from the kernel, Alexander Potapenko added new “init_on_alloc” and “init_on_free” boot parameters (with associated Kconfig defaults) to perform zeroing of heap memory either at allocation time (i.e. all kmalloc()s effectively become kzalloc()s), at free time (i.e. all kfree()s effectively become kzfree()s), or both. The performance impact of the former under most workloads appears to be under 1%, if it’s measurable at all. The “init_on_free” option, however, is more costly but adds the benefit of reducing the lifetime of heap contents after they have been freed (which might be useful for some use-after-free attacks or side-channel attacks). Everyone should enable CONFIG_INIT_ON_ALLOC_DEFAULT_ON=1 (or boot with “init_on_alloc=1“), and the more paranoid system builders should add CONFIG_INIT_ON_FREE_DEFAULT_ON=1 (or “init_on_free=1” at boot). As workloads are found that cause performance concerns, tweaks to the initialization coverage can be added.

pidfd_open() added
Christian Brauner has continued his pidfd work by creating the next needed syscall: pidfd_open(), which takes a pid and returns a pidfd. This is useful for cases where process creation isn’t yet using CLONE_PIDFD, and where /proc may not be mounted.

-Wimplicit-fallthrough enabled globally
Gustavo A.R. Silva landed the last handful of implicit fallthrough fixes left in the kernel, which allows for -Wimplicit-fallthrough to be globally enabled for all kernel builds. This will keep any new instances of this bad code pattern from entering the kernel again. With several hundred implicit fallthroughs identified and fixed, something like 1 in 10 were missing breaks, which is way higher than I was expecting, making this work even more well justified.

x86 CR4 & CR0 pinning
In recent exploits, one of the steps for making the attacker’s life easier is to disable CPU protections like Supervisor Mode Access (and Execute) Prevention (SMAP and SMEP) by finding a way to write to CPU control registers to disable these features. For example, CR4 controls SMAP and SMEP, where disabling those would let an attacker access and execute userspace memory from kernel code again, opening up the attack to much greater flexibility. CR0 controls Write Protect (WP), which when disabled would allow an attacker to write to read-only memory like the kernel code itself. Attacks have been using the kernel’s CR4 and CR0 writing functions to make these changes (since it’s easier to gain that level of execute control), but now the kernel will attempt to “pin” sensitive bits in CR4 and CR0 to avoid them getting disabled. This forces attacks to do more work to enact such register changes going forward. (I’d like to see KVM enforce this too, which would actually protect guest kernels from all attempts to change protected register bits.)

additional kfree() sanity checking
In order to avoid corrupted pointers doing crazy things when they’re freed (as seen in recent exploits), I added additional sanity checks to verify kmem cache membership and to make sure that objects actually belong to the kernel slab heap. As a reminder, everyone should be building with CONFIG_SLAB_FREELIST_HARDENED=1.

KASLR enabled by default on arm64
Just as Kernel Address Space Layout Randomization (KASLR) was enabled by default on x86, now KASLR has been enabled by default on arm64 too. It’s worth noting, though, that in order to benefit from this setting, the bootloader used for such arm64 systems needs to either support the UEFI RNG function or provide entropy via the “/chosen/kaslr-seed” Device Tree property.

hardware security embargo documentation
As there continues to be a long tail of hardware flaws that need to be reported to the Linux kernel community under embargo, a well-defined process has been documented. This will let vendors unfamiliar with how to handle things follow the established best practices for interacting with the Linux kernel community in a way that lets mitigations get developed before embargoes are lifted. The latest (and HTML rendered) version of this process should always be available here.

Those are the things I had on my radar. Please let me know if there are other things I should add! Linux v5.4 is almost here…

Comments (6)

codeblog code is freedom — patching my itch

November 20, 2019

experimenting with Clang CFI on upstream Linux

November 14, 2019

security things in Linux v5.3