October « 2016 « codeblog

October 20, 2016

CVE-2016-5195

Filed under: Chrome OS,Debian,Kernel,Security,Ubuntu,Ubuntu-Server — kees @ 4:02 pm

My prior post showed my research from earlier in the year at the 2016 Linux Security Summit on kernel security flaw lifetimes. Now that CVE-2016-5195 is public, here are updated graphs and statistics. Due to their rarity, the Critical bug average has now jumped from 3.3 years to 5.2 years. There aren’t many, but, as I mentioned, they still exist, whether you know about them or not. CVE-2016-5195 was sitting on everyone’s machine when I gave my LSS talk, and there are still other flaws on all our Linux machines right now. (And, I should note, this problem is not unique to Linux.) Dealing with knowing that there are always going to be bugs present requires proactive kernel self-protection (to minimize the effects of possible flaws) and vendors dedicated to updating their devices regularly and quickly (to keep the exposure window minimized once a flaw is widely known).

So, here are the graphs updated for the 668 CVEs known today:

Critical: 3 @ 5.2 years average
High: 44 @ 6.2 years average
Medium: 404 @ 5.3 years average
Low: 216 @ 5.5 years average

Comments (3)

October 18, 2016

Security bug lifetime

Filed under: Chrome OS,Debian,Kernel,Security,Ubuntu,Ubuntu-Server — kees @ 9:46 pm

In several of my recent presentations, I’ve discussed the lifetime of security flaws in the Linux kernel. Jon Corbet did an analysis in 2010, and found that security bugs appeared to have roughly a 5 year lifetime. As in, the flaw gets introduced in a Linux release, and then goes unnoticed by upstream developers until another release 5 years later, on average. I updated this research for 2011 through 2016, and used the Ubuntu Security Team’s CVE Tracker to assist in the process. The Ubuntu kernel team already does the hard work of trying to identify when flaws were introduced in the kernel, so I didn’t have to re-do this for the 557 kernel CVEs since 2011.

As the README details, the raw CVE data is spread across the active/, retired/, and ignored/ directories. By scanning through the CVE files to find any that contain the line “Patches_linux:”, I can extract the details on when a flaw was introduced and when it was fixed. For example CVE-2016-0728 shows:

Patches_linux:
 break-fix: 3a50597de8635cd05133bd12c95681c82fe7b878 23567fd052a9abb6d67fe8e7a9ccdd9800a540f2

This means that CVE-2016-0728 is believed to have been introduced by commit 3a50597de8635cd05133bd12c95681c82fe7b878 and fixed by commit 23567fd052a9abb6d67fe8e7a9ccdd9800a540f2. If there are multiple lines, then there may be multiple SHAs identified as contributing to the flaw or the fix. And a “-” is just short-hand for the start of Linux git history.

Then for each SHA, I queried git to find its corresponding release, and made a mapping of release version to release date, wrote out the raw data, and rendered graphs. Each vertical line shows a given CVE from when it was introduced to when it was fixed. Red is “Critical”, orange is “High”, blue is “Medium”, and black is “Low”:

CVE lifetimes 2011-2016

And here it is zoomed in to just Critical and High:

Critical and High CVE lifetimes 2011-2016

The line in the middle is the date from which I started the CVE search (2011). The vertical axis is actually linear time, but it’s labeled with kernel releases (which are pretty regular). The numerical summary is:

Critical: 2 @ 3.3 years
High: 34 @ 6.4 years
Medium: 334 @ 5.2 years
Low: 186 @ 5.0 years

This comes out to roughly 5 years lifetime again, so not much has changed from Jon’s 2010 analysis.

While we’re getting better at fixing bugs, we’re also adding more bugs. And for many devices that have been built on a given kernel version, there haven’t been frequent (or some times any) security updates, so the bug lifetime for those devices is even longer. To really create a safe kernel, we need to get proactive about self-protection technologies. The systems using a Linux kernel are right now running with security flaws. Those flaws are just not known to the developers yet, but they’re likely known to attackers, as there have been prior boasts/gray-market advertisements for at least CVE-2010-3081 and CVE-2013-2888.

(Edit: see my updated graphs that include CVE-2016-5195.)

Comments (8)

October 4, 2016

security things in Linux v4.8

Filed under: Chrome OS,Debian,Kernel,Security,Ubuntu,Ubuntu-Server — kees @ 5:26 pm

Previously: v4.7. Here are a bunch of security things I’m excited about in Linux v4.8:

SLUB freelist ASLR

Thomas Garnier continued his freelist randomization work by adding SLUB support.

x86_64 KASLR text base offset physical/virtual decoupling

On x86_64, to implement the KASLR text base offset, the physical memory location of the kernel was randomized, which resulted in the virtual address being offset as well. Due to how the kernel’s “-2GB” addressing works (gcc‘s “-mcmodel=kernel“), it wasn’t possible to randomize the physical location beyond the 2GB limit, leaving any additional physical memory unused as a randomization target. In order to decouple the physical and virtual location of the kernel (to make physical address exposures less valuable to attackers), the physical location of the kernel needed to be randomized separately from the virtual location. This required a lot of work for handling very large addresses spanning terabytes of address space. Yinghai Lu, Baoquan He, and I landed a series of patches that ultimately did this (and in the process fixed some other bugs too). This expands the physical offset entropy to roughly $physical_memory_size_of_system / 2MB bits.

x86_64 KASLR memory base offset

Thomas Garnier rolled out KASLR to the kernel’s various statically located memory ranges, randomizing their locations with CONFIG_RANDOMIZE_MEMORY. One of the more notable things randomized is the physical memory mapping, which is a known target for attacks. Also randomized is the vmalloc area, which makes attacks against targets vmalloced during boot (which tend to always end up in the same location on a given system) are now harder to locate. (The vmemmap region randomization accidentally missed the v4.8 window and will appear in v4.9.)

x86_64 KASLR with hibernation

Rafael Wysocki (with Thomas Garnier, Borislav Petkov, Yinghai Lu, Logan Gunthorpe, and myself) worked on a number of fixes to hibernation code that, even without KASLR, were coincidentally exposed by the earlier W^X fix. With that original problem fixed, then memory KASLR exposed more problems. I’m very grateful everyone was able to help out fixing these, especially Rafael and Thomas. It’s a hard place to debug. The bottom line, now, is that hibernation and KASLR are no longer mutually exclusive.

gcc plugin infrastructure

Emese Revfy ported the PaX/Grsecurity gcc plugin infrastructure to upstream. If you want to perform compiler-based magic on kernel builds, now it’s much easier with CONFIG_GCC_PLUGINS! The plugins live in scripts/gcc-plugins/. Current plugins are a short example called “Cyclic Complexity” which just emits the complexity of functions as they’re compiled, and “Sanitizer Coverage” which provides the same functionality as gcc’s recent “-fsanitize-coverage=trace-pc” but back through gcc 4.5. Another notable detail about this work is that it was the first Linux kernel security work funded by Linux Foundation’s Core Infrastructure Initiative. I’m looking forward to more plugins!

If you’re on Debian or Ubuntu, the required gcc plugin headers are available via the gcc-$N-plugin-dev package (and similarly for all cross-compiler packages).

hardened usercopy

Along with work from Rik van Riel, Laura Abbott, Casey Schaufler, and many other folks doing testing on the KSPP mailing list, I ported part of PAX_USERCOPY (the basic runtime bounds checking) to upstream as CONFIG_HARDENED_USERCOPY. One of the interface boundaries between the kernel and user-space are the copy_to_user()/copy_from_user() family of functions. Frequently, the size of a copy is known at compile-time (“built-in constant”), so there’s not much benefit in checking those sizes (hardened usercopy avoids these cases). In the case of dynamic sizes, hardened usercopy checks for 3 areas of memory: slab allocations, stack allocations, and kernel text. Direct kernel text copying is simply disallowed. Stack copying is allowed as long as it is entirely contained by the current stack memory range (and on x86, only if it does not include the saved stack frame and instruction pointers). For slab allocations (e.g. those allocated through kmem_cache_alloc() and the kmalloc()-family of functions), the copy size is compared against the size of the object being copied. For example, if copy_from_user() is writing to a structure that was allocated as size 64, but the copy gets tricked into trying to write 65 bytes, hardened usercopy will catch it and kill the process.

For testing hardened usercopy, lkdtm gained several new tests: USERCOPY_HEAP_SIZE_TO, USERCOPY_HEAP_SIZE_FROM, USERCOPY_STACK_FRAME_TO,
USERCOPY_STACK_FRAME_FROM, USERCOPY_STACK_BEYOND, and USERCOPY_KERNEL. Additionally, USERCOPY_HEAP_FLAG_TO and USERCOPY_HEAP_FLAG_FROM were added to test what will be coming next for hardened usercopy: flagging slab memory as “safe for copy to/from user-space”, effectively whitelisting certainly slab caches, as done by PAX_USERCOPY. This further reduces the scope of what’s allowed to be copied to/from, since most kernel memory is not intended to ever be exposed to user-space. Adding this logic will require some reorganization of usercopy code to add some new APIs, as PAX_USERCOPY’s approach to handling special-cases is to add bounce-copies (copy from slab to stack, then copy to userspace) as needed, which is unlikely to be acceptable upstream.

seccomp reordered after ptrace

By its original design, seccomp filtering happened before ptrace so that seccomp-based ptracers (i.e. SECCOMP_RET_TRACE) could explicitly bypass seccomp filtering and force a desired syscall. Nothing actually used this feature, and as it turns out, it’s not compatible with process launchers that install seccomp filters (e.g. systemd, lxc) since as long as the ptrace and fork syscalls are allowed (and fork is needed for any sensible container environment), a process could spawn a tracer to help bypass a filter by injecting syscalls. After Andy Lutomirski convinced me that ordering ptrace first does not change the attack surface of a running process (unless all syscalls are blacklisted, the entire ptrace attack surface will always be exposed), I rearranged things. Now there is no (expected) way to bypass seccomp filters, and containers with seccomp filters can allow ptrace again.

Edit: missed this next feature when I originally posted

NX stack and heap on MIPS

Other architectures have had a non-executable stack and heap for a while now, and now MIPS has caught up, thanks to Paul Barton. The primary reason for the delay was finding a way to cleanly deal with branch delay slot instructions which needed a place to write instructions. Traditionally this was on the stack, but now it’s handled with a per-mm page.

That’s it for v4.8! The merge window is open for v4.9…

Comments Off

October 3, 2016

security things in Linux v4.7

Filed under: Chrome OS,Debian,Kernel,Security,Ubuntu,Ubuntu-Server — kees @ 12:47 am

Previously: v4.6. Onward to security things I found interesting in Linux v4.7:

KASLR text base offset for MIPS

Matt Redfearn added text base address KASLR to MIPS, similar to what’s available on x86 and arm64. As done with x86, MIPS attempts to gather entropy from various build-time, run-time, and CPU locations in an effort to find reasonable sources during early-boot. MIPS doesn’t yet have anything as strong as x86’s RDRAND (though most have an instruction counter like x86’s RDTSC), but it does have the benefit of being able to use Device Tree (i.e. the “/chosen/kaslr-seed” property) like arm64 does. By my understanding, even without Device Tree, MIPS KASLR entropy should be as strong as pre-RDRAND x86 entropy, which is more than sufficient for what is, similar to x86, not a huge KASLR range anyway: default 8 bits (a span of 16MB with 64KB alignment), though CONFIG_RANDOMIZE_BASE_MAX_OFFSET can be tuned to the device’s memory, giving a maximum of 11 bits on 32-bit, and 15 bits on EVA or 64-bit.

SLAB freelist ASLR

Thomas Garnier added CONFIG_SLAB_FREELIST_RANDOM to make slab allocation layouts less deterministic with a per-boot randomized freelist order. This raises the bar for successful kernel slab attacks. Attackers will need to either find additional bugs to help leak slab layout information or will need to perform more complex grooming during an attack. Thomas wrote a post describing the feature in more detail here: Randomizing the Linux kernel heap freelists. (SLAB is done in v4.7, and SLUB in v4.8.)

eBPF JIT constant blinding

Daniel Borkmann implemented constant blinding in the eBPF JIT subsystem. With strong kernel memory protections (CONFIG_DEBUG_RODATA) in place, and with the segregation of user-space memory execution from kernel (i.e SMEP, PXN, CONFIG_CPU_SW_DOMAIN_PAN), having a place where user-space can inject content into an executable area of kernel memory becomes very high-value to an attacker. The eBPF JIT was exactly such a thing: the use of BPF constants could result in the JIT producing instruction flows that could include attacker-controlled instructions (e.g. by directing execution into the middle of an instruction with a constant that would be interpreted as a native instruction). The eBPF JIT already uses a number of other defensive tricks (e.g. random starting position), but this added randomized blinding to any BPF constants, which makes building a malicious execution path in the eBPF JIT memory much more difficult (and helps block attempts at JIT spraying to bypass other protections).

Elena Reshetova updated a 2012 proof-of-concept attack to succeed against modern kernels to help provide a working example of what needed fixing in the JIT. This serves as a thorough regression test for the protection.

The cBPF JITs that exist in ARM, MIPS, PowerPC, and Sparc still need to be updated to eBPF, but when they do, they’ll gain all these protections immediatley.

Bottom line is that if you enable the (disabled-by-default) bpf_jit_enable sysctl, be sure to set the bpf_jit_harden sysctl to 2 (to perform blinding even for root).

fix brk ASLR weakness on arm64 compat

There have been a few ASLR fixes recently (e.g. ET_DYN, x86 32-bit unlimited stack), and while reviewing some suggested fixes to arm64 brk ASLR code from Jon Medhurst, I noticed that arm64’s brk ASLR entropy was slightly too low (less than 1 bit) for 64-bit and noticeably lower (by 2 bits) for 32-bit compat processes when compared to native 32-bit arm. I simplified the code by using literals for the entropy. Maybe we can add a sysctl some day to control brk ASLR entropy like was done for mmap ASLR entropy.

LoadPin LSM

LSM stacking is well-defined since v4.2, so I finally upstreamed a “small” LSM that implements a protection I wrote for Chrome OS several years back. On systems with a static root of trust that extends to the filesystem level (e.g. Chrome OS’s coreboot+depthcharge boot firmware chaining to dm-verity, or a system booting from read-only media), it’s redundant to sign kernel modules (you’ve already got the modules on read-only media: they can’t change). The kernel just needs to know they’re all coming from the correct location. (And this solves loading known-good firmware too, since there is no convention for signed firmware in the kernel yet.) LoadPin requires that all modules, firmware, etc come from the same mount (and assumes that the first loaded file defines which mount is “correct”, hence load “pinning”).

That’s it for v4.7. Prepare yourself for v4.8 next!