The Linux kernel attempts to protect portions of its memory from unexpected modification (through potential future exploits) by setting areas read-only where the compiler has allowed it (CONFIG_DEBUG_RODATA). This, combined with marking function pointer tables “const”, reduces the number of easily writable kernel memory targets for attackers.
However, modules (which are almost the bulk of kernel code) were not handled, and remained read-write, regardless of compiler markings. In 2.6.38, thanks to the efforts of many people (especially Siarhei Liakh and Matthieu Castet), CONFIG_DEBUG_SET_MODULE_RONX was created (and CONFIG_DEBUG_RODATA expanded).
To visualize the effects, I patched Arjan van de Ven’s arch/x86/mm/dump_pagetables.c
to be a loadable module so I could look at /sys/kernel/debug/kernel_page_tables
without needing to rebuild my kernel with CONFIG_X86_PTDUMP.
Comparing Lucid (2.6.32), Maverick (2.6.35), and Natty (2.6.38), it’s clear to see the effects of the RO/NX improvements, especially in the “Modules” section which has no NX markings at all before 2.6.38:
lucid-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | grep NX | wc -l 0 maverick-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | grep NX | wc -l 0 natty-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | grep NX | wc -l 76
2.6.38’s memory region is much more granular, since each module has been chopped up for the various segment permissions:
lucid-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | wc -l 53 maverick-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | wc -l 67 natty-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | wc -l 155
For example, here’s the large “sunrpc” module. “RW” is read-write, “ro” is read-only, “x” is executable, and “NX” is non-executable:
maverick-amd64# awk '/^'$(awk '/^sunrpc/ {print $NF}' /proc/modules)'/','!/GLB/' /sys/kernel/debug/kernel_page_tables 0xffffffffa005d000-0xffffffffa0096000 228K RW GLB x pte 0xffffffffa0096000-0xffffffffa0098000 8K pte natty-amd64# awk '/^'$(awk '/^sunrpc/ {print $NF}' /proc/modules)'/','!/GLB/' /sys/kernel/debug/kernel_page_tables 0xffffffffa005d000-0xffffffffa007a000 116K ro GLB x pte 0xffffffffa007a000-0xffffffffa0083000 36K ro GLB NX pte 0xffffffffa0083000-0xffffffffa0097000 80K RW GLB NX pte 0xffffffffa0097000-0xffffffffa0099000 8K pte
The latter looks a whole lot more like a proper ELF (text segment is read-only and executable, rodata segment is read-only and non-executable, and data segment is read-write and non-executable).
Just another reason to make sure you’re using your CPU’s NX bit (via 64bit or 32bit-PAE kernels)! (And no, PAE is not slower in any meaningful way.)
© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.