[comment]: # (This presentation was made with markdown-slides) [comment]: # (This is a CommonMark compliant comment. It will not be included in the presentation.) [comment]: # (Compile this presentation with the command below) [comment]: # (mdslides presentation.md --include media) [comment]: # (Set the theme:) [comment]: # (The list of themes is at https://revealjs.com/themes/) [comment]: # (The list of code themes is at https://highlightjs.org/) [comment]: # (Pass optional settings to reveal.js:) [comment]: # (Other settings are documented at https://revealjs.com/config/) Linux Security Summit North America | April 18, 2024 ## Mitigating Integer Overflow in C Kees ("Case") Cook [https://fosstodon.org/@kees](https://fosstodon.org/@kees) keescook@chromium.org [Kernel Self-Protection Project](https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project) https://outflux.net/slides/2024/lss-na/
### Why do we care about integer overflow flaws? - The root cause of many security vulnerabilities is arithmetic overflow. - [CWE-190: Integer Overflow or Wraparound](https://cwe.mitre.org/data/definitions/190.html) "... can introduce other weaknesses when the calculation is used for resource management or execution control." - Difficult to get rid of in C due to many surprising behaviors - For the Linux kernel, past efforts to eliminate them don't seem to be making a much of a dent...
### A decade of integer overflow flaws Linux kernel averages (at least) 6 serious integer overflow CVEs a year ... [comment]: # (chartjs config https://www.chartjs.org/docs/3.2.0/general/fonts.html)
,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023 CVEs,9,4,8,7,9,4,4,4,7,5
### What is an "overflow" anyway? - A calculation’s result exceeds the involved representation range - Can't calculate 3 * 128 (384) with 8 bit unsigned storage:
1
1000 0000
(9th bit has nowhere to go...) `== 128`
8 value bits
(representation range of 0-255) - Can't calculate 120 + 10 (130) with 8 bit signed 2s-complement storage:
1
000 0010
(ends up shoved into sign bit) `== -2`
sign
7 value bits
(representation range of -128 to 127)
### Arithmetic overflow resolution strategies - none ("just execute the machine code and see what happens") (this behavior is why we commonly think of overflow as wrap-around) - wrap-around to other end of range (default for 2s-complement storage) - saturate (do not exceed min/max storage range, e.g. `refcount_t`) - trap (stop the execution context) - "undefined behavior" (pretend we don't know what CPU we're running on, treat the result as "undefined", and let compiler optimization passes melt the world)
### Where we are today with the C Standard (unsigned) - For **unsigned** integers, the C Standard explicitly says the unsigned integer arithmetic overflow strategy must be "wrap-around" - all our memory systems already Do The Right Thing for free - if there was some weirdo machine, it would need to produce the machine code needed to get the wrap-around result - More on unsigned wrap-around later
### Where we are today with the C Standard (signed) - For **signed** integers (e.g. the very common `"int"`), the C Standard says the signed integer arithmetic overflow strategy is "undefined behavior" - In theory this was to deal with finding a way to unify C for all compute environments decades ago including weird non-2s-complement memory systems? - In practice this **used** to mean the "none" strategy, which for all sane (2s-complement) systems would result in wrap-around. - With modern optimizing compilers, though, "undefined behavior" means "let the optimizer hallucinate".
### Signed integer overflow as undefined behavior One common C code pattern used to check for a potential overflow condition is to depend on wrap-around: ```cpp[1|1-5] if (value + offset < value) { /* Overflow detected, reject bad offset. */ return -EINVAL; } do_things_knowing_offset_is_safe(value + offset); ``` But after optimization, it effectively becomes: ```cpp /* Undefined behavior! Just remove the check completely! What could go wrong? */ do_things_knowing_offset_is_safe(value + offset); ``` 2009: [https://bugzilla.kernel.org/show_bug.cgi?id=12597](https://bugzilla.kernel.org/show_bug.cgi?id=12597)
### Avoiding undefined behavior from signed integer overflow - `-fno-strict-overflow` (`-fwrapv`) changes strategy for signed integer overflow from "undefined behavior" to "wrap-around" ```cpp extern char *safe, *unsafe; char *test(int value, int offset) /* value in %rdi, offset in %rsi */ { if (value + offset < value) { return unsafe; } return safe; } ``` e.g., built with only `-O2` nothing actually checks the contents of "value": ```asm[2-5|3] mov rax, QWORD PTR unsafe[rip] test esi, esi cmovns rax, QWORD PTR safe[rip] ret ```
### Avoiding undefined behavior from signed integer overflow - `-fno-strict-overflow` (`-fwrapv`) changes strategy for signed integer overflow from "undefined behavior" to "wrap-around" ```cpp extern char *safe, *unsafe; char *test(int value, int offset) /* value in %rdi, offset in %rsi */ { if (value + offset < value) { return unsafe; } return safe; } ``` Built with `-O2 -fno-strict-overflow ` it does the right thing: ```asm[1-5|1-2] lea eax, [rdi+rsi] cmp eax, edi mov rax, QWORD PTR unsafe[rip] cmovge rax, QWORD PTR safe[rip] ret ```
### Hurray, now everything will overflow! Err, wait... - Now that arithmetic actually *works* reliably, how do we catch overflows? - Undefined Behavior Sanitizer's arithmetic overflow checkers to the rescue! - `-fsanitize=signed-integer-overflow` - `-fsanitize=unsigned-integer-overflow` (Clang only) - (also `-fsanitize=pointer-overflow ` but not discussed today) - Two possible handlers - warn, but continue with calculation anyway (default) - trap, via `-fsanitize-trap=[SANITIZER|GROUP|all]`
### UBSAN signed integer overflow ```cpp #include
#include
int main(int argc, char *argv[]) { int foo = INT_MAX; printf("result: %d\n", foo * argc); return 0; } ``` Default handler:` ` ```shell $ SAN="-fsanitize=signed-integer-overflow" $ clang -O2 -Wall $SAN foo.c -o foo $ ./foo result: 2147483647 $ ./foo bang foo.c:5:29: runtime error: signed integer overflow: 2147483647 * 2 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior foo.c:5:29 result: -2 ```
### UBSAN signed integer overflow ```cpp #include
#include
int main(int argc, char *argv[]) { int foo = INT_MAX; printf("result: %d\n", foo * argc); return 0; } ``` Trap handler:` ` ```shell $ SAN="-fsanitize=signed-integer-overflow -fsanitize-trap=all" $ clang -O2 -Wall $SAN foo.c -o foo $ ./foo result: 2147483647 $ ./foo bang Illegal instruction (core dumped) ```
### UBSAN signed integer overflow ```cpp #include
#include
int main(int argc, char *argv[]) { int foo = INT_MAX; printf("result: %d\n", foo * argc); return 0; } ``` *But Linux builds with* `-fno-strict-overflow` ... ```shell $ SAN="-fsanitize=signed-integer-overflow -fsanitize-trap=all" $ clang -O2 -Wall $SAN -fno-strict-overflow foo.c -o foo $ ./foo result: 2147483647 $ ./foo bang result: -2 ``` 😭 😭 😭
### UBSAN *unsigned* integer overflow ```cpp #include
#include
int main(int argc, char *argv[]) { unsigned int foo = UINT_MAX; printf("result: %u\n", foo * argc); return 0; } ``` But `-fno-strict-overflow` does *not* break the unsigned sanitizer ... ```shell $ SAN="-fsanitize=unsigned-integer-overflow -fsanitize-trap=all" $ clang -O2 -Wall $SANITIZER -fno-strict-overflow foo.c -o foo $ ./foo result: 4294967295 $ ./foo bang Illegal instruction (core dumped) ```
### My head is spinning - The "signed integer overflow" sanitizer only runs when it finds *undefined* signed integer overflows, but
`-fno-strict-overflow`
makes it defined, disabling the checks. 🤦 - The "undefined behavior" sanitizer named "unsigned integer overflow" is finding overflows, which are *always defined* for unsigned types (using the wrap-around strategy). Neither of these things are named correctly! 😫 Regardless, in the real world, in order to mitigate attacks **we must be able to instrument *overflows* no matter their resolution strategy**.
### Step 1: Make the integer overflow sanitizers actually work - UBSAN should be named "Unexpected" Behavior Sanitizer instead 😅 - signed integer overflow sanitizer needs to detect overflows regardless of the overflow resolution strategy (i.e. it should still work even with
`-fno-strict-overflow`
) - Justin Stitt has [fixed this in Clang 19](https://github.com/llvm/llvm-project/commit/81b4b89197a6be5f19f907b558540bb3cb70f064) and later. - With that done, we can start testing it again in [Linux v6.9](https://git.kernel.org/linus/557f8c582a9ba8abe6aa0fd734b6f342af106b26) and later. - GCC still [needs a fix](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102317). - *unsigned* integer overflow sanitizer needs to be [implemented in GCC](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96829). In both GCC bugs above, developers need to be convinced what is needed are **overflow** sanitizers, not **undefined behavior** sanitizers.
### Wait, what about "open coded" overflow detection... Remember this? ```cpp if (value + offset < value) { /* Overflow detected, reject bad offset. */ return -EINVAL; } do_things_knowing_offset_is_safe(value + offset); ``` The (now fixed) sanitizer will trigger if `value + offset` ever wraps around. But that's what is literally being tested for here, so we need to avoid instrumentation.
### Identify and eliminate overflow detection code patterns We have two potential solutions. - Refactor the code to use either the common overflow checking helpers, or perform a non-overflow calculation. - Likely best for the more rare signed type patterns. or - Teach the compiler about the need for these kinds of "idiom exclusions" where it will automatically avoid instrumentation. - Likely best for the more idiomatic unsigned type patterns.
### Step 2: Refactor "open coded" signed integer overflow tests For cases where the calculation result is used later, the `check_*_overflow()` helpers can be used: ```cpp if (check_add_overflow(value, offset, &result)) { /* Overflow detected, reject bad offset. */ return -EINVAL; } do_things_knowing_offset_is_safe(result); ``` Otherwise, switch to non-overflowing calculations: ```cpp if (offset > type_max(value) - value) { /* Overflow detected, reject bad offset. */ return -EINVAL; } ```
### [Coccinelle](https://coccinelle.gitlabpages.inria.fr/website/) to the rescue! ``` @self_wrap@ typedef bool, s8, s16, s32, s64, loff_t; {signed char, short, int, long, long long, ssize_t, bool, s8, s16, s32, s64, loff_t} VAR; expression OFFSET; @@ - ((VAR + OFFSET) < VAR) + OFFSET > type_max(VAR) - VAR @below_zero@ typedef bool, s8, s16, s32, s64, loff_t; {signed char, short, int, long, long long, ssize_t, bool, s8, s16, s32, s64, loff_t} VAR, OFFSET; @@ - ((VAR + OFFSET) < 0) + OFFSET > type_max(VAR) - VAR ```
### Step 3: Teach compiler about open-coded overflow detection This will be needed for unsigned types (and pointer arithmetic), as depending on the wrap-around behavior is much more common in the kernel since it was never under threat from "undefined behavior". ```cpp if (value + offset < value) { ... } ``` ```cpp if (value + offset < 0) { ... } ``` Regardless, developing this logic will likely wait until Linux has sufficiently adopted the signed integer overflow sanitizer and we're ready to move forward with the unsigned sanitizer.
### "Was this operation expected to wrap around?" - Since unsigned arithmetic has always been expected to wrap-around, it becomes much harder to disambiguate the developer's intent. - Sometimes it's easy: - allocation sizes don't want to wrap around - hashes, crypto, and other modulo-style ops *do* want to wrap - Sprinkling `check_add_overflow()` and `check_mul_overflow()` everywhere will become both tedious and annoying to read - However, we do already disambiguate types for various things: - `gfp_t` used to be `unsigned int` - `time_t` is making headlines with its new width...
### Step 4: Create wrapping types - We can catch wrap-around with the sanitizers, but to *allow* for wrap-around, they must be disabled, but granularity is poor: - Disable instrumentation at directory, file, or function level - Use explicit wrapping helpers, per operation - Solution: add a `typedef` attribute named “`wraps`”, which disables wrap-around checking for operations that include the given type: ```cpp typedef unsigned int __attribute__((wraps)) u32_wrap; u32_wrap foo = U32_MAX; foo = foo + 31337; /* sanitizer does not instrument this operation */ ``` Clang implementation from Justin Stitt [up for review](https://github.com/llvm/llvm-project/pull/86618).
### Step 5: More idiom exclusions: `--` and `++` With the sanitizer enabled, what happens at the end of this loop? ```cpp void process_all(u32 num) { while (num--) process(num); } ```
### Step 5: More idiom exclusions: `--` and `++` With the sanitizer enabled, what happens at the end of this loop? ```cpp void process_all(u32 num) { while (num--) process(num); } ``` Avoided by using `u32_wrap` or just refactor ("only" 924 instances): ```cpp for (; num; num--) ``` But best if the compiler can be taught to avoid instrumenting things like this (unless `num` is actually used later...)
### Step 6: More idiom exclusions: unsigned negative (?!) constants As a short-hand for "set all the bits", or "slightly less than max value", the kernel makes extensive use of negative constant values cast to unsigned types: ```c unsigned long ret; ... ret = -1UL; /* Could replace this with ULONG_MAX */ ``` ```c #define KVM_HVA_ERR_RO_BAD (-2UL) /* Hmm "ULONG_MAX - 1" is getting ugly */ ``` The sanitizer needs to just ignore this type of construct. If it's a integer constant expression, just silently accept the wrap-around.
### Now we're ready to catch flaws! Example of a 32-bit wrap-around [recently fixed](https://git.kernel.org/linus/3b0daecfeac0103aba8b293df07a0cbaf8b43f29): ```c[1-18|3|10-11|13|15] struct kfd_ioctl_get_process_apertures_new_args { ... __u32 num_of_nodes; ... }; ... struct kfd_ioctl_get_process_apertures_new_args *args = data; struct kfd_process_device_apertures *pa; ... pa = kzalloc((sizeof(struct kfd_process_device_apertures) * args->num_of_nodes), GFP_KERNEL); ... for (i = 0; i < min(p->n_pdds, args->num_of_nodes); i++) { ... pa[i].gpu_id = pdd->dev->id; ... } ```
### Now we're ready to catch flaws! ```c pa = kzalloc((sizeof(struct kfd_process_device_apertures) * args->num_of_nodes), GFP_KERNEL); ``` Sanitizer catches this! Though so does `CONFIG_UBSAN_BOUNDS` once we try to access beyond the end of `pa`'s allocation: ```c pa[i].gpu_id = pdd->dev->id; ``` But that's just good luck since the allocation and the array access happen in the same function.
### Now we're ready to catch flaws? Here's another overflow [fixed last year](https://git.kernel.org/linus/6311071a056272e1e761de8d0305e87cc566f734): ```c[1-10|1|3-4|6|8-9] u8 num_elems = 0; ... nla_for_each_nested(nl_elems, attrs, rem_elems) num_elems++; /* Whoops */ elems = kzalloc(struct_size(elems, elem, num_elems), GFP_KERNEL); ... nla_for_each_nested(nl_elems, attrs, rem_elems) { elems->elem[i].data = nla_data(nl_elems); ... ```
### Now we're ready to catch flaws? Here's another overflow [fixed last year](https://git.kernel.org/linus/6311071a056272e1e761de8d0305e87cc566f734): ```c[4] u8 num_elems = 0; ... nla_for_each_nested(nl_elems, attrs, rem_elems) num_elems++; /* Whoops */ elems = kzalloc(struct_size(elems, elem, num_elems), GFP_KERNEL); ... nla_for_each_nested(nl_elems, attrs, rem_elems) { elems->elem[i].data = nla_data(nl_elems); ... ``` Integer overflow sanitizer does *not* catch this! 😲
### Integer promotion, implicit truncation, and bears, oh my! What is actually happening during `num++` ? ```c[1|1-2|2-3|3-5|5-6|6-8|8-10|10-12|12] u8 num = 255; /* 1111 1111 */ num ++; num = num /* 1111 1111 */ + 1; /* type size smaller than int, so promoted to int */ (int)num /* 0000 0000 0000 0000 0000 0000 1111 1111 */ + (int)1; (int)255 /* 0000 0000 0000 0000 0000 0000 1111 1111 */ + (int)1; /* sanitizer checks arithmetic, concludes "this is fine" */ (int)256 /* 0000 0000 0000 0000 0000 0001 0000 0000 */; /* assignment is to a u8, so cast back to u8 which truncates */ u8 num = (u8)256 /* 0000 0000 */; /* Need to check for implicit truncation above! */ u8 num = 0; /* 0000 0000 */ ```
### Step 7: Enable even more sanitizers: implicit truncation! - `-fsanitize=implicit-signed-integer-truncation` - Has the widest coverage due to integer promotion, i.e. from signed - `-fsanitize=implicit-unsigned-integer-truncation` Similar issues to solve, just visible at assignment time now: ```c u8 counter; ... while (counter--) { ... } ``` Either change the type to signed, or use `u8_wrap`. But, as before, it'd be best if the compiler could simply avoid instrumentation here.
### Summary - Fix integer overflow sanitizers when using `-fno-strict-overflow` - Refactor "open coded" signed integer overflow tests - Teach sanitizer about open-coded overflow detection - Create wrapping types and start refactoring to use them - Teach sanitizer about post decremented loop variables - Teach sanitizer about unsigned negative constants - Enable implicit truncation sanitizers - Continue needed refactoring in Linux throughout!
### Looking into the crystal ball for the coming year . - Linux v6.9 _May 2024?_ - Introduce `CONFIG_UBSAN_SIGNED_WRAP` for testing, which works with tip-of-tree Clang, but is still rough around the edges. - Linux v6.10 _Aug 2024?_ - False positives found via syzkaller are addressed. - `CONFIG_UBSAN_SIGNED_WRAP` ready for wider testing.
### Looking into the crystal ball for the coming year .. - Clang 19 _Sep 2024?_ - integer overflow sanitizers work with `-fno-strict-overflow` - “`wraps`” attribute available - “`value + offset < value`” idiom excluded - “`while (var--)`” idiom excluded - negative unsigned integer constant expression idiom exclusion - Linux v6.11 _Nov 2024?_ - Introduce `CONFIG_UBSAN_UNSIGNED_WRAP` for initial testing. - Add “`wraps`” types for use on things like jiffies, hashes, etc.
### Looking into the crystal ball for the coming year ... - Linux v6.11 _Jan 2025?_ - `CONFIG_UBSAN_UNSIGNED_WRAP` ready for wider testing. - Introduce `CONFIG_UBSAN_SIGNED_TRUNC` for initial testing. - Continue fixing false positives. - Linux v6.12 _Apr 2025?_ - `CONFIG_UBSAN_SIGNED_TRUC` ready for wider testing. - Introduce `CONFIG_UBSAN_UNSIGNED_TRUNC` for initial testing. - Continue fixing false positives.
## Questions and Feedback? **Thank you for your attention!** Kees ("Case") Cook [https://fosstodon.org/@kees](https://fosstodon.org/@kees) keescook@chromium.org [Kernel Self-Protection Project](https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project) https://outflux.net/slides/2024/lss-na/