If you’re running a 64bit system, and you’ve got users with access to a video device (/dev/video*
), then be sure you update your kernels for CVE-2010-2963. I’ve been slowly making my way through auditing the many uses in the Linux kernel of the copy_from_user()
function, and ran into this vulnerability.
Here’s the kernel code from drivers/media/video/v4l2-compat-ioctl32.c
:
static int get_microcode32(struct video_code *kp, struct video_code32 __user *up) { if (!access_ok(VERIFY_READ, up, sizeof(struct video_code32)) || copy_from_user(kp->loadwhat, up->loadwhat, sizeof(up->loadwhat)) || get_user(kp->datasize, &up->datasize) || copy_from_user(kp->data, up->data, up->datasize)) return -EFAULT; return 0; }
Note that kp->data
is being used as the target for up->data
in the final copy_from_user()
without actually verifying that kp->data
is pointing anywhere safe. Here’s the caller of get_microcode32
:
static long do_video_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { union { struct video_tuner vt; struct video_code vc; ... } karg; void __user *up = compat_ptr(arg); ... switch (cmd) { ... case VIDIOCSMICROCODE: err = get_microcode32(&karg.vc, up); ...
So, the contents of up
are totally under control of the caller, and the contents of karg
(in our case, the video_code
structure) are not initialized at all. So, it seems like a call for VIDIOCSMICROCODE
would write video_code->datasize
bytes from video_code->data
into some random kernel address, just causing an Oops, since we don’t control what is on the kernel’s stack.
But wait, who says we can’t control the contents of the kernel’s stack? In fact, this compat function makes it extremely easy. Let’s look back at the union. Notice the struct video_tuner
? That gets populated from the caller’s up
memory via this case of the switch (cmd)
statement:
... case VIDIOCSTUNER: case VIDIOCGTUNER: err = get_video_tuner32(&karg.vt, up); ...
So, to control the kernel stack, we just need to call this ioctl
twice in a row: once to populate the stack via VIDIOCSTUNER
with the contents we want (including the future address for video_code->data
, which starts at the same location as video_tuner->name[20]
), and then again with VIDIOCSMICROCODE
.
Tricks involved here are: the definition of the VIDIOCSMICROCODE
case in the kernel is wrong, and calling the ioctl
s without any preparation can trigger other kernel work (memory faults, etc) that may destroy the stack contents. First, we need the real value for the desired case statement. This turns out to be 0x4020761b. Next, we just repeatedly call the setup ioctl
in an attempt to get incidental kernel work out of the way so that our last ioctl
doing the stack preparation will stick, and then we call the buggy ioctl
to trigger the vulnerability.
Since the ioctl already does a multi-byte copy, we can now copy arbitrary lengths of bytes into kernel memory. One method of turning an arbitrary kernel memory write into a privilege escalation is to overwrite a kernel function pointer, and trigger that function. Based on the exploit for CVE-2010-3081, I opted to overwrite the security_ops
function pointer table. Their use of msg_queue_msgctl
wasn’t very good for the general case since it’s near the end of the table and its offset would depend on kernel versions. Initially I opted for getcap
, but in the end used ptrace_traceme
, both of which are very near the top the security_ops
structure. (Though I need share credit here with Dan Rosenberg as we were working together on improving the reliability of the security_ops
overwrite method. He used the same approach for his excellent RDS exploit.)
Here are the steps for one way of taking an arbitrary kernel memory write and turning it into a root escalation:
- overwrite
security_ops
withdefault_security_ops
, which will revert the LSM back to the capabilities-only security operations. This, however, means we can calculate wherecap_ptrace_traceme
is. - overwrite
default_security_ops->ptrace_traceme
to point to our supplied function that will actually perform the privilege escalation (thanks to Brad Spengler for his code from Enlightenment). - trigger the function (in this case, call
ptrace(PTRACE_TRACEME, 0, NULL, NULL)
). - restore
default_security_ops->ptrace_traceme
to point tocap_ptrace_traceme
so the next caller doesn’t Oops the system (since userspace memory will be remapped).
Here’s the source for Vyakarana as seen running in Enlightenment using cap_getcap
(which is pretty unstable, so you might want to switch it to use ptrace_traceme
), and as a stand-alone memory writer.
Conclusions: Keep auditing the kernel for more arbitrary writes; I think there are still many left. Reduce the exploitation surface within the kernel itself (which PaX and grsecurity have been doing for a while now), specifically:
- Block userspace memory access while in kernel mode. This would stop the ability to make the kernel start executing functions that live in userspace — a clear privilege violation. This protection would stop the current exploit above, but the exploit could be adjusted to use kernel memory instead.
- Keep function pointers read-only. There is no reason for these function pointer tables (
fops
,IDT
,security_ops
, etc) to be writable. These should all be marked correctly, with inline code exceptions being made for updating the global pointers to those tables, leaving the pointer read-only after it gets set. This would stop this particular exploit above, but there are still plenty more targets. - Randomize the kernel stack location on a per-syscall basis. This will stop exploits that depend on a stable kernel stack location (as this exploit does).
© 2010, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
Hi,
I’ve seen mention of this elsewhere.
It would be nice if you could elaborate on how, exactly, to fix these vulnerabilities. I have a couple in mind which I’m responsible for, but I’m unsure exactly how to fix them — and be sure I’ve done it right.
The other mention I saw of this also failed to explain how to fix it, other than mentioning access_ok() should be used. I could take a stab at it, but I’d like to know I was doing it right.
Comment by SteveC — October 19, 2010 @ 4:28 pm
In general, it’s best to just verify every single use of copy_to_user(), copy_from_user(), get_user(), and put_user(), etc (or any memory copying for that matter). Make sure you know where you’re reading and writing from, and how large those accesses are. (For example, are you sure a length can’t be negative, or larger than you’re expecting?) Calling access_ok() is already done inside the copy_*_user() functions, so it usually isn’t needed. You’ll note that in this particular case, it was the kernel destination that was unchecked. Everything was fine about the userspace reads — nothing was out of bounds. But nothing actually defined where or how large the kernel destination buffer was.
Comment by kees — October 19, 2010 @ 4:38 pm
Thanks for this detailed post!
Comment by JohnTaylor — October 19, 2010 @ 11:49 pm
Actually, it turns out I was thinking of a different issue now that I’m looking at it again, CVS-2010-3081
http://sota.gen.nz/compat1/
also related to the compat stuff, but to compat_alloc_user_space().
Comment by SteveC — October 20, 2010 @ 5:14 am
That should trigger a sparse warning shouldn’t it? If someone dereferences a __user pointer?
I don’t have a 64 bit computer and so I don’t compile that file. But someone should have caught that.
Comment by Dan Carpenter — October 23, 2010 @ 11:08 am
@Dan which part should have? Everything was syntactically correct except for the part where ->data wasn’t initialized.
Comment by kees — October 23, 2010 @ 1:56 pm
up is declared with the __user attribute.
So we’re not allowed to dereference it when we do the “up->datasize”. Calling access_ok() doesn’t mean we can dereference it, it just means that we use __get_user() instead of get_user(). User memory can be in a different address space or it can be swapped out.
Btw up->loadwhat is an array so that doesn’t actually dereference “up”.
Of course, the datasize isn’t capped as well as you point out. It seems like someone could write a Smatch script to detect these places automatically. Stuff like:
get_user(size, &user_ptr);
<– no cap on size here.
copy_from_user(dest, src, size);
I'll poke at this on Monday.
Comment by Dan Carpenter — October 23, 2010 @ 8:57 pm
Actually, I just read more thouroughly and that’s not what you were pointing out at all. :P Ha ha. It’s amazing this crap works at all.
Comment by Dan Carpenter — October 23, 2010 @ 10:00 pm
“Next, we just repeatedly call the setup ioctl in an attempt to get incidental kernel work out of the way so that our last ioctl doing the stack preparation will stick, and then we call the buggy ioctl to trigger the vulnerability.”
Can you give me an example when stack preparation will not stick?
I wrote my own exploit for this vulnerability and I call only one time the setup ioctl and it always works..
Comment by madara — August 7, 2011 @ 7:03 am
@madara I didn’t try it with just 1 stack prep, but I figured a bunch wouldn’t hurt. :)
Comment by kees — August 8, 2011 @ 10:32 am