If you’re running a 64bit system, and you’ve got users with access to a video device (/dev/video*
), then be sure you update your kernels for CVE-2010-2963. I’ve been slowly making my way through auditing the many uses in the Linux kernel of the copy_from_user()
function, and ran into this vulnerability.
Here’s the kernel code from drivers/media/video/v4l2-compat-ioctl32.c
:
static int get_microcode32(struct video_code *kp, struct video_code32 __user *up)
{
if (!access_ok(VERIFY_READ, up, sizeof(struct video_code32)) ||
copy_from_user(kp->loadwhat, up->loadwhat, sizeof(up->loadwhat)) ||
get_user(kp->datasize, &up->datasize) ||
copy_from_user(kp->data, up->data, up->datasize))
return -EFAULT;
return 0;
}
Note that kp->data
is being used as the target for up->data
in the final copy_from_user()
without actually verifying that kp->data
is pointing anywhere safe. Here’s the caller of get_microcode32
:
static long do_video_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
union {
struct video_tuner vt;
struct video_code vc;
...
} karg;
void __user *up = compat_ptr(arg);
...
switch (cmd) {
...
case VIDIOCSMICROCODE:
err = get_microcode32(&karg.vc, up);
...
So, the contents of up
are totally under control of the caller, and the contents of karg
(in our case, the video_code
structure) are not initialized at all. So, it seems like a call for VIDIOCSMICROCODE
would write video_code->datasize
bytes from video_code->data
into some random kernel address, just causing an Oops, since we don’t control what is on the kernel’s stack.
But wait, who says we can’t control the contents of the kernel’s stack? In fact, this compat function makes it extremely easy. Let’s look back at the union. Notice the struct video_tuner
? That gets populated from the caller’s up
memory via this case of the switch (cmd)
statement:
...
case VIDIOCSTUNER:
case VIDIOCGTUNER:
err = get_video_tuner32(&karg.vt, up);
...
So, to control the kernel stack, we just need to call this ioctl
twice in a row: once to populate the stack via VIDIOCSTUNER
with the contents we want (including the future address for video_code->data
, which starts at the same location as video_tuner->name[20]
), and then again with VIDIOCSMICROCODE
.
Tricks involved here are: the definition of the VIDIOCSMICROCODE
case in the kernel is wrong, and calling the ioctl
s without any preparation can trigger other kernel work (memory faults, etc) that may destroy the stack contents. First, we need the real value for the desired case statement. This turns out to be 0x4020761b. Next, we just repeatedly call the setup ioctl
in an attempt to get incidental kernel work out of the way so that our last ioctl
doing the stack preparation will stick, and then we call the buggy ioctl
to trigger the vulnerability.
Since the ioctl already does a multi-byte copy, we can now copy arbitrary lengths of bytes into kernel memory. One method of turning an arbitrary kernel memory write into a privilege escalation is to overwrite a kernel function pointer, and trigger that function. Based on the exploit for CVE-2010-3081, I opted to overwrite the security_ops
function pointer table. Their use of msg_queue_msgctl
wasn’t very good for the general case since it’s near the end of the table and its offset would depend on kernel versions. Initially I opted for getcap
, but in the end used ptrace_traceme
, both of which are very near the top the security_ops
structure. (Though I need share credit here with Dan Rosenberg as we were working together on improving the reliability of the security_ops
overwrite method. He used the same approach for his excellent RDS exploit.)
Here are the steps for one way of taking an arbitrary kernel memory write and turning it into a root escalation:
- overwrite
security_ops
with default_security_ops
, which will revert the LSM back to the capabilities-only security operations. This, however, means we can calculate where cap_ptrace_traceme
is.
- overwrite
default_security_ops->ptrace_traceme
to point to our supplied function that will actually perform the privilege escalation (thanks to Brad Spengler for his code from Enlightenment).
- trigger the function (in this case, call
ptrace(PTRACE_TRACEME, 0, NULL, NULL)
).
- restore
default_security_ops->ptrace_traceme
to point to cap_ptrace_traceme
so the next caller doesn’t Oops the system (since userspace memory will be remapped).
Here’s the source for Vyakarana as seen running in Enlightenment using cap_getcap
(which is pretty unstable, so you might want to switch it to use ptrace_traceme
), and as a stand-alone memory writer.
Conclusions: Keep auditing the kernel for more arbitrary writes; I think there are still many left. Reduce the exploitation surface within the kernel itself (which PaX and grsecurity have been doing for a while now), specifically:
- Block userspace memory access while in kernel mode. This would stop the ability to make the kernel start executing functions that live in userspace — a clear privilege violation. This protection would stop the current exploit above, but the exploit could be adjusted to use kernel memory instead.
- Keep function pointers read-only. There is no reason for these function pointer tables (
fops
, IDT
, security_ops
, etc) to be writable. These should all be marked correctly, with inline code exceptions being made for updating the global pointers to those tables, leaving the pointer read-only after it gets set. This would stop this particular exploit above, but there are still plenty more targets.
- Randomize the kernel stack location on a per-syscall basis. This will stop exploits that depend on a stable kernel stack location (as this exploit does).
© 2010, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.