December « 2013 « codeblog

December 20, 2013

DOM scraping

Filed under: Blogging,Debian,General,Ubuntu,Ubuntu-Server,Web — kees @ 11:16 pm

For a long time now I’ve used mechanize (via either Perl or Python) for doing website interaction automation. Stuff like playing web games, checking the weather, or reviewing my balance at the bank. However, as the use of javascript continues to increase, it’s getting harder and harder to screen-scrape without actually processing DOM events. To do that, really only browsers are doing the right thing, so getting attached to an actual browser DOM is generally the only way to do any kind of web interaction automation.

It seems the thing furthest along this path is Selenium. Initially, I spent some time trying to make it work with Firefox, but gave up. Instead, this seems to work nicely with Chrome via the Chrome WebDriver. And even better, all of this works out of the box on Ubuntu 13.10 via python-selenium and chromium-chromedriver.

Running /usr/lib/chromium-browser/chromedriver2_server from chromium-chromedriver starts a network listener on port 9515. This is the WebDriver API that Selenium can talk to. When requests are made, chromedriver2_server spawns Chrome, and all the interactions happen against that browser.

Since I prefer Python, I avoided the Java interfaces and focused on the Python bindings:

#!/usr/bin/env python
import sys
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys

caps = webdriver.DesiredCapabilities.CHROME

browser = webdriver.Remote("http://localhost:9515", caps)

browser.get("https://bank.example.com/")
assert "My Bank" in browser.title

try:
    elem = browser.find_element_by_name("userid")
    elem.send_keys("username")

    elem = browser.find_element_by_name("password")
    elem.send_keys("wheee my password" + Keys.RETURN)
except NoSuchElementException:
    print "Could not find login elements"
    sys.exit(1)

assert "Account Balances" in browser.title

xpath = "//div[text()='Balance']/../../td[2]/div[contains(text(),'$')]"
balance = browser.find_element_by_xpath(xpath).text

print balance

browser.close()

This would work pretty great, but if you need to save any state between sessions, you’ll want to be able to change where Chrome stores data (since by default in this configuration, it uses an empty temporary directory via --user-data-dir=). Happily, various things about the browser environment can be controlled, including the command line arguments. This is configurable by expanding the “desired capabilities” variable:

caps = webdriver.DesiredCapabilities.CHROME
caps["chromeOptions"] = {
        "args": ["--user-data-dir=/home/user/somewhere/to/store/your/session"],
    }

A great thing about this is that you get to actually watch the browser do its work. However, in cases where this interaction is going to be fully automated, you likely won’t have a Xorg session running, so you’ll need to wrap the WebDriver in one (since it launches Chrome). I used Xvfb for this:

#!/bin/bash
# Start WebDriver under fake X and wait for it to be listening
xvfb-run /usr/lib/chromium-browser/chromedriver2_server &
pid=$!
while ! nc -q0 -w0 localhost 9515; do
    sleep 1
done

the-chrome-script
rc=$?

# Shut down WebDriver
kill $pid

exit $rc

Alternatively, all of this could be done in the python script too, but I figured it’s easier to keep the support infrastructure separate from the actual test script itself. I actually leave the xvfb-run call external too, so it’s easier to debug the browser in my own X session.

One bug I encountered was that the WebDriver’s cache of the browser’s DOM can sometimes get out of sync with the actual browser’s DOM. I didn’t find a solution to this, but managed to work around it. I’m hoping later versions fix this. :)

Comments (2)

December 10, 2013

live patching the kernel

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 3:40 pm

A nice set of recent posts have done a great job detailing the remaining ways that a root user can get at kernel memory. Part of this is driven by the ideas behind UEFI Secure Boot, but they come from the same goal: making sure that the root user cannot directly subvert the running kernel. My perspective on this is toward making sure that an attacker who has gained access and then gained root privileges can’t continue to elevate their access and install invisible kernel rootkits.

An outline for possible attack vectors is spelled out by Matthew Gerrett’s continuing “useful kernel lockdown” patch series. The set of attacks was examined by Tyler Borland in “Bypassing modules_disabled security”. His post describes each vector in detail, and he ultimately chooses MSR writing as the way to write kernel memory (and shows an example of how to re-enable module loading). One thing not mentioned is that many distros have MSR access as a module, and it’s rarely loaded. If modules_disabled is already set, an attacker won’t be able to load the MSR module to begin with. However, the other general-purpose vector, kexec, is still available. To prove out this method, Matthew wrote a proof-of-concept for changing kernel memory via kexec.

Chrome OS is several steps ahead here, since it has hibernation disabled, MSR writing disabled, kexec disabled, modules verified, root filesystem read-only and verified, kernel verified, and firmware verified. But since not all my machines are Chrome OS, I wanted to look at some additional protections against kexec on general-purpose distro kernels that have CONFIG_KEXEC enabled, especially those without UEFI Secure Boot and Matthew’s lockdown patch series.

My goal was to disable kexec without needing to rebuild my entire kernel. For future kernels, I have proposed adding /proc/sys/kernel/kexec_disabled, a partner to the existing modules_disabled, that will one-way toggle kexec off. For existing kernels, things got more ugly.

What options do I have for patching a running kernel?

First I looked back at what I’d done in the past with fixing vulnerabilities with systemtap. This ends up being a rather heavy-duty way to go about things, since you need all the distro kernel debug symbols, etc. It does work, but has a significant problem: since it uses kprobes, a root user can just turn off the probes, reverting the changes. So that’s not going to work.

Next I looked at ksplice. The original upstream has gone away, but there is still some work being done by Jiri Slaby. However, even with his updates which fixed various build problems, there were still more, even when building a 3.2 kernel (Ubuntu 12.04 LTS). So that’s out too, which is too bad, since ksplice does exactly what I want: modifies the running kernel’s functions via a module.

So, finally, I decided to just do it by hand, and wrote a friendly kernel rootkit. Instead of dealing with flipping page table permissions on the normally-unwritable kernel code memory, I borrowed from PaX’s KERNEXEC feature, and just turn off write protect checking on the CPU briefly to make the changes. The return values for functions on x86_64 are stored in RAX, so I just need to stuff the kexec_load syscall with “mov -1, %rax; ret” (-1 is EPERM):

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static unsigned long long_target;
static char *target;
module_param_named(syscall, long_target, ulong, 0644);
MODULE_PARM_DESC(syscall, "Address of syscall");

/* mov $-1, %rax; ret */
unsigned const char bytes[] = { 0x48, 0xc7, 0xc0, 0xff, 0xff, 0xff, 0xff,
                                0xc3 };
unsigned char *orig;

/* Borrowed from PaX KERNEXEC */
static inline void disable_wp(void)
{
        unsigned long cr0;

        preempt_disable();
        barrier();
        cr0 = read_cr0();
        cr0 &= ~X86_CR0_WP;
        write_cr0(cr0);
}

static inline void enable_wp(void)
{
        unsigned long cr0;

        cr0 = read_cr0();
        cr0 |= X86_CR0_WP;
        write_cr0(cr0);
        barrier();
        preempt_enable_no_resched();
}

static int __init syscall_eperm_init(void)
{
        int i;
        target = (char *)long_target;

        if (target == NULL)
                return -EINVAL;

        /* save original */
        orig = kmalloc(sizeof(bytes), GFP_KERNEL);
        if (!orig)
                return -ENOMEM;
        for (i = 0; i < sizeof(bytes); i++) {
                orig[i] = target[i];
        }

        pr_info("writing %lu bytes at %p\n", sizeof(bytes), target);

        disable_wp();
        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = bytes[i];
        }
        enable_wp();

        return 0;
}
module_init(syscall_eperm_init);

static void __exit syscall_eperm_exit(void)
{
        int i;

        pr_info("restoring %lu bytes at %p\n", sizeof(bytes), target);

        disable_wp();
        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = orig[i];
        }
        enable_wp();

        kfree(orig);
}
module_exit(syscall_eperm_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kees Cook <kees@outflux.net>");
MODULE_DESCRIPTION("makes target syscall always return EPERM");

If I didn’t want to leave an obvious indication that the kernel had been manipulated, the module could be changed to:

not announce what it’s doing
remove the exit route to not restore the changes on module unload
error out at the end of the init function instead of staying resident

And with this in place, it’s just a matter of loading it with the address of sys_kexec_load (found via /proc/kallsyms) before I disable module loading via modprobe. Here’s my upstart script:

# modules-disable - disable modules after rc scripts are done
#
description "disable loading modules"

start on stopped module-init-tools and stopped rc

task
script
        cd /root/modules/syscall_eperm
        make clean
        make
        insmod ./syscall_eperm.ko \
                syscall=0x$(egrep ' T sys_kexec_load$' /proc/kallsyms | cut -d" " -f1)
        modprobe disable
end script

And now I’m safe from kexec before I have a kernel that contains /proc/sys/kernel/kexec_disabled.

Comments (7)

codeblog code is freedom — patching my itch

December 20, 2013

DOM scraping

December 10, 2013

live patching the kernel