For a long time now I’ve used mechanize
(via either Perl or Python) for doing website interaction automation. Stuff like playing web games, checking the weather, or reviewing my balance at the bank. However, as the use of javascript continues to increase, it’s getting harder and harder to screen-scrape without actually processing DOM events. To do that, really only browsers are doing the right thing, so getting attached to an actual browser DOM is generally the only way to do any kind of web interaction automation.
It seems the thing furthest along this path is Selenium. Initially, I spent some time trying to make it work with Firefox, but gave up. Instead, this seems to work nicely with Chrome via the Chrome WebDriver. And even better, all of this works out of the box on Ubuntu 13.10 via python-selenium
and chromium-chromedriver
.
Running /usr/lib/chromium-browser/chromedriver2_server
from chromium-chromedriver
starts a network listener on port 9515. This is the WebDriver API that Selenium can talk to. When requests are made, chromedriver2_server
spawns Chrome, and all the interactions happen against that browser.
Since I prefer Python, I avoided the Java interfaces and focused on the Python bindings:
#!/usr/bin/env python import sys from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from selenium.webdriver.common.keys import Keys caps = webdriver.DesiredCapabilities.CHROME browser = webdriver.Remote("http://localhost:9515", caps) browser.get("https://bank.example.com/") assert "My Bank" in browser.title try: elem = browser.find_element_by_name("userid") elem.send_keys("username") elem = browser.find_element_by_name("password") elem.send_keys("wheee my password" + Keys.RETURN) except NoSuchElementException: print "Could not find login elements" sys.exit(1) assert "Account Balances" in browser.title xpath = "//div[text()='Balance']/../../td[2]/div[contains(text(),'$')]" balance = browser.find_element_by_xpath(xpath).text print balance browser.close()
This would work pretty great, but if you need to save any state between sessions, you’ll want to be able to change where Chrome stores data (since by default in this configuration, it uses an empty temporary directory via --user-data-dir=
). Happily, various things about the browser environment can be controlled, including the command line arguments. This is configurable by expanding the “desired capabilities” variable:
caps = webdriver.DesiredCapabilities.CHROME caps["chromeOptions"] = { "args": ["--user-data-dir=/home/user/somewhere/to/store/your/session"], }
A great thing about this is that you get to actually watch the browser do its work. However, in cases where this interaction is going to be fully automated, you likely won’t have a Xorg session running, so you’ll need to wrap the WebDriver in one (since it launches Chrome). I used Xvfb for this:
#!/bin/bash # Start WebDriver under fake X and wait for it to be listening xvfb-run /usr/lib/chromium-browser/chromedriver2_server & pid=$! while ! nc -q0 -w0 localhost 9515; do sleep 1 done the-chrome-script rc=$? # Shut down WebDriver kill $pid exit $rc
Alternatively, all of this could be done in the python script too, but I figured it’s easier to keep the support infrastructure separate from the actual test script itself. I actually leave the xvfb-run
call external too, so it’s easier to debug the browser in my own X session.
One bug I encountered was that the WebDriver’s cache of the browser’s DOM can sometimes get out of sync with the actual browser’s DOM. I didn’t find a solution to this, but managed to work around it. I’m hoping later versions fix this. :)
© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.