Reliable Selenium Testing in Python - a work in progress

Friday, January 8

Selenium tests are notoriously annoying and unstable. There are just so many layers that affect the state of the system when booting up the full application and browser. In general, I try to make tests at the lowest, simplest level possible. If something can be tested without loading any Django code, like a utility function, test it independently. Or if something can be tested on the model, test it there. Otherwise, make a view test. At the final step of this outward spiral, you can instantiate the entire application and automate a web browser to emulate a user visiting the site, clicking links and filling out form fields. We use Selenium to automate web browsers for testing.

At this point, I do think Selenium-style tests are useful and essential. We've made giant refactors and upgrades to Mediathread's JavaScript code that would be tedious and intimidating without the intricate suite of Lettuce tests we have set up.

There's been some recent discussion on the problem of Selenium-style testing. Selenium: 7 Things You Need To Know provides some tips on what you can do to reduce flakiness and streamline the process of writing Selenium tests. Last November there was the Google Tech Talk Your Tests Aren't Flaky, proposing the idea that these "flaky" Selenium tests are really just symptoms of a flaky application, and should be considered bugs.

In my experience, in general, flaky Selenium tests mean the tests are flaky, not the application. But I appreciate the ideas of the "Your Tests Aren't Flaky" talk, and in that spirit I've made some attempts to fix a flaky test in WORTH. My first step was to look at the tools we were using with our behave/selenium setup for Django. I realized splinter's role was not essential: these programmatic browser commands could just as well be called directly from Selenium. It was only adding complexity to an already complex system. After refactoring behave to use Selenium's API instead of splinter's, I just added a bunch of extra-cautious code to WORTH's steps/common.py. I was able to get this working reliably (so far) without using any arbitrary calls to time.sleep() after reading through the Selenium Python documentation, and also some unofficial Selenium Python documentation that turned out to be really useful.

I've managed to make the selenium tests more reliable by implementing more complicated code. That's not all that satisfying, so this is still a work in progress. At the moment I'm still having trouble converting the UELC tests from splinter to Selenium in the same way.