Testing Web Applications with Selenium - Tips and Tricks

Alexander Todorov / Monday, March 17, 2014

When it comes to testing web apps, there are usually two main types of tests that we write – unit tests, which cover very specific parts of our JavaScript / server-side logic, and scenario UI tests, which cover everything and involve simulating real end user interactions.

The nature of web apps is such that we cannot cover everything with unit tests, and we rely a lot on our scenario UI tests to catch regressions and ensure quality. Over the past couple of years, we’ve tried many different frameworks which provide an API for real-world simulation/interactions using mouse & keyboard, but Selenium has really become the de-facto standard when it comes to testing UIs. It’s worth mentioning that the new WebDriver Selenium 2 API is what everyone is using (or should use) – I do not recommend relying on Selenium 1 (RC) for your tests. There are different implementations of the API for multiple platforms and browsers – including a JavaScript and PhantomJS implementations, so it makes it a very flexible and powerful testing framework. We have thousands of Selenium tests that execute with every build, and produce stable and reliable test outcomes.

In this blog post, I’d like to list some tips and tricks from the trenches that could help you save some time and make your UI tests a bit more reliable and robust. I’m going to use the .NET (C#) WebDriver API for my examples, but the same API calls should be easily applicable / translated to any other platform. 

1. Using waits

In a rich web app things usually happen asynchronously, that’s how a Hi-Fi UI functions - it’s not just about animations, but also about the dynamic nature of your screens – whenever you move your mouse, click on inputs, and perform drag and drop interactions, there is always some js code that is being executed, and in a lot of cases remote XHR requests are made, and the page uses setTimeout/setInterval/requestAnimationFrame in order to update the DOM.

The most obvious way of “waiting” until some processing is done is to use Thread.Sleep() in your Selenium Test code. This has a couple of major drawbacks – first, it is always guaranteed to cause the thread to sleep for X milliseconds. Second, it is not guaranteed that the processing will be finished in X ms, so that your test code can execute safely (for instance you may want to check if a dynamic element is present on the page, etc.). WebDriver has a great solution for that which basically boils down to using the WebDriverWait class. What WebDriverWait does under the hood is to poll your query every 500ms, and return once the wait condition is satisfied. This is called an “explicit” wait, and you can also put a max threshold in order to avoid having locks and infinite waiting for something that will never happen. An example of using WebDriverWait to check that there is no loading indicator visible on the page:

WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.Until((d) =>
    {
        return (bool)((RemoteWebDriver)d).ExecuteScript("return $('#" + id + "_container_loading').is(':visible') !== true;");
    });

Another way of avoiding Thread.Sleep() when finding elements is to use Implicit Waits, which means that whenever you invoke FindElementBy or just FindElement(..), it will implicitly wait / poll until the element is found. It will return null if the maximum wait time passes. 

Note that implicit waits need to be explicitly configured; otherwise the default max poll time is 0. In order to configure implicit waits, you can call this once in your setup initialization code:

driver.Manage().Timeouts().ImplicitlyWait(new TimeSpan(0, 0, 10));

Then whenever you do things like:

IWebElement element = driver.FindElementByCssSelector(cssSelector);

The element specified by “cssSelector” doesn’t have to exist immediately in the DOM.

2. Locating elements

As demonstrated in 1. , there are several ways to find elements on the page. One thing to always keep in mind when doing this is that from the time you assign your IWebElement reference, to the time you use it somewhere, it may already have a different location in the DOM, or be actually removed/recreated in some way. If this happens, you will get the nasty StaleElementReferenceException from Selenium. Example:

IWebElement container = driver.FindElementByCssSelector(containerSelector);
// some animation happens which attaches container to a different parent in the DOM
container.DoSomething(); // StaleElementReferenceException

The following link explains in detail the various reasons which may lead to this situation:

http://docs.seleniumhq.org/exceptions/stale_element_reference.jsp

(The code is Java specific but it basically has the same or similar naming it C#).

In order to avoid StaleElementReferenceExceptions, you may simply re-query and reassign the element reference, in case there is a scenario which may alter the state of the DOM for particular element references. 

3. Sending keyboard input

There are basically two main ways you can send text to input elements with Selenium. One of them is to just set the “value” property of editable elements by executing JavaScript using ExecuteScript() / ExecuteAsyncScript(). The second one is to use the SendKeys() API and really simulate user key presses. The difference is actually huge. The first approach may be more suitable if you just want to fill in some fields and don’t care about DOM events being fired and real-time character-by-character validation taking place. For instance you may just want to fill in a form and submit it. The second approach is much closer to what a real end user would do – first focus the field by pressing TAB or clicking on it, then send each letter one by one to the input field. It “gives” your app the chance to fire DOM events, perform js validation by handling those events, etc. Also, you can either send the complete value as a string using SendKeys(), or you can also send it character by character, and have some wait time between that – depending on what works best for your scenario. Here is a SendKeys() example: 

editor.Click();
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.Until((d) => { return (bool)driver.ExecuteScript("return $('#" + id + "').igCombo('hasFocus')"); });
Actions builder = new Actions(driver);
builder.SendKeys(text).Perform();

Note that here we’re also having some custom code that waits until our widget receives focus, before we try to send any text to the input. For the case of a simple input element, you won’t need this, but in case there is a “delay” in the way you handle focus/blur, such checks come in handy. 

4. Executing JavaScript code

There are two main ways to execute JavaScript code in the context of your WebDriver test code. The first one is using the synchronous ExecuteScript(), which means that the next “line” of test code won’t be reached and executed until the script execution returns. The second way is using ExecuteAsyncScript(), which won’t block your test code. Depending on the scenario, you may be better off using ExecuteScript(), especially when your test logic which follows the JS execution depends on it. The return type of ExecuteScript may vary depending on what your script actually returns. Selenium tries to do some good conversions where it can, for instance if your script returns a list of DOM elements, it will be returned as a ReadOnlyCollection<IWebElement>; if it returns boolean true/false values, it will be the true/false C# equivalents, and so on.

It’s important to keep in mind that if you expect your ExecuteScript() call to return some value that you want to use, you should always have a “return ;” pattern, or you should call a function that actually returns something. 

5. Doing Drag & Drop Interactions

The most convenient and robust way to perform drag and drop interaction using WebDriver is to use its Action Builder API, and the “DragAndDrop” and “DragAndDropToOffset” methods. The first one accepts a source and a target element as parameters, while the second one moves the source element by an X and Y offsets. Here are two examples:

Actions actions = new Actions(driver);
actions.DragAndDrop(header, groupByArea).Release().Perform();
builder.DragAndDropToOffset(colHeader, offsetX, offsetY).Release().Perform();

You can also do a more low-level drag and drop implementation by executing all the mouse actions that are taking place one by one, again using the Actions class. For instance you can chain calls to ClickAndHold(), MoveByOffset(),  and Release(). But for 99 % of the cases, DragAndDrop/ DragAndDropToOffset should work perfectly fine. 

6. Switching between windows and iframes

The easiest way to change to a particular iframe/window is to execute:

driver.SwitchTo().Frame(frameElement); //IWebElement

Or (for window): 

driver.SwitchTo().Window(handle); // where handle is a member of driver.WindowHandles

And then to switch back to the main window in the case of iframes:

driver.SwitchTo().DefaultContent();

Note that after you change the window or iframe, any method that you call on the driver after that will be executed in the new context, until you switch to a different iframe / window. 

7. Using drivers for multiple browsers

When you use the driver API, you should always refer to it using the RemoteWebDriver class, regardless of which browser you use to run your tests. A good practice is to describe your environment in a text/xml file – such as the browser(s) you’ll be using for tests – then parse this and return the correct driver instance (for example ChromeDriver, or InternetExplorerDriver). This will make your tests independent of a particular browser driver class. Example:

switch (browser)
{
	case Browser.Chrome:	
		return new ChromeDriver(chromeOptions);
		break;
	case Browser.InternetExplorer:
		return new InternetExplorerDriver(options);
	break;
	// and so on
}

8. Detecting if a particular framework is loaded

In most cases your web app will have at least a couple of javascript frameworks loaded, and the DOM initialization will depend on those frameworks, so usually you will need to wait for those to load before you can execute any actions on the page. This is quite obvious if, for example, you need to run some script which uses this framework’s API, in order to check for elements existence. Here is an example which shows how you can wait for jQuery to load before you do execute any WebDriver calls:

WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromMilliseconds(maxTimeout));
    try
    {
        wait.Until((d) =>
        {
            return (bool)((RemoteWebDriver)d).ExecuteScript("return typeof $ !== 'undefined';");
        });
    }
    catch (Exception e)
    {
        Assert.Fail("Couldn't load jQuery, probably there is an error in your test page: " + e.Message);
    }

9. Handling context menu interactions

There is a very easy way to perform right clicking on an element using WebDriver’s API:

Actions builder = new Actions(driver);
builder.MoveToElement(element).ContextClick(element).Perform();

You can additionally chain extra commands after the ContextClick, for instance clicking on a menu item, or doing KEY UP / KEY DOWN interactions, for instance:

builder.MoveToElement(element).ContextClick(element).SendKeys(Keys.ArrowDown).Perform();

On a side note, I’d like to clarify that it’s not mandatory to call Build() once you chain more than one action, because calling Perform() will call Build(). 

10. Cleaning up

When your test suite finishes, it needs to close the browser window(s) and free any resources that were used by the web driver instance. There are several methods that you can use in WebDriver’s API – Close(), Quit(), and Dispose(). The difference may not be that obvious, but basically Quit() will close all your browser windows, Close() will close the current one, while Dispose() basically calls Quit() and has the same effect. When there is only a single browser window opened, Close() will have the same behavior as Quit() / Dispose().

In a lot of cases, you need to run many tests and reuse the same browser window(s) before they are closed. Therefore, it may be more convenient to write some utility code or shell script that performs the cleanup externally (not even using WebDriver’s API).