2. WHAT DO I GET?
• Autoscaled Distributed Automation(Selenium Grid / AWS)
• DA will phenomenally shorten the UI automation run time
• Faster feedback cycle
• Fewer Jenkins jobs to run automation, instead of few
hundreds
• Cost effective and reliable
• Enables Continuous Integration / Continuous
Deployment
2
3. AGENDA
• Setting up
• Making the Grid stable
• Grid topologies
• Cost saving
• Reducing UI Tests
• Reporting / Dashboard
3
6. PROBLEM DESCRIPTION
• Hundreds of Jenkins jobs to run all the tests
• Not having a system to run vast amount of UI
automation reliably, fast and scalable in a cost
effective way is a blocker for CI / CD
• No intelligent automation report to narrow down
failures quickly!
6
7. SOLUTION
• To be able to run all UI automation
scenarios within the time taken by the
slowest test case
• Cost effective, auto scalable and reliable
• Teams focussing on automation
• Note: This is not about cross browser test coverage rather using grid for
parallel test execution
7
10. SETTING UP
• Cucumber allows to run a scenario with the following
syntax
• sample_featurefile.feature:12
• For Scenario Outline, the line number would be
that of the line from the example table
line no 12 Scenario: eat 5 out of 12
13 Given there are 12 cucumbers
14 When I eat 5 cucumbers
15 Then I should have 7 cucumbers
10
CUCUMBER SCENARIO GENERATION
13. SETTING UP
• c3.4xlarge (16 cpu / 30 GB RAM / High BW)
• Node should have high network bandwidth but low
CPU / Memory is fine
• Running SeleniumGridScaler jar, which will act as
the hub that can autoscale
• https://github.com/mhardin/SeleniumGridScaler
13
SELENIUM GRID HUB SETUP
14. SETTING UP
• Open Source
• Acts as an intelligent hub
• Auto scales grid nodes depending on the number
of tests
• Terminates nodes when not in use
• Adhoc launch of new nodes is also possible
• Talks to AWS using EC2
14
SELENIUMGRIDSCALER - HUB
15. • c3.xlarge
• Capable of running maximum 24 Firefox
• Number of Chrome that can be run is lesser
• Node created out of AMI has bootstrap code to
help attach to the hub
15
SETTING UP
SELENIUM GRID NODE SETUP
16. SETTING UP
• Node has bootstrap code to start and attach to
hub
• Either you have to get the node AMI or create an
AWS instance, bootstrap it,create an AMI out of it
and refer it in the Hub config.
• Hub creates the node based on a config:
AMI ID, subnet, security group, node type,etc.
16
SELENIUMGRIDSCALER - NODE
17. SELENIUM NODE BOOTSTRAP
CODE
[root@ip-10-2-12-167 ~]# more /home/grid/grid/grid_start_node.sh
#!/bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
cd /home/grid/grid
export EC2_INSTANCE_ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id || die "wget instance-
id has failed: $?"`"
# Pull down the user data, which will be a zip file containing necessary information
export NODE_TEMPLATE="/home/grid/grid/nodeConfigTemplate.json"
curl http://169.254.169.254/latest/user-data -o /home/grid/grid/data.zip
# Now, unzip the data downloaded from the userdata
unzip -o /home/grid/grid/data.zip -d /home/grid/ubuntu/grid
# Replace the instance ID in the node config file
sed "s/<INSTANCE_ID>/$EC2_INSTANCE_ID/g" $NODE_TEMPLATE > /home/grid/grid/nodeConfig.json
# Finally, run the java process in a window so browsers can run
xvfb-run --auto-servernum --server-args='-screen 0, 1600x1200x24' java -jar /home/grid/grid/selenium-server-
node.jar -role node -nodeConfig /home/grid/grid/nodeConfig.json -
Dwebdriver.chrome.driver="/home/grid/grid/chromedriver" -log /home/grid/grid/grid.log &
17
18. MAKING THE GRID STABLE
• Timeouts in json config
• “timeout”:240000 (ms)
• “browserTimeout”:390000 (ms)
• browserTimeout has to be bigger than ‘timeout’
and ‘webDriver’ timeout
• browserTimeout is specified in secs in command
line
TIMEOUTS
18
19. • If browser instance hangs (for any reason what so ever), it will take
3hrs (http client socket timeout) for the particular slot to become free.
• This timeouts the Jenkins job
• Solution:
• Fix the particular test scenario causing this issue
• Add a cronjob to kill any browser instances that is running for more
than 10mins.
• Make this as part of your Chef knife plugin
• Ref: selenium repo, PR: 227 / 285
MAKING THE GRID STABLE
TIMEOUTS
19
20. • Grid setup should be in the same AWS subnet
• Using multiple subnets will result in lots of
FORWARDING_TO_NODE_FAILED errors
MAKING THE GRID STABLE
AWS - SUBNET
20
21. • Subnet you are using should have enough free IP
addresses
• It will be a blocker for autoscaling the grid nodes
MAKING THE GRID STABLE
AWS - IP ADDRESS
21
22. • The webDriver object creation consumes bandwidth
in the range of 6Gbits/5min in the Hub for 250+ tests
in parallel
MAKING THE GRID STABLE
AWS - HUB BANDWIDTH
c3.4xlarge
bandwidth is “High”
22
23. • Fine tune your
• -Xms
• -Xmx
• -DPOOL_MAX
MAKING THE GRID STABLE
AWS - HUB / NODE MEMORY
23
24. • HUB becomes unstable after running thousands of
tests
• Automate restarting of Hub after every 2000+ tests
MAKING THE GRID STABLE
AWS - RESTARTING HUB
24
25. • Jenkins executor which would be running hundreds of
tests in parallel, needs to have enough CPU power.
MAKING THE GRID STABLE
AWS - JENKINS EXECUTOR CPU
c3.8xlarge when running 250+ tests in parallel
25
26. • Don’t rely too much on Selenium Grid’s queuing
policy
• If your average test execution time is greater than
webDriver timeout, tests will timeout at webDriver
creation itself
MAKING THE GRID STABLE
HUB QUEUING POLICY
26
27. • Update browsers in the node and create a new node
AMI
• Necessary browser settings:
MAKING THE GRID STABLE
UPDATE BROWSERS
27
profile =Selenium::WebDriver::Firefox::Profile.new
profile['app.update.auto'] = false
profile['app.update.enabled'] = false
profile['app.update.service.enabled'] = false
profile['dom.max_script_run_time'] = 60
profile['dom.max_chrome_script_run_time'] = 60
profile['focusmanager.testmode']=true
profile['accept_untrusted_certs']=true
profile['assume_untrusted_certificate_issuer'] = false
29. GRID TOPOLOGIES
• Decide what you want before selecting the topology to be cost efficient!
• I want to release code to production ..
1. Every CL (change list)
2. Once a day
3. Once a week
4. When ever I want (on demand!)
• Based on the above answers, Do I want to run all UI automation for
5. Every CL ?
6. Every 2 hours
7. Four times a day
8. Once a week
29
30. GRID TOPOLOGY - 1
HUB
• parallel execution for small projects
• 1 executor - 1 hub - 14 nodes
• eg: c3.8xlarge can execute 250*+ tests in parallel
• Test run would finish in ~5mins
c3.8xlarge
c3.4xlarge
c3.xlarge
30
….
31. GRID TOPOLOGY - 2
HUB
• Suitable for medium size projects (500+ tests)
• Adding one more executor (2 executors 1 hub
and 28 node),this could double your parallel
execution cases, still taking only ~5mins
c3.8xlarge
c3.8xlarge
c3.xlarge
31
….
….
32. GRID TOPOLOGY - 3
HUB
• Takes 2x times as previous topology, but half the
cost! (1 executor - 1 hub - 14 nodes)
• Suitable for medium size projects
• Test run would finish in ~10mins
c3.8xlarge
c3.xlargejob runs sequentially
32
….
c3.4xlarge
33. GRID TOPOLOGY
HUB
• One more job? Probably NOT as HUB network traffic would
make it unstable especially during webDriver creation
• c3.8xlarge network bandwidth limit is 10Gbit
c3.8xlarge
c3.8xlarge
c3.xlarge
33
….
….
34. GRID TOPOLOGY - 4
HUB
HUB
• Use two hubs to double
the tests (1000+)
• But speed is same as
topology 2 (~5mins)
• Double the cost
c3.8xlarge
c3.xlarge
34
c3.8xlarge
c3.8xlarge
36. OPTIMAL USE OF GRID NODES
• Running 250+ tests on a grid setup with 250 slots will
take around 5mins
• Nodes are idling for the remaining 55mins of time
which is already billed by AWS
• Even during the 5mins of run, only very minority of the
tests takes around 4mins and majority of the test
complete in less than 1 min
36
COST SAVING
38. • On a c3.8xlarge 250 tests can be run at one go
before all 32 CPU reach 100%
• Start 250 cases
• Then between every ~50 seconds, start 100 tests in
batch, repeat this until all tests are executed
• Fine tune the delay according to your observation
38
BATCH PROCESSING
COST SAVING
39. GRID TOPOLOGY - BATCH PROCESSING
HUB
• Cost saving topology 1 executor - 1 hub - 16 nodes
• Can run any number of tests
• Can run 5500 UI tests within ~1hr 40min
job runs sequentially
c3.8xlarge c3.xlarge
39
COST SAVING
c3.4xlarge
41. COMPARING AWS COST VS DATA CENTRE
• 1 Medium box (~$8000 / per month)
• 1 Large box (~$10000 / per month)
• 1 VM (~$2000 / per month)
• Total AWS cost for 2 Batch Processing Topologies
• ~$1300 / month (fully autoscaled and runs 9000+
UI test)
• Frequency: 9-11 times a day
41
COST SAVING
42. AUTOSCALING OF GRID NODES
• SeleniumGridScaler autoscales the grid nodes
• Nodes are created on demand (autoscaled)
• It creates only required number of nodes for
running the available tests
• Nodes are terminated in an optimal way
• Autoscaling is cheaper than stop/starting the
nodes (outside of SeleniumGridScaler plugin)
42
COST SAVING
46. REDUCING UI TESTS
• Create more unit / integration tests
• Categorize test cases appropriately
• Each test should focus only on one use case
46
47. REPORTING / DASHBOARD
• All automaton results are stored in MongoDB
• html/json report / failure screenshots, splunk query,
failure status,etc
• Nodejs / Express / Hightchart based dashboard for
viewing
• RSS feed for every projects so teams can subscribe to
them. Feed has html report / screenshot / war_file
version / splunk query
• HipChat notification
47
53. FEW WORDS
• Few differences in Expedia specific SeleniumGrid
Scaler
• https://github.com/ambirag/SeleniumGridScaler,
branch: SeleniumGridScalerExp
• Dockerised!
53