Distributed automation selcamp2016
- 2. WHAT DO I GET?
• Autoscaled Distributed Automation(Selenium Grid / AWS)
• DA will phenomenally shorten the UI automation run time
• Faster feedback cycle
• Fewer Jenkins jobs to run automation, instead of few
hundreds
• Cost effective and reliable
• Enables Continuous Integration / Continuous
Deployment
2
- 3. AGENDA
• Setting up
• Making the Grid stable
• Grid topologies
• Cost saving
• Reducing UI Tests
• Reporting / Dashboard
3
- 6. PROBLEM DESCRIPTION
• Hundreds of Jenkins jobs to run all the tests
• Not having a system to run vast amount of UI
automation reliably, fast and scalable in a cost
effective way is a blocker for CI / CD
• No intelligent automation report to narrow down
failures quickly!
6
- 7. SOLUTION
• To be able to run all UI automation
scenarios within the time taken by the
longest test case
• Cost effective, scalable and reliable
• Teams focussing on automation
• Note: This is not about cross browser test coverage rather using grid for
parallel test execution
7
- 10. SETTING UP
• Cucumber allows to run a scenario with the following
syntax
• sample_featurefile.feature:12
• For Scenario Outline, the line number would be
that of the line from the example table
line no 12 Scenario: eat 5 out of 12
13 Given there are 12 cucumbers
14 When I eat 5 cucumbers
15 Then I should have 7 cucumbers
10
CUCUMBER SCENARIO GENERATION
- 12. SETTING UP
• c3.4xlarge (16 cpu / 30 GB RAM / High BW)
• Node should have high network bandwidth but low
CPU / Memory is fine
• Running SeleniumGridScaler jar, which will act as
the hub that can autoscale
• https://github.com/mhardin/SeleniumGridScaler
12
SELENIUM GRID HUB SETUP
- 13. • c3.xlarge
• Capable of running maximum 24 Firefox
• Number of Chrome that can be run is lesser
• Node created out of AMI has bootstrap code to
help attach to the hub
13
SETTING UP
SELENIUM GRID NODE SETUP
- 14. MAKING THE GRID STABLE
• Timeouts in json config
• “timeout”:240000 (ms)
• “browserTimeout”:390000 (ms)
• browserTimeout has to be bigger than ‘timeout’
and ‘webDriver’ timeout
• browserTimeout is specified in secs in command
line
TIMEOUTS
14
- 15. • If browser instance hangs (for any reason what so ever), it will take
3hrs (http client socket timeout) for the particular slot to become free.
• This timeouts the Jenkins job
• Solution:
• Fix the particular test scenario causing this issue
• Add a cronjob to kill any browser instances that is running for more
than 10mins.
• Make this as part of your Chef knife plugin
• Ref: selenium repo, PR: 227 / 285
MAKING THE GRID STABLE
TIMEOUTS
15
- 16. • Grid setup should be in the same AWS subnet
• Using multiple subnets will result in lots of
FORWARDING_TO_NODE_FAILED errors
MAKING THE GRID STABLE
AWS - SUBNET
16
- 17. • Subnet you are using should have enough free IP
addresses
• It will be a blocker for autoscaling the grid nodes
MAKING THE GRID STABLE
AWS - IP ADDRESS
17
- 18. • The webDriver object creation consumes bandwidth
in the range of 6Gbits/5min in the Hub for 250+ tests
in parallel
MAKING THE GRID STABLE
AWS - HUB BANDWIDTH
c3.4xlarge
bandwidth is “High”
18
- 19. • Fine tune your
• -Xms
• -Xmx
• -DPOOL_MAX
MAKING THE GRID STABLE
AWS - HUB / NODE MEMORY
19
- 20. • HUB becomes unstable after running thousands of
tests
• Automate restarting of Hub after every 2000+ tests
MAKING THE GRID STABLE
AWS - RESTARTING HUB
20
- 21. • Jenkins executor which would be running hundreds of
tests in parallel, needs to have enough CPU power.
MAKING THE GRID STABLE
AWS - JENKINS EXECUTOR CPU
c3.8xlarge when running 250+ tests in parallel
21
- 22. • Don’t rely too much on Selenium Grid’s queuing
policy
• If your average test execution time is greater than
webDriver timeout, tests will timeout at webDriver
creation itself
MAKING THE GRID STABLE
HUB QUEUING POLICY
22
- 23. • Update browsers to latest at least every 3 months
• Necessary browser settings:
MAKING THE GRID STABLE
UPDATE BROWSERS
23
profile =Selenium::WebDriver::Firefox::Profile.new
profile['app.update.auto'] = false
profile['app.update.enabled'] = false
profile['app.update.service.enabled'] = false
profile['dom.max_script_run_time'] = 60
profile['dom.max_chrome_script_run_time'] = 60
profile['focusmanager.testmode']=true
profile['accept_untrusted_certs']=true
profile['assume_untrusted_certificate_issuer'] = false
- 25. GRID TOPOLOGIES
• Decide what you want before selecting the topology to be cost efficient!
• I want to release code to production ..
1. Every CL (change list)
2. Once a day
3. Once a week
4. When ever I want (on demand!)
• Based on the above answers, Do I want to run all UI automation for
5. Every CL ?
6. Every 2 hours
7. Four times a day
8. Once a week
25
- 26. GRID TOPOLOGY - 1
HUB
• parallel execution for small projects
• 1 executor - 1 hub - 14 nodes
• eg: c3.8xlarge can execute 250*+ tests in parallel
• Test run would finish in ~5mins
c3.8xlarge
c3.4xlarge
c3.xlarge
26
….
- 27. GRID TOPOLOGY - 2
HUB
• Suitable for medium size projects (500+ tests)
• Adding one more executor (2 executors 1 hub
and 28 node),this could double your parallel
execution cases, still taking only ~5mins
c3.8xlarge
c3.8xlarge
c3.xlarge
27
….
….
- 28. GRID TOPOLOGY - 3
HUB
• Takes 2x times as previous topology, but half the
cost! (1 executor - 1 hub - 14 nodes)
• Suitable for medium size projects
• Test run would finish in ~10mins
c3.8xlarge
c3.xlargejob runs sequentially
28
….
c3.4xlarge
- 29. GRID TOPOLOGY
HUB
• One more job? Probably NOT as HUB network traffic would
make it unstable especially during webDriver creation
• c3.8xlarge network bandwidth limit is 10Gbit
c3.8xlarge
c3.8xlarge
c3.xlarge
29
….
….
- 30. GRID TOPOLOGY - 4
HUB
HUB
• Use two hubs to double
the tests (1000+)
• But speed is same as
topology 2 (~5mins)
• Double the cost
c3.8xlarge
c3.xlarge
30
c3.8xlarge
c3.8xlarge
- 32. OPTIMAL USE OF GRID NODES
• Running 250+ tests on a grid setup with 250 slots will
take around 5mins
• Nodes are idling for the remaining 55mins of time
which is already billed by AWS
• Even during the 5mins of run, only very minority of the
tests takes around 4mins and majority of the test
complete in less than 1 min
32
COST SAVING
- 34. • On a c3.8xlarge 250 tests can be run at one go
before all 32 CPU reach 100%
• Start 250 cases
• Then between every 50 seconds, start 100 tests in
batch, repeat this until all tests are executed
• Fine tune the delay according to your observation
34
BATCH PROCESSING
COST SAVING
- 35. GRID TOPOLOGY - BATCH PROCESSING
HUB
• Cost saving topology 1 executor - 1 hub - 14 nodes
• Can run any number of tests
• Can run 5500 UI tests within ~1hr 40min
job runs sequentially
c3.8xlarge c3.xlarge
35
COST SAVING
c3.4xlarge
- 37. COMPARING AWS COST VS DATA CENTRE
• 1 Medium box (~$8000 / per month)
• 1 Large box (~$10000 / per month)
• 1 VM (~$2000 / per month)
• Total AWS cost for 2 Batch Processing Topologies
• ~$1300 / month (fully autoscaled and runs 9000+
UI test)
• Frequency: 9-11 times a day
37
COST SAVING
- 38. AUTOSCALING OF GRID NODES
• SeleniumGridScaler autoscales the grid nodes
• It creates AWS nodes on demand based on a
configuration file and the number of tests to run
• It also acts as the hub
• nodes are created from a preconfigured AMI
38
COST SAVING
- 42. REDUCING UI TESTS
• Create more unit / integration tests
• Categorise test cases appropriately
• Each test should focus only on one use case
42
- 43. REPORTING / DASHBOARD
• All automaton results are stored in MongoDB
• cucumber html/json report / failure screenshots,
splunk query, failure status,etc
• Nodejs / Express / Hightchart based dashboard for
viewing
• RSS feed for every projects so teams can subscribe
to them. Feed has html report / screenshot / war_file
version / splunk query
43