SlideShare a Scribd company logo
Scalability and Performance
for eCommerce
Overview and Discussion
Jerry Lewis
VP IBM Practice, SysIQ
Overview
• My background
• Horror Show (not Harasho)
• About Scalability
• What is website performance?
• Why does performance matter?
• What causes bad performance?
• How do we test website performance?
• How do we achieve good performance?
• Questions and Answers
3
A little about me
Large Client Experience
(revenue > $1 Billion USD)
Horror Stories
Horror Stories
Reviews Tag Won’t Fire
• Cause: SSL certificate changed at 3rd party
• Impact
• JS Null Pointer Exception
• Add to cart button disappears
• $1,250,000 USD lost sales
Horror Stories
SinoMaximiser.js won’t load on e-com site
• Cause: SinoMaximizer went out of business
• Impact
• 3 minute delay in loading page
• Massive customer complaints
• Brand erosion
Horror Stories
OOM Crashes site randomly for months
• Cause: Gomez login script
• Impact
• 10,000+ addresses in one user
account
• Loading address book causes OOM
• Site crash
• $15,000 in lost sales per occurrence
• Customer complaints when site
crashes
Horror Stories
EAV Data Model grows wildly, crippling site
• Cause: Millions of records loaded every
week for 1 year without clean up
• Impact
• Slow site performance
• DB Corruption when trying to clean
when site was live
• Loss of customer data, order data
• $100,000 USD lost revenue
• Massive customer complaints
Horror Stories
Search for blank term causes OOM during
holiday rush
• Cause: db query without limit on
number of results
• Impact:
• OOM, site outage during holiday
• $10,000 USD lost revenue per
occurrence
• Massive customer complaints
• Shoppers going to other sites
Horror Stories
Credit Card processing bottleneck causes 48
hour delay in fulfilling orders
• Cause: 3 second single threaded
detokenization process with order
volume > 4,000 orders per hour
• Impact:
• $2,000,000 in lost revenue due to
unfulfilled orders
• Inventory synchronization issues
cause invalid orders to be placed
• Massive customer complaints
• Negative social media
Scalability
Varies by platform
Key Questions
• Can you add clusters of servers (Web / App)
• Are they centrally manageable
• Monitoring
• Deployment
• Operation
• Can you vertically scale
• Multiple JVMs per node
• What additional costs accrue to horizontal or vertical scaling
Scaling is EXPENSIVE
Scalability
• Get the most from your hardware
• Get the most from your team
• Little additional money required
4
a : the ability to perform : efficiency
b : the manner in which a mechanism
performs <engine performance>
per·for·mance
noun pə(r)-ˈfȯr-mən(t)s
Definition of PERFORMANCE
Ecommerce Website Performance:
The ability to effectively process transactions.
We measure performance a few ways:
• Response time - The home page loaded in 2 seconds
• Availability �� The site was available 99.999% of the time
• Throughput – page hits per second or orders per hour
• Error rate - the number of errors relative to total requests
• Capacity – maximum throughput that can be achieved with
acceptable response time and availability
By maximizing your performance:
• Get the most out of your infrastructure
• Minimize software license costs (WC = $180,000 USD Per
PROCESSOR)
Why Does it Matter?
Scalability and performance for e commerce
BECAUSE SLOW SITES HURT
What Causes Bad Performance?
How do we test?
Execute Tests
Monitor
Systems
Analyze
Results
Code / Config
Changes
Modify
Scripts
Define Performance Metrics (GOALS)
Define User Scenarios
Write Test Scripts
Mimic Prod Infrastructure
PLAN EXECUTE
PLAN
Image Courtesy of IBM Corp
PLAN
• DEFINE METRICS AND GOALS
– Define acceptable response time goals for
• Home Page Load
• Search Results
• Product Display
• Add To Cart
• Checkout Pages
– Get projected metrics for
• Peak orders per hour
• Peak page views per second
• Conversion rate at peak
• Concurrent Visitors / Sessions
• Acceptable error rates
– How much headroom is required?
– Resellers care more than Big Brands (why?)
PLAN
• CREATE USER SCENARIOS
– Understand characteristics of
users
• How do they browse the site
• How often do they search
• How many sign in or create an
account?
• % that add to cart
– Understand characteristics of
orders
• # Items per order
• % of orders using promotions /
coupons
• Registered vs guest checkout
PLAN
• BUILD TEST PLAN
– Use the goals and metrics to
design a performance test plan
– Use the characteristics of
shoppers to design test scripts
– Use infrastructure that mirrors
production
• Same # of tiers
• DB / App / Web Hardware, Network
and Configuration
• 3rd Party and Back End Integrations
• Scheduled job / data loads
• “Tags” must fire
– Drive load with a system that
provides reports
EXECUTE
Image Courtesy of IBM Corp
EXECUTE
• EXECUTE LOAD TESTS
– Drive peak load against the
test infrastructure
– Use various methods
• Gradually increase to peak
• Burst
• Saturation / Soak
• Be sure you can capture
results
EXECUTEMONITOR
• CPU and RAM Utilization (DB/APP/WEB)
– Conform to headroom goals (typically 60-70% max on
app/db tiers)
– Garbage collection a major contributor
• Threading funnels and points of failure
– Web  App  DB
– Messaging Servers
– 3rd party API calls
• JVM / HEAP Performance
– Take Java cores during load tests
– Bottlenecks found by analyzing threads
• What are threads waiting on?
• Long Running SQL?
• Open DB Connections?
– Fragmentation and large object requests – key for garbage
collection
• DB Performance
– Buffer Pool Hit Ratio
– Open Connections
– Long Running Queries
• Disk IO / Network Latency
– Are you waiting on file system access?
– Are network response times good?
EXECUTE
• Analyze
– Test reports
• Log files (Web, App, DB)
• Java Heap
• DB logs
• CPU, Disk IO, Network Latency
• Recommend changes to
code or configurations
• Modify test scripts if
necessary
• Repeat
How do we achieve good performance?
Everyone has a part to play
• Project managers
• Developers
• Admins
• DBAs
• Testers
• Designers
• HTML/JS/CSS Team
• Clients
• Expectations
• Requirements
• Budget
How do we achieve good performance?
We must design for performance!
Design for Performance
• Front end
• Caching strategy
• Avoid needless execution of logic
• Database interactions
• Registry / Circuit Breakers
• Web/App/DB Tuning
• Ensure code is traceable
• Database design
• Coding best practices
Design for performance
• Front End
– Limit size of page
– Limit number of requests
– Componentize pages to maximize use
of platform specific server side
caching for dynamic content (e.g.
Dynacache for WC)
– Use Tag Containers for 3rd party Tags
– Avoid DB interactions if possible
– Cookie free domains where possible
for static contents
Design for performance
Design for Performance
• Caching and Caching Strategy
– Get to know your platform’s caching framework
– If designing a new page or function, identify caching impact
(what are new cache keys)
– Part of code review should include inclusion in cache and cache
invalidation
– Build page fragments that can be cached on dynamic pages
• USE a CDN
– WEB
– IMAGES
– MEDIA
• Push clients to leverage YouTube
Design for Performance
Build efficient pages and logic
• Avoid needless execution of logic
• Order calculate
• Address lookups
• Looking up product details when
unnecessary (esp. search results)
• Limit result set sizes (e.g. address
book)
• No Crappy Code
Design for Performance
• Database interactions
– Use JDBC / Session Beans
– Query Performance in Code
Review
– Views / Flat tables can help
– Parameterized queries
• Database Maintenance
– Remove obsolete data
– Reorg / Reindex
Design for Performance
DB Cleaning / Optimization
• E-Commerce Databases grow very fast
• Clean obsolete data regularly
– Need Customer Data Retention Policy
• Don’t get behind – it can be very hard to
recover if you don’t have robust hardware
• Review table sizes every week, look for
growth, look for changes in relative size of
tables
• Look at longest running queries and try to
optimize.
• Profile queries executed in code during code
review
Design for Performance
• Circuit Breakers
– DB Driven Registry
– Properties File Registry
– Allows toggling on/off key
functionality
– Good for Business Users and for
performance
Design for Performance
• Ensure traceability of code
• Avoid System.out.println
• Pollutes logs
• Impacts performance
• Use separate logs for 3rd party
API calls if possible
– Key for troubleshooting
Some Tools of the Trade
Load Testing APM Heap
Analysis
Tag
Container
External
Tests
CDN
Silk
Performer
Dynatrace IBM Heap
Analyzer
TagMan Gomez Akamai
Commercial
Open Source / Free
Load Testing APM Heap
Analysis
Tag
Container
External
Tests
CDN
JMeter Zabbix Eclipse
Memory
Analyzer
Google Tag
Manager
Pingdom
Yslow
CloudFlare
http://www.redbooks.ibm.com/abstracts/sg247512.html
WebSphere Commerce High Availability and Performance Solutions:
Q&A
Jerry Lewis
VP IBM Practice, SysIQ
+1 415 806 0755 (Mobile)
j.lewis@sysiq.com
http://www.linkedin.com/pub/jerry-lewis/4/76a/686/

More Related Content

Scalability and performance for e commerce

  • 1. Scalability and Performance for eCommerce Overview and Discussion Jerry Lewis VP IBM Practice, SysIQ
  • 2. Overview • My background • Horror Show (not Harasho) • About Scalability • What is website performance? • Why does performance matter? • What causes bad performance? • How do we test website performance? • How do we achieve good performance? • Questions and Answers
  • 3. 3 A little about me Large Client Experience (revenue > $1 Billion USD)
  • 5. Horror Stories Reviews Tag Won’t Fire • Cause: SSL certificate changed at 3rd party • Impact • JS Null Pointer Exception • Add to cart button disappears • $1,250,000 USD lost sales
  • 6. Horror Stories SinoMaximiser.js won’t load on e-com site • Cause: SinoMaximizer went out of business • Impact • 3 minute delay in loading page • Massive customer complaints • Brand erosion
  • 7. Horror Stories OOM Crashes site randomly for months • Cause: Gomez login script • Impact • 10,000+ addresses in one user account • Loading address book causes OOM • Site crash • $15,000 in lost sales per occurrence • Customer complaints when site crashes
  • 8. Horror Stories EAV Data Model grows wildly, crippling site • Cause: Millions of records loaded every week for 1 year without clean up • Impact • Slow site performance • DB Corruption when trying to clean when site was live • Loss of customer data, order data • $100,000 USD lost revenue • Massive customer complaints
  • 9. Horror Stories Search for blank term causes OOM during holiday rush • Cause: db query without limit on number of results • Impact: • OOM, site outage during holiday • $10,000 USD lost revenue per occurrence • Massive customer complaints • Shoppers going to other sites
  • 10. Horror Stories Credit Card processing bottleneck causes 48 hour delay in fulfilling orders • Cause: 3 second single threaded detokenization process with order volume > 4,000 orders per hour • Impact: • $2,000,000 in lost revenue due to unfulfilled orders • Inventory synchronization issues cause invalid orders to be placed • Massive customer complaints • Negative social media
  • 11. Scalability Varies by platform Key Questions • Can you add clusters of servers (Web / App) • Are they centrally manageable • Monitoring • Deployment • Operation • Can you vertically scale • Multiple JVMs per node • What additional costs accrue to horizontal or vertical scaling Scaling is EXPENSIVE
  • 12. Scalability • Get the most from your hardware • Get the most from your team • Little additional money required
  • 13. 4 a : the ability to perform : efficiency b : the manner in which a mechanism performs <engine performance> per·for·mance noun pə(r)-ˈfȯr-mən(t)s Definition of PERFORMANCE Ecommerce Website Performance: The ability to effectively process transactions. We measure performance a few ways: • Response time - The home page loaded in 2 seconds • Availability – The site was available 99.999% of the time • Throughput – page hits per second or orders per hour • Error rate - the number of errors relative to total requests • Capacity – maximum throughput that can be achieved with acceptable response time and availability By maximizing your performance: • Get the most out of your infrastructure • Minimize software license costs (WC = $180,000 USD Per PROCESSOR)
  • 14. Why Does it Matter?
  • 17. What Causes Bad Performance?
  • 18. How do we test? Execute Tests Monitor Systems Analyze Results Code / Config Changes Modify Scripts Define Performance Metrics (GOALS) Define User Scenarios Write Test Scripts Mimic Prod Infrastructure PLAN EXECUTE
  • 20. PLAN • DEFINE METRICS AND GOALS – Define acceptable response time goals for • Home Page Load • Search Results • Product Display • Add To Cart • Checkout Pages – Get projected metrics for • Peak orders per hour • Peak page views per second • Conversion rate at peak • Concurrent Visitors / Sessions • Acceptable error rates – How much headroom is required? – Resellers care more than Big Brands (why?)
  • 21. PLAN • CREATE USER SCENARIOS – Understand characteristics of users • How do they browse the site • How often do they search • How many sign in or create an account? • % that add to cart – Understand characteristics of orders • # Items per order • % of orders using promotions / coupons • Registered vs guest checkout
  • 22. PLAN • BUILD TEST PLAN – Use the goals and metrics to design a performance test plan – Use the characteristics of shoppers to design test scripts – Use infrastructure that mirrors production • Same # of tiers • DB / App / Web Hardware, Network and Configuration • 3rd Party and Back End Integrations • Scheduled job / data loads • “Tags” must fire – Drive load with a system that provides reports
  • 24. EXECUTE • EXECUTE LOAD TESTS – Drive peak load against the test infrastructure – Use various methods • Gradually increase to peak • Burst • Saturation / Soak • Be sure you can capture results
  • 25. EXECUTEMONITOR • CPU and RAM Utilization (DB/APP/WEB) – Conform to headroom goals (typically 60-70% max on app/db tiers) – Garbage collection a major contributor • Threading funnels and points of failure – Web  App  DB – Messaging Servers – 3rd party API calls • JVM / HEAP Performance – Take Java cores during load tests – Bottlenecks found by analyzing threads • What are threads waiting on? • Long Running SQL? • Open DB Connections? – Fragmentation and large object requests – key for garbage collection • DB Performance – Buffer Pool Hit Ratio – Open Connections – Long Running Queries • Disk IO / Network Latency – Are you waiting on file system access? – Are network response times good?
  • 26. EXECUTE • Analyze – Test reports • Log files (Web, App, DB) • Java Heap • DB logs • CPU, Disk IO, Network Latency • Recommend changes to code or configurations • Modify test scripts if necessary • Repeat
  • 27. How do we achieve good performance?
  • 28. Everyone has a part to play • Project managers • Developers • Admins • DBAs • Testers • Designers • HTML/JS/CSS Team • Clients • Expectations • Requirements • Budget How do we achieve good performance? We must design for performance!
  • 29. Design for Performance • Front end • Caching strategy • Avoid needless execution of logic • Database interactions • Registry / Circuit Breakers • Web/App/DB Tuning • Ensure code is traceable • Database design • Coding best practices
  • 30. Design for performance • Front End – Limit size of page – Limit number of requests – Componentize pages to maximize use of platform specific server side caching for dynamic content (e.g. Dynacache for WC) – Use Tag Containers for 3rd party Tags – Avoid DB interactions if possible – Cookie free domains where possible for static contents
  • 32. Design for Performance • Caching and Caching Strategy – Get to know your platform’s caching framework – If designing a new page or function, identify caching impact (what are new cache keys) – Part of code review should include inclusion in cache and cache invalidation – Build page fragments that can be cached on dynamic pages • USE a CDN – WEB – IMAGES – MEDIA • Push clients to leverage YouTube
  • 33. Design for Performance Build efficient pages and logic • Avoid needless execution of logic • Order calculate • Address lookups • Looking up product details when unnecessary (esp. search results) • Limit result set sizes (e.g. address book) • No Crappy Code
  • 34. Design for Performance • Database interactions – Use JDBC / Session Beans – Query Performance in Code Review – Views / Flat tables can help – Parameterized queries • Database Maintenance – Remove obsolete data – Reorg / Reindex
  • 35. Design for Performance DB Cleaning / Optimization • E-Commerce Databases grow very fast • Clean obsolete data regularly – Need Customer Data Retention Policy • Don’t get behind – it can be very hard to recover if you don’t have robust hardware • Review table sizes every week, look for growth, look for changes in relative size of tables • Look at longest running queries and try to optimize. • Profile queries executed in code during code review
  • 36. Design for Performance • Circuit Breakers – DB Driven Registry – Properties File Registry – Allows toggling on/off key functionality – Good for Business Users and for performance
  • 37. Design for Performance • Ensure traceability of code • Avoid System.out.println • Pollutes logs • Impacts performance • Use separate logs for 3rd party API calls if possible – Key for troubleshooting
  • 38. Some Tools of the Trade Load Testing APM Heap Analysis Tag Container External Tests CDN Silk Performer Dynatrace IBM Heap Analyzer TagMan Gomez Akamai Commercial Open Source / Free Load Testing APM Heap Analysis Tag Container External Tests CDN JMeter Zabbix Eclipse Memory Analyzer Google Tag Manager Pingdom Yslow CloudFlare http://www.redbooks.ibm.com/abstracts/sg247512.html WebSphere Commerce High Availability and Performance Solutions:
  • 39. Q&A
  • 40. Jerry Lewis VP IBM Practice, SysIQ +1 415 806 0755 (Mobile) j.lewis@sysiq.com http://www.linkedin.com/pub/jerry-lewis/4/76a/686/