SlideShare a Scribd company logo
Aviran Mordo
Head of Back-end Engineering
@aviranm
linkedin.com/in/aviran
aviransplace.com
Scaling Wix with
Microservices Architecture
 Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
Wix in Numbers
Over 61M users
1.5M new users/month
Static storage is >2PB of data
1.5TB new files/day
3 data centers + 3 clouds (Google, Amazon,Azure)
1.5B HTTP requests/day
900 people work atWix, of which ~ 300 in R&D
Initial Architecture
Built for fast development
Stateful login (Tomcat session), Ehcache, file uploads
No consideration for performance, scalability and testing
Intended for short-term use
Tomcat, Hibernate, custom web framework
Lighttpd
(file serving) MySQL
DB
Wix
(Tomcat)
The Monolithic Giant
One monolithic server that handled everything
Dependency between features
Changes in unrelated areas of the system caused deployment of
the whole system
Failure in unrelated areas will cause system wide downtime
Breaking the System Apart
Concerns and SLA
DataValidation
Security / Authentication
Data consistency
Lots of data
Edit websites
High availability
High performance
Lots of static files
Very high traffic volume
Viewport optimization
Long tail (immutable)
Serving Media
High availability
High performance
High traffic volume
Long tail (mutable)
View sites, created by
Wix editor
Wix Segmentation
1. Editor Segment 3. Public Segment2. Media Segment
Networking
HTML
Editor
Flash
Editor
MSM
Private
Media
Public
Media
Editor Segment Public Segment
Premium
Services
eCommerse
List DB
App
Builder
App
Store
App
Market
Dashboard
Statics/me
dia
Mailer
TimeZone
Public
HTML API
Public API
(Flash)
MSP
Public
Server
HTML
Renderer
HTML SEO
Renderer
Flash
Renderer
Flash SEO
Renderer
Sitemap
Renderer
Robots.txt
Renderer
User
Server
Template
Viewer
ContactsHUB
Activit
y
Site
Members
Provided
Mailing
Service
Comments
Snapshoter
User Pref
Feed Me
Shout-out Hotels
PETRI
Site Pref
Dist LoggerSlicer
eCom
Renderer
eCom Cart
eCom
Checkout
eCom
Catalog
eCom
Orders
Payment
Facade
Account
Info
HTML API
HTML
Embeder
BlogMobile
Microservices Guidelines
Each service has its own database (if one is needed)
Only one service can write to a specific DB table(s)
There may be additional read-only services that directly
accesses the DB (for performance reasons)
Services are stateless
No DB transactions
Cache is not a building block, but an optimization
 Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
Microservices Tradeoffs
Each service has its own database (if one is needed)
Easy to scale microservices based on SLA concerns
Tradeoff – system complexity, performance
Only one service can write to a specific DB table(s)
De-coupling architecture – faster development
Tradeoff – system complexity / performance
May be additional read-only services that accesses the DB
Performance gain
Tradeoff coupling
Services are stateless
Easy to scale out (just add more servers)
Tradeoff performance / consistency
No DB transactions
Better DB performance, easier to scale
Tradeoff system complexity
1. Editor Segment
Editor Server
Immutable JSON pages (~2.5M / day)
Site revisions
Active – standby MySQL cross datacenters
Editor Server
MySQL
Active
Sites
MySQL
Archive
 Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
Protect The Data
DB outage with fast recovery = replication
Data poisoning/corruption = revisions / backup
Make the data available at all times = data distribution to multiple
locations / providers
Browser
Editor
Server
Static
Grid
Notify
Google
Cloud
Storage
MySQL
Active
Sites
MySQL
Archive
Notify
Saving Editor Data
Archive
(Amazon)
Archive
(Google)
Save Page(s)
200 OK
Upload
Save Page
DC replication
Download Page
MySQL
Archive
MySQL
Active
Sites
Browser
Editor
Server
Static
Grid
Save Page(s)
Save Page
Upload
Notify
Download Page
Google
Cloud
Storage
MySQL
Archive
MySQL
Active
Sites
MySQL
Archive
DC replication
Notify
Self Healing Process
Archive
(Amazon)
Archive
(Google)
MySQL
Active
Sites
200 OK
No DB Transactions
Save each page (JSON) as an atomic operation
Page ID is a content based hash (immutable/idempotent)
Finalize transaction by sending site header (list of pages)
Can generate orphaned pages, not a problem in practice
2. Media Segment
Prospero – Wix Media Storage
2PB user media files
3M files uploaded daily
800M metadata records
Dynamic media processing
• Picture resize, crop and sharpen “on the fly”
• Watermark
• Audio format conversion
T
Google
Cloud
Prospero – Wix Media Manager
get image.jpg
First
fallback
Second
fallback
If not in
CDN
Amazon
x36
T
x36
T
x32
Austin
CDN
3. Public Segment
Public Segment Roles
Routing (resolve URLs)
Dispatching (to a renderer)
Rendering (HTML,XML,TXT)
Public
Server
HTML
Renderer
HTML SEO
Renderer
Flash
Renderer
Sitemap
Renderer
Robots.txt
Renderer
www.example.com
Flash SEO
Renderer
Public SLA
Our goal: 99% response time <100ms at peak traffic
Publish Site
Publish site header (a map of pages for a site)
Publish routing table
Publish site header / routes (CQRS)
Editor Segment Public Segment
Built For Speed
Minimize out-of-process hops (2 DB, 1 RPC)
Lookup tables are cached in memory, updated every few minutes
Denormalized data – optimize for read by primary key (MySQL)
Minimize business logic
How a Page Gets Rendered
Bootstrap HTML template that contains only data
Only JavaScript imports
JSON data (site-header + dynamic data)
No “real” HTML view
Offload rendering work to the browser
The average Intel Core
i750 can push up to 7
GFLOPS without
overclocking
Why JSON?
Easy to parse in JavaScript and Java/Scala
Fairly compact text format
Highly compressible (5:1 even for small payloads)
Easy to fix rendering bugs (just deploy a new code)
Minimum Number of Public Servers
Needed to Serve 60M Sites
4
Public SLA
Be Available 99.999%
Serving a Site – Sunny Day
Archive
CDN Statics
Browser
http://example.wix.com
Store HTML
to cache
HTTP
Request
Notify
site view
LB
Public
Renderer
HTML
Resources / Media
HTTP
Request
Serving a Site – DC Lost
Archive
CDN Statics
Browser
http://example.wix.com
LB
Public
Renderer
LB
Public
Renderer
Change DNS
HTTP
Request
Serving a Site – Public Lost
Archive
Browser
http://example.wix.com
LB
Public
Renderer
Get
Cached HTML
Version
HTML
HTTP
Request
LB
Public
Renderer
Fallback to 2nd
DC
Living in the Browser
Archive
CDN Statics
Browser
http://example.wix.com
LB
Public
Renderer
Editor Pages
Fallback
JSON /
Media
HTML
HTTP
Request
Fallback
Summary
Identify your critical path and concerns
Build redundancy in critical path (for availability)
De-normalize data (for performance)
Minimize out-of-process hops (for performance)
Take advantage of client’s CPU power
 Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
Aviran Mordo
Head of Back-end Engineering
Q&A
@aviranm
linkedin.com/in/aviran
aviransplace.com
http://engineering.wix.com
http://goo.gl/wlq9Ih
@WixEng

More Related Content

Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015

  • 1. Aviran Mordo Head of Back-end Engineering @aviranm linkedin.com/in/aviran aviransplace.com Scaling Wix with Microservices Architecture
  • 3. Wix in Numbers Over 61M users 1.5M new users/month Static storage is >2PB of data 1.5TB new files/day 3 data centers + 3 clouds (Google, Amazon,Azure) 1.5B HTTP requests/day 900 people work atWix, of which ~ 300 in R&D
  • 4. Initial Architecture Built for fast development Stateful login (Tomcat session), Ehcache, file uploads No consideration for performance, scalability and testing Intended for short-term use Tomcat, Hibernate, custom web framework Lighttpd (file serving) MySQL DB Wix (Tomcat)
  • 5. The Monolithic Giant One monolithic server that handled everything Dependency between features Changes in unrelated areas of the system caused deployment of the whole system Failure in unrelated areas will cause system wide downtime
  • 7. Concerns and SLA DataValidation Security / Authentication Data consistency Lots of data Edit websites High availability High performance Lots of static files Very high traffic volume Viewport optimization Long tail (immutable) Serving Media High availability High performance High traffic volume Long tail (mutable) View sites, created by Wix editor
  • 8. Wix Segmentation 1. Editor Segment 3. Public Segment2. Media Segment Networking
  • 9. HTML Editor Flash Editor MSM Private Media Public Media Editor Segment Public Segment Premium Services eCommerse List DB App Builder App Store App Market Dashboard Statics/me dia Mailer TimeZone Public HTML API Public API (Flash) MSP Public Server HTML Renderer HTML SEO Renderer Flash Renderer Flash SEO Renderer Sitemap Renderer Robots.txt Renderer User Server Template Viewer ContactsHUB Activit y Site Members Provided Mailing Service Comments Snapshoter User Pref Feed Me Shout-out Hotels PETRI Site Pref Dist LoggerSlicer eCom Renderer eCom Cart eCom Checkout eCom Catalog eCom Orders Payment Facade Account Info HTML API HTML Embeder BlogMobile
  • 10. Microservices Guidelines Each service has its own database (if one is needed) Only one service can write to a specific DB table(s) There may be additional read-only services that directly accesses the DB (for performance reasons) Services are stateless No DB transactions Cache is not a building block, but an optimization
  • 12. Microservices Tradeoffs Each service has its own database (if one is needed) Easy to scale microservices based on SLA concerns Tradeoff – system complexity, performance Only one service can write to a specific DB table(s) De-coupling architecture – faster development Tradeoff – system complexity / performance May be additional read-only services that accesses the DB Performance gain Tradeoff coupling Services are stateless Easy to scale out (just add more servers) Tradeoff performance / consistency No DB transactions Better DB performance, easier to scale Tradeoff system complexity
  • 14. Editor Server Immutable JSON pages (~2.5M / day) Site revisions Active – standby MySQL cross datacenters Editor Server MySQL Active Sites MySQL Archive
  • 16. Protect The Data DB outage with fast recovery = replication Data poisoning/corruption = revisions / backup Make the data available at all times = data distribution to multiple locations / providers
  • 18. Browser Editor Server Static Grid Save Page(s) Save Page Upload Notify Download Page Google Cloud Storage MySQL Archive MySQL Active Sites MySQL Archive DC replication Notify Self Healing Process Archive (Amazon) Archive (Google) MySQL Active Sites 200 OK
  • 19. No DB Transactions Save each page (JSON) as an atomic operation Page ID is a content based hash (immutable/idempotent) Finalize transaction by sending site header (list of pages) Can generate orphaned pages, not a problem in practice
  • 21. Prospero – Wix Media Storage 2PB user media files 3M files uploaded daily 800M metadata records Dynamic media processing • Picture resize, crop and sharpen “on the fly” • Watermark • Audio format conversion
  • 22. T Google Cloud Prospero – Wix Media Manager get image.jpg First fallback Second fallback If not in CDN Amazon x36 T x36 T x32 Austin CDN
  • 24. Public Segment Roles Routing (resolve URLs) Dispatching (to a renderer) Rendering (HTML,XML,TXT) Public Server HTML Renderer HTML SEO Renderer Flash Renderer Sitemap Renderer Robots.txt Renderer www.example.com Flash SEO Renderer
  • 25. Public SLA Our goal: 99% response time <100ms at peak traffic
  • 26. Publish Site Publish site header (a map of pages for a site) Publish routing table Publish site header / routes (CQRS) Editor Segment Public Segment
  • 27. Built For Speed Minimize out-of-process hops (2 DB, 1 RPC) Lookup tables are cached in memory, updated every few minutes Denormalized data – optimize for read by primary key (MySQL) Minimize business logic
  • 28. How a Page Gets Rendered Bootstrap HTML template that contains only data Only JavaScript imports JSON data (site-header + dynamic data) No “real” HTML view
  • 29. Offload rendering work to the browser
  • 30. The average Intel Core i750 can push up to 7 GFLOPS without overclocking
  • 31. Why JSON? Easy to parse in JavaScript and Java/Scala Fairly compact text format Highly compressible (5:1 even for small payloads) Easy to fix rendering bugs (just deploy a new code)
  • 32. Minimum Number of Public Servers Needed to Serve 60M Sites 4
  • 34. Serving a Site – Sunny Day Archive CDN Statics Browser http://example.wix.com Store HTML to cache HTTP Request Notify site view LB Public Renderer HTML Resources / Media HTTP Request
  • 35. Serving a Site – DC Lost Archive CDN Statics Browser http://example.wix.com LB Public Renderer LB Public Renderer Change DNS HTTP Request
  • 36. Serving a Site – Public Lost Archive Browser http://example.wix.com LB Public Renderer Get Cached HTML Version HTML HTTP Request LB Public Renderer Fallback to 2nd DC
  • 37. Living in the Browser Archive CDN Statics Browser http://example.wix.com LB Public Renderer Editor Pages Fallback JSON / Media HTML HTTP Request Fallback
  • 38. Summary Identify your critical path and concerns Build redundancy in critical path (for availability) De-normalize data (for performance) Minimize out-of-process hops (for performance) Take advantage of client’s CPU power
  • 40. Aviran Mordo Head of Back-end Engineering Q&A @aviranm linkedin.com/in/aviran aviransplace.com http://engineering.wix.com http://goo.gl/wlq9Ih @WixEng

Editor's Notes

  1. How many built a website?
  2. Editor – Read immediately after write – Small working set Viewer optimize for reads We fight for every ms. Page view = many resource downloading
  3. Read-only services only if it is part of the same business functionality
  4. Read-only services only if it is part of the same business functionality
  5. Immutable data helps handle eventual consistency MySql is a great key-value store Not all data is equal (only 6% of websites are edited 3 months after creation)
  6. Revision keep data safe from poisoning Pay in storage and management
  7. Highly optimized code – every ms count
  8. We can change the arrows as we want Tech vendor lock is a myth,  easy to change the api (small dev effort).  Invest in data distribution. Evaluation of new platform starts by putting the data.
  9. Save pages on JSON Upload to static storage
  10. Explain what is JSON and what is HTML
  11. UPS dies, secondary power source connected to the same UPS
  12. Due to error or bad configuration