Over the first 8 years of Wix, Wix infrastructure has gone a number of transformations, starting as a monolithic application server with MySQL, evolving to a service based architecture with with diverse infrastructure.
Over this 8 years journey, we have learned a thing or two - some DOs and some DON'Ts.
This presentation goes over the evolution of Wix architecture, with the different transformations we have done to support Wix at scale. We will share some of out insights about building a web infrastructure for over 50M users
5. 2006 2007 2008 2009 2010 2011 2012 2013 2014
Wix is founded
First funding
Open beta
1 M users
eCommerce
10 M users
IPO
50 M users
Mobile
App Market
Hive
Wix Worldwide
HTML 5
20. CDN
• Use a CDN!
• Cache killer pattern
– http://static.wix.com/client/1.3.2/css/viewer.css
21. Development Velocity - 2010
• Large and entangled codebase
• Hard feature rollout
• While at the same time, the iPad was
released
• We needed to enable Wix to move fast
22. 2011 2012 2013 2014
CI / CD / TDD
DevOps
ScalaWix Framework
Micro-Services
2010
TDD Redux
Companies &
Guilds
25. Why CI / CD / TDD / DevOps?
• Fear of change
• Low quality
• Slow product development
• 3 months from dev to GA
I want
change
I want
stability
26. CI / CD / TDD / DevOps
• Small and fast changes
• Empower the developer
• Automate!!!
• Measure!!!
• = x xRisk
Number
of
changes
Size of
change
$$$
impact of
change
30. Micro-Services
• Over 100 micro-services at Wix
• Each service is a process
• Independent development & deployment
• Risk mitigation
• Increases % of network failures
• Back / Forward compatibility
• Size of a service – as large as the team
31. Companies & Guilds
• Companies focus on products
• Guilds focus on technology
How
What
Company leader
Guild master
Built for fast development
Did not know what are business is
We know we will need to replace it
Did not know how hard that will be
Cache management – how do you know what is in the cache? How do you find invalid data in the cache?
Invalidation – who invalidates the cache? When? Why?
Cache Reset – can your architecture stand a cache restart?
Sites should never ever have a downtime!
Sites should work as fast as possible, always!
However, an editing system does not require this level of SLA
Releases of Editing feature should have no impact on existing site operations!
Solution - The two concerns evolve independently
The Public segment targets serving websites
Has mostly read-only usage pattern
Simple publishing system
Simple + readonly -> simpler to have higher SLA and DRP
MySQL used as NoSQL – single large table with XML text fields
The Editor segment
Exposes the Editing APIs, user account and galleries management.
Has different release schedule compared to the Public segment
Use one non-normalized table, primary key access, json fields
Immutable blobs, blog table with pointers
No transactions
No MySQL auto generated keys
GUID for keys – no locks, enable master master replication
Amsterdam for 3 way active active -> failed
Doing 2 way active active + service disruption on third
The “upload to app server, post process files, copy to lighttpd server, serve by lighttpd” pattern proved inefficient, slow and error prone
ls does not scale
Needed control over http headers for caching
CDN acts as a great connection manager - We have CDN hit ratio’s of over 99.9%
There are many vendors
We started with 1 CDN vendor
We are now working with two CDN vendors
Different CDN vendors have advantages at different geo
Tune HTTP Headers per CDN Vendor
CDN Vendors interpret HTTP headers differently
Train the people you already have
Hiring the right people is key to success
Hire only the best developers (only seniors)
Don’t count only on the interview, you need to test actual coding
Hire people who will challenge you (no “yes man”)
Get people you can trust with “root” access to production
Never stop hiring