Scaling Django with gevent
- 2. @mahendra
● Python developer for 6 years
● FOSS enthusiast/volunteer for 14 years
● Bangalore LUG and Infosys LUG
● FOSS.in and LinuxBangalore/200x
● Gevent user for 1 year
● Twisted user for 5 years (before migrating)
● Added twisted support libraries like mustaine
- 5. Process/Thread
● There are blocking sections in the code
● Python GIL is an issue in thread based
concurrency
- 6. Event driven
event_1 hdler_1() ev()
event_2 block_on_events() hdler_2()
Events are posted
event_n hdler_n()
- 7. Event driven web server
request open(fp) reg()
opened parse()
event_loop() read_sql() reg()
sql_read wri_sql() reg()
sql_writ sock_wr() reg()
responded close()
- 8. Two years back
● Using python twisted for half of our products
● Using django for the other half
● Quite a nightmare
- 9. Python twisted
● An event driven library (very scalable)
● Using epoll or kqueue Server 1
Server 2
Nginx
Client
(SSL & LB)
.
.
.
Server N
Proc 1 (:8080)
Proc 2 (:8080)
Proc N (:8080)
- 12. Coroutines
● Python coroutines are almost similar to
generators.
def abc( seq ):
lst = list( seq )
for i in lst:
value = yield i
if cmd is not None:
lst.append( value )
r = abc( [1,2,3] )
r.send( 4 )
- 13. Gevent features
● Fast event-loop based on libevent (epoll,
kqueue etc.)
● Lightweight execution units based on greenlets
(coroutines)
● Monkey patching support
● Simple API
● Fast WSGI server
- 14. Greenlets
● Primitive notion of micro-threads with no implicit
scheduling
● Just co-routines or independent pseudo-
threads
● Other systems like gevent build micro-threads
on top of greenlets.
● Execution happens by switching execution
among greenlet stacks
● Greenlet switching is not implicit (switch())
- 16. Greenlet code
from greenlet import greenlet
def test1():
gr2.switch()
def test2():
gr1.switch()
gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()
- 17. How does gevent work
● Creates an implicit event loop inside a
dedicated greenlet
● When a function in gevent wants to block, it
switches to the greenlet of the event loop. This
will schedule another child greenlet to run
● The eventloop automatically picks up the
fastest polling mechanism available in the
system
● One event loop runs inside a single OS thread
(process)
- 18. Gevent code
import gevent
from gevent import socket
urls = ['www.google.com', 'www.example.com',
'www.python.org']
jobs = [gevent.spawn(socket.gethostbyname, url) for
url in urls]
gevent.joinall(jobs, timeout=2)
[job.value for job in jobs]
['74.125.79.106', '208.77.188.166', '82.94.164.162']
- 19. Gevent apis
● Greenlet management (spawn, timeout, schedule)
● Greenlet local data
● Networking (socket, ssl, dns, select)
● Synchronization
● Event – notify multiple listeners
● Queue – synchronized producer/consumer queues
● Locking – Semaphores
● Greenlet pools
● TCP/IP and WSGI servers
- 20. Gevent advantages
● Almost synchronous code. No callbacks and
deferreds
● Lightweight greenlets
● Good concurrency
● No issues of python GIL
● No need for in-process locking, since a greenlet
cannot be pre-empted
- 21. Gevent issues
● A greenlet will run till it blocks or switches
● Be vary of large/infinite loops
● Monkey patching is required for un-supported
blocking libraries. Might not work well with
some libraries
- 22. Our django dream
● We love django
● I like twisted, but love django more
● Coding complexity
● Lack of developers for hire
● Deployment complexity
● Gevent saved the day
- 23. The Django Problem
● In a HTTP request cycle, we wanted the
following operations
● Fetch some metadata for an item being sold
● Purchase the item for the user in the billing system
● Fetch ads to be shown along with the item
● Fetch recommendations based on this item
● In parallel … !!
● Twisted was the only option
- 24. Twisted code
def handle_purchase( rqst ):
defs = []
defs.append( biller() )
defs.append( ads() )
defs.append( recos() )
defs.append( meta() )
def = DeferredList( defs, … )
def.addCallback( send_response() )
return NOT_DONE_YET
- 25. Twisted issues
● The issues were with everything else
● Header management
● Templates for response
● ORM support
● SOAP, REST, Hessian/Burlap support
– We liked to use suds, requests, mustaine etc.
● Session management and auth
● Caching support
● The above are django's strength
● Django's vibrant eco-system (celery, south,
tastypie)
- 26. gunicorn
● A python WSGI HTTP server
● Supports running code under worker, eventlet,
gevent etc.
● Uses monkey patching
● Excellent django support
● gunicorn_django app.settings
● Enabled gevent support for our app by default
without any code changes
● Spawns and manages worker processes and
distributes load amongst them
- 27. Migrating our products
def handle_purchase( request ):
jobs = []
jobs.append( gevent.spawn( biller, … ) )
jobs.append( gevent.spawn( ads, … ) )
jobs.append( gevent.spawn( meta, … ) )
jobs.append( gevent.spawn( reco, … ) )
gevent.joinall()
- 28. Migrating our products
● Migrating our entire code base (2 products)
took around 1 week to finish
● Was easier because we were already using
inlineCallbacks() decorator of twisted
● Only small parts of our code had to be migrated
- 29. Deployment
Gunicorn 1
Gunicorn 2
Nginx
Client
(SSL & LB)
.
.
.
Gunicorn N
Proc 1
Proc 2
Proc N
- 30. Life today
● Single framework for all 4 products
● Use django's awesome features and
ecosystem
● Increased scalability. More so with celery.
● Use blocking python libraries without worrying
too much
● No more usage of python-twisted
● Coding, testing and maintenance is much
easier
● We are hiring!!
- 31. Links
● http://greenlet.readthedocs.org/en/latest/index.html
● http://www.gevent.org/
● http://in.pycon.org/2010/talks/48-twisted-programming