SlideShare a Scribd company logo
Scaling Django with Gevent

          Mahendra M
          @mahendra
   https://github.com/mahendra
@mahendra
●   Python developer for 6 years
●   FOSS enthusiast/volunteer for 14 years
    ●   Bangalore LUG and Infosys LUG
    ●   FOSS.in and LinuxBangalore/200x
●   Gevent user for 1 year
●   Twisted user for 5 years (before migrating)
    ●   Added twisted support libraries like mustaine
Concurrency models
●   Multi-Process
●   Threads
●   Event driven
●   Coroutines
Process/Thread

request   dispatch()   worker_1()


                                    read(fp)

                                     db_rd()

                                     db_wr()

                                    sock_wr()




                       worker_n()
Process/Thread
●   There are blocking sections in the code
●   Python GIL is an issue in thread based
    concurrency
Event driven

event_1                           hdler_1()   ev()



event_2      block_on_events()    hdler_2()



          Events are posted



event_n                           hdler_n()
Event driven web server

 request                        open(fp)    reg()


 opened                         parse()


                event_loop()   read_sql()   reg()


sql_read                       wri_sql()    reg()


sql_writ                       sock_wr()    reg()

responded                       close()
Two years back
●   Using python twisted for half of our products
●   Using django for the other half
●   Quite a nightmare
Python twisted
●   An event driven library (very scalable)
●   Using epoll or kqueue                 Server 1



                                          Server 2
                             Nginx
             Client
                           (SSL & LB)
                                               .
                                               .
                                               .
                                          Server N

                                              Proc 1 (:8080)

                                              Proc 2 (:8080)

                                              Proc N (:8080)
Gevent
A coroutine-based Python networking library that
uses greenlet to provide a high-level synchronous
API on top of the libevent event loop.
Gevent
A coroutine-based Python networking library that
uses greenlet to provide a high-level synchronous
API on top of the libevent event loop.
Coroutines
●   Python coroutines are almost similar to
    generators.

def abc( seq ):
     lst = list( seq )
     for i in lst:
         value = yield i
         if cmd is not None:
              lst.append( value )
r = abc( [1,2,3] )
r.send( 4 )
Gevent features
●   Fast event-loop based on libevent (epoll,
    kqueue etc.)
●   Lightweight execution units based on greenlets
    (coroutines)
●   Monkey patching support
●   Simple API
●   Fast WSGI server
Greenlets
●   Primitive notion of micro-threads with no implicit
    scheduling
●   Just co-routines or independent pseudo-
    threads
●   Other systems like gevent build micro-threads
    on top of greenlets.
●   Execution happens by switching execution
    among greenlet stacks
●   Greenlet switching is not implicit (switch())
Greenlet execution

Main greenlet                     pause()


                                   abc()


                 Child greenlet   func_1()


                                  pause()


                                  some()     reg()

                                  func_2()
Greenlet code
from greenlet import greenlet


def test1():
   gr2.switch()


def test2():
   gr1.switch()


gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()
How does gevent work
●   Creates an implicit event loop inside a
    dedicated greenlet
●   When a function in gevent wants to block, it
    switches to the greenlet of the event loop. This
    will schedule another child greenlet to run
●   The eventloop automatically picks up the
    fastest polling mechanism available in the
    system
●   One event loop runs inside a single OS thread
    (process)
Gevent code
import gevent
from gevent import socket
urls = ['www.google.com', 'www.example.com',
'www.python.org']
jobs = [gevent.spawn(socket.gethostbyname, url) for
url in urls]
gevent.joinall(jobs, timeout=2)
[job.value for job in jobs]


['74.125.79.106', '208.77.188.166', '82.94.164.162']
Gevent apis
●   Greenlet management (spawn, timeout, schedule)
●   Greenlet local data
●   Networking (socket, ssl, dns, select)
●   Synchronization
    ●   Event – notify multiple listeners
    ●   Queue – synchronized producer/consumer queues
    ● Locking – Semaphores
●   Greenlet pools
●   TCP/IP and WSGI servers
Gevent advantages
●   Almost synchronous code. No callbacks and
    deferreds
●   Lightweight greenlets
●   Good concurrency
●   No issues of python GIL
●   No need for in-process locking, since a greenlet
    cannot be pre-empted
Gevent issues
●   A greenlet will run till it blocks or switches
    ●   Be vary of large/infinite loops
●   Monkey patching is required for un-supported
    blocking libraries. Might not work well with
    some libraries
Our django dream
●   We love django
●   I like twisted, but love django more
    ●   Coding complexity
    ●   Lack of developers for hire
    ●   Deployment complexity
●   Gevent saved the day
The Django Problem
●   In a HTTP request cycle, we wanted the
    following operations
    ●   Fetch some metadata for an item being sold
    ●   Purchase the item for the user in the billing system
    ●   Fetch ads to be shown along with the item
    ●   Fetch recommendations based on this item
●   In parallel … !!
    ●   Twisted was the only option
Twisted code
def handle_purchase( rqst ):
   defs = []
   defs.append( biller() )
   defs.append( ads() )
   defs.append( recos() )
   defs.append( meta() )
   def = DeferredList( defs, … )
   def.addCallback( send_response() )
   return NOT_DONE_YET
Twisted issues
●   The issues were with everything else
    ●   Header management
    ●   Templates for response
    ●   ORM support
    ●   SOAP, REST, Hessian/Burlap support
        –   We liked to use suds, requests, mustaine etc.
    ●   Session management and auth
    ●   Caching support
●   The above are django's strength
    ●   Django's vibrant eco-system (celery, south,
        tastypie)
gunicorn
●   A python WSGI HTTP server
●   Supports running code under worker, eventlet,
    gevent etc.
    ●   Uses monkey patching
●   Excellent django support
    ●   gunicorn_django app.settings
●   Enabled gevent support for our app by default
    without any code changes
●   Spawns and manages worker processes and
    distributes load amongst them
Migrating our products
def handle_purchase( request ):
    jobs = []
    jobs.append( gevent.spawn( biller, … ) )
    jobs.append( gevent.spawn( ads, … ) )
    jobs.append( gevent.spawn( meta, … ) )
    jobs.append( gevent.spawn( reco, … ) )
    gevent.joinall()
Migrating our products
●   Migrating our entire code base (2 products)
    took around 1 week to finish
●   Was easier because we were already using
    inlineCallbacks() decorator of twisted
●   Only small parts of our code had to be migrated
Deployment

                        Gunicorn 1



                        Gunicorn 2
             Nginx
Client
           (SSL & LB)
                             .
                             .
                             .
                        Gunicorn N

                                 Proc 1

                                 Proc 2

                                 Proc N
Life today
●   Single framework for all 4 products
●   Use django's awesome features and
    ecosystem
●   Increased scalability. More so with celery.
●   Use blocking python libraries without worrying
    too much
●   No more usage of python-twisted
●   Coding, testing and maintenance is much
    easier
●   We are hiring!!
Links
●   http://greenlet.readthedocs.org/en/latest/index.html
●   http://www.gevent.org/
●   http://in.pycon.org/2010/talks/48-twisted-programming

More Related Content

Scaling Django with gevent

  • 1. Scaling Django with Gevent Mahendra M @mahendra https://github.com/mahendra
  • 2. @mahendra ● Python developer for 6 years ● FOSS enthusiast/volunteer for 14 years ● Bangalore LUG and Infosys LUG ● FOSS.in and LinuxBangalore/200x ● Gevent user for 1 year ● Twisted user for 5 years (before migrating) ● Added twisted support libraries like mustaine
  • 3. Concurrency models ● Multi-Process ● Threads ● Event driven ● Coroutines
  • 4. Process/Thread request dispatch() worker_1() read(fp) db_rd() db_wr() sock_wr() worker_n()
  • 5. Process/Thread ● There are blocking sections in the code ● Python GIL is an issue in thread based concurrency
  • 6. Event driven event_1 hdler_1() ev() event_2 block_on_events() hdler_2() Events are posted event_n hdler_n()
  • 7. Event driven web server request open(fp) reg() opened parse() event_loop() read_sql() reg() sql_read wri_sql() reg() sql_writ sock_wr() reg() responded close()
  • 8. Two years back ● Using python twisted for half of our products ● Using django for the other half ● Quite a nightmare
  • 9. Python twisted ● An event driven library (very scalable) ● Using epoll or kqueue Server 1 Server 2 Nginx Client (SSL & LB) . . . Server N Proc 1 (:8080) Proc 2 (:8080) Proc N (:8080)
  • 10. Gevent A coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.
  • 11. Gevent A coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.
  • 12. Coroutines ● Python coroutines are almost similar to generators. def abc( seq ): lst = list( seq ) for i in lst: value = yield i if cmd is not None: lst.append( value ) r = abc( [1,2,3] ) r.send( 4 )
  • 13. Gevent features ● Fast event-loop based on libevent (epoll, kqueue etc.) ● Lightweight execution units based on greenlets (coroutines) ● Monkey patching support ● Simple API ● Fast WSGI server
  • 14. Greenlets ● Primitive notion of micro-threads with no implicit scheduling ● Just co-routines or independent pseudo- threads ● Other systems like gevent build micro-threads on top of greenlets. ● Execution happens by switching execution among greenlet stacks ● Greenlet switching is not implicit (switch())
  • 15. Greenlet execution Main greenlet pause() abc() Child greenlet func_1() pause() some() reg() func_2()
  • 16. Greenlet code from greenlet import greenlet def test1(): gr2.switch() def test2(): gr1.switch() gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch()
  • 17. How does gevent work ● Creates an implicit event loop inside a dedicated greenlet ● When a function in gevent wants to block, it switches to the greenlet of the event loop. This will schedule another child greenlet to run ● The eventloop automatically picks up the fastest polling mechanism available in the system ● One event loop runs inside a single OS thread (process)
  • 18. Gevent code import gevent from gevent import socket urls = ['www.google.com', 'www.example.com', 'www.python.org'] jobs = [gevent.spawn(socket.gethostbyname, url) for url in urls] gevent.joinall(jobs, timeout=2) [job.value for job in jobs] ['74.125.79.106', '208.77.188.166', '82.94.164.162']
  • 19. Gevent apis ● Greenlet management (spawn, timeout, schedule) ● Greenlet local data ● Networking (socket, ssl, dns, select) ● Synchronization ● Event – notify multiple listeners ● Queue – synchronized producer/consumer queues ● Locking – Semaphores ● Greenlet pools ● TCP/IP and WSGI servers
  • 20. Gevent advantages ● Almost synchronous code. No callbacks and deferreds ● Lightweight greenlets ● Good concurrency ● No issues of python GIL ● No need for in-process locking, since a greenlet cannot be pre-empted
  • 21. Gevent issues ● A greenlet will run till it blocks or switches ● Be vary of large/infinite loops ● Monkey patching is required for un-supported blocking libraries. Might not work well with some libraries
  • 22. Our django dream ● We love django ● I like twisted, but love django more ● Coding complexity ● Lack of developers for hire ● Deployment complexity ● Gevent saved the day
  • 23. The Django Problem ● In a HTTP request cycle, we wanted the following operations ● Fetch some metadata for an item being sold ● Purchase the item for the user in the billing system ● Fetch ads to be shown along with the item ● Fetch recommendations based on this item ● In parallel … !! ● Twisted was the only option
  • 24. Twisted code def handle_purchase( rqst ): defs = [] defs.append( biller() ) defs.append( ads() ) defs.append( recos() ) defs.append( meta() ) def = DeferredList( defs, … ) def.addCallback( send_response() ) return NOT_DONE_YET
  • 25. Twisted issues ● The issues were with everything else ● Header management ● Templates for response ● ORM support ● SOAP, REST, Hessian/Burlap support – We liked to use suds, requests, mustaine etc. ● Session management and auth ● Caching support ● The above are django's strength ● Django's vibrant eco-system (celery, south, tastypie)
  • 26. gunicorn ● A python WSGI HTTP server ● Supports running code under worker, eventlet, gevent etc. ● Uses monkey patching ● Excellent django support ● gunicorn_django app.settings ● Enabled gevent support for our app by default without any code changes ● Spawns and manages worker processes and distributes load amongst them
  • 27. Migrating our products def handle_purchase( request ): jobs = [] jobs.append( gevent.spawn( biller, … ) ) jobs.append( gevent.spawn( ads, … ) ) jobs.append( gevent.spawn( meta, … ) ) jobs.append( gevent.spawn( reco, … ) ) gevent.joinall()
  • 28. Migrating our products ● Migrating our entire code base (2 products) took around 1 week to finish ● Was easier because we were already using inlineCallbacks() decorator of twisted ● Only small parts of our code had to be migrated
  • 29. Deployment Gunicorn 1 Gunicorn 2 Nginx Client (SSL & LB) . . . Gunicorn N Proc 1 Proc 2 Proc N
  • 30. Life today ● Single framework for all 4 products ● Use django's awesome features and ecosystem ● Increased scalability. More so with celery. ● Use blocking python libraries without worrying too much ● No more usage of python-twisted ● Coding, testing and maintenance is much easier ● We are hiring!!
  • 31. Links ● http://greenlet.readthedocs.org/en/latest/index.html ● http://www.gevent.org/ ● http://in.pycon.org/2010/talks/48-twisted-programming