SlideShare a Scribd company logo
Scaling the World’s Largest Django App

Jason Yan                 David Cramer
@jasonyan                    @zeeg
What is DISQUS?
What is DISQUS?


            dis·cuss • dĭ-skŭs'

We are a comment system with an emphasis on
           connecting communities




              http://disqus.com/about/
What is Scale?

                     Number of Visitors
300M
250M
200M
150M
100M
 50M



Our traffic at a glance
17,000 requests/second peak
450,000 websites
15 million profiles
75 million comments
250 million visitors (August 2010)

Recommended for you

Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!

Web App for Containers は、アプリスタックのホストに Docker コンテナーを使用するため皆さんが今Linux上で利用しているOSSベースのアプリもアプリスタックごとDockerコンテナ化することでそのまま Web App for Containersで利用することができます。本ウェビナーでは簡単なMySQL + Ruby on Rails アプリ を題材に、アプリをコンテナ化し Web App for Containersにデプロイするまでの一連の流れを解説し、CIツールを使った継続的なデプロイ方法についてご紹介します。今回、AzureのフルマネージドMySQLサービスであるAzure DB for MySQLを利用して完全マネージドな環境でのアプリ実行を実現します。

app servicesazurecircleci
Getting started with influx Db and Grafana Installation Guide
Getting started with influx Db and Grafana Installation GuideGetting started with influx Db and Grafana Installation Guide
Getting started with influx Db and Grafana Installation Guide

This document discusses InfluxDB, an open source time series database, and Grafana, an open source analytics and visualization suite commonly used with InfluxDB. It provides instructions for installing InfluxDB and Grafana on Mac OS using Brew, and installing the Python plugin for InfluxDB.

MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals

Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.

mongodbarchitecturedatabase
Our Challenges


• We can’t predict when things will happen
  • Random celebrity gossip
  • Natural disasters
• Discussions never expire
  • We can’t keep those millions of articles from
    2008 in the cache
  • You don’t know in advance (generally) where the
    traffic will be
  • Especially with dynamic paging, realtime, sorting,
    personal prefs, etc.
Our Challenges (cont’d)


• High availability
  • Not a destination site
  • Difficult to schedule maintenance
Server Architecture
Server Architecture - Load Balancing
• Load Balancing                          • High Availability
  • Software, HAProxy                       • heartbeat
     • High performance, intelligent
       server availability checking
     • Bonus: Nice statistics reporting




                                                     Image Source: http://haproxy.1wt.eu/

Recommended for you

NW遅延環境(Paas)でのPostgreSQLの利用について
NW遅延環境(Paas)でのPostgreSQLの利用についてNW遅延環境(Paas)でのPostgreSQLの利用について
NW遅延環境(Paas)でのPostgreSQLの利用について
postgresql
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod

Cassandra is a structured storage system designed for large amounts of data across commodity servers. It provides high availability with eventual consistency and scales incrementally without centralized administration. Data is partitioned across nodes and replicated for fault tolerance. Writes are applied locally and propagated asynchronously, prioritizing availability over consistency. It uses a gossip protocol for membership and failure detection.

cassandrasigmodfacebook
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013

Parquet is a columnar storage format for Hadoop data. It was developed by Twitter and Cloudera to optimize storage and querying of large datasets. Parquet provides more efficient compression and I/O compared to traditional row-based formats by storing data by column. Early results show a 28% reduction in storage size and up to a 114% improvement in query performance versus the original Thrift format. Parquet supports complex nested schemas and can be used with Hadoop tools like Hive, Pig, and Impala.

summithadoopparquet
Server Architecture



• ~100 Servers
 • 30% Web Servers (Apache + mod_wsgi)
 • 10% Databases (PostgreSQL)
 • 25% Cache Servers (memcached)
 • 20% Load Balancing / High Availability
   (HAProxy + heartbeat)
 • 15% Utility Servers (Python scripts)
Server Architecture - Web Servers


• Apache 2.2
• mod_wsgi
  • Using `maximum-requests` to
    plug memory leaks.

• Performance Monitoring
  • Custom middleware
    (PerformanceLogMiddleware)
  • Ships performance statistics
    (DB queries, external calls,
    template rendering, etc) through
    syslog
  • Collected and graphed through
    Ganglia
Server Architecture - Database




• PostgreSQL
• Slony-I for Replication
  • Trigger-based
  • Read slaves for extra read capacity
  • Failover master database for high
    availability
Server Architecture - Database

• Make sure indexes fit in memory and
  measure I/O
 • High I/O generally means slow queries
   due to missing indexes or indexes not in
   buffer cache
• Log Slow Queries
 • syslog-ng + pgFouine + cron to automate
   slow query logging

Recommended for you

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns

This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.

concurrencyreplicationnosql
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices

Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.

solution-architecture-and-best-practicesanthony-nguyenaws
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15

This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.

postgresqlpostgres
Server Architecture - Database



• Use connection pooling
 • Django doesn’t do this for you
 • We use pgbouncer
 • Limits the maximum number of
   connections your database needs to
   handle
 • Save on costly opening and tearing down
   of new database connections
Our Data Model
Partitioning




• Fairly easy to implement, quick wins
• Done at the application level
  • Data is replayed by Slony
• Two methods of data separation
Vertical Partitioning
Vertical partitioning involves creating tables with fewer columns
  and using additional tables to store the remaining columns.



     Forums         Posts             Users         Sentry




          http://en.wikipedia.org/wiki/Partition_(database)

Recommended for you

Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper

An introductory talk on Apache ZooKeeper at gnuNify - 2013 on 16th Feb '13, organized by Symbiosis Institute of Computer Science & Research, Pune IN

apache zookeepergnunify
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached

Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.

phpmysqlikdoeict
Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)

2016/07/15のdb tech showcase 2016講演資料です

amazonawsaurora
Pythonic Joins


            Allows us to separate datasets

posts = Post.objects.all()[0:25]

# store users in a dictionary based on primary key
users = dict(
    (u.pk, u) for u in 
    User.objects.filter(pk__in=set(p.user_id for p in posts))
)

# map users to their posts
for p in posts:
  p._user_cache = users.get(p.user_id)
Pythonic Joins (cont’d)



• Slower than at database level
    • But not enough that you should care
    • Trading performance for scale
• Allows us to separate data
    • Easy vertical partitioning
• More efficient caching
    • get_many, object-per-row cache
Designating Masters




• Alleviates some of the write load on your
  primary application master
• Masters exist under specific conditions:
  • application use case
  • partitioned data
• Database routers make this (fairly) easy
Routing by Application




class ApplicationRouter(object):
    def db_for_read(self, model, **hints):
        instance = hints.get('instance')
        if not instance:
            return None

        app_label = instance._meta.app_label

        return get_application_alias(app_label)

Recommended for you

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features

This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/

mongodbintroductioncassandra
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB

MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.

mongodbnosqldatabase
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]

네이버 검색에서 몽고DB 사용에 대한 사용자 경험 공유 - 컨텐츠검색 설명 - 몽고DB인덱스의 이해 - 몽고DB 속도 올리기

mongodb .local seoul 2019:
Horizontal Partitioning
Horizontal partitioning (also known as sharding) involves splitting
               one set of data into different tables.



      Disqus      Your Blog            CNN        Telegraph




           http://en.wikipedia.org/wiki/Partition_(database)
Horizontal Partitions




• Some forums have very large datasets
• Partners need high availability
• Helps scale the write load on the master
• We rely more on vertical partitions
Routing by Partition

class ForumPartitionRouter(object):
    def db_for_read(self, model, **hints):
        instance = hints.get('instance')
        if not instance:
            return None

        forum_id = getattr(instance, 'forum_id', None)
        if not forum_id:
              return None

        return get_forum_alias(forum_id)


# What we used to do
Post.objects.filter(forum=forum)


# Now, making sure hints are available
forum.post_set.all()
Optimizing QuerySets




• We really dislike raw SQL
  • It creates more work when dealing with
    partitions
• Built-in cache allows sub-slicing
  • But isn’t always needed
  • We removed this cache

Recommended for you

DynamoDBによるソーシャルゲーム実装 How To
DynamoDBによるソーシャルゲーム実装 How ToDynamoDBによるソーシャルゲーム実装 How To
DynamoDBによるソーシャルゲーム実装 How To

JAWS DAYS 2013 [DEV-02] http://jaws-ug.jp/jawsdays2013/

jawsug
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps OnlineGKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online

2020 年 4 月 27 日(月) 第 10 回 Google Cloud INSIDE Games & Apps Online Google Cloud 篠原 一徳によるセッション スライドです。

gc_insidegoogle cloud platform
Physical Security Presentation
Physical Security PresentationPhysical Security Presentation
Physical Security Presentation

This document discusses physical security for protecting enterprise resources including people, data, and facilities. It covers assessing threats and vulnerabilities, choosing a secure site location, designing security for the building structure and environment, implementing physical and administrative controls, and ensuring life safety measures like fire detection and suppression. Key considerations include perimeter security, access control, environmental factors, emergency procedures, and compliance with standards to help ensure security.

securityphysicalpresentation
Removing the Cache


• Django internally caches the results of your QuerySet
  • This adds additional memory overhead

     # 1 query
     qs = Model.objects.all()[0:100]

     # 0 queries (we don’t need this behavior)
     qs = qs[0:10]

     # 1 query
     qs = qs.filter(foo=bar)


• Many times you only need to view a result set once
• So we built SkinnyQuerySet
Removing the Cache (cont’d)

Optimizing memory usage by removing the cache
 class SkinnyQuerySet(QuerySet):
     def __iter__(self):
         if self._result_cache is not None:
             # __len__ must have been run
             return iter(self._result_cache)

        has_run = getattr(self, 'has_run', False)
        if has_run:
            raise QuerySetDoubleIteration("...")
        self.has_run = True
        # We wanted .iterator() as the default
        return self.iterator()



                http://gist.github.com/550438
Atomic Updates




• Keeps your data consistent
• save() isnt thread-safe
  • use update() instead
• Great for things like counters
  • But should be considered for all write
    operations
Atomic Updates (cont’d)


  Thread safety is impossible with .save()
Request 1

post = Post(pk=1)
# a moderator approves
post.approved = True
post.save()

Request 2

post = Post(pk=1)
# the author adjusts their message
post.message = ‘Hello!’
post.save()

Recommended for you

Mri brain anatomy Dr Muhammad Bin Zulfiqar
Mri brain anatomy Dr Muhammad Bin ZulfiqarMri brain anatomy Dr Muhammad Bin Zulfiqar
Mri brain anatomy Dr Muhammad Bin Zulfiqar

Anatomy of Brain by MRI In this presentation we will discuss the cross sectional anatomy of brain. Then we will discuss the Most common diseases to be evaluated by brain imaging. In my opinion this presentation is a road map for beginars.

dr. muhammad bin zulfiqarbrain anatomy on mrimri of brain
Thai tech startup ecosystem report 2017
Thai tech startup ecosystem report 2017Thai tech startup ecosystem report 2017
Thai tech startup ecosystem report 2017

The document is a report by Techsauce, Thailand's leading tech publication, summarizing Thailand's startup ecosystem and investment trends from 2012-2017. Some key findings include: - Total funding raised by Thai startups grew from $3.1 million in 2011-2012 to over $86 million in 2016. - Popular categories of startups receiving investment included e-commerce/marketplaces, fintech, logistics, and payments. - Major acquisitions of Thai startups have totaled over $108 million, with companies being acquired by firms from Southeast Asia, China, and other regions.

thai startupthailand startupthailand startup ecosystem
Engineering Geology
Engineering GeologyEngineering Geology
Engineering Geology

This document discusses various geological processes and landforms resulting from physical geology. It covers the geological work of rivers including erosion, transportation, deposition and various fluvial landforms. It also discusses the geological work of other agents like wind, groundwater and oceans. Rivers can erode, transport and deposit sediment, forming features like drainage patterns, valleys, waterfalls and terraces over long periods of time. Wind erosion can form dunes and loess deposits, while groundwater can dissolve rock to form sinkholes, caves and valleys. Oceans also erode, transport and deposit material along coastlines.

Atomic Updates (cont’d)


            So we need atomic updates
Request 1

post = Post(pk=1)
# a moderator approves
Post.objects.filter(pk=post.pk)
            .update(approved=True)

Request 2

post = Post(pk=1)
# the author adjusts their message
Post.objects.filter(pk=post.pk)
            .update(message=‘Hello!’)
Atomic Updates (cont’d)


           A better way to approach updates
def update(obj, using=None, **kwargs):
    """
    Updates specified attributes on the current instance.
    """
    assert obj, "Instance has not yet been created."
    obj.__class__._base_manager.using(using)
                                .filter(pk=obj)
                                .update(**kwargs)
    for k, v in kwargs.iteritems():
        if isinstance(v, ExpressionNode):
            # NotImplemented
            continue
        setattr(obj, k, v)



http://github.com/andymccurdy/django-tips-and-tricks/blob/master/model_update.py
Delayed Signals




• Queueing low priority tasks
 • even if they’re fast
• Asynchronous (Delayed) signals
 • very friendly to the developer
 • ..but not as friendly as real signals
Delayed Signals (cont’d)



  We send a specific serialized version
   of the model for delayed signals

from disqus.common.signals import delayed_save

def my_func(data, sender, created, **kwargs):
    print data[‘id’]

delayed_save.connect(my_func, sender=Post)




 This is all handled through our Queue

Recommended for you

Process sequence of weaving
Process sequence of weavingProcess sequence of weaving
Process sequence of weaving

The document provides an overview of the process sequence for weaving. It begins with yarn from the spinning department which then undergoes processes like cone winding, warping, sizing, tying-in, drafting, and denting to prepare the warp threads. The warp is then mounted on the loom and undergoes weaving to produce grey fabric. Key steps in weaving include shedding, picking, and beating-up. The woven fabric then undergoes inspection, folding, and baling before delivery. The document outlines the various motions and essential parts of a loom needed to carry out this weaving process.

The evolution of mobile phones
The evolution of mobile phonesThe evolution of mobile phones
The evolution of mobile phones

MobilePundits is one of the fastest growing companies in central Asia. It has been awarded with Deloitte award for it performance.

mobilepundits
Cardiac cycle ppt (2)
Cardiac cycle ppt (2)Cardiac cycle ppt (2)
Cardiac cycle ppt (2)

The cardiac cycle consists of systole and diastole. During systole, the heart contracts and pumps blood out of the ventricles. During diastole, the heart relaxes and fills with blood. The cycle involves coordinated events in the atria and ventricles. It can be analyzed using a Wiggers diagram which plots various cardiac parameters over time, revealing phases like isovolumic contraction, ejection, isovolumic relaxation, and filling. Precisely measuring time intervals within the cycle using Doppler echocardiography provides clinical insights into cardiac function and timing.

Caching




• Memcached
• Use pylibmc (newer libMemcached-based)
 • Ticket #11675 (add pylibmc support)
 • Third party applications:
   • django-newcache, django-pylibmc
Caching (cont’d)



• libMemcached / pylibmc is configurable with
  “behaviors”.
• Memcached “single point of failure”
  • Distributed system, but we must take
    precautions.
  • Connection timeout to memcached can stall
    requests.
    • Use `_auto_eject_hosts` and
      `_retry_timeout` behaviors to prevent
      reconnecting to dead caches.
Caching (cont’d)



   • Default (naive) hashing behavior
     • Modulo hashed cache key cache for index
       to server list.
     • Removal of a server causes majority of
       cache keys to be remapped to new
       servers.

CACHE_SERVERS = [‘10.0.0.1’, ‘10.0.0.2’]
key = ‘my_cache_key’
cache_server = CACHE_SERVERS[hash(key) % len(CACHE_SERVERS)]
Caching (cont’d)

• Better approach: consistent hashing
  • libMemcached (pylibmc) uses libketama
    (http://tinyurl.com/lastfm-libketama)


  • Addition / removal of a cache server
    remaps (K/n) cache keys
    (where K=number of keys and n=number of servers)




                 Image Source: http://sourceforge.net/apps/mediawiki/kai/index.php?title=Introduction

Recommended for you

BCG Matrix of Engro foods
BCG Matrix of Engro foodsBCG Matrix of Engro foods
BCG Matrix of Engro foods

This document provides a summary of a marketing analysis project presented by four students at Superior University Lahore on Engro Foods. It includes an introduction, table of contents, acknowledgements, history and background of Engro Foods, their vision, mission and core values. It also summarizes Engro's diversified business portfolio, their brands, business segments targeted, sales setup, departments, production process, and concludes with interviews conducted and references. The document analyzes Engro Foods' market performance and strategies.

4. heredity and evolution
4. heredity and evolution4. heredity and evolution
4. heredity and evolution

Here is an analysis of variations in a red beetle population across three situations: Situation 1 (Original population): The population consists of mostly red beetles, with a small percentage of black beetles. The red coloration provides better camouflage in their current environment. Situation 2 (Environment change): The environment darkens due to increased vegetation/debris. Now black beetles have better camouflage than red beetles. Over time, the percentage of black beetles in the population will increase relative to red beetles, as black beetles survive and reproduce at a higher rate. Situation 3 (New environment): The environment changes again, this time becoming lighter in color (e.g

10+ Getting to Know You Activities for Teens & Adults
10+ Getting to Know You Activities for Teens & Adults10+ Getting to Know You Activities for Teens & Adults
10+ Getting to Know You Activities for Teens & Adults

My books- Hacking Digital Learning Strategies http://hackingdls.com & Learning to Go https://gum.co/learn2go Resources- http://shellyterrell.com/icebreakers

eltbusiness englishesl
Caching (cont’d)


• Thundering herd (stampede) problem
  • Invalidating a heavily accessed cache key causes many
    clients to refill cache.
  • But everyone refetching to fill the cache from the data
    store or reprocessing data can cause things to get even
    slower.
  • Most times, it’s ideal to return the previously invalidated
    cache value and let a single client refill the cache.
  • django-newcache or MintCache (http://
    djangosnippets.org/snippets/793/) will do this for you.
  • Prefer filling cache on invalidation instead of deleting
    from cache also helps to prevent the thundering herd
    problem.
Transactions


• TransactionMiddleware got us started, but
  down the road became a burden
• For postgresql_psycopg2, there’s a database
  option, OPTIONS[‘autocommit’]
  • Each query is in its own transaction. This
    means each request won’t start in a
    transaction.
    • But sometimes we want transactions
      (e.g., saving multiple objects and rolling
      back on error)
Transactions (cont’d)


• Tips:
  • Use autocommit for read slave databases.
  • Isolate slow functions (e.g., external calls,
    template rendering) from transactions.
  • Selective autocommit
    • Most read-only views don’t need to be
      in transactions.
    • Start in autocommit and switch to a
      transaction on write.
Scaling the Team




• Small team of engineers
• Monthly users / developers = 40m
• Which means writing tests..
• ..and having a dead simple workflow

Recommended for you

Tmj anatomy
Tmj anatomyTmj anatomy
Tmj anatomy

This document provides an overview of the temporomandibular joint (TMJ). It begins by defining the TMJ as the joint connecting the mandible to the skull and regulating mandibular movement. It then describes the different types of joints in the body before focusing on the specifics of the TMJ. Key points include that the TMJ is a complex synovial joint that allows for both hinging and gliding movements. An articular disc separates the condyle of the mandible and fossa of the temporal bone. The document outlines the development, structures, innervation, vascularization and biomechanics of the TMJ.

Basics of c++ Programming Language
Basics of c++ Programming LanguageBasics of c++ Programming Language
Basics of c++ Programming Language

The aim of this list of programming languages is to include all notable programming languages in existence, both those in current use and ... Note: This page does not list esoteric programming languages. .... Computer programming portal ...

c++programmingc programming
How Obama Won Using Digital and Social Media
How Obama Won Using Digital and Social MediaHow Obama Won Using Digital and Social Media
How Obama Won Using Digital and Social Media

The document discusses 12 business lessons learned from the Obama presidential campaign's effective use of digital and social media. It summarizes key tactics the campaign used, such as maintaining a centralized customer database, using social networks to leverage large audiences, engaging supporters through YouTube videos, targeting small online donations, self-managed social networks, mobile applications, Twitter, blogging, and capturing consumer information. The outcomes included hundreds of thousands of organized events and donors, millions of calls and donations made, and over $500 million raised online and $639 million total.

electionobamamediasauce
Keeping it Simple




• A developer can be up and running in a few
  minutes
 • assuming postgres and other server
   applications are already installed
 • pip, virtualenv
 • settings.py
Setting Up Local




1. createdb -E UTF-8 disqus
2. git clone git://repo
3. mkvirtualenv disqus
4. pip install -U -r requirements.txt
5. ./manage.py syncdb && ./manage.py migrate
Sane Defaults


settings.py
from disqus.conf.settings.default import *

try:
    from local_settings import *
except ImportError:
    import sys, traceback
    sys.stderr.write("Can't find 'localsettings.py’n”)
    sys.stderr.write("nThe exception was:nn")
    traceback.print_exc()



local_settings.py
from disqus.conf.settings.dev import *
Continuous Integration



• Daily deploys with Fabric
  • several times an hour on some days
• Hudson keeps our builds going
  • combined with Selenium
• Post-commit hooks for quick testing
  • like Pyflakes
• Reverting to a previous version is a matter of
  seconds

Recommended for you

Micro Expressions
Micro ExpressionsMicro Expressions
Micro Expressions

Micro Expressions are brief, involuntary facial expressions shown on the face of humans according to emotions experienced. They occur when a person is consciously trying to conceal all signs of how he or she is feeling, or when a person does not consciously know how he or she is feeling. In this deck, a brief history of micro expressions is introduced, along with a detailed analysis of the 7 universal facial expressions that could be found in almost anyone walking on this Earth.

micro expressions
BUSINESS QUIZ -Round 1
 BUSINESS QUIZ -Round 1 BUSINESS QUIZ -Round 1
BUSINESS QUIZ -Round 1

The document provides a business quiz with 16 multiple choice questions covering topics such as companies that coined economic terms, automobile companies, airlines, technology companies, banks, and consumer brands. It tests knowledge of companies like Goldman Sachs, Tata, Bombay Stock Exchange, HP, Rolls Royce, KFC, and banks like SBI and HDFC. The questions cover industries, products, founding details and other notable business facts.

quiz
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfg

The document outlines plans and strategies for sales management and general trade. It includes: 1. Developing more distribution areas and converting them to business units. 2. Providing assistance programs to support business unit development, build competitive marketing and sales edges, and enhance sales management systems and skills. 3. Setting targets to develop 35 business units by year's end and strengthen market expansion through key account management and multi-line product approaches.

Continuous Integration (cont’d)

 Hudson makes integration easy
Testing



• It’s not fun breaking things when you’re the new
  guy
• Our testing process is fairly heavy
• 70k (Python) LOC, 73% coverage, 20 min suite
• Custom Test Runner (unittest)
  • We needed XML, Selenium, Query Counts
  • Database proxies (for read-slave testing)
  • Integration with our Queue
Testing (cont’d)


Query Counts
# failures yield a dump of queries
def test_read_slave(self):
    Model.objects.using(‘read_slave’).count()
    self.assertQueryCount(1, ‘read_slave’)


Selenium
def test_button(self):
    self.selenium.click('//a[@class=”dsq-button”]')



Queue Integration
class WorkerTest(DisqusTest):
    workers = [‘fire_signal’]

    def test_delayed_signal(self):
        ...
Bug Tracking



• Switched from Trac to Redmine
  • We wanted Subtasks
• Emailing exceptions is a bad idea
  • Even if its localhost
• Previously using django-db-log to aggregate
  errors to a single point
• We’ve overhauled db log and are releasing
  Sentry

Recommended for you

New forever clean 9 booklet
New forever clean 9 bookletNew forever clean 9 booklet
New forever clean 9 booklet

The Clean 9 Program can help you to jumpstart your journey to a slimmer, healthier you in 9 days. This effective, easy-to-follow cleansing program will give you the tools you need to start transforming your body today! http://www.aloe4us.com/forever-clean-9.html

weight lossc9clean 9
Tweak Your Resume
Tweak Your ResumeTweak Your Resume
Tweak Your Resume

In today's connected age, we need something more than a paper resume and cover letter to stand out. Tweak your resume and get visual.

visual designvisual resumecreative
Coca Cola
Coca ColaCoca Cola
Coca Cola

The document discusses how shopper behavior is changing due to time constraints and lifestyle changes. It analyzes different shopper need states and how retailers can adapt merchandising and store layout to satisfy these evolving needs. Key insights include that cleanliness, selection, and convenience are very important to shoppers. The beverage category can be organized into different consumer need states that retailers should clearly message to drive shopper conversion.

django-sentry

Groups messages intelligently




   http://github.com/dcramer/django-sentry
django-sentry (cont’d)

Similar feel to Django’s debugger




    http://github.com/dcramer/django-sentry
Feature Switches



• We needed a safety in case a feature wasn’t
  performing well at peak
  • it had to respond without delay, globally,
    and without writing to disk
• Allows us to work out of trunk (mostly)
• Easy to release new features to a portion of
  your audience
• Also nice for “Labs” type projects
Feature Switches (cont’d)

Recommended for you

Understanding text-structure-powerpoint
Understanding text-structure-powerpointUnderstanding text-structure-powerpoint
Understanding text-structure-powerpoint

This paragraph describes the events of the Great Chicago Fire in chronological order, beginning with Daniel Sullivan noticing the flames and ending with the total number of buildings burned after the fire was out. Time clue words like "at around 8:30 pm", "By 9:30 pm", "In another 3 hours", and "It would be another day" indicate a chronological structure.

Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator

The document provides an introduction to Typesafe Activator and the Play Framework. It discusses how Activator is a tool that helps developers get started with the Typesafe Reactive Platform and Play applications. It also covers some core features of Play like routing, templates, assets, data access with Slick and JSON, and concurrency with Futures, Actors, and WebSockets.

play activator typesafe
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js

This document provides an introduction to Node.js, a framework for building scalable server-side applications with asynchronous JavaScript. It discusses what Node.js is, how it uses non-blocking I/O and events to avoid wasting CPU cycles, and how external Node modules help create a full JavaScript stack. Examples are given of using Node modules like Express for building RESTful APIs and Socket.IO for implementing real-time features like chat. Best practices, limitations, debugging techniques and references are also covered.

node.jsnodejsexpress.socket.io
Final Thoughts


• The language (usually) isn’t your problem
• We like Django
  • But we maintain local patches
• Some tickets don’t have enough of a following
  • Patches, like #17, completely change
    Django..
  • ..arguably in a good way
• Others don’t have champions
      Ticket #17 describes making the ORM an identify mapper
Housekeeping




       Birds of a Feather
   Want to learn from others about
  performance and scaling problems?
           Or play some StarCraft 2?


          We’re Hiring!

DISQUS is looking for amazing engineers
Questions
References


django-sentry
http://github.com/dcramer/django-sentry

Our Feature Switches
http://cl.ly/2FYt

Andy McCurdy’s update()
http://github.com/andymccurdy/django-tips-and-tricks

Our PyFlakes Fork
http://github.com/dcramer/pyflakes

SkinnyQuerySet
http://gist.github.com/550438

django-newcache
http://github.com/ericflo/django-newcache

attach_foreignkey (Pythonic Joins)
http://gist.github.com/567356

Recommended for you

Django at Scale
Django at ScaleDjango at Scale
Django at Scale

The potential problem with caching in update_homepage is that deleting the cache key after updating the page could lead to a race condition or stampede. Since the homepage is being hit 1000/sec, between the time the cache key is deleted and a new value is set, many requests could hit the database simultaneously to refetch the page, overwhelming it. It would be better to set a new value for the cache key instead of deleting it, to avoid this potential issue.

pythondjango
Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']

This document discusses replacing the use of $GLOBALS['TYPO3_DB'] with Doctrine DBAL for database queries in TYPO3 extensions. Doctrine DBAL provides a database abstraction layer that supports multiple database vendors, whereas $GLOBALS['TYPO3_DB'] only supports MySQL. Migrating to Doctrine DBAL offers benefits like a more reliable industry standard and easier API. The document provides examples of common queries like select, insert, update using the Doctrine query builder and highlights best practices for security and restrictions. $GLOBALS['TYPO3_DB'] will be removed in TYPO3 8 LTS, so extensions need to migrate to Doctrine DB

databasedbaldoctrine
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?

Node.js is a platform for building scalable network applications. It uses Google's V8 JavaScript engine and a non-blocking I/O model. Some key points: - Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, especially for real-time applications. - It has a large ecosystem of open source modules. Popular frameworks include Express and Fab. - While Node.js is very fast for I/O operations, memory usage can grow quickly and scaling to multiple cores requires multiple processes. - The author argues Node.js is suitable for single-page apps, real-time applications, and crawlers, but

nodejs atlanta javascript v8

More Related Content

What's hot

使いこなそうGUC
使いこなそうGUC使いこなそうGUC
使いこなそうGUC
Akio Ishida
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
Changje Jeong
 
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Yoichi Kawasaki
 
Getting started with influx Db and Grafana Installation Guide
Getting started with influx Db and Grafana Installation GuideGetting started with influx Db and Grafana Installation Guide
Getting started with influx Db and Grafana Installation Guide
Soumil Shahsoumil
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
NW遅延環境(Paas)でのPostgreSQLの利用について
NW遅延環境(Paas)でのPostgreSQLの利用についてNW遅延環境(Paas)でのPostgreSQLの利用について
NW遅延環境(Paas)でのPostgreSQLの利用について
kawarasho
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
Jeff Hammerbacher
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
Amazon Web Services
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
Jonathan Katz
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Web Services Japan
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
MongoDB
 
DynamoDBによるソーシャルゲーム実装 How To
DynamoDBによるソーシャルゲーム実装 How ToDynamoDBによるソーシャルゲーム実装 How To
DynamoDBによるソーシャルゲーム実装 How To
伊藤 祐策
 
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps OnlineGKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
Google Cloud Platform - Japan
 

What's hot (20)

使いこなそうGUC
使いこなそうGUC使いこなそうGUC
使いこなそうGUC
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
AWS를 활용하여 Daily Report 만들기 : 로그 수집부터 자동화된 분석까지
 
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
 
Getting started with influx Db and Grafana Installation Guide
Getting started with influx Db and Grafana Installation GuideGetting started with influx Db and Grafana Installation Guide
Getting started with influx Db and Grafana Installation Guide
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
NW遅延環境(Paas)でのPostgreSQLの利用について
NW遅延環境(Paas)でのPostgreSQLの利用についてNW遅延環境(Paas)でのPostgreSQLの利用について
NW遅延環境(Paas)でのPostgreSQLの利用について
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)Amazon Aurora Deep Dive (db tech showcase 2016)
Amazon Aurora Deep Dive (db tech showcase 2016)
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
DynamoDBによるソーシャルゲーム実装 How To
DynamoDBによるソーシャルゲーム実装 How ToDynamoDBによるソーシャルゲーム実装 How To
DynamoDBによるソーシャルゲーム実装 How To
 
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps OnlineGKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
GKE に飛んでくるトラフィックを 自由自在に操る力 | 第 10 回 Google Cloud INSIDE Games & Apps Online
 

Viewers also liked

Physical Security Presentation
Physical Security PresentationPhysical Security Presentation
Physical Security Presentation
Wajahat Rajab
 
Mri brain anatomy Dr Muhammad Bin Zulfiqar
Mri brain anatomy Dr Muhammad Bin ZulfiqarMri brain anatomy Dr Muhammad Bin Zulfiqar
Mri brain anatomy Dr Muhammad Bin Zulfiqar
Dr. Muhammad Bin Zulfiqar
 
Thai tech startup ecosystem report 2017
Thai tech startup ecosystem report 2017Thai tech startup ecosystem report 2017
Thai tech startup ecosystem report 2017
Techsauce Media
 
Engineering Geology
Engineering GeologyEngineering Geology
Engineering Geology
GAURAV. H .TANDON
 
Process sequence of weaving
Process sequence of weavingProcess sequence of weaving
Process sequence of weaving
Md. Mazadul Hasan Shishir
 
The evolution of mobile phones
The evolution of mobile phonesThe evolution of mobile phones
The evolution of mobile phones
Olivia2590
 
Cardiac cycle ppt (2)
Cardiac cycle ppt (2)Cardiac cycle ppt (2)
Cardiac cycle ppt (2)
Gopi Krishna Rayidi
 
BCG Matrix of Engro foods
BCG Matrix of Engro foodsBCG Matrix of Engro foods
BCG Matrix of Engro foods
Mutahir Bilal
 
4. heredity and evolution
4. heredity and evolution4. heredity and evolution
4. heredity and evolution
Abhay Goyal
 
10+ Getting to Know You Activities for Teens & Adults
10+ Getting to Know You Activities for Teens & Adults10+ Getting to Know You Activities for Teens & Adults
10+ Getting to Know You Activities for Teens & Adults
Shelly Sanchez Terrell
 
Tmj anatomy
Tmj anatomyTmj anatomy
Tmj anatomy
Tony Pious
 
Basics of c++ Programming Language
Basics of c++ Programming LanguageBasics of c++ Programming Language
Basics of c++ Programming Language
Ahmad Idrees
 
How Obama Won Using Digital and Social Media
How Obama Won Using Digital and Social MediaHow Obama Won Using Digital and Social Media
How Obama Won Using Digital and Social Media
James Burnes
 
Micro Expressions
Micro ExpressionsMicro Expressions
Micro Expressions
Yang Ao Wei 楊翱維
 
BUSINESS QUIZ -Round 1
 BUSINESS QUIZ -Round 1 BUSINESS QUIZ -Round 1
BUSINESS QUIZ -Round 1
pradeep acharya
 
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfg
Romy Cagampan
 
New forever clean 9 booklet
New forever clean 9 bookletNew forever clean 9 booklet
New forever clean 9 booklet
Katalin Hidvegi
 
Tweak Your Resume
Tweak Your ResumeTweak Your Resume
Tweak Your Resume
Chiara Ojeda
 
Coca Cola
Coca ColaCoca Cola
Coca Cola
mixas450
 
Understanding text-structure-powerpoint
Understanding text-structure-powerpointUnderstanding text-structure-powerpoint
Understanding text-structure-powerpoint
aelowans
 

Viewers also liked (20)

Physical Security Presentation
Physical Security PresentationPhysical Security Presentation
Physical Security Presentation
 
Mri brain anatomy Dr Muhammad Bin Zulfiqar
Mri brain anatomy Dr Muhammad Bin ZulfiqarMri brain anatomy Dr Muhammad Bin Zulfiqar
Mri brain anatomy Dr Muhammad Bin Zulfiqar
 
Thai tech startup ecosystem report 2017
Thai tech startup ecosystem report 2017Thai tech startup ecosystem report 2017
Thai tech startup ecosystem report 2017
 
Engineering Geology
Engineering GeologyEngineering Geology
Engineering Geology
 
Process sequence of weaving
Process sequence of weavingProcess sequence of weaving
Process sequence of weaving
 
The evolution of mobile phones
The evolution of mobile phonesThe evolution of mobile phones
The evolution of mobile phones
 
Cardiac cycle ppt (2)
Cardiac cycle ppt (2)Cardiac cycle ppt (2)
Cardiac cycle ppt (2)
 
BCG Matrix of Engro foods
BCG Matrix of Engro foodsBCG Matrix of Engro foods
BCG Matrix of Engro foods
 
4. heredity and evolution
4. heredity and evolution4. heredity and evolution
4. heredity and evolution
 
10+ Getting to Know You Activities for Teens & Adults
10+ Getting to Know You Activities for Teens & Adults10+ Getting to Know You Activities for Teens & Adults
10+ Getting to Know You Activities for Teens & Adults
 
Tmj anatomy
Tmj anatomyTmj anatomy
Tmj anatomy
 
Basics of c++ Programming Language
Basics of c++ Programming LanguageBasics of c++ Programming Language
Basics of c++ Programming Language
 
How Obama Won Using Digital and Social Media
How Obama Won Using Digital and Social MediaHow Obama Won Using Digital and Social Media
How Obama Won Using Digital and Social Media
 
Micro Expressions
Micro ExpressionsMicro Expressions
Micro Expressions
 
BUSINESS QUIZ -Round 1
 BUSINESS QUIZ -Round 1 BUSINESS QUIZ -Round 1
BUSINESS QUIZ -Round 1
 
Fmcg training modules-bfg
Fmcg training modules-bfgFmcg training modules-bfg
Fmcg training modules-bfg
 
New forever clean 9 booklet
New forever clean 9 bookletNew forever clean 9 booklet
New forever clean 9 booklet
 
Tweak Your Resume
Tweak Your ResumeTweak Your Resume
Tweak Your Resume
 
Coca Cola
Coca ColaCoca Cola
Coca Cola
 
Understanding text-structure-powerpoint
Understanding text-structure-powerpointUnderstanding text-structure-powerpoint
Understanding text-structure-powerpoint
 

Similar to DjangoCon 2010 Scaling Disqus

Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
Kevin Webber
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
orkaplan
 
Django at Scale
Django at ScaleDjango at Scale
Django at Scale
bretthoerner
 
Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']
Jan Helke
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
Felix Geisendörfer
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
Concentric Sky
 
Django Pro ORM
Django Pro ORMDjango Pro ORM
Django Pro ORM
Alex Gaynor
 
Hosting Ruby Web Apps
Hosting Ruby Web AppsHosting Ruby Web Apps
Hosting Ruby Web Apps
Michael Reinsch
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
Amazon Web Services Japan
 
CouchDB for Web Applications - Erlang Factory London 2009
CouchDB for Web Applications - Erlang Factory London 2009CouchDB for Web Applications - Erlang Factory London 2009
CouchDB for Web Applications - Erlang Factory London 2009
Jason Davies
 
Rails 3 (beta) Roundup
Rails 3 (beta) RoundupRails 3 (beta) Roundup
Rails 3 (beta) Roundup
Wayne Carter
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
DECK36
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2
Elana Krasner
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
David Keener
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
How to Contribute to Apache Usergrid
How to Contribute to Apache UsergridHow to Contribute to Apache Usergrid
How to Contribute to Apache Usergrid
David M. Johnson
 
Intro to node and mongodb 1
Intro to node and mongodb   1Intro to node and mongodb   1
Intro to node and mongodb 1
Mohammad Qureshi
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
Grant Fritchey
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS Backend
Laurent Cerveau
 

Similar to DjangoCon 2010 Scaling Disqus (20)

Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
Django at Scale
Django at ScaleDjango at Scale
Django at Scale
 
Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']Bye bye $GLOBALS['TYPO3_DB']
Bye bye $GLOBALS['TYPO3_DB']
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
 
Django Pro ORM
Django Pro ORMDjango Pro ORM
Django Pro ORM
 
Hosting Ruby Web Apps
Hosting Ruby Web AppsHosting Ruby Web Apps
Hosting Ruby Web Apps
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
CouchDB for Web Applications - Erlang Factory London 2009
CouchDB for Web Applications - Erlang Factory London 2009CouchDB for Web Applications - Erlang Factory London 2009
CouchDB for Web Applications - Erlang Factory London 2009
 
Rails 3 (beta) Roundup
Rails 3 (beta) RoundupRails 3 (beta) Roundup
Rails 3 (beta) Roundup
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
Architecting for Microservices Part 2
Architecting for Microservices Part 2Architecting for Microservices Part 2
Architecting for Microservices Part 2
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
How to Contribute to Apache Usergrid
How to Contribute to Apache UsergridHow to Contribute to Apache Usergrid
How to Contribute to Apache Usergrid
 
Intro to node and mongodb 1
Intro to node and mongodb   1Intro to node and mongodb   1
Intro to node and mongodb 1
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
 
Elements for an iOS Backend
Elements for an iOS BackendElements for an iOS Backend
Elements for an iOS Backend
 

More from zeeg

Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deployment
zeeg
 
Tools for Development and Debugging in Python
Tools for Development and Debugging in PythonTools for Development and Debugging in Python
Tools for Development and Debugging in Python
zeeg
 
Pitfalls of Continuous Deployment
Pitfalls of Continuous DeploymentPitfalls of Continuous Deployment
Pitfalls of Continuous Deployment
zeeg
 
Building Scalable Web Apps
Building Scalable Web AppsBuilding Scalable Web Apps
Building Scalable Web Apps
zeeg
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)
zeeg
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling DisqusPyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
zeeg
 
Sentry (SF Python, Feb)
Sentry (SF Python, Feb)Sentry (SF Python, Feb)
Sentry (SF Python, Feb)
zeeg
 
Db tips & tricks django meetup
Db tips & tricks django meetupDb tips & tricks django meetup
Db tips & tricks django meetup
zeeg
 

More from zeeg (8)

Practicing Continuous Deployment
Practicing Continuous DeploymentPracticing Continuous Deployment
Practicing Continuous Deployment
 
Tools for Development and Debugging in Python
Tools for Development and Debugging in PythonTools for Development and Debugging in Python
Tools for Development and Debugging in Python
 
Pitfalls of Continuous Deployment
Pitfalls of Continuous DeploymentPitfalls of Continuous Deployment
Pitfalls of Continuous Deployment
 
Building Scalable Web Apps
Building Scalable Web AppsBuilding Scalable Web Apps
Building Scalable Web Apps
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)
 
PyCon 2011 Scaling Disqus
PyCon 2011 Scaling DisqusPyCon 2011 Scaling Disqus
PyCon 2011 Scaling Disqus
 
Sentry (SF Python, Feb)
Sentry (SF Python, Feb)Sentry (SF Python, Feb)
Sentry (SF Python, Feb)
 
Db tips & tricks django meetup
Db tips & tricks django meetupDb tips & tricks django meetup
Db tips & tricks django meetup
 

Recently uploaded

20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
welrejdoall
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 

Recently uploaded (20)

20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Manual | Product | Research Presentation
Manual | Product | Research PresentationManual | Product | Research Presentation
Manual | Product | Research Presentation
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 

DjangoCon 2010 Scaling Disqus

  • 1. Scaling the World’s Largest Django App Jason Yan David Cramer @jasonyan @zeeg
  • 3. What is DISQUS? dis·cuss • dĭ-skŭs' We are a comment system with an emphasis on connecting communities http://disqus.com/about/
  • 4. What is Scale? Number of Visitors 300M 250M 200M 150M 100M 50M Our traffic at a glance 17,000 requests/second peak 450,000 websites 15 million profiles 75 million comments 250 million visitors (August 2010)
  • 5. Our Challenges • We can’t predict when things will happen • Random celebrity gossip • Natural disasters • Discussions never expire • We can’t keep those millions of articles from 2008 in the cache • You don’t know in advance (generally) where the traffic will be • Especially with dynamic paging, realtime, sorting, personal prefs, etc.
  • 6. Our Challenges (cont’d) • High availability • Not a destination site • Difficult to schedule maintenance
  • 8. Server Architecture - Load Balancing • Load Balancing • High Availability • Software, HAProxy • heartbeat • High performance, intelligent server availability checking • Bonus: Nice statistics reporting Image Source: http://haproxy.1wt.eu/
  • 9. Server Architecture • ~100 Servers • 30% Web Servers (Apache + mod_wsgi) • 10% Databases (PostgreSQL) • 25% Cache Servers (memcached) • 20% Load Balancing / High Availability (HAProxy + heartbeat) • 15% Utility Servers (Python scripts)
  • 10. Server Architecture - Web Servers • Apache 2.2 • mod_wsgi • Using `maximum-requests` to plug memory leaks. • Performance Monitoring • Custom middleware (PerformanceLogMiddleware) • Ships performance statistics (DB queries, external calls, template rendering, etc) through syslog • Collected and graphed through Ganglia
  • 11. Server Architecture - Database • PostgreSQL • Slony-I for Replication • Trigger-based • Read slaves for extra read capacity • Failover master database for high availability
  • 12. Server Architecture - Database • Make sure indexes fit in memory and measure I/O • High I/O generally means slow queries due to missing indexes or indexes not in buffer cache • Log Slow Queries • syslog-ng + pgFouine + cron to automate slow query logging
  • 13. Server Architecture - Database • Use connection pooling • Django doesn’t do this for you • We use pgbouncer • Limits the maximum number of connections your database needs to handle • Save on costly opening and tearing down of new database connections
  • 15. Partitioning • Fairly easy to implement, quick wins • Done at the application level • Data is replayed by Slony • Two methods of data separation
  • 16. Vertical Partitioning Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Forums Posts Users Sentry http://en.wikipedia.org/wiki/Partition_(database)
  • 17. Pythonic Joins Allows us to separate datasets posts = Post.objects.all()[0:25] # store users in a dictionary based on primary key users = dict( (u.pk, u) for u in User.objects.filter(pk__in=set(p.user_id for p in posts)) ) # map users to their posts for p in posts: p._user_cache = users.get(p.user_id)
  • 18. Pythonic Joins (cont’d) • Slower than at database level • But not enough that you should care • Trading performance for scale • Allows us to separate data • Easy vertical partitioning • More efficient caching • get_many, object-per-row cache
  • 19. Designating Masters • Alleviates some of the write load on your primary application master • Masters exist under specific conditions: • application use case • partitioned data • Database routers make this (fairly) easy
  • 20. Routing by Application class ApplicationRouter(object): def db_for_read(self, model, **hints): instance = hints.get('instance') if not instance: return None app_label = instance._meta.app_label return get_application_alias(app_label)
  • 21. Horizontal Partitioning Horizontal partitioning (also known as sharding) involves splitting one set of data into different tables. Disqus Your Blog CNN Telegraph http://en.wikipedia.org/wiki/Partition_(database)
  • 22. Horizontal Partitions • Some forums have very large datasets • Partners need high availability • Helps scale the write load on the master • We rely more on vertical partitions
  • 23. Routing by Partition class ForumPartitionRouter(object): def db_for_read(self, model, **hints): instance = hints.get('instance') if not instance: return None forum_id = getattr(instance, 'forum_id', None) if not forum_id: return None return get_forum_alias(forum_id) # What we used to do Post.objects.filter(forum=forum) # Now, making sure hints are available forum.post_set.all()
  • 24. Optimizing QuerySets • We really dislike raw SQL • It creates more work when dealing with partitions • Built-in cache allows sub-slicing • But isn’t always needed • We removed this cache
  • 25. Removing the Cache • Django internally caches the results of your QuerySet • This adds additional memory overhead # 1 query qs = Model.objects.all()[0:100] # 0 queries (we don’t need this behavior) qs = qs[0:10] # 1 query qs = qs.filter(foo=bar) • Many times you only need to view a result set once • So we built SkinnyQuerySet
  • 26. Removing the Cache (cont’d) Optimizing memory usage by removing the cache class SkinnyQuerySet(QuerySet): def __iter__(self): if self._result_cache is not None: # __len__ must have been run return iter(self._result_cache) has_run = getattr(self, 'has_run', False) if has_run: raise QuerySetDoubleIteration("...") self.has_run = True # We wanted .iterator() as the default return self.iterator() http://gist.github.com/550438
  • 27. Atomic Updates • Keeps your data consistent • save() isnt thread-safe • use update() instead • Great for things like counters • But should be considered for all write operations
  • 28. Atomic Updates (cont’d) Thread safety is impossible with .save() Request 1 post = Post(pk=1) # a moderator approves post.approved = True post.save() Request 2 post = Post(pk=1) # the author adjusts their message post.message = ‘Hello!’ post.save()
  • 29. Atomic Updates (cont’d) So we need atomic updates Request 1 post = Post(pk=1) # a moderator approves Post.objects.filter(pk=post.pk) .update(approved=True) Request 2 post = Post(pk=1) # the author adjusts their message Post.objects.filter(pk=post.pk) .update(message=‘Hello!’)
  • 30. Atomic Updates (cont’d) A better way to approach updates def update(obj, using=None, **kwargs): """ Updates specified attributes on the current instance. """ assert obj, "Instance has not yet been created." obj.__class__._base_manager.using(using) .filter(pk=obj) .update(**kwargs) for k, v in kwargs.iteritems(): if isinstance(v, ExpressionNode): # NotImplemented continue setattr(obj, k, v) http://github.com/andymccurdy/django-tips-and-tricks/blob/master/model_update.py
  • 31. Delayed Signals • Queueing low priority tasks • even if they’re fast • Asynchronous (Delayed) signals • very friendly to the developer • ..but not as friendly as real signals
  • 32. Delayed Signals (cont’d) We send a specific serialized version of the model for delayed signals from disqus.common.signals import delayed_save def my_func(data, sender, created, **kwargs): print data[‘id’] delayed_save.connect(my_func, sender=Post) This is all handled through our Queue
  • 33. Caching • Memcached • Use pylibmc (newer libMemcached-based) • Ticket #11675 (add pylibmc support) • Third party applications: • django-newcache, django-pylibmc
  • 34. Caching (cont’d) • libMemcached / pylibmc is configurable with “behaviors”. • Memcached “single point of failure” • Distributed system, but we must take precautions. • Connection timeout to memcached can stall requests. • Use `_auto_eject_hosts` and `_retry_timeout` behaviors to prevent reconnecting to dead caches.
  • 35. Caching (cont’d) • Default (naive) hashing behavior • Modulo hashed cache key cache for index to server list. • Removal of a server causes majority of cache keys to be remapped to new servers. CACHE_SERVERS = [‘10.0.0.1’, ‘10.0.0.2’] key = ‘my_cache_key’ cache_server = CACHE_SERVERS[hash(key) % len(CACHE_SERVERS)]
  • 36. Caching (cont’d) • Better approach: consistent hashing • libMemcached (pylibmc) uses libketama (http://tinyurl.com/lastfm-libketama) • Addition / removal of a cache server remaps (K/n) cache keys (where K=number of keys and n=number of servers) Image Source: http://sourceforge.net/apps/mediawiki/kai/index.php?title=Introduction
  • 37. Caching (cont’d) • Thundering herd (stampede) problem • Invalidating a heavily accessed cache key causes many clients to refill cache. • But everyone refetching to fill the cache from the data store or reprocessing data can cause things to get even slower. • Most times, it’s ideal to return the previously invalidated cache value and let a single client refill the cache. • django-newcache or MintCache (http:// djangosnippets.org/snippets/793/) will do this for you. • Prefer filling cache on invalidation instead of deleting from cache also helps to prevent the thundering herd problem.
  • 38. Transactions • TransactionMiddleware got us started, but down the road became a burden • For postgresql_psycopg2, there’s a database option, OPTIONS[‘autocommit’] • Each query is in its own transaction. This means each request won’t start in a transaction. • But sometimes we want transactions (e.g., saving multiple objects and rolling back on error)
  • 39. Transactions (cont’d) • Tips: • Use autocommit for read slave databases. • Isolate slow functions (e.g., external calls, template rendering) from transactions. • Selective autocommit • Most read-only views don’t need to be in transactions. • Start in autocommit and switch to a transaction on write.
  • 40. Scaling the Team • Small team of engineers • Monthly users / developers = 40m • Which means writing tests.. • ..and having a dead simple workflow
  • 41. Keeping it Simple • A developer can be up and running in a few minutes • assuming postgres and other server applications are already installed • pip, virtualenv • settings.py
  • 42. Setting Up Local 1. createdb -E UTF-8 disqus 2. git clone git://repo 3. mkvirtualenv disqus 4. pip install -U -r requirements.txt 5. ./manage.py syncdb && ./manage.py migrate
  • 43. Sane Defaults settings.py from disqus.conf.settings.default import * try: from local_settings import * except ImportError: import sys, traceback sys.stderr.write("Can't find 'localsettings.py’n”) sys.stderr.write("nThe exception was:nn") traceback.print_exc() local_settings.py from disqus.conf.settings.dev import *
  • 44. Continuous Integration • Daily deploys with Fabric • several times an hour on some days • Hudson keeps our builds going • combined with Selenium • Post-commit hooks for quick testing • like Pyflakes • Reverting to a previous version is a matter of seconds
  • 45. Continuous Integration (cont’d) Hudson makes integration easy
  • 46. Testing • It’s not fun breaking things when you’re the new guy • Our testing process is fairly heavy • 70k (Python) LOC, 73% coverage, 20 min suite • Custom Test Runner (unittest) • We needed XML, Selenium, Query Counts • Database proxies (for read-slave testing) • Integration with our Queue
  • 47. Testing (cont’d) Query Counts # failures yield a dump of queries def test_read_slave(self): Model.objects.using(‘read_slave’).count() self.assertQueryCount(1, ‘read_slave’) Selenium def test_button(self): self.selenium.click('//a[@class=”dsq-button”]') Queue Integration class WorkerTest(DisqusTest): workers = [‘fire_signal’] def test_delayed_signal(self): ...
  • 48. Bug Tracking • Switched from Trac to Redmine • We wanted Subtasks • Emailing exceptions is a bad idea • Even if its localhost • Previously using django-db-log to aggregate errors to a single point • We’ve overhauled db log and are releasing Sentry
  • 49. django-sentry Groups messages intelligently http://github.com/dcramer/django-sentry
  • 50. django-sentry (cont’d) Similar feel to Django’s debugger http://github.com/dcramer/django-sentry
  • 51. Feature Switches • We needed a safety in case a feature wasn’t performing well at peak • it had to respond without delay, globally, and without writing to disk • Allows us to work out of trunk (mostly) • Easy to release new features to a portion of your audience • Also nice for “Labs” type projects
  • 53. Final Thoughts • The language (usually) isn’t your problem • We like Django • But we maintain local patches • Some tickets don’t have enough of a following • Patches, like #17, completely change Django.. • ..arguably in a good way • Others don’t have champions Ticket #17 describes making the ORM an identify mapper
  • 54. Housekeeping Birds of a Feather Want to learn from others about performance and scaling problems? Or play some StarCraft 2? We’re Hiring! DISQUS is looking for amazing engineers
  • 56. References django-sentry http://github.com/dcramer/django-sentry Our Feature Switches http://cl.ly/2FYt Andy McCurdy’s update() http://github.com/andymccurdy/django-tips-and-tricks Our PyFlakes Fork http://github.com/dcramer/pyflakes SkinnyQuerySet http://gist.github.com/550438 django-newcache http://github.com/ericflo/django-newcache attach_foreignkey (Pythonic Joins) http://gist.github.com/567356

Editor's Notes

  1. Hi. I'm Jason (and I'm David), and we're from Disqus.
  2. Show of hands, How many of you know what DISQUS is?
  3. For those of you who are not familiar with us, DISQUS is a comment system that focuses on connecting communities. We power discussions on such sites as CNN, IGN, and more recently Engadget and TechCrunch. Our company was founded back in 2007 by my co-founder, Daniel Ha, and I back where we started working out of our dorm room. Our decision to use Django came down primarily to our dislike for PHP which we were previously using. Since then, we've grown Disqus to over 250+ million visitors a month.
  4. We've peaked at over 17,000 requests per second, to Django, and we currently power comments on nearly half a million websites which accounts for more than 15 million profiles who have left over 75 million comments.
  5. As you can imagine we have some big challenges when it comes to scaling a large Django application. For one, it’s hard to predict when events happen like last year with Michael Jackson’s death, and more recently, the Gulf Oil Spill. Another challenge we have is the fact that discussions never expire. When you visit that blog post from 2008 we have to be ready to serve those comments immediately. Not only does THAT make caching difficult, but we also have to deal with things such as dynamic paging, realtime commenting, and other personal preferences. This makes it even more important to be able to serve those quickly without relying on the cache.
  6. So we also have some interesting infrastructure problems when it comes to scaling Disqus. We're not a destination website, so if we go down, it affects other sites as well as ours. Because of this, it's difficult for us to schedule maintenance, so we face some interesting scaling and availbility challenges.
  7. As you can see, we have tried to keep the stack pretty thin. This is because, as we've learned, the more services we try to add, the more difficult it is to support. And especially because we have a small team, this becomes difficult to manage. So we use DNS load balancing to spread the requests to multiple HAProxy servers which are our software load balancers. These proxy requests to our backend app servers which run mod_wsgi. We use memcache for caching, and we have a custom wrapper using syslog for our queue. For our data store, we use PostgreSQL, and for replication, we use Slony for failover and read slaves.
  8. As I said, we use HAProxy for HTTP load balancing. It's a high performance software load balancer with intelligent failure detection. It also provides you with nice statistics of your requests. We use heartbeat for high availability and we have it take over the IP address of the down machine.
  9. We have about 100GB of cache. Because of our high availability requirements, 20% are allocated to high availability and load balancing.
  10. Our web servers are pretty standard. We use mod_wsgi mostly because it just works. Performance wise, you're really going to be bottlenecked on the application. The cool thing we do is that we actually hasve a custom middleware that does performance monitoring. What this does is ship data from our application about external calls like database, cache calls, and we collect it and graph it with Ganglia.
  11. The more interesting aspect of our server architecture is how we have our database setup. As I mentioned, we use Postgres as our database. Honestly, we used it because Django recommended it, and my recommendation is that if you’re not already an expert in a database, you're better off going with Postgres. We use slony for replication Slony is trigger-based which means that every write is captured and strored in a log table and those events are replayed to slave databases. This is nice over otehr methods such as log shipping because it allows us to have flexible schemas across read lsaves. For example, some of our read slaves have different indexes. We also use slony for failover for high availbility.
  12. There are a few things we do to keep our database healthy. We keep our indexes in memory, and when we can't, we partition our data. We also have application-specific indexes on our readslaves. Another important thing we've done is measure I/O. Any time we've seen high I/O is usually because we're missing indexes or indexes aren't fitting in memory. Lastly, we monitor slow queries. We send logs to pgfouine via syslog which genererates a nice report showing you which queries are the slowest.
  13. The last thing we've found to be really helpful is switching to database connection pool. Remember, Django doesn't do this for you. We use pgbouncer for this, and there are a few easy wins for using it. One is that it limits the maximum connections to the database so it doesn't have to handle as many concurrent connections. Secpondly, you save the cost of opening and tearing down new connections per request.
  14. Moving on to our application, we’ve found that most of the struggle is with the database layer. We’ve got a pretty standard layout if you’re familiar with forums. Forum has many threads, which has many posts. Posts use an adjacency list model, and also reference Users. With this kind of data model, one of our quickest wins has been the ability to partition data.
  15. It’s almost entirely done at the application level, which makes it fairly easy to implement. The only thing not handled by the app is replication, and Slony does that for us. We handle partitioning in a couple of ways.
  16. The first of which are vertical partitions. This is probably the simplest thing you can implement in your application. Kill off your joins and spread out your applications on multiple databases. Some database engines might make this easier than others, but Slony allows us to easily replicate very specific data.
  17. Using this method you’ll need to handle joins in your Python application. We do this by performing two separate queries and mapping the foreign keys to the parent objects. For us the easiest way has been to throw them into a dictionary, iterate through the other queryset, and set the foreignkey cache’s value to the instance.
  18. A few things to keep in mind when doing pythonic joins. They’re not going to be as fast in the database. You can’t avoid this, but it’s not something you should worry about. With this however, you get plain and simple vertical partitions. You also can cache things a lot easier, and more efficiently fetch them using things like get_many and a singular object cache. Overall your’e trading performance for scale.
  19. Another benefit that comes from vertical partitioning is the ability to designate masters. We do this to alleviate some of the load on our primary application master. So for example, server FOO might be the source for writes on the Users table, while server BAR handles all of our other forum data. Since we’re using Django 1.2 we also get routing for free through the new routers.
  20. Here’s an example of a simple application router. It let’s us specify a read-slave based on our app label. So if its users, we go to FOO, if its forums, we go to BAR. You can handle this logic any way you want, pretty simple and powerful.
  21. While we use vertical partitioning for most cases, eventually you hit an issue where your data just doesn’t scale on a single database. You’re probably familiar with the word sharding, well that’s what we do with our forum data. We’ve set it up so that we can send certain large sites to dedicated machines. This also uses designated masters as we mentioned with the other partitions.
  22. We needed this when write and read load combined became so big that it was just hard to keep up on a single set of machines. It also gives the nice added benefit of high availability in many situations. Mostly though, it all goes back to scaling our master databases.
  23. So again we’re using the router here to handle partitioning of the forums. We can specify that CNN goes to this database alias, which could be any number of machines, and everything else goes to our default cluster. The one caveat we found with this, is sometimes hints aren’t present in the router. I believe within the current version of Django they are only available when using a relational lookup, such as a foreign key. All in all it’s pretty powerful, and you just need to be aware of it while writing your queries.