DISQUS is a comment system that handles high volumes of traffic, with up to 17,000 requests per second and 250 million monthly visitors. They face challenges in unpredictable spikes in traffic and ensuring high availability. Their architecture includes over 100 servers split between web servers, databases, caching, and load balancing. They employ techniques like vertical and horizontal data partitioning, atomic updates, delayed signals, consistent caching, and feature flags to scale their large Django application.
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!
Web App for Containers は、アプリスタックのホストに Docker コンテナーを使用するため皆さんが今Linux上で利用しているOSSベースのアプリもアプリスタックごとDockerコンテナ化することでそのまま Web App for Containersで利用することができます。本ウェビナーでは簡単なMySQL + Ruby on Rails アプリ を題材に、アプリをコンテナ化し Web App for Containersにデプロイするまでの一連の流れを解説し、CIツールを使った継続的なデプロイ方法についてご紹介します。今回、AzureのフルマネージドMySQLサービスであるAzure DB for MySQLを利用して完全マネージドな環境でのアプリ実行を実現します。
Getting started with influx Db and Grafana Installation Guide
This document discusses InfluxDB, an open source time series database, and Grafana, an open source analytics and visualization suite commonly used with InfluxDB. It provides instructions for installing InfluxDB and Grafana on Mac OS using Brew, and installing the Python plugin for InfluxDB.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Cassandra is a structured storage system designed for large amounts of data across commodity servers. It provides high availability with eventual consistency and scales incrementally without centralized administration. Data is partitioned across nodes and replicated for fault tolerance. Writes are applied locally and propagated asynchronously, prioritizing availability over consistency. It uses a gossip protocol for membership and failure detection.
Parquet is a columnar storage format for Hadoop data. It was developed by Twitter and Cloudera to optimize storage and querying of large datasets. Parquet provides more efficient compression and I/O compared to traditional row-based formats by storing data by column. Early results show a 28% reduction in storage size and up to a 114% improvement in query performance versus the original Thrift format. Parquet supports complex nested schemas and can be used with Hadoop tools like Hive, Pig, and Impala.
This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
This document discusses physical security for protecting enterprise resources including people, data, and facilities. It covers assessing threats and vulnerabilities, choosing a secure site location, designing security for the building structure and environment, implementing physical and administrative controls, and ensuring life safety measures like fire detection and suppression. Key considerations include perimeter security, access control, environmental factors, emergency procedures, and compliance with standards to help ensure security.
Anatomy of Brain by MRI
In this presentation we will discuss the cross sectional anatomy of brain. Then we will discuss the Most common diseases to be evaluated by brain imaging.
In my opinion this presentation is a road map for beginars.
The document is a report by Techsauce, Thailand's leading tech publication, summarizing Thailand's startup ecosystem and investment trends from 2012-2017. Some key findings include:
- Total funding raised by Thai startups grew from $3.1 million in 2011-2012 to over $86 million in 2016.
- Popular categories of startups receiving investment included e-commerce/marketplaces, fintech, logistics, and payments.
- Major acquisitions of Thai startups have totaled over $108 million, with companies being acquired by firms from Southeast Asia, China, and other regions.
This document discusses various geological processes and landforms resulting from physical geology. It covers the geological work of rivers including erosion, transportation, deposition and various fluvial landforms. It also discusses the geological work of other agents like wind, groundwater and oceans. Rivers can erode, transport and deposit sediment, forming features like drainage patterns, valleys, waterfalls and terraces over long periods of time. Wind erosion can form dunes and loess deposits, while groundwater can dissolve rock to form sinkholes, caves and valleys. Oceans also erode, transport and deposit material along coastlines.
The document provides an overview of the process sequence for weaving. It begins with yarn from the spinning department which then undergoes processes like cone winding, warping, sizing, tying-in, drafting, and denting to prepare the warp threads. The warp is then mounted on the loom and undergoes weaving to produce grey fabric. Key steps in weaving include shedding, picking, and beating-up. The woven fabric then undergoes inspection, folding, and baling before delivery. The document outlines the various motions and essential parts of a loom needed to carry out this weaving process.
The cardiac cycle consists of systole and diastole. During systole, the heart contracts and pumps blood out of the ventricles. During diastole, the heart relaxes and fills with blood. The cycle involves coordinated events in the atria and ventricles. It can be analyzed using a Wiggers diagram which plots various cardiac parameters over time, revealing phases like isovolumic contraction, ejection, isovolumic relaxation, and filling. Precisely measuring time intervals within the cycle using Doppler echocardiography provides clinical insights into cardiac function and timing.
This document provides a summary of a marketing analysis project presented by four students at Superior University Lahore on Engro Foods. It includes an introduction, table of contents, acknowledgements, history and background of Engro Foods, their vision, mission and core values. It also summarizes Engro's diversified business portfolio, their brands, business segments targeted, sales setup, departments, production process, and concludes with interviews conducted and references. The document analyzes Engro Foods' market performance and strategies.
Here is an analysis of variations in a red beetle population across three situations:
Situation 1 (Original population): The population consists of mostly red beetles, with a small percentage of black beetles. The red coloration provides better camouflage in their current environment.
Situation 2 (Environment change): The environment darkens due to increased vegetation/debris. Now black beetles have better camouflage than red beetles. Over time, the percentage of black beetles in the population will increase relative to red beetles, as black beetles survive and reproduce at a higher rate.
Situation 3 (New environment): The environment changes again, this time becoming lighter in color (e.g
10+ Getting to Know You Activities for Teens & Adults
My books- Hacking Digital Learning Strategies http://hackingdls.com & Learning to Go https://gum.co/learn2go
Resources- http://shellyterrell.com/icebreakers
This document provides an overview of the temporomandibular joint (TMJ). It begins by defining the TMJ as the joint connecting the mandible to the skull and regulating mandibular movement. It then describes the different types of joints in the body before focusing on the specifics of the TMJ. Key points include that the TMJ is a complex synovial joint that allows for both hinging and gliding movements. An articular disc separates the condyle of the mandible and fossa of the temporal bone. The document outlines the development, structures, innervation, vascularization and biomechanics of the TMJ.
The aim of this list of programming languages is to include all notable programming languages in existence, both those in current use and ... Note: This page does not list esoteric programming languages. .... Computer programming portal ...
The document discusses 12 business lessons learned from the Obama presidential campaign's effective use of digital and social media. It summarizes key tactics the campaign used, such as maintaining a centralized customer database, using social networks to leverage large audiences, engaging supporters through YouTube videos, targeting small online donations, self-managed social networks, mobile applications, Twitter, blogging, and capturing consumer information. The outcomes included hundreds of thousands of organized events and donors, millions of calls and donations made, and over $500 million raised online and $639 million total.
Micro Expressions are brief, involuntary facial expressions shown on the face of humans according to emotions experienced.
They occur when a person is consciously trying to conceal all signs of how he or she is feeling, or when a person does not consciously know how he or she is feeling.
In this deck, a brief history of micro expressions is introduced, along with a detailed analysis of the 7 universal facial expressions that could be found in almost anyone walking on this Earth.
The document provides a business quiz with 16 multiple choice questions covering topics such as companies that coined economic terms, automobile companies, airlines, technology companies, banks, and consumer brands. It tests knowledge of companies like Goldman Sachs, Tata, Bombay Stock Exchange, HP, Rolls Royce, KFC, and banks like SBI and HDFC. The questions cover industries, products, founding details and other notable business facts.
The document outlines plans and strategies for sales management and general trade. It includes:
1. Developing more distribution areas and converting them to business units.
2. Providing assistance programs to support business unit development, build competitive marketing and sales edges, and enhance sales management systems and skills.
3. Setting targets to develop 35 business units by year's end and strengthen market expansion through key account management and multi-line product approaches.
The Clean 9 Program can help you to jumpstart your journey to a slimmer, healthier you in 9 days. This effective, easy-to-follow cleansing program will give you the tools you need to start transforming your body today! http://www.aloe4us.com/forever-clean-9.html
The document discusses how shopper behavior is changing due to time constraints and lifestyle changes. It analyzes different shopper need states and how retailers can adapt merchandising and store layout to satisfy these evolving needs. Key insights include that cleanliness, selection, and convenience are very important to shoppers. The beverage category can be organized into different consumer need states that retailers should clearly message to drive shopper conversion.
This paragraph describes the events of the Great Chicago Fire in chronological order, beginning with Daniel Sullivan noticing the flames and ending with the total number of buildings burned after the fire was out. Time clue words like "at around 8:30 pm", "By 9:30 pm", "In another 3 hours", and "It would be another day" indicate a chronological structure.
The document provides an introduction to Typesafe Activator and the Play Framework. It discusses how Activator is a tool that helps developers get started with the Typesafe Reactive Platform and Play applications. It also covers some core features of Play like routing, templates, assets, data access with Slick and JSON, and concurrency with Futures, Actors, and WebSockets.
This document provides an introduction to Node.js, a framework for building scalable server-side applications with asynchronous JavaScript. It discusses what Node.js is, how it uses non-blocking I/O and events to avoid wasting CPU cycles, and how external Node modules help create a full JavaScript stack. Examples are given of using Node modules like Express for building RESTful APIs and Socket.IO for implementing real-time features like chat. Best practices, limitations, debugging techniques and references are also covered.
The potential problem with caching in update_homepage is that deleting the cache key after updating the page could lead to a race condition or stampede.
Since the homepage is being hit 1000/sec, between the time the cache key is deleted and a new value is set, many requests could hit the database simultaneously to refetch the page, overwhelming it.
It would be better to set a new value for the cache key instead of deleting it, to avoid this potential issue.
This document discusses replacing the use of $GLOBALS['TYPO3_DB'] with Doctrine DBAL for database queries in TYPO3 extensions. Doctrine DBAL provides a database abstraction layer that supports multiple database vendors, whereas $GLOBALS['TYPO3_DB'] only supports MySQL. Migrating to Doctrine DBAL offers benefits like a more reliable industry standard and easier API. The document provides examples of common queries like select, insert, update using the Doctrine query builder and highlights best practices for security and restrictions. $GLOBALS['TYPO3_DB'] will be removed in TYPO3 8 LTS, so extensions need to migrate to Doctrine DB
Node.js is a platform for building scalable network applications. It uses Google's V8 JavaScript engine and a non-blocking I/O model. Some key points:
- Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, especially for real-time applications.
- It has a large ecosystem of open source modules. Popular frameworks include Express and Fab.
- While Node.js is very fast for I/O operations, memory usage can grow quickly and scaling to multiple cores requires multiple processes.
- The author argues Node.js is suitable for single-page apps, real-time applications, and crawlers, but
This document summarizes Grand Unified Configuration (GUC) parameters in PostgreSQL. It describes how GUC parameters can be modified, the contexts in which modifications can be reverted, and how to view current parameter settings and sources using pg_settings. It provides examples of modifying parameters at different scopes like system-wide, database-level, and for individual users.
Cassandra concepts, patterns and anti-patternsDave Gardner
The document discusses Cassandra concepts, patterns, and anti-patterns. It begins with an agenda that covers choosing NoSQL, Cassandra concepts based on Dynamo and Bigtable, and patterns and anti-patterns of use. It then delves into Cassandra concepts such as consistent hashing, vector clocks, gossip protocol, hinted handoff, read repair, and consistency levels. It also discusses Bigtable concepts like sparse column-based data model, SSTables, commit log, and memtables. Finally, it outlines several patterns and anti-patterns of Cassandra use.
Web App for Containers + MySQLでコンテナ対応したRailsアプリを作ろう!Yoichi Kawasaki
Web App for Containers は、アプリスタックのホストに Docker コンテナーを使用するため皆さんが今Linux上で利用しているOSSベースのアプリもアプリスタックごとDockerコンテナ化することでそのまま Web App for Containersで利用することができます。本ウェビナーでは簡単なMySQL + Ruby on Rails アプリ を題材に、アプリをコンテナ化し Web App for Containersにデプロイするまでの一連の流れを解説し、CIツールを使った継続的なデプロイ方法についてご紹介します。今回、AzureのフルマネージドMySQLサービスであるAzure DB for MySQLを利用して完全マネージドな環境でのアプリ実行を実現します。
Getting started with influx Db and Grafana Installation GuideSoumil Shahsoumil
This document discusses InfluxDB, an open source time series database, and Grafana, an open source analytics and visualization suite commonly used with InfluxDB. It provides instructions for installing InfluxDB and Grafana on Mac OS using Brew, and installing the Python plugin for InfluxDB.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Cassandra is a structured storage system designed for large amounts of data across commodity servers. It provides high availability with eventual consistency and scales incrementally without centralized administration. Data is partitioned across nodes and replicated for fault tolerance. Writes are applied locally and propagated asynchronously, prioritizing availability over consistency. It uses a gossip protocol for membership and failure detection.
Parquet is a columnar storage format for Hadoop data. It was developed by Twitter and Cloudera to optimize storage and querying of large datasets. Parquet provides more efficient compression and I/O compared to traditional row-based formats by storing data by column. Early results show a 28% reduction in storage size and up to a 114% improvement in query performance versus the original Thrift format. Parquet supports complex nested schemas and can be used with Hadoop tools like Hive, Pig, and Impala.
This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
Introduction to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of Spot EC2 instances to reduce costs, and other Amazon EMR architectural best practices.
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
This document discusses physical security for protecting enterprise resources including people, data, and facilities. It covers assessing threats and vulnerabilities, choosing a secure site location, designing security for the building structure and environment, implementing physical and administrative controls, and ensuring life safety measures like fire detection and suppression. Key considerations include perimeter security, access control, environmental factors, emergency procedures, and compliance with standards to help ensure security.
Anatomy of Brain by MRI
In this presentation we will discuss the cross sectional anatomy of brain. Then we will discuss the Most common diseases to be evaluated by brain imaging.
In my opinion this presentation is a road map for beginars.
The document is a report by Techsauce, Thailand's leading tech publication, summarizing Thailand's startup ecosystem and investment trends from 2012-2017. Some key findings include:
- Total funding raised by Thai startups grew from $3.1 million in 2011-2012 to over $86 million in 2016.
- Popular categories of startups receiving investment included e-commerce/marketplaces, fintech, logistics, and payments.
- Major acquisitions of Thai startups have totaled over $108 million, with companies being acquired by firms from Southeast Asia, China, and other regions.
This document discusses various geological processes and landforms resulting from physical geology. It covers the geological work of rivers including erosion, transportation, deposition and various fluvial landforms. It also discusses the geological work of other agents like wind, groundwater and oceans. Rivers can erode, transport and deposit sediment, forming features like drainage patterns, valleys, waterfalls and terraces over long periods of time. Wind erosion can form dunes and loess deposits, while groundwater can dissolve rock to form sinkholes, caves and valleys. Oceans also erode, transport and deposit material along coastlines.
The document provides an overview of the process sequence for weaving. It begins with yarn from the spinning department which then undergoes processes like cone winding, warping, sizing, tying-in, drafting, and denting to prepare the warp threads. The warp is then mounted on the loom and undergoes weaving to produce grey fabric. Key steps in weaving include shedding, picking, and beating-up. The woven fabric then undergoes inspection, folding, and baling before delivery. The document outlines the various motions and essential parts of a loom needed to carry out this weaving process.
The cardiac cycle consists of systole and diastole. During systole, the heart contracts and pumps blood out of the ventricles. During diastole, the heart relaxes and fills with blood. The cycle involves coordinated events in the atria and ventricles. It can be analyzed using a Wiggers diagram which plots various cardiac parameters over time, revealing phases like isovolumic contraction, ejection, isovolumic relaxation, and filling. Precisely measuring time intervals within the cycle using Doppler echocardiography provides clinical insights into cardiac function and timing.
This document provides a summary of a marketing analysis project presented by four students at Superior University Lahore on Engro Foods. It includes an introduction, table of contents, acknowledgements, history and background of Engro Foods, their vision, mission and core values. It also summarizes Engro's diversified business portfolio, their brands, business segments targeted, sales setup, departments, production process, and concludes with interviews conducted and references. The document analyzes Engro Foods' market performance and strategies.
Here is an analysis of variations in a red beetle population across three situations:
Situation 1 (Original population): The population consists of mostly red beetles, with a small percentage of black beetles. The red coloration provides better camouflage in their current environment.
Situation 2 (Environment change): The environment darkens due to increased vegetation/debris. Now black beetles have better camouflage than red beetles. Over time, the percentage of black beetles in the population will increase relative to red beetles, as black beetles survive and reproduce at a higher rate.
Situation 3 (New environment): The environment changes again, this time becoming lighter in color (e.g
My books- Hacking Digital Learning Strategies http://hackingdls.com & Learning to Go https://gum.co/learn2go
Resources- http://shellyterrell.com/icebreakers
This document provides an overview of the temporomandibular joint (TMJ). It begins by defining the TMJ as the joint connecting the mandible to the skull and regulating mandibular movement. It then describes the different types of joints in the body before focusing on the specifics of the TMJ. Key points include that the TMJ is a complex synovial joint that allows for both hinging and gliding movements. An articular disc separates the condyle of the mandible and fossa of the temporal bone. The document outlines the development, structures, innervation, vascularization and biomechanics of the TMJ.
The aim of this list of programming languages is to include all notable programming languages in existence, both those in current use and ... Note: This page does not list esoteric programming languages. .... Computer programming portal ...
How Obama Won Using Digital and Social MediaJames Burnes
The document discusses 12 business lessons learned from the Obama presidential campaign's effective use of digital and social media. It summarizes key tactics the campaign used, such as maintaining a centralized customer database, using social networks to leverage large audiences, engaging supporters through YouTube videos, targeting small online donations, self-managed social networks, mobile applications, Twitter, blogging, and capturing consumer information. The outcomes included hundreds of thousands of organized events and donors, millions of calls and donations made, and over $500 million raised online and $639 million total.
Micro Expressions are brief, involuntary facial expressions shown on the face of humans according to emotions experienced.
They occur when a person is consciously trying to conceal all signs of how he or she is feeling, or when a person does not consciously know how he or she is feeling.
In this deck, a brief history of micro expressions is introduced, along with a detailed analysis of the 7 universal facial expressions that could be found in almost anyone walking on this Earth.
The document provides a business quiz with 16 multiple choice questions covering topics such as companies that coined economic terms, automobile companies, airlines, technology companies, banks, and consumer brands. It tests knowledge of companies like Goldman Sachs, Tata, Bombay Stock Exchange, HP, Rolls Royce, KFC, and banks like SBI and HDFC. The questions cover industries, products, founding details and other notable business facts.
The document outlines plans and strategies for sales management and general trade. It includes:
1. Developing more distribution areas and converting them to business units.
2. Providing assistance programs to support business unit development, build competitive marketing and sales edges, and enhance sales management systems and skills.
3. Setting targets to develop 35 business units by year's end and strengthen market expansion through key account management and multi-line product approaches.
The Clean 9 Program can help you to jumpstart your journey to a slimmer, healthier you in 9 days. This effective, easy-to-follow cleansing program will give you the tools you need to start transforming your body today! http://www.aloe4us.com/forever-clean-9.html
The document discusses how shopper behavior is changing due to time constraints and lifestyle changes. It analyzes different shopper need states and how retailers can adapt merchandising and store layout to satisfy these evolving needs. Key insights include that cleanliness, selection, and convenience are very important to shoppers. The beverage category can be organized into different consumer need states that retailers should clearly message to drive shopper conversion.
This paragraph describes the events of the Great Chicago Fire in chronological order, beginning with Daniel Sullivan noticing the flames and ending with the total number of buildings burned after the fire was out. Time clue words like "at around 8:30 pm", "By 9:30 pm", "In another 3 hours", and "It would be another day" indicate a chronological structure.
The document provides an introduction to Typesafe Activator and the Play Framework. It discusses how Activator is a tool that helps developers get started with the Typesafe Reactive Platform and Play applications. It also covers some core features of Play like routing, templates, assets, data access with Slick and JSON, and concurrency with Futures, Actors, and WebSockets.
This document provides an introduction to Node.js, a framework for building scalable server-side applications with asynchronous JavaScript. It discusses what Node.js is, how it uses non-blocking I/O and events to avoid wasting CPU cycles, and how external Node modules help create a full JavaScript stack. Examples are given of using Node modules like Express for building RESTful APIs and Socket.IO for implementing real-time features like chat. Best practices, limitations, debugging techniques and references are also covered.
The potential problem with caching in update_homepage is that deleting the cache key after updating the page could lead to a race condition or stampede.
Since the homepage is being hit 1000/sec, between the time the cache key is deleted and a new value is set, many requests could hit the database simultaneously to refetch the page, overwhelming it.
It would be better to set a new value for the cache key instead of deleting it, to avoid this potential issue.
This document discusses replacing the use of $GLOBALS['TYPO3_DB'] with Doctrine DBAL for database queries in TYPO3 extensions. Doctrine DBAL provides a database abstraction layer that supports multiple database vendors, whereas $GLOBALS['TYPO3_DB'] only supports MySQL. Migrating to Doctrine DBAL offers benefits like a more reliable industry standard and easier API. The document provides examples of common queries like select, insert, update using the Doctrine query builder and highlights best practices for security and restrictions. $GLOBALS['TYPO3_DB'] will be removed in TYPO3 8 LTS, so extensions need to migrate to Doctrine DB
Node.js is a platform for building scalable network applications. It uses Google's V8 JavaScript engine and a non-blocking I/O model. Some key points:
- Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, especially for real-time applications.
- It has a large ecosystem of open source modules. Popular frameworks include Express and Fab.
- While Node.js is very fast for I/O operations, memory usage can grow quickly and scaling to multiple cores requires multiple processes.
- The author argues Node.js is suitable for single-page apps, real-time applications, and crawlers, but
Most mid-sized Django websites thrive by relying on memcached. Though what happens when basic memcached is not enough? And how can one identify when the caching architecture is becoming a bottleneck? We'll cover the problems we've encountered and solutions we've put in place.
This document summarizes an ORM architecture talk given by Alex Gaynor. It discusses the major components of Django's ORM, including managers, querysets, queries, and SQL compilers. It provides examples of custom aggregates and automatic single object caching using the ORM. The talk focused on 50% ORM architecture details and 50% practical applications of the ORM.
The options for hosting ruby web application are plentiful, all with different advantages and disadvantages, options, limitations. How to start, how to grow, what are the pitfalls?
With this talk I’d first like to give a short overview of several cloud hosting alternatives such as plain VPS, AWS, EngineYard, Heroku, and provide some insights based on my experience with them – beyond just somehow getting it to run, but also how to handle continuous deployment, how to maintain and scale them.
While Rails already comes with many best practices build in, there are still plenty enough traps for you. We definitely had our fair share, and I’d like to share some of them for your entertainment and learning.
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
This document provides a summary of migrating to ClickHouse for analytics use cases. It discusses the author's background and company's requirements, including ingesting 10 billion events per day and retaining data for 3 months. It evaluates ClickHouse limitations and provides recommendations on schema design, data ingestion, sharding, and SQL. Example queries demonstrate ClickHouse performance on large datasets. The document outlines the company's migration timeline and challenges addressed. It concludes with potential future integrations between ClickHouse and MySQL.
This document summarizes a presentation about serverless and mobile applications. It includes:
- An introduction focusing on startups, mobile apps, serverless computing, blockchain, and fintech.
- Details about AWS AppSync and its Todo data model.
- Information on request and response mapping templates in AWS Lambda.
- An overview of delta sync using AppSync, base queries, subscriptions, and delta queries.
- Notes on serverless technologies like AWS Lambda and its supported runtimes.
- Resources on serverless best practices and architectures using Lambda, API Gateway, and other AWS services.
CouchDB for Web Applications - Erlang Factory London 2009Jason Davies
This document summarizes CouchApps, which are pure CouchDB applications that are standalone and hosted entirely on CouchDB. CouchApps have single step deployment via replication and enforce scalable thinking. The document discusses the couchapp tool for developing CouchApps and the resulting directory structure and design documents. It also covers JavaScript templating, URL routing, sending emails, form validation, and several example CouchApps including a blog.
Roundup of what is on the web at regarding Rails 3 as of Easter 2010.
Includes outline of significant changes to Rais in Rails 3 plus how you might set about upgrading an existing app.
Acknowledges and links to to some amazing resources already elsewhere on the web.
This document discusses using Puppet and related tools to automate the configuration and provisioning of development environments and servers. It covers using Vagrant and Puppet to set up local virtual machine environments, managing configurations with Puppet and Hiera, structuring code according to roles and profiles, integrating with version control and the Puppet Forge, and monitoring changes with tools like the Puppet Dashboard and MCollective. The document provides an overview of best practices and strategies for implementing infrastructure as code with Puppet.
You can find the first part of this presentation here: https://www.slideshare.net/secret/pAvK8Qd9f07oa
This presentation takes a deep dive into how the Million Song Library, a microservices-based application, was built using the Netflix Stack, Cassandra and Datastax.
To learn more about Million Song Library and its components visit the project on GitHub: https://github.com/kenzanlabs/million-song-library
Lea
Rails is a great Ruby-based framework for producing web sites quickly and effectively. Here are a bunch of tips and best practices aimed at the Ruby newbie.
The document summarizes new features in JBoss Operations Network (JBoss ON), including:
1) New chart types have been added to visualize metrics data. Storage nodes using Cassandra have also been added to improve scalability of storing large volumes of metrics data in a distributed manner.
2) Finer-grained bundle permissions allow restricting bundle creation, deployment and management based on resource groups and roles.
3) The REST API is now fully supported for both retrieving and inputting configuration data to enable out-of-band processing.
4) Upcoming versions of JBoss ON aim to reduce the agent footprint, improve support for EAP 6, and integrate with the Red Hat Access portal.
Whether you are building a mobile app or a web app, Apache Usergrid (incubating) can provide you with a complete backend that supports authentication, persistence and social features like activities and followers all via a comprehensive REST API — and backed by Cassandra, giving you linear scalability. This session will tell you what you need to know to be a Usergrid contributor, starting with the basics of building and running Usergrid from source code. You’ll learn how to find your way around the Usergrid code base, how the code for the Stack, Portal and SDKs and how to use the test infrastructure to test your changes to Usergrid. You’ll learn the Usergrid contributor workflow, how the project uses JIRA and Github to manage change and how to contribute your changes to the project. The session will also cover the Usergrid roadmap and what the community is currently working on.
This document introduces Node.js and provides an overview of its key features and use cases. Some main points:
- Node.js is a JavaScript runtime built on Chrome's V8 engine that allows building scalable network applications easily. It is not a web framework but you can build web frameworks with Node.js modules.
- Node.js is well-suited for building web servers, TCP servers, command line tools, and anything involving high I/O due to its non-blocking I/O model. It has over 15,000 modules and an active community for support.
- Common use cases include building JSON APIs, single page apps, leveraging existing Unix tools via child processes, streaming
PostgreSQL Performance Problems: Monitoring and AlertingGrant Fritchey
PostgreSQL can be difficult to troubleshoot when the pressure is on without the right knowledge and tools. Knowing where to find the information you need to improve performance is central to your ability to act quickly and solve problems. In this training, we'll discuss the various query statistic views and log information that's available in PostgreSQL so that you can solve problems quickly. Along the way, we'll highlight a handful of open-source and paid tools that can help you track data over time and provide better alerting capabilities so that you know about problems before they become critical.
The document discusses continuous deployment and practices at Disqus for releasing code frequently. It emphasizes shipping code as soon as it is ready after it has been reviewed, passes automated tests, and some level of QA. It also discusses keeping development simple, integrating code changes through automated testing, using metrics for reporting, and doing progressive rollouts of new features to subsets of users.
This document discusses several Python debugging and monitoring tools for Django projects: django-debug-toolbar for debugging, django-devserver for profiling, Gargoyle for feature flags, Sentry for error reporting, and references for further information. It provides installation instructions and examples of configuring and extending the tools.
This document discusses continuous deployment and how Disqus implements it. It involves committing code changes to a master branch which then triggers an automated integration and deployment process. Failed builds are reported, while successful builds automatically deploy the changes. Rollbacks can also be performed if needed. The document outlines Disqus' development, testing, deployment, and reporting workflows and tools like Gargoyle, Sentry, Jenkins, Fabric and Graphite. It discusses challenges around stability, testing coverage, scaling and database changes.
This document discusses building scalable web applications. It covers topics like common database bottlenecks, using asynchronous tasks like Celery to improve performance, and building an API to optimize access to data stored across multiple databases and caches. The document provides examples of using Django, Redis, and other tools to architect a Twitter-like application called Tweeter to be scalable from the start.
Continuous Deployment at Disqus (Pylons Minicon)zeeg
The document discusses Disqus' approach to continuous deployment. It describes how code is automatically deployed as soon as tests pass, with the goal of releasing features incrementally. It outlines the workflow, pros and cons, and techniques used to simplify local development and ensure stability through testing. Pain points like test speed and database migrations are addressed along with tools developed in-house like Mule for distributed testing.
Disqus talks about how they scale their Python web application to over 500 million visitors a month.
Video is available here: http://pycon.blip.tv/file/4880330/
Sentry is a self-hosted, open source log storage solution powered by Python. It provides a usable interface for filtering, sorting, and searching logs. Key features include extensibility through plugins, integration with Python frameworks like Django, and flexibility through its event storage and dashboard APIs. Sentry 2.0 will be platform independent and remove dependencies like Django.
The document provides tips for optimizing Django applications. It recommends using update() instead of save() for thread safety, partitioning queries to avoid expensive joins that don't scale, caching querysets judiciously, and delaying creation of model instances until needed to reduce memory usage and improve performance. The author concludes by advertising open engineering positions at their company.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
3. What is DISQUS?
dis·cuss • dĭ-skŭs'
We are a comment system with an emphasis on
connecting communities
http://disqus.com/about/
4. What is Scale?
Number of Visitors
300M
250M
200M
150M
100M
50M
Our traffic at a glance
17,000 requests/second peak
450,000 websites
15 million profiles
75 million comments
250 million visitors (August 2010)
5. Our Challenges
• We can’t predict when things will happen
• Random celebrity gossip
• Natural disasters
• Discussions never expire
• We can’t keep those millions of articles from
2008 in the cache
• You don’t know in advance (generally) where the
traffic will be
• Especially with dynamic paging, realtime, sorting,
personal prefs, etc.
6. Our Challenges (cont’d)
• High availability
• Not a destination site
• Difficult to schedule maintenance
10. Server Architecture - Web Servers
• Apache 2.2
• mod_wsgi
• Using `maximum-requests` to
plug memory leaks.
• Performance Monitoring
• Custom middleware
(PerformanceLogMiddleware)
• Ships performance statistics
(DB queries, external calls,
template rendering, etc) through
syslog
• Collected and graphed through
Ganglia
11. Server Architecture - Database
• PostgreSQL
• Slony-I for Replication
• Trigger-based
• Read slaves for extra read capacity
• Failover master database for high
availability
12. Server Architecture - Database
• Make sure indexes fit in memory and
measure I/O
• High I/O generally means slow queries
due to missing indexes or indexes not in
buffer cache
• Log Slow Queries
• syslog-ng + pgFouine + cron to automate
slow query logging
13. Server Architecture - Database
• Use connection pooling
• Django doesn’t do this for you
• We use pgbouncer
• Limits the maximum number of
connections your database needs to
handle
• Save on costly opening and tearing down
of new database connections
15. Partitioning
• Fairly easy to implement, quick wins
• Done at the application level
• Data is replayed by Slony
• Two methods of data separation
16. Vertical Partitioning
Vertical partitioning involves creating tables with fewer columns
and using additional tables to store the remaining columns.
Forums Posts Users Sentry
http://en.wikipedia.org/wiki/Partition_(database)
17. Pythonic Joins
Allows us to separate datasets
posts = Post.objects.all()[0:25]
# store users in a dictionary based on primary key
users = dict(
(u.pk, u) for u in
User.objects.filter(pk__in=set(p.user_id for p in posts))
)
# map users to their posts
for p in posts:
p._user_cache = users.get(p.user_id)
18. Pythonic Joins (cont’d)
• Slower than at database level
• But not enough that you should care
• Trading performance for scale
• Allows us to separate data
• Easy vertical partitioning
• More efficient caching
• get_many, object-per-row cache
19. Designating Masters
• Alleviates some of the write load on your
primary application master
• Masters exist under specific conditions:
• application use case
• partitioned data
• Database routers make this (fairly) easy
20. Routing by Application
class ApplicationRouter(object):
def db_for_read(self, model, **hints):
instance = hints.get('instance')
if not instance:
return None
app_label = instance._meta.app_label
return get_application_alias(app_label)
21. Horizontal Partitioning
Horizontal partitioning (also known as sharding) involves splitting
one set of data into different tables.
Disqus Your Blog CNN Telegraph
http://en.wikipedia.org/wiki/Partition_(database)
22. Horizontal Partitions
• Some forums have very large datasets
• Partners need high availability
• Helps scale the write load on the master
• We rely more on vertical partitions
23. Routing by Partition
class ForumPartitionRouter(object):
def db_for_read(self, model, **hints):
instance = hints.get('instance')
if not instance:
return None
forum_id = getattr(instance, 'forum_id', None)
if not forum_id:
return None
return get_forum_alias(forum_id)
# What we used to do
Post.objects.filter(forum=forum)
# Now, making sure hints are available
forum.post_set.all()
24. Optimizing QuerySets
• We really dislike raw SQL
• It creates more work when dealing with
partitions
• Built-in cache allows sub-slicing
• But isn’t always needed
• We removed this cache
25. Removing the Cache
• Django internally caches the results of your QuerySet
• This adds additional memory overhead
# 1 query
qs = Model.objects.all()[0:100]
# 0 queries (we don’t need this behavior)
qs = qs[0:10]
# 1 query
qs = qs.filter(foo=bar)
• Many times you only need to view a result set once
• So we built SkinnyQuerySet
26. Removing the Cache (cont’d)
Optimizing memory usage by removing the cache
class SkinnyQuerySet(QuerySet):
def __iter__(self):
if self._result_cache is not None:
# __len__ must have been run
return iter(self._result_cache)
has_run = getattr(self, 'has_run', False)
if has_run:
raise QuerySetDoubleIteration("...")
self.has_run = True
# We wanted .iterator() as the default
return self.iterator()
http://gist.github.com/550438
27. Atomic Updates
• Keeps your data consistent
• save() isnt thread-safe
• use update() instead
• Great for things like counters
• But should be considered for all write
operations
28. Atomic Updates (cont’d)
Thread safety is impossible with .save()
Request 1
post = Post(pk=1)
# a moderator approves
post.approved = True
post.save()
Request 2
post = Post(pk=1)
# the author adjusts their message
post.message = ‘Hello!’
post.save()
29. Atomic Updates (cont’d)
So we need atomic updates
Request 1
post = Post(pk=1)
# a moderator approves
Post.objects.filter(pk=post.pk)
.update(approved=True)
Request 2
post = Post(pk=1)
# the author adjusts their message
Post.objects.filter(pk=post.pk)
.update(message=‘Hello!’)
30. Atomic Updates (cont’d)
A better way to approach updates
def update(obj, using=None, **kwargs):
"""
Updates specified attributes on the current instance.
"""
assert obj, "Instance has not yet been created."
obj.__class__._base_manager.using(using)
.filter(pk=obj)
.update(**kwargs)
for k, v in kwargs.iteritems():
if isinstance(v, ExpressionNode):
# NotImplemented
continue
setattr(obj, k, v)
http://github.com/andymccurdy/django-tips-and-tricks/blob/master/model_update.py
31. Delayed Signals
• Queueing low priority tasks
• even if they’re fast
• Asynchronous (Delayed) signals
• very friendly to the developer
• ..but not as friendly as real signals
32. Delayed Signals (cont’d)
We send a specific serialized version
of the model for delayed signals
from disqus.common.signals import delayed_save
def my_func(data, sender, created, **kwargs):
print data[‘id’]
delayed_save.connect(my_func, sender=Post)
This is all handled through our Queue
33. Caching
• Memcached
• Use pylibmc (newer libMemcached-based)
• Ticket #11675 (add pylibmc support)
• Third party applications:
• django-newcache, django-pylibmc
34. Caching (cont’d)
• libMemcached / pylibmc is configurable with
“behaviors”.
• Memcached “single point of failure”
• Distributed system, but we must take
precautions.
• Connection timeout to memcached can stall
requests.
• Use `_auto_eject_hosts` and
`_retry_timeout` behaviors to prevent
reconnecting to dead caches.
35. Caching (cont’d)
• Default (naive) hashing behavior
• Modulo hashed cache key cache for index
to server list.
• Removal of a server causes majority of
cache keys to be remapped to new
servers.
CACHE_SERVERS = [‘10.0.0.1’, ‘10.0.0.2’]
key = ‘my_cache_key’
cache_server = CACHE_SERVERS[hash(key) % len(CACHE_SERVERS)]
36. Caching (cont’d)
• Better approach: consistent hashing
• libMemcached (pylibmc) uses libketama
(http://tinyurl.com/lastfm-libketama)
• Addition / removal of a cache server
remaps (K/n) cache keys
(where K=number of keys and n=number of servers)
Image Source: http://sourceforge.net/apps/mediawiki/kai/index.php?title=Introduction
37. Caching (cont’d)
• Thundering herd (stampede) problem
• Invalidating a heavily accessed cache key causes many
clients to refill cache.
• But everyone refetching to fill the cache from the data
store or reprocessing data can cause things to get even
slower.
• Most times, it’s ideal to return the previously invalidated
cache value and let a single client refill the cache.
• django-newcache or MintCache (http://
djangosnippets.org/snippets/793/) will do this for you.
• Prefer filling cache on invalidation instead of deleting
from cache also helps to prevent the thundering herd
problem.
38. Transactions
• TransactionMiddleware got us started, but
down the road became a burden
• For postgresql_psycopg2, there’s a database
option, OPTIONS[‘autocommit’]
• Each query is in its own transaction. This
means each request won’t start in a
transaction.
• But sometimes we want transactions
(e.g., saving multiple objects and rolling
back on error)
39. Transactions (cont’d)
• Tips:
• Use autocommit for read slave databases.
• Isolate slow functions (e.g., external calls,
template rendering) from transactions.
• Selective autocommit
• Most read-only views don’t need to be
in transactions.
• Start in autocommit and switch to a
transaction on write.
40. Scaling the Team
• Small team of engineers
• Monthly users / developers = 40m
• Which means writing tests..
• ..and having a dead simple workflow
41. Keeping it Simple
• A developer can be up and running in a few
minutes
• assuming postgres and other server
applications are already installed
• pip, virtualenv
• settings.py
43. Sane Defaults
settings.py
from disqus.conf.settings.default import *
try:
from local_settings import *
except ImportError:
import sys, traceback
sys.stderr.write("Can't find 'localsettings.py’n”)
sys.stderr.write("nThe exception was:nn")
traceback.print_exc()
local_settings.py
from disqus.conf.settings.dev import *
44. Continuous Integration
• Daily deploys with Fabric
• several times an hour on some days
• Hudson keeps our builds going
• combined with Selenium
• Post-commit hooks for quick testing
• like Pyflakes
• Reverting to a previous version is a matter of
seconds
46. Testing
• It’s not fun breaking things when you’re the new
guy
• Our testing process is fairly heavy
• 70k (Python) LOC, 73% coverage, 20 min suite
• Custom Test Runner (unittest)
• We needed XML, Selenium, Query Counts
• Database proxies (for read-slave testing)
• Integration with our Queue
48. Bug Tracking
• Switched from Trac to Redmine
• We wanted Subtasks
• Emailing exceptions is a bad idea
• Even if its localhost
• Previously using django-db-log to aggregate
errors to a single point
• We’ve overhauled db log and are releasing
Sentry
51. Feature Switches
• We needed a safety in case a feature wasn’t
performing well at peak
• it had to respond without delay, globally,
and without writing to disk
• Allows us to work out of trunk (mostly)
• Easy to release new features to a portion of
your audience
• Also nice for “Labs” type projects
53. Final Thoughts
• The language (usually) isn’t your problem
• We like Django
• But we maintain local patches
• Some tickets don’t have enough of a following
• Patches, like #17, completely change
Django..
• ..arguably in a good way
• Others don’t have champions
Ticket #17 describes making the ORM an identify mapper
54. Housekeeping
Birds of a Feather
Want to learn from others about
performance and scaling problems?
Or play some StarCraft 2?
We’re Hiring!
DISQUS is looking for amazing engineers
Hi. I'm Jason (and I'm David), and we're from Disqus.
Show of hands, How many of you know what DISQUS is?
For those of you who are not familiar with us, DISQUS is a comment system
that focuses on connecting communities. We power discussions on such sites as CNN, IGN, and
more recently Engadget and TechCrunch.
Our company was founded back in 2007 by my co-founder,
Daniel Ha, and I back where we started working out of our dorm room.
Our decision to use Django came down primarily to our dislike for PHP which
we were previously using. Since then, we've grown Disqus to over 250+
million visitors a month.
We've peaked at over 17,000 requests per second, to Django, and we currently
power comments on nearly half a million websites which accounts for more than
15 million profiles who have left over 75 million comments.
As you can imagine we have some big challenges when it comes to scaling a large Django application.
For one, it’s hard to predict when events happen like last year with Michael Jackson’s death, and more recently, the Gulf Oil Spill.
Another challenge we have is the fact that discussions never expire. When you visit that blog post from 2008 we have to be ready to serve those comments immediately. Not only does THAT make caching difficult, but we also have to deal with things such as dynamic paging, realtime commenting, and other personal preferences. This makes it even more important to be able to serve those quickly without relying on the cache.
So we also have some interesting infrastructure problems when it comes to scaling Disqus.
We're not a destination website, so if we go down, it affects other sites as well as ours.
Because of this, it's difficult for us to schedule maintenance, so we face some interesting
scaling and availbility challenges.
As you can see, we have tried to keep the stack pretty thin. This is because, as we've learned,
the more services we try to add, the more difficult it is to support. And especially because we have
a small team, this becomes difficult to manage.
So we use DNS load balancing to spread the requests to multiple HAProxy servers which are our software
load balancers. These proxy requests to our backend app servers which run mod_wsgi. We use memcache
for caching, and we have a custom wrapper using syslog for our queue. For our data store, we use PostgreSQL,
and for replication, we use Slony for failover and read slaves.
As I said, we use HAProxy for HTTP load balancing. It's a high performance
software load balancer with intelligent failure detection. It also
provides you with nice statistics of your requests. We use heartbeat for
high availability and we have it take over the IP address of the down machine.
We have about 100GB of cache. Because of our high availability requirements, 20%
are allocated to high availability and load balancing.
Our web servers are pretty standard. We use mod_wsgi mostly because it just
works. Performance wise, you're really going to be bottlenecked on the application.
The cool thing we do is that we actually hasve a custom middleware that does
performance monitoring. What this does is ship data from our application about
external calls like database, cache calls, and we collect it and graph
it with Ganglia.
The more interesting aspect of our server architecture is how we have our database
setup. As I mentioned, we use Postgres as our database. Honestly, we used it because
Django recommended it, and my recommendation is that if you’re not already an expert in a
database, you're better off going with Postgres.
We use slony for replication Slony is trigger-based which means that every write
is captured and strored in a log table and those events are replayed to slave databases.
This is nice over otehr methods such as log shipping because it allows us to have
flexible schemas across read lsaves. For example, some of our read slaves have different
indexes. We also use slony for failover for high availbility.
There are a few things we do to keep our database healthy. We keep our indexes in memory,
and when we can't, we partition our data. We also have application-specific indexes on
our readslaves. Another important thing we've done is measure I/O. Any time we've seen
high I/O is usually because we're missing indexes or indexes aren't fitting in memory.
Lastly, we monitor slow queries. We send logs to pgfouine via syslog which genererates
a nice report showing you which queries are the slowest.
The last thing we've found to be really helpful is switching to database connection pool.
Remember, Django doesn't do this for you. We use pgbouncer for this, and there are a few
easy wins for using it. One is that it limits the maximum connections to the database so
it doesn't have to handle as many concurrent connections. Secpondly, you save the cost
of opening and tearing down new connections per request.
Moving on to our application, we’ve found that most of the struggle is with the database layer.
We’ve got a pretty standard layout if you’re familiar with forums. Forum has many threads, which has many posts. Posts use an adjacency list model, and also reference Users. With this kind of data model, one of our quickest wins has been the ability to partition data.
It’s almost entirely done at the application level, which makes it fairly easy to implement. The only thing not handled by the app is replication, and Slony does that for us. We handle partitioning in a couple of ways.
The first of which are vertical partitions. This is probably the simplest thing you can implement in your application. Kill off your joins and spread out your applications on multiple databases. Some database engines might make this easier than others, but Slony allows us to easily replicate very specific data.
Using this method you’ll need to handle joins in your Python application. We do this by performing two separate queries and mapping the foreign keys to the parent objects. For us the easiest way has been to throw them into a dictionary, iterate through the other queryset, and set the foreignkey cache’s value to the instance.
A few things to keep in mind when doing pythonic joins. They’re not going to be as fast in the database. You can’t avoid this, but it’s not something you should worry about. With this however, you get plain and simple vertical partitions. You also can cache things a lot easier, and more efficiently fetch them using things like get_many and a singular object cache. Overall your’e trading performance for scale.
Another benefit that comes from vertical partitioning is the ability to designate masters. We do this to alleviate some of the load on our primary application master. So for example, server FOO might be the source for writes on the Users table, while server BAR handles all of our other forum data. Since we’re using Django 1.2 we also get routing for free through the new routers.
Here’s an example of a simple application router. It let’s us specify a read-slave based on our app label. So if its users, we go to FOO, if its forums, we go to BAR. You can handle this logic any way you want, pretty simple and powerful.
While we use vertical partitioning for most cases, eventually you hit an issue where your data just doesn’t scale on a single database. You’re probably familiar with the word sharding, well that’s what we do with our forum data. We’ve set it up so that we can send certain large sites to dedicated machines. This also uses designated masters as we mentioned with the other partitions.
We needed this when write and read load combined became so big that it was just hard to keep up on a single set of machines. It also gives the nice added benefit of high availability in many situations. Mostly though, it all goes back to scaling our master databases.
So again we’re using the router here to handle partitioning of the forums. We can specify that CNN goes to this database alias, which could be any number of machines, and everything else goes to our default cluster. The one caveat we found with this, is sometimes hints aren’t present in the router. I believe within the current version of Django they are only available when using a relational lookup, such as a foreign key. All in all it’s pretty powerful, and you just need to be aware of it while writing your queries.