This document discusses how to maintain large web applications over time. It describes how the author's team managed a web application with over 65,000 lines of code and 6,000 automated tests over 2.5 years of development. Key aspects included packaging full releases, automating dependency installation, specifying supported environments, and automating data migrations during upgrades. The goal was to have a sustainable process that allowed for continuous development without slowing down due to maintenance issues.
A document about queues discusses what queues are, why they are used, common use cases, implementation patterns, protocols, considerations when implementing queues, and how to handle issues that may arise. Queues act as buffers that allow different applications or systems to communicate asynchronously by passing messages. They help decouple components, distribute load, and improve reliability and user experience. Common examples of messages that may be queued include emails, images, videos, and IoT data.
With the Varnish caching proxy you can make websites blazingly fast. The basics are quite simple once you understand how cache handling in HTTP works. For starters, we will look into HTTP and Varnish configuration. The main course is going to be test-driven cache invalidation and the cache tagging strategy. For desserts, there will be an introduction to Edge Side Includes (ESI). All of this will be liberally sprinkled with examples from the FOSHttpCache library and some ideas from the FOSHttpCacheBundle for Symfony2.
Despite advances in software design and static analysis techniques, software remains incredibly complicated and difficult to reason about. Understanding highly-concurrent, kernel-level, and intentionally-obfuscated programs are among the problem domains that spawned the field of dynamic program analysis. More than mere debuggers, the challenge of dynamic analysis tools is to be able record, analyze, and replay execution without sacrificing performance. This talk will provide an introduction to the dynamic analysis research space and hopefully inspire you to consider integrating these techniques into your own internal tools.
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013Marcus Barczak
The document discusses Etsy's experience integrating multiple content delivery network (CDN) providers. Etsy began using a single CDN in 2008 but then investigated using multiple CDNs in 2012 to improve resilience, flexibility, and costs. They developed an evaluation criteria and testing process to initially configure and test the CDNs with non-critical traffic before routing production traffic. Etsy then implemented methods for balancing traffic across CDNs using DNS and monitoring the performance of the CDNs and origin infrastructure.
The document discusses cache concepts and the Varnish caching software. It provides an agenda that covers cache concepts like levels and types of caches as well as HTTP headers that help caching. It then covers Varnish, describing it as an HTTP accelerator, and discusses its process architecture, installation, basic configuration using VCL, backends, probes, directors, functions/subroutines, and tuning best practices.
A document about queues discusses what queues are, why they are used, common use cases, implementation patterns, protocols, considerations when implementing queues, and how to handle issues that may arise. Queues act as buffers that allow different applications or systems to communicate asynchronously by passing messages. They help decouple components, distribute load, and improve reliability and user experience. Common examples of messages that may be queued include emails, images, videos, and IoT data.
With the Varnish caching proxy you can make websites blazingly fast. The basics are quite simple once you understand how cache handling in HTTP works. For starters, we will look into HTTP and Varnish configuration. The main course is going to be test-driven cache invalidation and the cache tagging strategy. For desserts, there will be an introduction to Edge Side Includes (ESI). All of this will be liberally sprinkled with examples from the FOSHttpCache library and some ideas from the FOSHttpCacheBundle for Symfony2.
Beyond Breakpoints: A Tour of Dynamic AnalysisFastly
Despite advances in software design and static analysis techniques, software remains incredibly complicated and difficult to reason about. Understanding highly-concurrent, kernel-level, and intentionally-obfuscated programs are among the problem domains that spawned the field of dynamic program analysis. More than mere debuggers, the challenge of dynamic analysis tools is to be able record, analyze, and replay execution without sacrificing performance. This talk will provide an introduction to the dynamic analysis research space and hopefully inspire you to consider integrating these techniques into your own internal tools.
How to investigate and recover from a security breach in WordPressOtto Kekäläinen
This document summarizes Otto Kekäläinen's talk about investigating and recovering from a WordPress security breach at his company Seravo. On November 9th, 2018 four WordPress sites hosted by Seravo were compromised due to a vulnerability in the WP GDPR Compliance plugin. Seravo's security team launched an investigation that uncovered malicious user accounts, identified the vulnerable plugin as the entry point, and cleaned up the sites. The experience highlighted the importance of having an incident response plan even when security best practices are followed.
Less and faster – Cache tips for WordPress developersSeravo
Otto Kekäläinen, the code-loving CEO of Seravo held a webinar on May 12, 2020, that focused on the cache: what should a WordPress developer know and which are the best practices to follow?
In this presentation, I show the audience how to implement HTTP caching best practices in a non-intrusive way in PHP Symfony 4 code base.
This presentation focuses on topics like:
- Caching using cache-control headers
- Cache variations using the Vary header
- Conditional requests using headers like ETag & If-None-Match
- ESI discovery & parsing using headers like Surrogate-Capability & Surrogate-Control
- Caching stateful content using JSON Web Token Validation in Varnish
More information about this presentation is available at https://feryn.eu/speaking/developing-cacheable-php-applications-php-limburg-be/
Search in WordPress - how it works and howto customize itOtto Kekäläinen
WordPress search customization is a topic we at Seravo get asked about on a frequent basis. There are many different ways to customize the search, and customers understandably want to learn the best practices. The search can be customized quite easily with small changes on PHP code level, and by utilizing MariaDB database’s built-in search functionality. You can also choose a more robust way to do this, and build a new ElasticSearch server just for your case.
These slides are from the webinar on January 14th, 2021: https://seravo.com/blog/webinar-search-function-and-how-to-customize-it/
Sherlock Homepage - A detective story about running large web services - WebN...Maarten Balliauw
The site was slow. CPU and memory usage everywhere! Some dead objects in the corner. Something terrible must have happened! We have some IIS logs. Some traces from a witness. But not enough to find out what was wrong. In this session, we’ll see how effective telemetry, a profiler or two as well as a refresher of how IIS runs our ASP.NET web applications can help solve this server murder mystery.
(WEB304) Running and Scaling Magento on AWS | AWS re:Invent 2014Amazon Web Services
Magento is a leading open source, eCommerce platform used by many global brands. However, architecting your Magento platform to grow with your business can sometimes be a challenge. This session walks through the steps needed to take an out-of-the-box, single-node Magento implementation and turn it into a highly available, elastic, and robust deployment. This includes an end-to-end caching strategy that provides an efficient front-end cache (including populated shopping carts) using Varnish on Amazon EC2 as well as offloading the Magento caches to separate infrastructure such as Amazon ElastiCache. We also look at strategies to manage the Magento Media library outside of the application instances, including EC2-based shared storage solutions and Amazon S3. At the data layer we look at Magento-specific Amazon RDSandndash;tuning strategies including configuring Magento to use read replicas for horizontal scalability. Finally, we look at proven techniques to manage your Magento implementation at scale, including tips on cache draining, appropriate cache separation, and utilizing AWS CloudFormation to manage your infrastructure and orchestrate predictable deployments.
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentFastly
June 25, 2014. Hooman Beheshti, VP Technology at Fastly, discusses how using a real-time, modern CDN that provides instant cache invalidation and real-time analytics allows for instantaneous control over dynamic content caching. In this session, he looks at the challenges CDNs face with dynamic content and how you can use programmatic means to fully integrate your applications with your CDN.
Today's high-traffic web sites must implement performance-boosting measures that reduce data processing and reduce load on the database, while increasing the speed of content delivery. One such method is the use of a cache to temporarily store whole pages, database recordsets, large objects, and sessions. While many caching mechanisms exist, memcached provides one of the fastest and easiest-to-use caching servers. Coupling memcached with the alternative PHP cache (APC) can greatly improve performance by reducing data processing time. In this talk, Ben Ramsey covers memcached and the pecl/memcached and pecl/apc extensions for PHP, exploring caching strategies, a variety of configuration options to fine-tune your caching solution, and discusses when it may be appropriate to use memcached vs. APC to cache objects or data.
Stupid Boot Tricks: using ipxe and chef to get to boot management blissmacslide
Jason Cook discusses his experience setting up boot infrastructure for Fastly's caching clusters. He outlines how they moved from using existing tools like Cobbler and Razor to building their own solution called Donner using iPXE to boot machines over HTTP. Donner uses Chef to store machine metadata and configuration which allows the boot process to install operating systems, configure networking, and run Chef on first boot to provision machines.
Scaling asp.net websites to millions of usersoazabir
This document discusses various techniques for optimizing ASP.NET applications to scale from thousands to millions of users. It covers topics such as preventing denial of service attacks, optimizing the ASP.NET process model and pipeline, reducing the size of ASP.NET cookies on static content, improving System.net settings, optimizing queries to ASP.NET membership providers, issues with LINQ to SQL, using transaction isolation levels to prevent deadlocks, and employing a content delivery network. The overall message is that ASP.NET requires various "hacks" at the code, database, and configuration levels to scale to support millions of hits.
Scale your PHP web app to get ready for the peak season.
Useful information you might want to consider before scaling your application.
Slides as presented in my talk at PHP conference Australia in April 2016
My slides from WordCamp Dhaka 2019 on WordPress Scaling. In this session I explained performance optimisation using HTTP/2, Caching and compressing resources.
I also explained how to Dockerize WordPress to make it easier to scale.
You know how HTTP caching works but need more? In this talk we look into ways to cache personalized content. We will look at Edge Side Includes (ESI) to tailor caching rules of fragments, and at the user context concept to differentiate caches not by individual user but by permission groups.
Roy foubister (hosting high traffic sites on a tight budget)WordCamp Cape Town
The document discusses optimizing a server to handle high traffic loads on a tight budget. It describes how the default LAMP stack configuration is not adequate and leads to crashes under load. It then details several optimizations tried: increasing Apache and MySQL configuration limits, using Apache worker mode, adding OPcache and object caching with W3 Total Cache which improved performance by 500%. It also recommends splitting static and dynamic content using Nginx to further reduce load on Apache. With these optimizations, a single server could reliably handle the load.
Do modernizing the Mainframe for DevOps.Massimo Talia
The document discusses how organizations can modernize mainframe development for DevOps. It outlines challenges like an aging mainframe developer workforce and slow release cycles. It recommends integrating mainframe source code with version control systems, enabling local development environments, automating builds and testing, and integrating mainframe deployment with corporate DevOps tools. Adopting modern development tools and practices can help empower developers and reduce bottlenecks to help bring mainframe systems into the DevOps model.
The DevOps paradigm - the evolution of IT professionals and opensource toolkitMarco Ferrigno
This document discusses the DevOps paradigm and tools. It begins by defining DevOps as focusing on communication and cooperation between development and operations teams. It then discusses concepts like continuous integration, delivery and deployment. It provides examples of tools used in DevOps like Docker, Kubernetes, Ansible, and monitoring tools. It discusses how infrastructure has evolved to be defined through code. Finally, it discusses challenges of security in DevOps and how DevOps works aligns with open source principles like meritocracy, metrics, and continuous improvement.
This document summarizes the DevOps paradigm and tools. It discusses how DevOps aims to improve communication and cooperation between development and operations teams through practices like continuous integration, delivery, and deployment. It then provides an overview of common DevOps tools for containers, cluster management, automation, CI/CD, monitoring, and infrastructure as code. Specific tools mentioned include Docker, Kubernetes, Ansible, Jenkins, and AWS CloudFormation. The document argues that adopting open source principles and emphasizing leadership, culture change, and talent growth are important for successful DevOps implementation.
Migraine Drupal - syncing your staging and live sitesdrupalindia
The document discusses using the Migraine tool to migrate changes between development, staging, and production environments for a Drupal website. It outlines the development methodology, requirements for Migraine, and the workflow it uses to synchronize databases and file systems between environments with minimal downtime. Key aspects include categorizing database tables, taking backups, comparing schemas, and commands to dump, migrate, and restore databases.
This document discusses building a simple automated deployment platform with PHP and Linux. It describes taking a website offline for manual upgrades, which is time-consuming and error-prone. The document then outlines techniques for automating the deployment process, including exporting code from version control, applying file permissions and configuration changes, backing up and patching databases, running unit tests, and using symlinks to swap environments. It emphasizes the need for rollback capabilities and managing multiple environments like staging and production. The goal is to provide techniques for small teams and startups to continuously and reliably deploy updates.
This document discusses building a simple automated deployment platform with PHP and Linux. It describes taking a website offline for manual upgrades, which is time-consuming and error-prone. The document then outlines techniques for automating the deployment process, including exporting code from version control, applying file permissions and configuration changes, backing up and patching databases, running unit tests, and using symlinks to swap environments. It emphasizes the need for rollback capabilities and managing multiple environments like staging and production. The goal is to provide techniques for small teams and startups to continuously and reliably deploy updates.
Confoo-Montreal-2016: Controlling Your Environments using Infrastructure as CodeSteve Mercier
Slides from my talk at ConFoo Montreal, February 2016. A presentation on how to apply configuration management (CM) principles for your various environments, to control changes made to them. You apply CM on your code, why not on your environments content? This presentation will present the infrastructure as code principles using Chef and/or Ansible. Topics discussed include Continuous Integration, Continuous Delivery/Deployment principles, Infrastructure As Code and DevOps.
This document discusses how to automate testing, builds, and deployments of Perl applications using Hudson. Hudson is an open source tool that facilitates continuous integration by automatically building code, running tests, and deploying changes whenever code is committed to version control. The document outlines how to set up Hudson to run tests and deploy code changes, providing notifications of failures or successes. It also discusses strategies for deploying code directly to production after passing automated tests to speed up release cycles.
SE2018_Lec-22_-Continuous-Integration-ToolsAmr E. Mohamed
The document discusses build tools and continuous integration. It provides an overview of Maven, a build tool that standardizes project builds through conventions and dependencies. Maven aims to simplify builds through predefined directories and dependencies. It also provides dependency management, documentation generation, and release management. The document then discusses Jenkins, a tool for continuous integration that can trigger automated builds and tests. It notes Maven and Jenkins are often used together, with Maven for builds and Jenkins triggering builds.
Are you tired of the ever-increasing complexity in the world of DevOps? Do Docker and Kubernetes scripts, Ansible configurations, and networking woes make your head spin? It's time for a breath of fresh air.
Join us on a transformative journey where we shatter the myth that DevOps has to be overly complicated. Say goodbye to the days of struggling with incomplete scripts and tangled configurations. In this enlightening talk, we'll guide you through the process of rapidly onboarding your new standard microservice into the DevOps and Cloud universe.
We'll unveil the power of GitHub Actions, AWS, OpenAI API, and MS Teams Incoming Web hooks in a way that's both enlightening and entertaining. Additionally, we'll explore how Language Model APIs (LLMs) can be leveraged to enhance and streamline your DevOps workflows. You'll discover that DevOps doesn't have to be a labyrinth of complexity; it can be a streamlined and enjoyable experience.
So, if you're ready to simplify your DevOps journey and embrace a world where AWS, the OpenAI API, and GitHub Actions collaborate seamlessly while harnessing the potential of LLMs, join us and let's make DevOps a breeze!
This post is about love. About the love of the static code analyzer PVS-Studio, for the great open source Linux operating system. This love is young, touching and fragile. It needs help and care. You will help greatly if you volunteer to help testing the beta-version of PVS-Studio for Linux.
The new buzz world in the world of Agile is "DevOps". So what exactly is devOps and Why do we need it? When development got married to deployment (sys-admin/operations) ; what is born is a new advanced species which is known to us today as "DevOps"
Deploying systems using AWS DevOps tools
You've heard a lot about DevOps, but have you ever wondered which tools to use to deploy your systems? Join Karl Schwirz and Matt Parr from Slalom Consulting as they walk through a code pipeline deployment on AWS. In this MassTLC DevOps session, Matt and Karl will walk through a real-world application deployment using CloudFormation, CodeDeploy, CodePipeline and Chef.
The document provides tips for building a scalable and high-performance website, including using caching, load balancing, and monitoring. It discusses horizontal and vertical scalability, and recommends planning, testing, and version control. Specific techniques mentioned include static content caching, Memcached, and the YSlow performance tool.
RightScale Webinar: January 13, 2011 – Watch this webinar for a look behind the scenes as we discuss ServerTemplates and how are they different from alternate approaches.
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaAmazon Web Services
The presentation will discuss some architectural patterns in continuous integration, deployment and optimization and I will share some of the lessons learned from Amazon.com.
The goal of the presentation is to convince you that if you invest your time where you get the maximum learning from your customers, automate everything else in the cloud (CI + CD + CO), you get fast feedback and will be able to release early, release often and recover quickly from your mistakes. Dynamism of the cloud allows you to increase the speed of your iteration and reduce the cost of mistakes so you can continuously innovate while keeping your cost down.
This document discusses building a Slack bot using AWS Lambda and the Chalice framework. It describes how FaaS works, options for running functions through AWS Lambda including Python support. It then outlines how to build a Slack bot with Chalice that allows users to query Stack Overflow through natural language requests in Slack. Key steps include setting up the bot to handle requests, retrieving secrets securely, formatting responses, and deploying the code to AWS Lambda to be accessible through Slack. It also briefly discusses additional uses of FaaS beyond a basic bot including cron jobs and handling external events.
Mage Titans USA 2016 - Jonathan Bownds - Magento CI and Testing Stacey Whitney
Continuous integration is a great way to increase the velocity of development and release a greater number of features more quickly. In this talk we’ll be reviewing how to implement CI with Jenkins, Ansible and phantom.js so that a change can be checked into GIT, automatically triggering a build which includes test cases and a deploy to a staging box.
The document provides guidance on designing a complex web application by breaking it into multiple microservices or applications. It recommends asking questions about team size, traffic patterns, priorities for speed vs stability, existing APIs or libraries, and programming languages. Based on the answers, it suggests appropriate frameworks, languages, data storage, testing/deployment processes, and server/container management options. The overall goal is to modularize the application, leverage existing tools when possible, and not overengineer parts of the design.
DevOps aims to bring development and operations teams closer together through automation, shared tools and processes. Automating builds improves consistency, reduces errors and improves productivity. Common issues with builds include them being too long, handling a large volume, or being too complex. Solutions include improving build speed, addressing long/complex builds through techniques like distributed builds, and using build acceleration tools. Automation is a key part of DevOps and enables continuous integration, testing and deployment.
This document summarizes an introduction to profiling presentation. It discusses using the cProfile module to generate profile data and analyze it using tools like pstats. It also discusses using the results to identify bottlenecks by looking at exclusive time functions or walking down the call graph from inclusive time functions. Common optimizations mentioned include removing unnecessary work, using more efficient algorithms, batching I/O operations, database and SQL tuning, caching, and reducing code complexity.
Introduction to performance tuning perl web applicationsPerrin Harkins
This document provides an introduction to performance tuning Perl web applications. It discusses identifying performance bottlenecks, benchmarking tools like ab and httperf to measure performance, profiling tools like Devel::NYTProf to find where time is spent, common causes of slowness like inefficient database queries and lack of caching, and approaches for improvement like query optimization, caching, and infrastructure changes. The key messages are that performance issues are best identified through measurement and profiling, database queries are often the main culprit, and caching can help but adds complexity.
The document discusses various approaches for efficient shared data in Perl, including files, shared memory, databases, and caching modules. It provides an overview of common caching and database modules for Perl like Cache::Cache, Cache::Mmap, MLDBM::Sync, BerkeleyDB, IPC::MM, Tie::TextDir, IPC::Shareable, DBD::SQLite, DBD::MySQL and memcached. It also describes testing methodology and analysis of performance results for different approaches when varying factors like number of clients and read/write ratios.
I gave this talk at OSCON 2001. The information here is somewhat outdated, but the related article has been updated since then:
http://perl.apache.org/docs/tutorials/tmpl/comparison/comparison.html
For the record, this was a really fun talk to give. The title slide had animated flames burning on the screen when people walked in.
DBIx::Router is a DBI proxy that provides load balancing, failover, and sharding capabilities across multiple database servers. It uses a configuration file to define data sources and routing rules to map SQL queries to specific data sources. This allows databases to be scaled out in a transparent way without needing to modify application code. While it has made progress, DBIx::Router is still in development and lacks some features like auto-commit and streaming query results.
This talk was presented at OSCON 2006 and ApacheCon 2006. It suffers quite a bit from not having the commentary that went with the slides, but my notes for this talk are available on this site as a PDF.
This talk was probably the most well-received OSCON talk I've ever done. There were a lot of jokes and people were rolling in the aisles. Larry Wall and Damian Conway attended the talk at OSCON and while they did argue a couple of points they mostly laughed along.
Care and Feeding of Large Web ApplicationsPerrin Harkins
This document discusses the development and maintenance of a large web application called Arcos. It was developed over 2.5 years by 2-5 developers and contains around 79,000 lines of Perl code. It includes features like a CMS, e-commerce, data warehousing, email marketing, and job queueing. Maintaining such a large codebase requires careful version control, configuration management, automated testing, and the ability to deploy stable releases.
This document provides 10 tips for improving Perl performance. Some key tips include using a profiler like Devel::NYTProf to identify bottlenecks, optimizing database queries with DBI, choosing fast hash storage like BerkeleyDB, avoiding serialization with Data::Dumper in favor of faster options like JSON::XS, and considering compiling Perl without threads for a potential 15% speed boost. Proper use of profiling is emphasized to avoid wasting time optimizing the wrong parts of code.
The document discusses common mistakes made when using the Template Toolkit templating system and ways to improve performance. It mentions that people often break the Template Toolkit cache by discarding the Template instance and recommends keeping it. It also notes that Uri's benchmark of Template Toolkit did something different by templating in a scalar reference, which may be a bug. Template Toolkit is faster than it was originally shown to be, now over 300% faster, though Template::Simple is still much faster for some use cases. The document advocates for slower, heavier templates as it separates presentation from model and controller code.
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Care and feeding notes
1. 3/3/12 No Title
Care and Feeding of Large Web Applications
by Perrin Harkins
So, you launched your website. Congratulations!
And then there were a bunch of quick fixes. And you started getting traffic so you had to add more
machines. And some more developers. And more features to keep your new users happy. And suddenly
you find yourself spending all your time doing damage control on a site that seems to have taken on a life of
its own and you can't make a new release because the regression testing alone would take three years.
Usually, this is the part where everyone starts clamoring for a rewrite, and the CEO contemplates firing
your ass and bringing in an army of consultants to rewrite it all in the flavor of the month.
How can we avoid this mess? How can we create a web development process that is sustainable for years
and doesn't hold back development?
Backstory
There's more than one way to do it, but I'll tell how my team did it, at a small startup company called Plus
Three. Let me give you a few stats about our project:
About 2.5 years of continuous development
2 - 5 developers on the team during that time
65,000+ lines of Perl code
1600+ lines of SQL
(Computed with David Wheeler's SLOCCount program)
Plenty of HTML, CSS, and JavaScript too
6000+ automated tests in 78 files
169 CPAN modules
It's a big system, built to support running websites for political campaigns and non-profit membership
organizations. Some of the major components are a content management system, an e-commerce system
with comprehensive reporting, a data warehouse with an AJAX query builder GUI, a large-scale e-mail
campaign system, a variety of user-facing web apps, and an asynchronous job queue.
This talk isn't meant to be about coding style, which I've discussed in some previous talks, but I'll give you
the 10,000 foot overview:
Object-oriented
MVC-ish structure with the typical breakdown into controller classes, database classes, and templates.
file:///Users/perrinharkins/Conferences/care_and_feeding.html 1/7
2. 3/3/12 No Title
(Not very pure MVC, but that's a whole separate topic.)
Our basic building blocks were CGI::Application, Class::DBI, and HTML::Template.
Ok, that's the software. How did we keep it under control?
Deployment
Let's dive right in by talking about the hardest thing first: deployment. So hard to get right, but so rarely
discussed and so hard to generalize. Everyone ends up with solutions that are tied very closely to their own
organization's quirks.
The first issue here is how to package a release. We used plain old .tar.gz files, built by a simple script after
pulling a tagged release from our source control system. We tried to always release complete builds, not
individual files. This is important in order to be sure you have a consistent production system that you can
rebuild from scratch if necessary. It's also important for setting up QA testing. If you just upload a file here
and there (or worse, vi a file in production!), you get yourself in a bad state where your source control no
longer reflects what's really live and your testing misses things because of it. We managed to stick to the
"full build release" rule, outside of dire emergencies.
Like most big Perl projects we used a ton of CPAN modules. The first advice you'll get about how to install
them is "just use the CPAN shell," possibly with a bundle or Task file. This is terrible advice.
The most obvious problem with it is that as the number of CPAN modules increases, the probability of one
of them failing to install via the CPAN shell for some obscure and irrelevant reason approaches 1.
The second most obvious problem is that you don't want to install whatever the latest version of some
module happens to be -- you want to install the specific version that you've been developing with and that
you tested in QA. There might be something subtly different about the new version that will break your site.
Test it first.
Let me lay out the requirements we had for a CPAN installer:
Install specific versions.
Install from local media. Sometimes a huge CPAN download is not convenient.
Handle versions with local patches. We always submitted our patches, but sometimes we couldn't
afford to wait for a release that included them.
Fully automated. That means that modules which ask pesky questions during install must be handled
in some way. I'm looking at you, WWW::Mechanize.
Install into a local directory. We don't want to put anything in the system directories because we want
to be able to run multiple versions of our application on one machine, even if they require different
versions of the same module.
Skip the tests. I know this sounds like blasphemy, but bear with me. If you have a cluster of identical
machines, running all the module tests on all of them is a waste of time. And the larger issue is that
CPAN authors still don't all agree on what the purpose of tests is. Some modules come with tests that
are effectively useless or simply fail unless you set up test databases or jump through similar hoops.
file:///Users/perrinharkins/Conferences/care_and_feeding.html 2/7
3. 3/3/12 No Title
Our solution to the installation problem was to write an automated build system that builds all the modules it
finds in the src/ directory of our release package. (Note that this means we can doctor one of those modules
if we have to.) We used the Expect module (which is included and bootstrapped at the beginning of the
build) and gave it canned answers for the modules with chatty install scripts. We also made it build some
non-CPAN things we needed: Apache and mod_perl, the SWISH-E search engine. If we could have
bundled Perl and MySQL too, that would have been ideal.
Why bundle the dependencies? Why not just use whatever apache binary we find lying around? In short,
we didn't want to spend all of our time troubleshooting insane local configurations and builds where
someone missed a step. A predictable runtime environment is important.
To stress that point a little more, if your software is an internal application that's going to be run on
dedicated hardware, you can save yourself a lot of trouble by only supporting very specific configurations.
Just as an example, only supporting one version of one operating system cuts down the time and resources
you need for QA testing. To this end, we specified exact versions of Perl, MySQL, Red Hat Linux, and a
set of required packages and install options in addition to the things we bundled in our releases.
That was the theory anyway. Reality intruded a bit here in the form of cheap legacy hardware that would
work with some versions of Red Hat and not others. If we had a uniform cluster of hardware, we could
have gone as far as creating automated installs, maybe even network booting, but the best we were able to
do was keep our list of supported OS versions down to a handful. This is also a place where human nature
can become a problem. If you have a separate sysadmin group, they can get territorial when developers try
to dictate details of the OS to install. But that's another separate topic.
The automated build worked out very well. Eventually though, as we added more modules, the builds
started taking longer than we would have liked. Remember, we built them on every machine. Not the most
efficient thing to do.
The obvious next step would be binary distributions, possibly using RPMs, or just tar balls. Not trivial, but
not too bad if you can insist on one version of Perl and one hardware architecture. If we were only
concerned about distributing the CPAN modules, it might be possible to use something existing like PAR.
If you're interested in seeing this build system, the Krang CMS (which we used) comes with a version of it,
along with a pretty nice automated installer that checks dependencies and can be customized for different
OSes. (http://krangcms.com/) You could probably make your own for the CPAN stuff using CPANPLUS,
but you'd still need to do the Expect part and the non-CPAN builds.
QA
Upgrades
We didn't automate upgrades enough. Changes on a production system are tense for everyone, and its much
better to have them automated so that you can fully test them ahead of time and make the actual work to be
done in the upgrade process as dumb as possible. We didn't fully automate this, but we did fully automate
one of the crucial parts of it: data and database schema upgrades.
Our procedure was pretty simple, and coincidentally similar to the Ruby on Rails schema upgrade
approach. We kept the current schema version number in the database and the code version number in the
release package, and when we ran our upgrade utility it would look for any upgrade scripts with versions
between the one we were on and the one we wanted to go to. For example, when going from version 2.0 to
file:///Users/perrinharkins/Conferences/care_and_feeding.html 3/7
4. 3/3/12 No Title
3.0, it would look in the upgrade/ directory (also in our install bundle), find scripts named V2.1 and V3.0,
and run them in order. Usually they just ran SQL scripts, but sometimes we needed to do some things in
perl as well.
Our SQL upgrade scripts were written by hand. I tried a couple of schema diffing utilities but they were
pretty weak. They didn't pick up things like changes in default value for a column, or know what to do with
changes in foreign keys. Maybe someday someone will make a good one. Even then, it will still require
some manual intervention when columns and tables get renamed, or a table gets split into multiple tables.
One cool thing we discovered recently is a decent way to test these upgrades on real data. We always set up
a QA server with a copy of the current version of the system, and then try our upgrade procedure and
continue with testing. This works fine except that when you fix a bug and need to do it again, it takes
forever to set it up again. We tried VMWare snapshots, but the disk performance for Linux on VMWare
was so poor that we had to abandon it. Backups over the network seemed like they would take a long time
to restore. Then we tried LVM, the Linux Volume Manager. It let us take a snapshot just before the upgrade
test, and then roll back to it almost instantly.
Time-travel bug
Plugin System
Harder than it sounds
Simple factory works for most things
Configuration
The trouble with highly configurable software is that someone has to configure it. Our configuration options
expanded greatly as time went on, and we had to devise ways to make configuring it easier.
We started with a simple config file containing defaults and comments, like the one that comes with
Apache. In fact it was very much like that one because we used Config::ApacheFormat.
In the beginning, this worked fine. Config::ApacheFormat supplied a concept of blocks that inherit from
surrounding blocks, so that if you have a block for each server and a parameter that applies to all of them,
you can put it outside of those blocks and avoid repeating it. You can even override that parameter in the
one server that needs something different.
As the number of parameters grew, we realized a few things:
People will ignore configuration options they don't understand. Expectations are that if the server
starts, it must be okay.
A few of lines of comments in a config file is pretty weak documentation.
Long config files full of things that you hardly ever need to change are pointless and look daunting.
To deal with these problems, we started making extensive use of default values, so that things that didn't
usually get changed could be left out of the file. We ended up creating a fairly complex config system in
order to keep the file short. It does things like default several values based on one setting, e.g. setting the
domain name for a server allows it to default the cookie domain, the e-mail account to use as the From
file:///Users/perrinharkins/Conferences/care_and_feeding.html 4/7
5. 3/3/12 No Title
address on site-related mail, etc.
Of course this created the necessity to see what all of the values were defaulting to, so a config dumper
utility was created.
By the time we were done, we had moved to a level where using one of the complex config modules like
Config::Scoped probably would have been a better choice than maintaining our own. Well, Config::Scoped
still scares me, but something along those lines.
Testing
You all know the deal with testing. You have to have it. It's your only hope of being able to change the
code later without breaking everything. This point became very clear to me when I did a couple of big
refactorings and the test suite found all kind of problems I missed on my own.
For any large application, you'll probably end up needing some local test libraries that save setup work in
your test scripts. Ours had functions for doing common things like getting a WWW::Mechanize object all
logged in and ready to go.
When you're testing a large database-driven application, you need some strategies for generating and
cleaning up test data. We created a module for this called Arcos::TestData. (Arcos is the name of the
project.) The useage is like this:
my $creator = Arcos::TestData->new();
END { $creator->cleanup() }
# create an Arcos::DB::ContactInfo
my $contact_info = $creator->create_contact_info();
# create one with some values specified
my $aubrey = $creator->create_contact_info(first_name => 'George',
occupcation => 'housecat');
This one is simple, but some of them will create a whole tree of dependent objects with default values to
avoid needing to code all that in your test. When the END block runs, it deletes all the registered objects in
reverse order, to avoid referential integrity problems.
This seemed very clever at the time. However, after a while there were many situations that required special
handling, like web-based tests that cause objects to be created by another process. We had solutions for
each one, but they took programmer time, and at this point I think it might have been smarter to simply wipe
the whole schema at the end of a test script. We could have just truncated all the non-lookup tables pretty
quickly.
We got a lot of mileage out of Test::WWW::Mechanize.
Test::Class helps similar classes
Testing web interfaces - Mech tricks - Selenium
Smolder
Testing difficult things
file:///Users/perrinharkins/Conferences/care_and_feeding.html 5/7
6. 3/3/12 No Title
Code Formatting
This was the first project I worked on where we had an official Perl::Tidy spec and we all used it. Can I just
say it was awesome? That's all I wanted to say about it. Developers who worked on Perl::Tidy, you have
my thanks.
Version Control
A couple of years ago, only crackpots had opinions about version control. CVS was the only game in town.
These days, there's several good open source choices and everyone wants to tell you about their favorite
and why yours is crap.
I'm not going to go into the choice of tools too much here. You can fight that out amongst yourselves. We
used Subversion, but I'll try to talk about the theory without getting bogged down in the mechanics.
Most projects need at least two branches: one for maintenance of the release currently in production, and
one for new development. Most of you are familiar with this from open source projects.
Here are the main ideas we used for source control:
The main branch is for new development, but must be stable. Code should not to be checked in until
all tests pass. (But more about that later.)
When you make a release of the main branch, tag it. That means tagging the whole branch at that
point. Example: tag release 2.0. The main branch is now for development of 3.0.
For each main branch release, make a maintenance branch from the point where you tagged it.
Example: make a "2.x" branch for fixing bugs that show up in production.
When you make a bug fix release from a maintenance branch, tag the branch and then merge all
changes since the last release on that branch to the main branch. This is the only merging ever done
and it's always a merge of changes from one sequentially numbered tag to the next and into the main
branch. Example: tag the 2.x branch bug fix release as 2.1. Merge all changes from 2.0 to 2.1 to the
main development branch.
This is about as simple as you can make it, and it worked very well for us for a long time. Eventually
though, we discovered situations that didn't fit nicely. One of these was that sometimes there was a period
of a few days during QA where part of the team would still be working on bug fixes on the development
branch while others were ready to move on to working on features for the next major release. You can't do
both in the same place. One solution is to create the maintenance branch at that point, for doing the final
pre-release bug fixes, and let the main branch open up for major new development. It's a bad sign if you
need to do this often. Usually the team should be sharing things evenly enough to make it unnecessary.
Another problem, although less frequent than you might expect, is keeping the development branch stable at
all times. Some changes are too big to be done safely as a single commit. At that point it becomes necessary
to make a feature branch, working on it until the new feature is stable and all tests are passing again, and
then merging it back to the main development branch.
Beware of complicated merging, whether your tools support it well or not. A web app is not the Linux
kernel. If you find yourself needing to do bidirectional merges or frequent repeated merges to the point
file:///Users/perrinharkins/Conferences/care_and_feeding.html 6/7
7. 3/3/12 No Title
where you have trouble keeping track of what's been merged, you may need to take a look at your process
and see if there's some underlying reason. Maybe the source control system is being used as a substitute for
basic personal communication on your team, or has become a battleground for warring factions. Some
problems are easier to solve by talking to your co-workers than by devising a complex branching scheme.
file:///Users/perrinharkins/Conferences/care_and_feeding.html 7/7