Let's say you're a data scientist, and you've been asked to build infrastructure. Here I've distilled some best practices as an introduction for people who are new to DevOps.
This document discusses scaling Django applications on Google App Engine. It provides information on different approaches for running Django on App Engine, including Django non-rel, Djappengine, and using CloudSQL. It also discusses scaling at the technical level through services like memcache, task queues, and versioning, as well as non-technical scaling through organizational practices like removing bottlenecks, centralized tools, and security. The overall message is to focus on making good things and removing overhead through a minimalist approach.
Embrace Chaos - Introducing Chaos Engineering to your OrganizationPaul Osman
This document discusses introducing chaos engineering to an organization. It describes game days, which are planned fault injection exercises where engineers imagine failure scenarios, implement protections, and then cause the failures in production to test the protections and gain confidence in the system's resiliency. An example game day at UnderArmour is described where they tested failure scenarios like services becoming unavailable or experiencing high latency. Takeaways were that they learned a lot, identified valuable action items, and experienced less alert fatigue. Next steps discussed automating fault injection and doing more frequent game days across more teams.
Spark Tuning for Enterprise System AdministratorsAnya Bida
This document summarizes a presentation on Spark tuning for system administrators. It provides contact information for the presenters, Anya Bida and Rachel Warren, and addresses intermittent, reliable and optimal configuration of Spark applications (mySparkApp). Key points include setting initial Spark configuration parameters like spark.executor.memory, using fair schedulers in YARN and Spark, and techniques for handling memory issues like persisting RDDs to disk and using checkpointing to improve reliability.
Writing iOS apps in Javascript is not a new idea, anymore, at least since companies like Appcelerator (Titanium) built entire business models around corresponding frameworks.
And yet, Apple manages to open up two exciting new possibilities during the WWDC 2013: The release of the JavaScriptCore Framework as a public API on iOS and OS X, as well as the announcement of an Objective-C to Javascript Bridge.
I'd like to talk to you about my experiences with these new bridge-technologies, the new ways in which you can use them and finally present to you my own project; Node.app — a Node.js implementation for iOS.
Navigating the Incubator at the Apache Software FoundationBrett Porter
The document discusses navigating the Apache Incubator process for open source projects. It explains that the Incubator helps projects join the Apache Foundation by ensuring they meet legal standards and the collaborative development model. Projects go through an incubation phase where they build community, follow procedures, and release versions before attempting to graduate. Not all projects will graduate but the Incubator addresses important issues for projects seeking to join Apache.
Shopify uses Ruby on Rails, MySQL, and services like Memcached and Redis to handle a high volume of traffic. They optimize performance by caching content, running background jobs asynchronously, scaling database and application servers, and separating services like image processing. Monitoring tools help them understand bottlenecks and prioritize where to optimize next.
Developing APIs over a RESTful interface with JSON payloads is kind of the de-facto standard nowadays, but it still lacks an easy way to build it with a well-defined interface and document it to be used by others. What if we can leverage gRPC's fast, type-safe, and modern way of building APIs and still be able to provide an interface over REST/JSON ? Check this talk to find out how.
Scrum Control or Kanban Agility? You Can Have both, Using MetricsAtlassian
Are you someone who runs multiple stable Scrum teams, but is curious about migrating to Kanban? Do you think Kanban might lead to a loss of team control and productivity?
Join me as I first discuss the pros and cons of Scrum and Kanban. Because whichever you choose, it should be for the right reasons. Next, I'll talk about how I used JIRA Software's powerful reports and metrics to migrate three Scrum teams to Kanban, without losing agility or control. I'll highlight some aspects of our migration:
- Rituals - How to run metric driven planning meetings and retrospectives in a Kanban oriented team
- Estimation - From point estimation to story consistency
- Metrics - Fluency and cycle times for estimations
Marcio Ghiraldelli, Senior Quality Engineer, Atlassian
This document discusses monitoring and summarizes key points about collecting and analyzing data. It describes collecting log and metrics data using tools like Logstash and Collectd, storing and visualizing data with Graphite, InfluxDB and Grafana, and monitoring applications using JMX. The overall message is that monitoring is important for making fact-based decisions, and collecting data once and sharing it supports various teams like operations, development and business.
This document provides an overview of web operations concepts including LAMP stacks, virtualization, cloud computing, containers, container images, orchestrators like Kubernetes, and recommended resources. It traces the evolution from single application/server models to modern approaches that leverage containers, container images, and orchestration to manage large numbers of distributed applications and services. Key steps discussed include creating container images with Dockerfiles, running containers on container hosts, and using orchestrators like Kubernetes to coordinate containers and their connections to other resources like databases and domains.
Agile long term planning כנס הארגון האג'ילי Chai Forsher
The document discusses the importance of long term planning for software development projects at Intel. It provides metrics from Intel's software organization that show tracking well against commitments over quarters with a capacity of around 300 engineers. Specific metrics are shown for features completed on schedule by the driver team with targets for alpha and beta releases. The document advocates for prioritizing work at a top-down level while gaining bottom-up commitment from teams, and routinely updating plans each sprint with indicators to learn and improve processes through retrospectives.
This document discusses the use of open source tools in software development. It outlines the typical toolstacks used, including programming languages, databases, and frameworks. It notes that open source tools have accelerated startup development by reducing costs and allowing developers greater flexibility in choosing technologies. The document also discusses SAP's embrace of open source, both in powering their own products and services with open source software, as well as contributing their own open source projects.
DevDay 2013 - Building Startups and Minimum Viable ProductsBen Hall
DevDay (http://devday.pl),
20th of September 2013, Kraków
Video at http://www.youtube.com/watch?v=L4eTOvq2WmM&feature=c4-overview-vl&list=PLBMFXMTB7U74NdDghygvBaDcp67owVUUF
Lightning talks on best practices for product and engineering teams to experiment everywhere in their applications.
First presented at Optimizely's user conference, Opticon18 on September 12th, 2018.
Value streammapping cascadiait2014-mceniryChris McEniry
This document provides an overview of value stream mapping as a Lean technique. It discusses how to map the current and future states of a value stream by visualizing the flow of materials and information from raw materials to the customer. Key aspects covered include identifying processes, work centers, inputs/outputs, and wait times to understand where waste exists. The document uses an example of storage provisioning to demonstrate mapping multiple levels from individual processes to the full value stream and planning improvements.
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
Atmosphere Conference 2015: The 10 Myths of DevOpsPROIDEA
Speaker: Seth Vargo
Language: English
Although not officially coined until 2009, DevOps ideals have been explicitly discussed since at least 2006. Recently, however, the term "DevOps" has gained increasing popularity across a variety of fields and industries. DevOps is not a development methodology or technology; DevOps is an ideology. It is a way to facilitate organizational prosperity and growth while increasing each individual employee's happiness along the way. As DevOps has gained in prominence, a gap has been created between the original definition of DevOps and this new "enterprise-ready" buzzword.
For organizations beginning DevOps practices, this talk will provide a 10,000ft view of DevOps and how you can properly implement DevOps practices in your organization. For organizations that are currently practicing DevOps, this talk will cover common pitfalls, ways to sustain a happy culture, and new tips to foster organizational prosperity.
Visit our website: http://atmosphere-conference.com/
Stapling and patching the web of now - ForwardJS3, San FranciscoChristian Heilmann
This document summarizes a talk given by Chris Heilmann at ForwardJS in 2015. Heilmann discusses the state of web development technologies and how developers have focused too much on experimental features that are not ready for production use. This has led to a fragmented web where browsers implement features differently. He argues developers should focus on standardizing and improving existing web standards rather than constantly introducing new technologies. ES6 is highlighted as a priority for improving existing JavaScript.
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
NOTE: To complete hands on poriton in the time allotted, attendees should come with a newly created AWS (Amazon Web Services) Account and complete the other prerequisites found in the DataKitchen blog <http: />.
Do you need Ops in your new startup? If not now, then when? And...what is Ops?
Learn how to scale ruby-based distributed software infrastructure in the cloud to serve 4,000 requests per second, handle 400 updates per second, and achieve 99.97% uptime – all while building the product at the speed of light.
Unimpressed? Now try doing the above altogether without the Ops team, while growing your traffic 100x in 6 months and deploying 5-6 times a day!
It could be a dream, but luckily it's a reality that could be yours.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
Extending SAP SuccessFactors in the Cloud and how not to do itChris Paine
Extending SAP SuccessFactors using SAP Cloud Platform is an excellent idea, but there are many pitfalls. This presentation explains what not to do when creating your own extension project and why sometimes you might not even want to go there.
1. The document provides tips for surviving a hackathon and beyond. It recommends focusing on minimum viable products with core features, using existing frameworks and libraries instead of reinventing the wheel, thinking in components, using version control, commenting code, getting feedback from potential users, continuously learning, being part of a community, and having fun.
2. The tips are organized into sections for before, during, and after a hackathon, as well as general practices to always follow, such as continuous learning and being part of a community.
3. The document emphasizes trimming ideas down to minimum viable products in order to deliver functional products quickly, and suggests spending time researching existing solutions before writing new code to avoid duplicating
This document discusses achieving continuous delivery with Puppet. It notes that currently, development cycles are long, integration is painful, and deployments are difficult. It proposes that continuous integration, continuous delivery, DevOps practices, and an agile infrastructure using automation can help address these issues. Puppet is presented as a tool that can be used to help achieve an agile infrastructure and automate application deployments, though some challenges with its use are also discussed. The document advocates for moving away from a strict separation of roles between development and operations teams toward more shared responsibilities.
Puppet Camp Paris 2014: Achieving Continuous Delivery and DevOps with Puppet Puppet
This document discusses achieving continuous delivery with Puppet. It notes that currently, development cycles are long, integration is painful, and deployments are difficult. It proposes that continuous integration, continuous delivery, DevOps practices, and an agile infrastructure using automation can help address these issues. Puppet is presented as a tool that can be used to help achieve an agile infrastructure and automate application deployments, though some challenges with its use are also discussed. The document advocates for changing the relationship between development and operations teams to one of more shared responsibility.
Apache Spark the Hard Way: Challenges with Building an On-Prem Spark Analytic...Spark Summit
This document discusses building an on-premise analytics pipeline using Spark. It summarizes challenges faced including sharing a cluster for different environments, orchestrating multi-step jobs, performance concerns due to resource contention, debugging issues, and lack of development tooling. Solutions proposed include using Docker to isolate environments, Luigi for job orchestration, optimizing resource allocation, logging to Graylog, and developing custom tools. The next steps suggested are moving to the cloud to simplify development and enable broader insights.
Keeping Your DevOps Transformation From Crushing Your Ops Capacity Rundeck
Presentation by Damon Edwards, co-founder of Rundeck, at DevOps Enterprise Summit in San Francisco, November 13, 2017
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
Innovate Better Through Machine data AnalyticsHal Rottenberg
This talk was presented at IP Expo Manchester in May, 2016. the themes discussed are:
- how does machine data relate to devops?
- how can tracking this data lead to better outcomes?
- what types of data are important to track?
Kief Morris rethinks the infrastructure code delivery lifecycle, advocating for a shift towards composable infrastructure systems. We should shift to designing around deployable components rather than code modules, use more useful levels of abstraction, and drive design and deployment from applications rather than bottom-up, monolithic architecture and delivery.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Transcript: Details of description part II: Describing images in practice - T...
JustEnoughDevOpsForDataScientists
2. Just Enough DevOps for Data Scientists
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce
3. About Anya
Sr. Member of Technical Staff (SRE)
Salesforce Production Engineering
Salesforce Einstein Platform
Co-organizer SF Big Analytics
Spark Tuning
• Cheat-sheet
• Talks
Previously at Alpine Data, SRI
PhD Mayo Clinic, BS Johns Hopkins
@anyabida1
4. What I am going to talk about
What is DevOps
Salesforce Einstein Scales
Our goal
Top 10 tips
What’s next?
11. Tip 1: Plan for Failure
Take off that Data Scientist hat now.
12. Simple Dashboard with KPIs
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
13. Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
Simple Dashboard with KPIs
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Jos Boumans,
Salesforce DMP
slides
14. Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead
Simple Dashboard with KPIs
• Request & error rates
• Longest response times - upper
95th & 99th percentile
• Capacity
• Events
Collect metrics from every
machine.
Troubleshoot with all the
metrics at your disposal
15. Tip 2: Blue Green Deployments
https://docs.mobingi.com/official/guide/bg-deploy
Blue Machine
(old)
Green Machine
(new)
Users
16. Tip 3: Assume people make mistakes
Technical debt
• Every manual change
• Duplicate metrics
Scale down resources
• Terminate unused machines
• Janitor Monkey
• Understand the cost per job
• Jobs should not accumulate files on disk
17. Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
18. Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
19. Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
CREATE TABLE myConferences (
name text ,
city text,
early_bird timeuuid,
late_bird timeuuid,
discount_code string,
PRIMARY KEY ((name, city),
early_bird)
) WITH CLUSTERING ORDER BY
(early_bird DESC);
20. Tip 5: Configuration management
Network Connectivity
• 20 parameters
User Access
• 50 parameters
Deploy cluster (eg Mesos)
• 20 non-default parameters
Deploy a microservice
• 50 parameters
Schedule a job
• 3 parameters
SUM X 3 regions
X 20 metrics
Approx.6000
21. Templates for Automation
Service discovery
Creating dashboards
• Prod, non-prod, …
Log queries
Cost analysis
Tip 6: Pick a naming convention
<service>.
<environment>.
<region>.
<hostname>.
<metric>
22. Tip 7: Permissions
Every user, service, & job should have specific, auditable permissions.
Cluster Manager
Scheduler
IAM
IAM Roles
• User has an IAM Role
• Job has an IAM Role
• IAM Roles determine read /
write access to data
IAM
Out
Logs
IAM
In
23. Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh Kunjir (Duke University)
Tip 8: Understand resource allocation
Node Memory
Container Memory
8Gb
Node Memory
Container
Memory
8Gb
29. Getting started tips:
1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
30. Getting started tips: 1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be auditable
5. Configuration management
6. Pick a naming convention
7. Permissions
• user, service, job
8. Understand resource allocation
9. Monitor multiple viewpoints
10. Infrastructure as Code
31. Did we just automate ourselves
out of our jobs?
Nope. Now we have time to take on new projects and grow…
32. More info:
Jos Boumans,
Salesforce DMP
slides
SRE How Google Runs
Production Systems book
James Ward,
Engineering & Open Source
Ambassador at Salesforce
High Performance
spark book
33. More info:
Real Time ML Pipelines in Multi-Tenant Environments
Director of Engineering Karl Skucha & Lead Engineer Yan Yang
Introduction to Machine Learning
Engineering & Open Source Ambassador James Ward
Fantastic ML apps and how to build them
Principal Engineer, Matthew Tovbin
Fireworks - lighting up the sky with millions of Sparks
Director of Engineering Thomas Gerber
Functional Linear Algebra in Scala
Engineer & Professor Vlad Patryshev
Panel: Functional Programming for Machine Learning
Saturday @ 2:10pm —Complex Machine Learning Pipelines Made Easy
Machine Learning Engineers Till Bergmann & Chris Rupley
What DevOps actually IS???
-- cross section of infrastructure,
-- here’s all the things data scientists need to support themselves at scale
What DevOps actually IS???
-- cross section of infrastructure,
-- here’s all the things data scientists need to support themselves at scale
What DevOps actually IS???
-- cross section of infrastructure,
-- here’s all the things data scientists need to support themselves at scale
We need to build an infra that scales at the pace of Salesforce.
Salesforce Einstein is serving 475 Million predictions per day, and growing.So how do we do this from an infra perspective?
Even if you do everything right, machines WILL fail.
Collect metrics by installing statsd on every machine.
Should I automate the file removal
Better: keep your files in a distributed, versioned storage system
Infra team will monitor disk usage
Lets say I have a database with one replica on the east coast, and one replica on the west coast.
My database schema, here represented as a table, is as follows.
Right now my schemas are identical across data centers.
But if someone changes the schema for one of my replicas, I want to know immediately.
So my schemas should be auditable.
Q on our SRE team built the tool schaper to compare schemas. Schaper is generic - it supports ElasticSearch, Cassandra, MongoDb, etc., and provides a report when there is a schema change. I NEED TO KNOW when my schema changes. Obviously this could be very important information. Wink, wink.
Schaper is also modular - it’s plug-n-play. So this is an example of how we ensure changes are auditable. Cassandra: Keyspaces
Database replication
Schaper is one example of the type of tools that could be built to audit changes. From the audit, we can automate some action, depending on the particular change or …
We haven’t open sourced this tool, yet, just an example
When to automate? Any task that’s done 10x per year should be automated.
IAC should be correct, comprehensible, and composable.
How the number of clicks can be so big20clicks per cluster x 3regions x 20metrics
IAC
-- networking layer
-- provisioning
-- build and deploy
-- monitoring
-- manage
IAM definitionIdentity and access management
Authorization & Authentication
Ok, so I’ve got my container, which uses maybe 8Gb of RAM. Now I want to know if my container can launch on my cluster.
So my cluster has 3 nodes, let’s say, and 8Gb total RAM on each node. CAN MY 8GB CONTAINER LAUNCH ON THIS CLUSTER?
Since 4Gb of ram is used on each node, the cluster memory available is 4x3 = 12Gb, so if I only monitor cluster level metrics, then my container will fail to launch.
The image above shows sample connectivity for development, staging and production environments. It helps us verify there are no unintended rules etc..
Mention the three lone servers - should we review these? Are these supposed to be there?
This tool is not open sourced, but just an example of the internal tools we build - and you can too!
Double clicking a node shows its connectivity. This is useful for debugging issues.
We can filter by resource type, names, tags etc.
Taken together, hopefully I’ve convinced you that each piece of your infra should be deployed and managed as code.
This has been “Just enough devops for data scientists”
This has been “Just enough devops for data scientists”