The document describes using the Lithops framework to simplify serverless data pre-processing of images by extracting faces and aligning them. Lithops allows processing millions of images located in different storage locations in a serverless manner without having to write boilerplate code to access storage or partition data. It handles parallel execution, data access, and coordination to run a user-defined function that pre-processes each image on remote servers near the data. This avoids having to move large amounts of data and allows leveraging serverless cloud compute resources to speed up processing times significantly compared to running everything locally.
Netflix uses Conductor, an open source microservices orchestrator, to manage complex content processing workflows involving ingestion, encoding, localization, and delivery. Conductor provides visibility, control, and reuse of tasks through a task queuing system and workflow definitions. It has scaled to process millions of workflow executions across Netflix's content platform using a stateless architecture with Dynomite for storage and Dyno-Queues for task distribution.
The document compares the API gateways Kong and Traefik. Kong is easy to install and maintain, has great performance, and flexible integration with Kubernetes ingress. However, it lacks an official dashboard and plugins must be built in Lua. Traefik is very simple to configure and use and has strong integration with many cloud systems. However, it has less documentation and lacks some advanced features of Kong. Overall, both tools are suitable API gateways but Kong may be better for more complex needs while Traefik is simpler to use and get started with.
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"
An API gateway is at the core of how APIs are managed, secured, and presented within any web-based system. Although the technology has been in use for many years, it has not always kept pace with recent developments within the cloud native space, and many engineers are confused about how a cloud native API gateway relates to Kubernetes Ingress or a Service load balancer.
Join this session to learn about:
– The evolution of API gateways over the past ten years, and how the original problems they were solving have shifted in relation to cloud native technologies and workflow
– Current challenges of using an API gateway within Kubernetes: scaling the developer workflow; and supporting multiple architecture styles and protocols
– Strategies for exposing Kubernetes services and APIs at the edge of your system
– A brief guide to the (potential) future of cloud native API gateways
Cloud-native data infrastructure, such as Confluent and Kubernetes, combine well together to enable teams to use declarative spec based automation (GitOps) for deployment and management. In this talk, we'll showcase how to use GitOps to manage data in motion with Confluent for Kubernetes.
Why you should have a Schema Registry | David Hettler, Celonis SE
Kafka moves blobs of data from one place to another. That's its job. Kafka doesn't care what the blob is or what it looks like. This can be a boon because it's simple and it allows for a multitude of use cases. It can also be a curse in those cases when you DO want to have control over what that blob may look like.
Especially when you want to share a topic with another team it is important that you have clear-cut rules for what you want to allow on that topic and what not. Or in other words, you need a clearly defined interface contract.
In the RESTful world the case is clear: You would define an OpenAPI spec and give it to the other team. Done. What about the event streaming case though? Would you treat your topic like an API? If you're not sure about the answer then this talk is for you. You'll learn about the schema registry, a centralized data governance tool which allows you to define, and more importantly, enforce interface contracts among Kafka clients.
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays LIVE India 2021 - Connecting 1.3 billion digital innovators
May 20, 2021
REST the Events - REST APIs for Event-Driven Architecture
Mark Teehan, Principal Solution Engineer at Confluent APAC
Helm summit 2019_handling large number of charts_sept 10
Now that you have an application running in Kubernetes, what will your next steps be? Can you deploy this application to any cloud? If someone else wishes to install your helm chart would you have all necessary resources to deploy it successfully? Do you have a certification process to ensure your helm chart is enterprise ready? Creating a helm chart to deploy your application is just the first step, but now you need a process to ensure that the helm chart follows guidelines established by your enterprise and future versions of the chart are created efficiently as part of your CI/CD pipeline. In this presentation, you will learn about effective ways to create, organize and maintain enterprise grade helm charts. We will also discuss how our CI/CD pipeline is implemented using custom linter, verification test cases to make sure only certified charts are promoted into production.
Kubernetes-Native DevOps: For Apache Kafka® with Confluent
This document discusses Kubernetes-native DevOps for Apache Kafka using Confluent. It notes that Kubernetes is becoming the standard API for infrastructure and that Confluent's operator supports Kubernetes offerings. It also mentions that Confluent Platform can be run on Kubernetes using the operator across public cloud, private datacenter, edge locations, and local workstations.
Make Java Microservices Resilient with Istio - Mangesh - IBM - CC18
This presentation was made by Mangesh Patankar (Developer Advocate - IBM Cloud) as part of Container Conference 2018: www.containerconf.in.
"How do we make microservices resilient and fault-tolerant? How do we enforce policy decisions, such as fine-grained access control and rate limits? How do we enable timeouts/retries, health checks, etc.?
A service-mesh architecture attempts to resolve these issues by extracting the common resiliency features needed by a microservices framework away from the applications and frameworks and into the platform itself. Istio provides an easy way to create this service mesh."
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
This document provides an overview and demonstration of integrating cloud and on-premises applications using event-driven architecture. It discusses Function-as-a-Service (FaaS) platforms like Google Cloud Functions. It also describes Kafka Connect for scalably streaming data between Apache Kafka and other systems like Google Cloud Pub/Sub using source and sink connectors. The document demonstrates configuring Pub/Sub connectors to integrate Kafka topics with Cloud Pub/Sub topics.
With microservices and containers becoming mainstream, container orchestrators provide much of what the cluster (nodes and containers) needs. With container orchestrators' core focus on scheduling, discovery, and health at an infrastructure level, microservices are left with unmet, service-level needs, such as:
- Traffic management, routing, and resilient and secure communication between services
- Policy enforcement, rate-limiting, circuit breaking
- Visibility and monitoring with metrics, logs, and traces
- Load balancing and rollout/canary deployment support
Service meshes provide for these needs. In this session, we will dive into Istio - its components, capabilities, and extensibility. Istio envelops and integrates with other open source projects to deliver a full-service mesh. We'll explore these integrations and Istio's extensibility in terms of choice of proxies and adapters, such as nginMesh.
How kubernetes operators can rescue dev secops in midst of a pandemic updated
This document discusses how Kubernetes operators can help automate DevSecOps processes. It begins by explaining why organizations adopt containers and Kubernetes. It then discusses the challenges of managing containerized workloads at scale and how Kubernetes operators can provide orchestration and management. It provides an overview of what operators are, how the Operator Framework works, and the phases of building an operator. It demonstrates building a sample memcached operator in Golang using the Operator SDK tools. Finally, it discusses different options for installing operators like Helm, Ansible, and custom operators and provides some useful links for learning more.
Good observability is essential for modern software. It gives us confidence that our systems are working properly. And it also allows us to debug issues efficiently. In this talk, we’ll explore everything you need to know to start applying good observability to your projects. And we’ll see the most common pitfalls you need to be aware of. We will start with the tools and basic concepts in monitoring. And we’ll go over the 3 most common mistakes people make with it. Then we’ll see how to have automatic alerts to detect issues. And, we’ll touch on the principles for setting up good alerts. As a final step, we’ll see how to build our logging system and how to apply it in the most efficient way to debug issues easily.
This document discusses building cloud-native applications using microservices architecture on the .NET platform. It covers topics like leveraging containers and container orchestrators like Kubernetes, implementing common microservices patterns for communication, resiliency, and health checks. It also promotes the 12 factor app methodology and references resources for getting started with microservices on .NET including sample code. Case studies are provided of companies successfully using microservices at scale like Netflix, Uber, and WeChat.
In the era of cloud generation, the constant activity around workloads and containers create more vulnerabilities than an organization can keep up with. Using legacy security vendors doesn't set you up for success in the cloud. You’re likely spending undue hours chasing, triaging and patching a countless stream of cloud vulnerabilities with little prioritization.
Join us for this live webinar as we detail how to streamline host and container vulnerability workflows for your software teams wanting to build fast in the cloud. We'll be covering how to:
Get visibility into active packages and associated vulnerabilities
Reduce false positives by 98%
Reduce investigation time by 30%
Spot a legacy vendor looking to do some cloud washing
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The document discusses how Heroku leveraged Apache Kafka to realize the vision of an enterprise service bus (ESB). It defines what an ESB is according to analysts and vendors. Heroku defined the API as its ESB but faced bottlenecks and reliability issues. It transitioned to using Kafka with a pull-based architecture for independent development, scalability, and avoiding single points of failure. Heroku now uses Kafka for operational data pipelines and metrics aggregation. It provides examples of using Kafka topics and discusses next steps of implementing a schema registry and security.
This document discusses ideas and technologies for building scalable software systems and processing big data. It covers:
1. Bi-modal distribution of developers shapes architecture/design and the need for loosely/tightly coupled code.
2. Internet companies like Google and Facebook innovate at large scale using open source tools and REST architectures.
3. A REST architecture allows scalability, extensible development, and integration of tools/ideas from the internet for non-internet applications.
Your easy move to serverless computing and radically simplified data processing
- PyWren-IBM is a Python framework that allows users to easily scale Python code across serverless platforms like IBM Cloud Functions without having to learn the underlying storage or function as a service APIs.
- It addresses challenges like how to integrate existing applications and workflows with serverless computing, how to process large datasets without becoming a storage expert, and how to scale code without major disruptions.
- The document discusses use cases for PyWren-IBM like Monte Carlo simulations, protein folding, and stock price prediction that demonstrate how it can be used for high performance computing workloads.
The Open Neural Network Exchange (ONNX) standard has emerged for representing deep learning models in a standardized format. In this talk, I will discuss:
1. ONNX for exporting deep learning computation graphs, the ONNX-ML component of the specification for exporting both traditional ML models, common feature extraction, data transformation and post-processing steps.
2. How to use ONNX and the growing ecosystem of exporter libraries for common frameworks (including TensorFlow, PyTorch, Keras, scikit-learn and Apache SparkML) to deploy complete deep learning pipelines.
3. Best practices for working with and combining these disparate exporter toolkits, as well as highlight the gaps, issues, and missing pieces to be taken into account and still to be addressed.
The Download: Tech Talks by the HPCC Systems Community, Episode 11
Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.
Episode 11 includes Tech Talks featuring speakers from our community on topics covering Big Data solutions, Spark Integration and other ECL Tips leveraging the HPCC Systems platform.
1) Raj Chandrasekaran, CTO & Co-Founder, ClearFunnel - Scaling Data Science capabilities: Leveraging a homogeneous Big Data ecosystem
2) James McMullan, Software Engineer III, LexisNexis Risk Solutions - HDFS Connector Preview
3) Bob Foreman, Senior Software Engineer, LexisNexis Risk Solutions - Building a RELATIONal Dataset - A Valentine’s Day Special!
This document provides an introduction to C# programming language. It outlines the goals of the introduction, provides background on .NET and C#, compares C# to Java, discusses networking namespaces in C#, and references additional resources. The key points covered include an overview of .NET, how to get started with a simple C# application, differences between C# and Java, and links for further reading.
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC Meet ML/DL
Machine learning and deep learning (ML/DL) are becoming important workloads for high performance computing (HPC) as new algorithms are developed to solve business problems across many domains. Container technologies like Docker can help with the portability and scalability needs of ML/DL workloads on HPC systems. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications that can help run MPI jobs and ML/DL pipelines on HPC systems, though it currently lacks some features important for HPC like advanced job scheduling capabilities. Running an HPC-specific job scheduler like IBM Spectrum LSF on top of Kubernetes is one approach to address current gaps in
Big Data Developers Moscow Meetup 1 - sql on hadoop
This document summarizes a meetup about Big Data and SQL on Hadoop. The meetup included discussions on what Hadoop is, why SQL on Hadoop is useful, what Hive is, and introduced IBM's BigInsights software for running SQL on Hadoop with improved performance over other solutions. Key topics included HDFS file storage, MapReduce processing, Hive tables and metadata storage, and how BigInsights provides a massively parallel SQL engine instead of relying on MapReduce.
This chapter discusses software development security. It covers topics like programming concepts, compilers and interpreters, procedural vs object-oriented languages, application development methods like waterfall vs agile models, databases, object-oriented design, assessing software vulnerabilities, and artificial intelligence techniques. The key aspects are securing the entire software development lifecycle from initial planning through operation and disposal, using secure coding practices, testing for vulnerabilities, and continually improving processes.
The document discusses distributed deep learning using Hopsworks. It describes how Hopsworks can be used for distributed training, hyperparameter optimization, and model serving. Hopsworks provides a feature store, distributed file system, and workflows for building scalable machine learning pipelines. It supports frameworks like TensorFlow, PyTorch, and Spark for distributed deep learning tasks like data parallel training using collective all-reduce strategies.
This document summarizes a presentation on using SQL Server Integration Services (SSIS) with HDInsight. It introduces Tillmann Eitelberg and Oliver Engels, who are experts on SSIS and HDInsight. The agenda covers traditional ETL processes, challenges of big data, useful Apache Hadoop components for ETL, clarifying statements about Hadoop and ETL, using Hadoop in the ETL process, how SSIS is more than just an ETL tool, tools for working with HDInsight, getting started with Azure HDInsight, and using SSIS to load and transform data on HDInsight clusters.
The document summarizes lessons learned from building a real-time network traffic analyzer in C/C++. Key points include:
- Libpcap was used for traffic capturing as it is cross-platform, supports PF_RING, and has a relatively easy API.
- SQLite was used for data storage due to its small footprint, fast performance, embeddability, SQL support, and B-tree indexing.
- A producer-consumer model with a blocking queue was implemented to handle packet processing in multiple threads.
- Memory pooling helped address performance issues caused by excessive malloc calls during packet aggregation.
- Custom spin locks based on atomic operations improved performance over mutexes on FreeBSD/
Come può .NET contribuire alla Data Science? Cosa è .NET Interactive? Cosa c'entrano i notebook? E Apache Spark? E il pythonismo? E Azure? Vediamo in questa sessione di mettere in ordine le idee.
This document introduces several big data technologies that are less well known than traditional solutions like Hadoop and Spark. It discusses Apache Flink for stream processing, Apache Samza for processing real-time data from Kafka, Google Cloud Dataflow which provides a managed service for batch and stream data processing, and StreamSets Data Collector for collecting and processing data in real-time. It also covers machine learning technologies like TensorFlow for building dataflow graphs, and cognitive computing services from Microsoft. The document aims to think beyond traditional stacks and learn from companies building pipelines at scale.
Open Source Software, Distributed Systems, Database as a Cloud Service
- Treasure Data is a database as a cloud service company that collects and stores customer data beyond the cloud [1].
- It uses open source software like Fluentd and MessagePack to easily integrate and collect data from customers [2]. It also uses open source distributed systems software like Hadoop and Presto to store, process and query large amounts of customer data [3].
- As a database service, it needs to share computer resources securely for many customers. It contributes to open source to build and maintain the distributed systems software that powers its cloud database service [4].
How bol.com makes sense of its logs, using the Elastic technology stack.
Bol.com uses the Elastic (ELK) stack to make sense of logs from over 1,600 servers and 500-600 million events per day. Key aspects of their system include:
1. Shipping JSON-formatted log events from sources like Apache, databases, and applications to Redis queues to allow multiple Logstash instances to process events in real-time without data loss.
2. Enriching log events with information like request IDs to correlate requests across services, and IP-to-role mappings to identify client roles.
3. Using Elasticsearch aggregations and transformations to generate a directed graph of service dependencies based on logs, to help understand their distributed architecture.
Mihai Nuta has over 14 years of experience developing computer systems and applications. He has extensive experience with technologies like Visual Basic, SQL, Oracle, and .NET. Currently he works as a senior programmer analyst at Xerox Corporation developing applications for General Motors, including a legal document application and tools for processing images and documents. He has strong skills in databases, web and client/server development, and software like Microsoft Office, SQL Server, and Visual Studio.
Programming 8051 with C and using Keil uVision5.pptx
This document provides an introduction to embedded C programming using the Keil development environment. It defines embedded systems and describes how C became the dominant programming language for embedded applications. The document outlines the basics of embedded C, including common data types, compilers versus cross compilers, and how to set up a basic project in Keil uVision. It also includes examples of simple embedded C programs to blink an LED and output the maximum value from an array to a port.
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Kubeflow Pipelines and TensorFlow Extended (TFX) together is end-to-end platform for deploying production ML pipelines. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system. In this talk we describe how how to run TFX in hybrid cloud environments.
Pivotal Container Service il modo più semplice per gestire Kubernetes in azie...VMware Tanzu
Pivotal Container Service il modo più semplice per gestire Kubernetes in azienda (Pivotal Cloud-Native Workshop: Milan)
Fabio Marinelli & Mattia Gandolfi
7 February 2018
Read to learn what Mule Runtime Fabric (RTF) and Anypoint RTF are, how you can leverage these integration engines, the best adoption strategies, and the right way to conduct the risk-cost-benefit analysis for your business.
Netflix uses Conductor, an open source microservices orchestrator, to manage complex content processing workflows involving ingestion, encoding, localization, and delivery. Conductor provides visibility, control, and reuse of tasks through a task queuing system and workflow definitions. It has scaled to process millions of workflow executions across Netflix's content platform using a stateless architecture with Dynomite for storage and Dyno-Queues for task distribution.
The document compares the API gateways Kong and Traefik. Kong is easy to install and maintain, has great performance, and flexible integration with Kubernetes ingress. However, it lacks an official dashboard and plugins must be built in Lua. Traefik is very simple to configure and use and has strong integration with many cloud systems. However, it has less documentation and lacks some advanced features of Kong. Overall, both tools are suitable API gateways but Kong may be better for more complex needs while Traefik is simpler to use and get started with.
SoftwareCircus 2020 "The Past, Present, and Future of Cloud Native API Gateways"Daniel Bryant
An API gateway is at the core of how APIs are managed, secured, and presented within any web-based system. Although the technology has been in use for many years, it has not always kept pace with recent developments within the cloud native space, and many engineers are confused about how a cloud native API gateway relates to Kubernetes Ingress or a Service load balancer.
Join this session to learn about:
– The evolution of API gateways over the past ten years, and how the original problems they were solving have shifted in relation to cloud native technologies and workflow
– Current challenges of using an API gateway within Kubernetes: scaling the developer workflow; and supporting multiple architecture styles and protocols
– Strategies for exposing Kubernetes services and APIs at the edge of your system
– A brief guide to the (potential) future of cloud native API gateways
Cloud-native data infrastructure, such as Confluent and Kubernetes, combine well together to enable teams to use declarative spec based automation (GitOps) for deployment and management. In this talk, we'll showcase how to use GitOps to manage data in motion with Confluent for Kubernetes.
Why you should have a Schema Registry | David Hettler, Celonis SEHostedbyConfluent
Kafka moves blobs of data from one place to another. That's its job. Kafka doesn't care what the blob is or what it looks like. This can be a boon because it's simple and it allows for a multitude of use cases. It can also be a curse in those cases when you DO want to have control over what that blob may look like.
Especially when you want to share a topic with another team it is important that you have clear-cut rules for what you want to allow on that topic and what not. Or in other words, you need a clearly defined interface contract.
In the RESTful world the case is clear: You would define an OpenAPI spec and give it to the other team. Done. What about the event streaming case though? Would you treat your topic like an API? If you're not sure about the answer then this talk is for you. You'll learn about the schema registry, a centralized data governance tool which allows you to define, and more importantly, enforce interface contracts among Kafka clients.
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...apidays
apidays LIVE India 2021 - Connecting 1.3 billion digital innovators
May 20, 2021
REST the Events - REST APIs for Event-Driven Architecture
Mark Teehan, Principal Solution Engineer at Confluent APAC
Helm summit 2019_handling large number of charts_sept 10Shikha Srivastava
Now that you have an application running in Kubernetes, what will your next steps be? Can you deploy this application to any cloud? If someone else wishes to install your helm chart would you have all necessary resources to deploy it successfully? Do you have a certification process to ensure your helm chart is enterprise ready? Creating a helm chart to deploy your application is just the first step, but now you need a process to ensure that the helm chart follows guidelines established by your enterprise and future versions of the chart are created efficiently as part of your CI/CD pipeline. In this presentation, you will learn about effective ways to create, organize and maintain enterprise grade helm charts. We will also discuss how our CI/CD pipeline is implemented using custom linter, verification test cases to make sure only certified charts are promoted into production.
Kubernetes-Native DevOps: For Apache Kafka® with Confluentconfluent
This document discusses Kubernetes-native DevOps for Apache Kafka using Confluent. It notes that Kubernetes is becoming the standard API for infrastructure and that Confluent's operator supports Kubernetes offerings. It also mentions that Confluent Platform can be run on Kubernetes using the operator across public cloud, private datacenter, edge locations, and local workstations.
This presentation was made by Mangesh Patankar (Developer Advocate - IBM Cloud) as part of Container Conference 2018: www.containerconf.in.
"How do we make microservices resilient and fault-tolerant? How do we enforce policy decisions, such as fine-grained access control and rate limits? How do we enable timeouts/retries, health checks, etc.?
A service-mesh architecture attempts to resolve these issues by extracting the common resiliency features needed by a microservices framework away from the applications and frameworks and into the platform itself. Istio provides an easy way to create this service mesh."
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...Rich Lee
This document provides an overview and demonstration of integrating cloud and on-premises applications using event-driven architecture. It discusses Function-as-a-Service (FaaS) platforms like Google Cloud Functions. It also describes Kafka Connect for scalably streaming data between Apache Kafka and other systems like Google Cloud Pub/Sub using source and sink connectors. The document demonstrates configuring Pub/Sub connectors to integrate Kafka topics with Cloud Pub/Sub topics.
Istio: Using nginMesh as the service proxyLee Calcote
With microservices and containers becoming mainstream, container orchestrators provide much of what the cluster (nodes and containers) needs. With container orchestrators' core focus on scheduling, discovery, and health at an infrastructure level, microservices are left with unmet, service-level needs, such as:
- Traffic management, routing, and resilient and secure communication between services
- Policy enforcement, rate-limiting, circuit breaking
- Visibility and monitoring with metrics, logs, and traces
- Load balancing and rollout/canary deployment support
Service meshes provide for these needs. In this session, we will dive into Istio - its components, capabilities, and extensibility. Istio envelops and integrates with other open source projects to deliver a full-service mesh. We'll explore these integrations and Istio's extensibility in terms of choice of proxies and adapters, such as nginMesh.
How kubernetes operators can rescue dev secops in midst of a pandemic updatedShikha Srivastava
This document discusses how Kubernetes operators can help automate DevSecOps processes. It begins by explaining why organizations adopt containers and Kubernetes. It then discusses the challenges of managing containerized workloads at scale and how Kubernetes operators can provide orchestration and management. It provides an overview of what operators are, how the Operator Framework works, and the phases of building an operator. It demonstrates building a sample memcached operator in Golang using the Operator SDK tools. Finally, it discusses different options for installing operators like Helm, Ansible, and custom operators and provides some useful links for learning more.
Good observability is essential for modern software. It gives us confidence that our systems are working properly. And it also allows us to debug issues efficiently. In this talk, we’ll explore everything you need to know to start applying good observability to your projects. And we’ll see the most common pitfalls you need to be aware of. We will start with the tools and basic concepts in monitoring. And we’ll go over the 3 most common mistakes people make with it. Then we’ll see how to have automatic alerts to detect issues. And, we’ll touch on the principles for setting up good alerts. As a final step, we’ll see how to build our logging system and how to apply it in the most efficient way to debug issues easily.
This document discusses building cloud-native applications using microservices architecture on the .NET platform. It covers topics like leveraging containers and container orchestrators like Kubernetes, implementing common microservices patterns for communication, resiliency, and health checks. It also promotes the 12 factor app methodology and references resources for getting started with microservices on .NET including sample code. Case studies are provided of companies successfully using microservices at scale like Netflix, Uber, and WeChat.
In the era of cloud generation, the constant activity around workloads and containers create more vulnerabilities than an organization can keep up with. Using legacy security vendors doesn't set you up for success in the cloud. You’re likely spending undue hours chasing, triaging and patching a countless stream of cloud vulnerabilities with little prioritization.
Join us for this live webinar as we detail how to streamline host and container vulnerability workflows for your software teams wanting to build fast in the cloud. We'll be covering how to:
Get visibility into active packages and associated vulnerabilities
Reduce false positives by 98%
Reduce investigation time by 30%
Spot a legacy vendor looking to do some cloud washing
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...confluent
The document discusses how Heroku leveraged Apache Kafka to realize the vision of an enterprise service bus (ESB). It defines what an ESB is according to analysts and vendors. Heroku defined the API as its ESB but faced bottlenecks and reliability issues. It transitioned to using Kafka with a pull-based architecture for independent development, scalability, and avoiding single points of failure. Heroku now uses Kafka for operational data pipelines and metrics aggregation. It provides examples of using Kafka topics and discusses next steps of implementing a schema registry and security.
This document discusses ideas and technologies for building scalable software systems and processing big data. It covers:
1. Bi-modal distribution of developers shapes architecture/design and the need for loosely/tightly coupled code.
2. Internet companies like Google and Facebook innovate at large scale using open source tools and REST architectures.
3. A REST architecture allows scalability, extensible development, and integration of tools/ideas from the internet for non-internet applications.
Your easy move to serverless computing and radically simplified data processinggvernik
- PyWren-IBM is a Python framework that allows users to easily scale Python code across serverless platforms like IBM Cloud Functions without having to learn the underlying storage or function as a service APIs.
- It addresses challenges like how to integrate existing applications and workflows with serverless computing, how to process large datasets without becoming a storage expert, and how to scale code without major disruptions.
- The document discusses use cases for PyWren-IBM like Monte Carlo simulations, protein folding, and stock price prediction that demonstrate how it can be used for high performance computing workloads.
End-to-End Deep Learning Deployment with ONNXNick Pentreath
The Open Neural Network Exchange (ONNX) standard has emerged for representing deep learning models in a standardized format. In this talk, I will discuss:
1. ONNX for exporting deep learning computation graphs, the ONNX-ML component of the specification for exporting both traditional ML models, common feature extraction, data transformation and post-processing steps.
2. How to use ONNX and the growing ecosystem of exporter libraries for common frameworks (including TensorFlow, PyTorch, Keras, scikit-learn and Apache SparkML) to deploy complete deep learning pipelines.
3. Best practices for working with and combining these disparate exporter toolkits, as well as highlight the gaps, issues, and missing pieces to be taken into account and still to be addressed.
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
Join us as we continue this series of webinars specifically designed for the community by the community with the goal to share knowledge, spark innovation and further build and link the relationships within our HPCC Systems community.
Episode 11 includes Tech Talks featuring speakers from our community on topics covering Big Data solutions, Spark Integration and other ECL Tips leveraging the HPCC Systems platform.
1) Raj Chandrasekaran, CTO & Co-Founder, ClearFunnel - Scaling Data Science capabilities: Leveraging a homogeneous Big Data ecosystem
2) James McMullan, Software Engineer III, LexisNexis Risk Solutions - HDFS Connector Preview
3) Bob Foreman, Senior Software Engineer, LexisNexis Risk Solutions - Building a RELATIONal Dataset - A Valentine’s Day Special!
This document provides an introduction to C# programming language. It outlines the goals of the introduction, provides background on .NET and C#, compares C# to Java, discusses networking namespaces in C#, and references additional resources. The key points covered include an overview of .NET, how to get started with a simple C# application, differences between C# and Java, and links for further reading.
When HPC meet ML/DL: Manage HPC Data Center with KubernetesYong Feng
When HPC Meet ML/DL
Machine learning and deep learning (ML/DL) are becoming important workloads for high performance computing (HPC) as new algorithms are developed to solve business problems across many domains. Container technologies like Docker can help with the portability and scalability needs of ML/DL workloads on HPC systems. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications that can help run MPI jobs and ML/DL pipelines on HPC systems, though it currently lacks some features important for HPC like advanced job scheduling capabilities. Running an HPC-specific job scheduler like IBM Spectrum LSF on top of Kubernetes is one approach to address current gaps in
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
This document summarizes a meetup about Big Data and SQL on Hadoop. The meetup included discussions on what Hadoop is, why SQL on Hadoop is useful, what Hive is, and introduced IBM's BigInsights software for running SQL on Hadoop with improved performance over other solutions. Key topics included HDFS file storage, MapReduce processing, Hive tables and metadata storage, and how BigInsights provides a massively parallel SQL engine instead of relying on MapReduce.
This chapter discusses software development security. It covers topics like programming concepts, compilers and interpreters, procedural vs object-oriented languages, application development methods like waterfall vs agile models, databases, object-oriented design, assessing software vulnerabilities, and artificial intelligence techniques. The key aspects are securing the entire software development lifecycle from initial planning through operation and disposal, using secure coding practices, testing for vulnerabilities, and continually improving processes.
The document discusses distributed deep learning using Hopsworks. It describes how Hopsworks can be used for distributed training, hyperparameter optimization, and model serving. Hopsworks provides a feature store, distributed file system, and workflows for building scalable machine learning pipelines. It supports frameworks like TensorFlow, PyTorch, and Spark for distributed deep learning tasks like data parallel training using collective all-reduce strategies.
This document summarizes a presentation on using SQL Server Integration Services (SSIS) with HDInsight. It introduces Tillmann Eitelberg and Oliver Engels, who are experts on SSIS and HDInsight. The agenda covers traditional ETL processes, challenges of big data, useful Apache Hadoop components for ETL, clarifying statements about Hadoop and ETL, using Hadoop in the ETL process, how SSIS is more than just an ETL tool, tools for working with HDInsight, getting started with Azure HDInsight, and using SSIS to load and transform data on HDInsight clusters.
The document summarizes lessons learned from building a real-time network traffic analyzer in C/C++. Key points include:
- Libpcap was used for traffic capturing as it is cross-platform, supports PF_RING, and has a relatively easy API.
- SQLite was used for data storage due to its small footprint, fast performance, embeddability, SQL support, and B-tree indexing.
- A producer-consumer model with a blocking queue was implemented to handle packet processing in multiple threads.
- Memory pooling helped address performance issues caused by excessive malloc calls during packet aggregation.
- Custom spin locks based on atomic operations improved performance over mutexes on FreeBSD/
Come può .NET contribuire alla Data Science? Cosa è .NET Interactive? Cosa c'entrano i notebook? E Apache Spark? E il pythonismo? E Azure? Vediamo in questa sessione di mettere in ordine le idee.
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
This document introduces several big data technologies that are less well known than traditional solutions like Hadoop and Spark. It discusses Apache Flink for stream processing, Apache Samza for processing real-time data from Kafka, Google Cloud Dataflow which provides a managed service for batch and stream data processing, and StreamSets Data Collector for collecting and processing data in real-time. It also covers machine learning technologies like TensorFlow for building dataflow graphs, and cognitive computing services from Microsoft. The document aims to think beyond traditional stacks and learn from companies building pipelines at scale.
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
- Treasure Data is a database as a cloud service company that collects and stores customer data beyond the cloud [1].
- It uses open source software like Fluentd and MessagePack to easily integrate and collect data from customers [2]. It also uses open source distributed systems software like Hadoop and Presto to store, process and query large amounts of customer data [3].
- As a database service, it needs to share computer resources securely for many customers. It contributes to open source to build and maintain the distributed systems software that powers its cloud database service [4].
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
Bol.com uses the Elastic (ELK) stack to make sense of logs from over 1,600 servers and 500-600 million events per day. Key aspects of their system include:
1. Shipping JSON-formatted log events from sources like Apache, databases, and applications to Redis queues to allow multiple Logstash instances to process events in real-time without data loss.
2. Enriching log events with information like request IDs to correlate requests across services, and IP-to-role mappings to identify client roles.
3. Using Elasticsearch aggregations and transformations to generate a directed graph of service dependencies based on logs, to help understand their distributed architecture.
Mihai Nuta has over 14 years of experience developing computer systems and applications. He has extensive experience with technologies like Visual Basic, SQL, Oracle, and .NET. Currently he works as a senior programmer analyst at Xerox Corporation developing applications for General Motors, including a legal document application and tools for processing images and documents. He has strong skills in databases, web and client/server development, and software like Microsoft Office, SQL Server, and Visual Studio.
Programming 8051 with C and using Keil uVision5.pptxShyamkant Vasekar
This document provides an introduction to embedded C programming using the Keil development environment. It defines embedded systems and describes how C became the dominant programming language for embedded applications. The document outlines the basics of embedded C, including common data types, compilers versus cross compilers, and how to set up a basic project in Keil uVision. It also includes examples of simple embedded C programs to blink an LED and output the maximum value from an array to a port.
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
Kubeflow Pipelines and TensorFlow Extended (TFX) together is end-to-end platform for deploying production ML pipelines. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system. In this talk we describe how how to run TFX in hybrid cloud environments.
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumData Driven Innovation
This document provides an overview and demonstration of Streamsets Data Collector (SDC) and SDC Edge for ingesting data from IoT devices and the edge. It discusses the challenges of ingesting data from distributed edge locations. It then describes the key features of SDC for designing flexible data flows with minimal coding. It also introduces SDC Edge, a lightweight agent for running SDC pipelines on edge devices. The presentation includes demonstrations of using SDC with Kafka and using SDC Edge to ingest and analyze data from Android devices and send it to Elasticsearch. It concludes with discussing additional topics and providing useful links.
Similar to Toward Hybrid Cloud Serverless Transparency with Lithops Framework (20)
k6 is an open source load testing tool that was acquired by Grafana in 2021. It allows teams to test reliability before problems impact users by simulating user traffic to applications and services. The k6-operator allows running distributed k6 tests on Kubernetes and integrates k6 into developer workflows. It provides many options for configuring and scaling tests through JavaScript scripts.
This document discusses extending kubectl functionality through plugins. It introduces kubectl plugins and Krew, a plugin manager for kubectl. It covers developing and publishing plugins, including writing plugins in any language, creating a krew manifest, and automating plugin updates through GitHub actions.
Enhancing Data Protection Workflows with Kanister And Argo WorkflowsLibbySchulze
This document discusses enhancing data protection workflows with Kanister and Argo Workflows. It begins with discussing the need for data protection of stateful workloads on Kubernetes and challenges with current approaches. It then provides an overview of Kanister, an open source tool for application-level data protection on Kubernetes. Kanister uses custom resources and functions to abstract away complex data protection workflows. It also works with Argo Workflows to scale parallel data operations. The document concludes with a demo of using Kanister's CSI functions to create and restore snapshots and scaling snapshots with Argo Workflows.
This document discusses 10 common fallacies in platform engineering. It begins by introducing the speaker and topic, which are 10 fallacies seen in platform engineering and how to mitigate them. Some of the fallacies discussed include prioritizing the wrong procedures, relying only on visualizations, trying to replace all tools at once, providing too much freedom without constraints, and trying to compete directly with large cloud providers. The goal of platform engineering is to standardize processes and reduce cognitive load on developers and operations teams.
This document introduces Fluvio, an open-source data streaming platform founded by the creators of Nginx's open-source service mesh. It provides a programmable platform for data in motion that can be used to build analytics pipelines, track user behavior and sensor data, and enable fraud detection. Fluvio offers better performance and lower costs compared to Kafka. The roadmap details ongoing development of Fluvio and its cloud offering from InfinyOn, including adding smart modules, connectors, and pipelines.
The document summarizes a CNCF webinar about Project Updates with LitmusChaos. The webinar agenda covers what's new in LitmusChaos 2.0, use cases from iFood and HaloDoc, and a demo of making an e-commerce application resilient. For iFood, the challenges of a growing online food delivery platform moving to microservices are described. For HaloDoc, the service reliability challenges of a hybrid cloud-native healthcare application are covered. LitmusChaos helps both companies by providing experiments, observability, and automation to test reliability.
This document discusses Sigstore, a new standard for signing, verifying, and protecting software. It provides three key pieces - Cosign for signing things, Fulcio for signing with short-lived certificates, and Rekor for verification and monitoring. Sigstore allows signing of software artifacts, documents like SBOMs and attestations, and git commits. Attestations provide signed statements about software, and Sigstore ensures their integrity. Sigstore supports achieving different levels in the SLSA framework for supply chain security. It also aligns with frameworks from NIST and CIS. Tools like Gitsign allow "keyless" signing of git commits to meet requirements for verified history and two-person review.
This document summarizes a presentation on avoiding configuration drift with Argo CD. It introduces configuration drift as differences between environments that are supposed to be similar, such as undocumented changes or "cowboy deployments". It then discusses how configuration drift can occur in Kubernetes and strategies like GitOps and Argo CD that use bidirectional synchronization between code repositories and clusters. This helps guarantee clusters always deploy the desired configuration from Git and can self-heal if manual changes are made. The presentation includes a live demo of these concepts using Rancher and Argo CD.
This document summarizes a virtual meetup on app modernization. It discusses that 79% of app modernization efforts fail, with the average cost being $1.5 million and time being 16 months. App modernization aims to improve scalability, engineering velocity, and remove technical debt. Common obstacles include complexity, technical debt, and lack of resources. Modernizing just the UI without the business logic is ineffective. The document recommends prioritizing modernizing the business logic first to achieve the most benefits, and provides guidance for successful modernization projects such as defining requirements, securing resources, training teams, and providing the right tools.
CNCF Live Webinar: Low Footprint Java Containers with GraalVMLibbySchulze
GraalVM Native Image can compile Java applications into native executables for improved performance and lower resource usage compared to the traditional Java Runtime. It works by ahead-of-time compiling Java applications into native images that have a smaller footprint when deployed in containers and start faster than traditionally interpreted Java applications. Native images generated by GraalVM Native Image were shown to use half the memory and achieve better throughput than the same application running on the Java Runtime when deployed to Oracle Kubernetes Engine.
This document summarizes a workshop about using EnRoute and Open Policy Agent (OPA) to enforce policies at the ingress level. It includes an overview of EnRoute and OPA, a system diagram, differences between EnRoute and other ingress controllers, how OPA can be used for attribute-based access control (ABAC). It then demonstrates configuring EnRoute with OPA integration, installing an example workload secured with JWT, enforcing JWT claims using an OPA policy, and verifying the policy is applied.
1. An air-gapped Kubernetes environment restricts internet access to increase security by preventing downloads of malicious data and attacks from outside entities.
2. Implementing an air-gapped Kubernetes cluster is more difficult than a standard one and requires additional effort for maintenance, but provides protections such as preventing data exfiltration by third parties.
3. Deploying components like the ELK stack in an air-gapped environment requires manually downloading, transferring, and installing charts and images due to the lack of access to external registries and repositories. Processes and permissions must be tightly controlled to maintain security.
CNCF_ A step to step guide to platforming your delivery setup.pdfLibbySchulze
1. This document provides a step-by-step guide to establishing an internal developer platform to help teams build applications more efficiently.
2. It recommends treating the platform as a product with a product owner, roadmap, and user interviews. Prioritize components based on how much developer and operations time they save.
3. Agree on core technologies like containers and Kubernetes as the minimum standard. Identify evangelistic teams to pilot the initial platform offerings.
CNCF Online - Data Protection Guardrails using Open Policy Agent (OPA).pdfLibbySchulze
The document discusses a presentation by Joey Lei and Anders Eknert on data protection guardrails using Open Policy Agent (OPA). It provides background on the speakers and an overview of OPA, including how it works, the Rego policy language, and OPA's open source community. It then discusses how data protection policies can be enforced as code using OPA to provide guardrails for infrastructure-as-code deployments and prevent misconfigurations that could compromise availability, integrity or confidentiality of data. Examples of policy checks for recovery objectives, retention, backup strategies and exfiltration protection are provided.
This document summarizes a presentation about securing Windows workloads in a hybrid Kubernetes cluster. It begins with an overview of Calico and describes what a hybrid cluster is. It then discusses running Windows containers and the need to choose container base images wisely. The presentation covers how to secure Windows workloads using Calico for networking and policy enforcement. It concludes with information about demo resources and links for further reading.
This document summarizes a presentation about securing Windows workloads in a hybrid Kubernetes cluster. It begins with an overview of Calico and describes what a hybrid cluster is. It then discusses running Windows containers and the need to choose container base images wisely. The presentation covers how Calico can be used to secure Windows workloads by providing networking and policy enforcement capabilities. It concludes with information about demo environments and resources for working with Windows and Kubernetes.
Advancements in Kubernetes Workload Identity for AzureLibbySchulze
This document summarizes Azure Workload Identity, a new solution for providing managed identities to Kubernetes workloads. It discusses the limitations of the existing AAD Pod Identity solution and introduces the motivations and architecture of Azure Workload Identity. Key points include that it eliminates identity assignment wait times, dependencies on Kubernetes custom resource definitions and the IMDS, and supports non-Azure Kubernetes clusters and non-Linux nodes. Integrations, the roadmap, and resources are also outlined.
Megalive99 Situs Betting Online Gacor TerpercayaMegalive99
Megalive99 telah menetapkan standar tinggi untuk platform taruhan online. Berbagai macam permainan, desain ramah pengguna, dan transaksi aman menjadikannya pilihan utama para petaruh.
10th International Conference on Networks, Mobile Communications and Telema...ijp2p
10th International Conference on Networks, Mobile Communications and
Telematics (NMOCT 2024)
Scope
10th International Conference on Networks, Mobile Communications and Telematics (NMOCT 2024) is a forum for presenting new advances and research results in the fields of Network, Mobile communications, and Telematics. The aim of the conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works, and industrial experiences that describe significant advances in the following areas but are not limited to.
Topics of interest include, but are not limited to, the following:
Mobile Communications and Telematics Mobile Network Management and Service Infrastructure Mobile Computing Integrated Mobile Marketing Communications Efficacy of Mobile Communications Mobile Communication Applications Critical Success Factors for Mobile Communication Diffusion Metric Mobile Business Enterprise Mobile Communication Security Issues and Requirements Mobile and Handheld Devices in the Education Telematics Tele-Learning Privacy and Security in Mobile Computing and Wireless Systems Cross-Cultural Mobile Communication Issues Integration and Interworking of Wired and Wireless Networks Location Management for Mobile Communications Distributed Systems Aspects of Mobile Computing Next Generation Internet Next Generation Web Architectures Network Operations and Management Adhoc and Sensor Networks Internet and Web Applications Ubiquitous Networks Wireless Multimedia Systems Wireless Communications
Heterogeneous Wireless Networks Operating System and Middleware Support for Mobile Computing Interaction and Integration in Mobile Communications Business Models for Mobile Communications E-Commerce & E-Governance
Nomadic and Portable Communication Wireless Information Assurance Mobile Multimedia Architecture and Network Management Mobile Multimedia Network Traffic Engineering & Optimization Mobile Multimedia Infrastructure Developments Mobile Multimedia Markets & Business Models Personalization, Privacy and Security in Mobile Multimedia Mobile Computing Software Architectures Network & Communications Network Protocols & Wireless Networks Network Architectures High Speed Networks Routing, Switching and Addressing Techniques Measurement and Performance Analysis Peer To Peer and Overlay Networks QOS and Resource Management Network-Based Applications Network Security Self-organizing networks and Networked Systems Mobile & Broadband Wireless Internet Recent Trends & Developments in Computer Networks
Paper Submission
Authors are invited to submit papers through the conference Submission System by July 06, 2024. Submissions must be original and
Book dating , international dating phgrathomaskurtha9
International dating programhttps: please register here and start to meet new people todayhttps://www.digistore24.com/redir/384521/godtim/.
get started. https://www.digistore24.com/redir/384521/godtim/
Have you ever built a sandcastle at the beach, only to see it crumble when the tide comes in? In the digital world, our information is like that sandcastle, constantly under threat from waves of cyberattacks. A cybersecurity course is like learning to build a fortress for your information!
This course will teach you how to protect yourself from sneaky online characters who might try to steal your passwords, photos, or even mess with your computer. You'll learn about things like:
* **Spotting online traps:** Phishing emails that look real but could steal your info, and websites that might be hiding malware (like tiny digital monsters).
* **Building strong defenses:** Creating powerful passwords and keeping your software up-to-date, like putting a big, strong lock on your digital door.
* **Fighting back (safely):** Learning how to identify and avoid threats, and what to do if something does go wrong.
By the end of this course, you'll be a cybersecurity champion, ready to defend your digital world and keep your information safe and sound!
Toward Hybrid Cloud Serverless Transparency with Lithops Framework
1. Toward hybrid cloud serverless transparency with
Lithops framework
Gil Vernik, IBM Research
gilv@il.ibm.com
2. About myself
• Gil Vernik
• IBM Research from 2010
• Architect, 25+ years of development experience
• Active in open source
• Hybrid cloud. Big Data Engines. Serverless Twitter: @vernikgil
https://www.linkedin.com/in/gil-vernik-1a50a316/
2
3. All material and code presented in this talk are open source.
Comments, suggestions or code contributions are welcome
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825184.
Photos used in this presentaton by Unknown Author are licensed under CC
BY-SA-NC
3
8. Serverless paradigm
It doesn't mean that
there are no servers
it means that we don't
need to worry about
servers
In the Serverless
world, we simply
“deliver” our code,
software stack or
workload
The “serverless
backend engine” will
take of provisioning
the server(s) and
execute the code
8
9. This Photo by
Unknown
Author is
licensed under
CC BY-NC-ND
This way, we don’t
think anywhere about
servers and so the
name Serverless
9
11. User code with
Software stack
and
dependencies
Serverless user experience
IBM Cloud Functions
The more we focus on the business logic and
less how to deploy and execute – so better our
‘serverless’ experience
11
13. Code, dependencies and containers
Docker image with
dependencies
Dependencies, packages,
software, etc.
Code as part of
Docker image?
13
14. The gap between the business logic and the boileplate code
User wants to run ML algorithms on
the colors extracted from images.
He wrote a function that extracts colors
from a single image and tested it works
He now wants to run this function on
millions of images, located in different
storage places (cloud object storage,
local CEPH, etc.) , extract all the colors
and inject them into ML framework for
further processing
Example
15. Bring the code to data or move data to code
• How to run the code as close
as possible to the data?
Local? Cloud? Hybrid?
How to collect results?
• Move as less data as possible
• The boiler plate code to access
storage
User wants to run ML algorithms on
the colors extracted from images.
He wrote a function that extracts color
from a single image and tested it works
He now wants to run this function on
millions of images, located in different
storage places (cloud object storage,
local CEPH, etc.) , extract all the colors
and inject them into ML framework for
further processing
15
16. The gap between the business logic and the boileplate code
• How to partition input data?
• How to list millions of images?
• How much memory needed to
process a single image
assuming images of different
size
• How to deploy the code with
dependencies
• and so on…
User wants to run ML algorithms on
the colors extracted from images.
He wrote a function that extracts color
from a single image and tested it works
He now wants to run this function on
millions of images, located in different
storage places (cloud object storage,
local CEPH, etc.) , extract all the colors
and inject them into ML framework for
further processing
16
17. Know APIs and the semantics
Developers need to know vendor documentation and APIs, use CLI tools, learn how
to deploy code and dependencies, how to retrieve results, etc.
Each cloud vendor has it’s own API and semantics
17
18. User software
stack
User software
stack
Containeriazed model is not only to run and deploy the code
User software
stack
• How to “containerize” code or scale software components from an existing application without major disruption
and without rewriting all from scratch?
• How to scale the code decide the right parallelism on terabytes of data without become a systems expert in
scaling the code and learn storage semantics?
• How to partition input data, generate output, leverage cache if needed
COS, Ceph, databases, in
memory cache, etc.
Software stack
18
20. Push to the serverless with Lithops framework
• Lithops is a novel Python framework designed to scale code or applications at massive scale, exploiting
almost any execution backend platform, hybrid clouds, public clouds, etc.
• A single Lithops API against any backend engine
• Open source http://lithops.cloud , Apache License 2.0
• Leaded by IBM Research Haifa and URV university
• Can benefit to variety of use cases
Serverless for more use cases
The easy move to serverless
20
21. Lithops to scale Python code and applications
input data = array, COS, etc.
def my_func(x):
//business logic
Lithops
print (lt.get_result())
IBM Cloud Functions
import lithops
lt = lithops.FunctionExecutor()
lt.map(my_func, input_data))
Lithops
Lithops
21
26. More on Lithops
• Truly serverless, lightweight framework, without need to deploy additional cluster on
top of serverless engine.
• Scales from 0 to many
• Can deploy code to any compute backend and simplify hybrid use cases with a single
Lithops API
• Data driven with advanced data partitioner to support processing of large input datasets
• Lithops can execute any native code, not limited to Python code
• Hides complexity of sharing data between compute stages and supports shuffle
• Fits well into workflow orchestrator frameworks
26
27. User code or
application
Data driven flows with Lithops
27
User code or
application
Lithops leverage user business logic to simply the containerization process
• Decides the right scale, coordinate parallel inovocations, etc.
• Lithops runtime handle all accesses to the datasets, data partitionning, use cache if needed, monitor progress
• Lithops runtime allows to or exchange data between serverless invocations, implements shuffle
COS, Ceph, databases, in
memory cache, etc.
User software
stack
User software
stack
User software
stack
Lithops runtime
Software stack
Lithops client
automatically deploys
user’s software stack as a
serverless actions
Lithops runtime Lithops runtime
28. What Lithops good for
• Data pre-processing for ML / DL / AI Frameworks
• Batch processing, UDF, ETL, HPC and Monte Carlo simulations
• Embarrassingly parallel workload or problems - often the case where there is little or no dependency or
need for communication between parallel tasks
• Subset of map-reduce flows
Input Data
Results
………
Tasks 1 2 3 n
28
31. Serverless data pre-processing
•Majority of ML / DL / AI flows requires raw data to be pre-processed before
being consumed by the execution frameworks
•Examples
• Images persisted in the object storage.
User wants to extract colors and run
DL algorithms on the extracted colors
• Face alignment in images is usually
required as a first step before further
analysis
31
MBs KBs
32. Face alignment in images without Lithops
• import logging
• import os
• import sys
• import time
• import shutil
• import cv2
• from openface.align_dlib import AlignDlib
• logger = logging.getLogger(__name__)
• temp_dir = '/tmp'
• def preprocess_image(bucket, key, data_stream, storage_handler):
• """
• Detect face, align and crop :param input_path. Write output to :param output_path
• :param bucket: COS bucket
• :param key: COS key (object name ) - may contain delimiters
• :param storage_handler: can be used to read / write data from / into COS
• """
• crop_dim = 180
• #print("Process bucket {} key {}".format(bucket, key))
• sys.stdout.write(".")
• # key of the form /subdir1/../subdirN/file_name
• key_components = key.split('/')
• file_name = key_components[len(key_components)-1]
• input_path = temp_dir + '/' + file_name
• if not os.path.exists(temp_dir + '/' + 'output'):
• os.makedirs(temp_dir + '/' +'output')
• output_path = temp_dir + '/' +'output/' + file_name
• with open(input_path, 'wb') as localfile:
• shutil.copyfileobj(data_stream, localfile)
• exists = os.path.isfile(temp_dir + '/' +'shape_predictor_68_face_landmarks')
• if exists:
• pass;
• else:
• res = storage_handler.get_object(bucket, 'lfw/model/shape_predictor_68_face_landmarks.dat', stream = True)
• with open(temp_dir + '/' +'shape_predictor_68_face_landmarks', 'wb') as localfile:
• shutil.copyfileobj(res, localfile)
• align_dlib = AlignDlib(temp_dir + '/' +'shape_predictor_68_face_landmarks')
• image = _process_image(input_path, crop_dim, align_dlib)
• if image is not None:
• #print('Writing processed file: {}'.format(output_path))
• cv2.imwrite(output_path, image)
• f = open(output_path, "rb")
• processed_image_path = os.path.join('output',key)
• storage_handler.put_object(bucket, processed_image_path, f)
• os.remove(output_path)
• else:
• pass;
• #print("Skipping filename: {}".format(input_path))
• os.remove(input_path)
• def _process_image(filename, crop_dim, align_dlib):
• image = None
• aligned_image = None
• image = _buffer_image(filename)
• if image is not None:
• aligned_image = _align_image(image, crop_dim, align_dlib)
• else:
• raise IOError('Error buffering image: {}'.format(filename))
• return aligned_image
• def _buffer_image(filename):
• logger.debug('Reading image: {}'.format(filename))
• image = cv2.imread(filename, )
• image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
• return image
• def _align_image(image, crop_dim, align_dlib):
• bb = align_dlib.getLargestFaceBoundingBox(image)
• aligned = align_dlib.align(crop_dim, image, bb, landmarkIndices=AlignDlib.INNER_EYES_AND_BOTTOM_LIP)
• if aligned is not None:
• aligned = cv2.cvtColor(aligned, cv2.COLOR_BGR2RGB)
• return aligned
import ibm_boto3
import ibm_botocore
from ibm_botocore.client import Config
from ibm_botocore.credentials import DefaultTokenManager
t0 = time.time()
client_config = ibm_botocore.client.Config(signature_version='oauth',
max_pool_connections=200)
api_key = config['ibm_cos']['api_key']
token_manager = DefaultTokenManager(api_key_id=api_key)
cos_client = ibm_boto3.client('s3', token_manager=token_manager,
config=client_config, endpoint_url=config['ibm_cos']['endpoint'])
try:
paginator = cos_client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket="gilvdata", Prefix = 'lfw/test/images')
print (page_iterator)
except ibm_botocore.exceptions.ClientError as e:
print(e)
class StorageHandler:
def __init__(self, cos_client):
self.cos_client = cos_client
def get_object(self, bucket_name, key, stream=False, extra_get_args={}):
"""
Get object from COS with a key. Throws StorageNoSuchKeyError if the given key does not exist.
:param key: key of the object
:return: Data of the object
:rtype: str/bytes
"""
try:
r = self.cos_client.get_object(Bucket=bucket_name, Key=key, **extra_get_args)
if stream:
data = r['Body']
else:
data = r['Body'].read()
return data
except ibm_botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "NoSuchKey":
raise StorageNoSuchKeyError(key)
else:
raise e
def put_object(self, bucket_name, key, data):
"""
Put an object in COS. Override the object if the key already exists.
:param key: key of the object.
:param data: data of the object
:type data: str/bytes
:return: None
"""
try:
res = self.cos_client.put_object(Bucket=bucket_name, Key=key, Body=data)
status = 'OK' if res['ResponseMetadata']['HTTPStatusCode'] == 200 else 'Error'
try:
log_msg='PUT Object {} size {} {}'.format(key, len(data), status)
logger.debug(log_msg)
except:
log_msg='PUT Object {} {}'.format(key, status)
logger.debug(log_msg)
except ibm_botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "NoSuchKey":
raise StorageNoSuchKeyError(key)
else:
raise e
temp_dir = '/home/dsxuser/.tmp'
storage_client = StorageHandler(cos_client)
for page in page_iterator:
if 'Contents' in page:
for item in page['Contents']:
key = item['Key']
r = cos_client.get_object(Bucket='gilvdata', Key=key)
data = r['Body']
preprocess_image('gilvdata', key, data, storage_client)
t1 = time.time()
print("Execution completed in {} seconds".format(t1-t0))
Business Logic Boiler plate
• Loop over all images
• Close to 100 lines of “boiler
plate” code to find the
images, read and write the
objects, etc.
• Data scientist needs to be
familiar with S3 API
• Execution time
approximately 36 minutes for
1000 images!
32
33. Face alignment in images with Lithops
• import logging
• import os
• import sys
• import time
• import shutil
• import cv2
• from openface.align_dlib import AlignDlib
• logger = logging.getLogger(__name__)
• temp_dir = '/tmp'
• def preprocess_image(bucket, key, data_stream, storage_handler):
• """
• Detect face, align and crop :param input_path. Write output to :param output_path
• :param bucket: COS bucket
• :param key: COS key (object name ) - may contain delimiters
• :param storage_handler: can be used to read / write data from / into COS
• """
• crop_dim = 180
• #print("Process bucket {} key {}".format(bucket, key))
• sys.stdout.write(".")
• # key of the form /subdir1/../subdirN/file_name
• key_components = key.split('/')
• file_name = key_components[len(key_components)-1]
• input_path = temp_dir + '/' + file_name
• if not os.path.exists(temp_dir + '/' + 'output'):
• os.makedirs(temp_dir + '/' +'output')
• output_path = temp_dir + '/' +'output/' + file_name
• with open(input_path, 'wb') as localfile:
• shutil.copyfileobj(data_stream, localfile)
• exists = os.path.isfile(temp_dir + '/' +'shape_predictor_68_face_landmarks')
• if exists:
• pass;
• else:
• res = storage_handler.get_object(bucket, 'lfw/model/shape_predictor_68_face_landmarks.dat', stream = True)
• with open(temp_dir + '/' +'shape_predictor_68_face_landmarks', 'wb') as localfile:
• shutil.copyfileobj(res, localfile)
• align_dlib = AlignDlib(temp_dir + '/' +'shape_predictor_68_face_landmarks')
• image = _process_image(input_path, crop_dim, align_dlib)
• if image is not None:
• #print('Writing processed file: {}'.format(output_path))
• cv2.imwrite(output_path, image)
• f = open(output_path, "rb")
• processed_image_path = os.path.join('output',key)
• storage_handler.put_object(bucket, processed_image_path, f)
• os.remove(output_path)
• else:
• pass;
• #print("Skipping filename: {}".format(input_path))
• os.remove(input_path)
• def _process_image(filename, crop_dim, align_dlib):
• image = None
• aligned_image = None
• image = _buffer_image(filename)
• if image is not None:
• aligned_image = _align_image(image, crop_dim, align_dlib)
• else:
• raise IOError('Error buffering image: {}'.format(filename))
• return aligned_image
• def _buffer_image(filename):
• logger.debug('Reading image: {}'.format(filename))
• image = cv2.imread(filename, )
• image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
• return image
• def _align_image(image, crop_dim, align_dlib):
• bb = align_dlib.getLargestFaceBoundingBox(image)
• aligned = align_dlib.align(crop_dim, image, bb, landmarkIndices=AlignDlib.INNER_EYES_AND_BOTTOM_LIP)
• if aligned is not None:
• aligned = cv2.cvtColor(aligned, cv2.COLOR_BGR2RGB)
• return aligned
lt = lithops.FunctionExecutor()
bucket_name = 'gilvdata/lfw/test/images'
results = lt.map_reduce(preprocess_image, bucket_name, None, None).get_result()
• Under 3 lines of “boiler plate”!
• Data scientist does not need to
use s3 API!
• Execution time is 35s
• 35 seconds as compared to 36
minutes!
33
Business Logic Boiler plate
34. Demo – Color Identification of Images
• Our demo is based on the blog ”Color Identification in Images”, by Karan Bhanot
• We show how existing code from the blog can be executed at massive scale by Lithops
againt any compute backened without modifications to the original code
• Images of flowers stored in the object storage
• User wants to retrieve all images that contains “specific” color
• We demonstrate Lithops with with the backend based on K8s API
34
36. Behind the scenes
• Lithops inspects the input dataset in the object storage
• Generates DAG of the execution, mapping a single task to process a single image
• Lithops serialize user provided code, execution DAG and other internal metadata and upload all to object
storage
• Lithops generates ConfigMap and Job Definition in Code Engine, based on the provided Docker image (or
ses default if none provided)
• Lithops submit an array job that is mapped to the generated execution DAG
• Each task contains Lithops runtime and will pull relevant entry from the execution DAG
• Once task completed, status and results are persisted in object storage
• When array job is completed, Lithops reads the results from the object storage
36
37. The user experience
Kubernetes
deployment
definitions
Job description
Coordination
User application
Today’s state
User responsible for deployment
Configuration and management
Lithops framework
input_data = array, COS, storage, DBs, etc.
def my_func(params):
//user software stack
results = f.get_result()
import lithops as lt
f=lt.FunctionExecutor(backend=“code_engine”)
f.map(my_func, input_data, (params))
User only focus on biz/science logic
Deployment completed abstracted
37
39. Spatial metabolomics and the Big Data challenge
• Spatial metabolomics, or detection of metabolites in cells,
tissue, and organs, is the current frontier in the field of
understanding our health and disease in particular in cancer
and immunity.
• The process generates a lot of data since every pixel in a
medical image can be considered as a sample containing
thousands of molecules and the number of pixels can reach as
high as a million thus putting as-high-as-ever requirements to
the algorithms for the data analysis.
EMBL develops novel computational biology tools to reveal
the spatial organization of metabolic processes
39
40. 40
Scientist uploads
dataset
(1GB-1TB)
Provides metadata,
selects parameters,
chooses molecular
DB
Data preparation for
parallel analytics
Screening for
molecules
Molecular
visualization,
analysis, sharing
METASCPASE workflow with Lithops
Completely serverless
de-centralized architecture
Optimal scale automatically
defined at run time
Backend selection optimized
for cost/performance
https://github.com/metaspace2020/Lithops-METASPACE
In the demo Lithop uses Apache OpenWhisk API
42. Monte Carlo and Lithops
• Very popular in financial sector
• Risk and uncertainty analysis
• Molecular biology
• Sport, gaming, gambling
• Weather models and many more…
Lithops is natural fit to scale Monte Carlo computations across FaaS platform
User need to write business logic and Lithops does the rest
42
45. Protein Folding
45
• Proteins are biological polymers that carry out most of
the cell’s day-to-day functions.
• Protein structure leads to protein function
• Proteins are made from a linear chain of amino acids
and folded into variety of 3-D shapes
• Protein folding is a complex process
that is not yet completely understood
46. Replica exchange
• Monte Carlo simulations are popular methods to predict protein folding
• ProtoMol is special designed framework for molecular dynamics
• http://protomol.sourceforge.net
• A highly parallel replica exchange molecular dynamics (REMD) method used to
exchange Monte Carlo process for efficient sampling
• A series of tasks (replicas) are run in parallel at various temperatures
• From time to time the configurations of neighboring tasks are exchanged
• Various HPC frameworks allows to run Protein Folding
• Depends on MPI
• VMs or dedicated HPC machines
46
47. Protein folding with Lithops
47
Lithops submit a job of
X invocations
each running ProtoMol
Lithops collect
results of all
invocations
REMD algorithms uses output
of invocations as an input to
the next job
Each invocation runs
ProtoMol (or GROMACS)
library to run Monte Carlo
simulations
*
Our experiment – 99 jobs
• Each job executes many invocations
• Each invocation runs 100 Monte Carlo steps
• Each step running 10000 Molecular Dynamic steps
• REMD exchange the results of the completed job
which used as an input to the following job
• Our approach doesn’t use MPI
“Bringing scaling transparency to Proteomics applications with serverless computing"
WoSC'20: Proceedings of the 2020 Sixth International Workshop on Serverless Computing
December 2020 Pages 55–60, https://doi.org/10.1145/3429880.3430101
48. Summary
• Serverless computing provides unlimited resources and is very attractive compute platform
• The move to serveless may be challenging for certain scenarios
• Lithops is an open source framework, designed for “Push to the Cloud” experience
• We saw demos and use cases
• All material and code presented in this talk are an open source
• For more details visit
Thank you
Gil Vernik
gilv@il.ibm.com
48
http://lithops.cloud