You have made a successful Proof of Concept by using Pandas for data manipulation and analysis. So, how are you going to productionize it? Come to learn how to transform your POC to a scalable product with MongoDB. Learn about pitfalls and drawbacks of Pandas and benefits of using MongoDB in the early stages.
To successfully implement our clients' unique use cases and data patterns, it is mandatory that we unlearn many relational concepts while designing and rapidly developing efficient applications in NoSQL. In this session, we will talk about some of our client use cases and the strategies we adopted using features of MongoDB.
MongoDB Atlas Autoscaling automatically changes both the storage and compute capacity of your MongoDB Atlas cluster, in response to changing traffic patterns. This enables MongoDB Atlas to continuously maximize performance while minimizing cost, with just a press of a button. Plan to attend this session and learn more about how autoscaling works behind the scenes, and the best ways to use it.
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
Venez en apprendre davantage sur notre nouvel opérateur de recherche en texte intégral pour MongoDB Atlas. Il s'agit d'une amélioration significative des fonctionnalités de recherches de MongoDB et c'est également la solution de recherche en texte intégral la plus simple et la plus puissante pour les bases de données MongoDB Atlas. Cette présentation est importante pour quiconque a mis en place ou en visage de mettre en place une fonctionnalité de recherche dans son application MongoDB. Vous assisterez à une démo de $searchBeta, apprendrez comment cela fonctionne, découvrirez des fonctionnalités spécifiques vous permettant d'obtenir des résultats de recherche pertinents et apprendrez comment vous pouvez commencer à utiliser la recherche en texte intégral dans votre application dès aujourd'hui.
Come and hear more about our new full-text search operator for MongoDB Atlas. This is a significant enhancement to MongoDB search features and is the easiest and most powerful full-text search solution for databases on MongoDB Atlas. This talk is important for anyone who has implemented search or is considering a search feature in their MongoDB application. You will see a demo of $searchBeta, learn about how it works, discover specific features to help you deliver relevant search results, and learn how you can start using full-text search in your application today.
Join this talk and test session with MongoDB Support where you'll go over the configuration and deployment of an Atlas environment. Setup a service that you can take back in a production-ready state and prepare to unleash your inner genius.
Move Fast with MongoDB Cloud Database - Atlas. The workshop covered: Deploying a MongoDB cluster in minutes Query and manage data in MongoDB Executing continuous backups and point-in-time restores, ensuring that you can meet any restore point objectives View historical metrics in optimized dashboards, see what’s happening in your database live, configure alerts, and receive automated index suggestions to improve the performance of your cluster Using MongoDB Charts and create visual representations of your data
This document discusses running MongoDB and Kubernetes together to enable lean and agile development. It proposes using Docker containers to package applications and leverage tools like Kubernetes for deployment, management and scaling. Specifically, it recommends: 1) Using Docker to containerize applications and define deployment configurations. 2) Deploying to Kubernetes where services and replication controllers ensure high availability and scalability. 3) Treating databases specially by running them as "naked pods" assigned to labeled nodes with appropriate resources. 4) Demonstrating deployment of a sample MEAN stack application on Kubernetes with MongoDB and discussing future work around experimentation and blue/green deployments.
Chaque entreprise devient une entreprise de logiciels, fournissant des solutions client pour accéder à une variété de services et d'informations. Les entreprises commencent maintenant à valoriser leurs données et à obtenir de meilleures informations pour l'entreprise. Un défi crucial consiste à s'assurer que ces données sont toujours disponibles et sécurisées pour être conformes aux objectifs commerciaux de l'entreprise et aux contraintes réglementaires des pays. MongoDB fournit la couche de sécurité dont vous avez besoin, venez découvrir comment sécuriser vos données avec MongoDB.
The document provides an agenda for a MongoDB presentation, including an introduction to MongoDB's document model and how it differs from relational databases, how MongoDB brings value to clients with flexibility, performance, versatility and ease of use. It then demonstrates these qualities through MongoDB's features like rich queries, data models, and deployability anywhere. The presentation promotes MongoDB's cloud database as a service Atlas and tools like Compass. It outlines MongoDB's evolution and roadmap. It concludes by providing contact details for the presenter.
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Data administrators face the challenge of integrating disparate data technologies into a cohesive and performant data platform. This is especially true when using diverse query languages and protocols. This session will focus on how to integrate SQL-aware applications into a MongoDB data platform.
MongoDB Ops Manager is an enterprise-grade end-to-end database management, monitoring, and backup solution. Kubernetes has clearly won the orchestration-platform "wars". In this session we'll take a deep dive on how you can leverage both these technologies to host your MongoDB deployments within your Kubernetes infrastructure whether that's OpenShift, PKS, Azure AKS, or just upstream. This talk will review the core technologies, such as containers, Kubernetes, and MongoDB Ops Manager. You'll also have a chance to see real-live demos of MongoDB running on Kubernetes and managed with MongoDB Ops Manager with the MongoDB Enterprise Kubernetes Operator.
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
A Free New World: Atlas Free Tier and How It Was Born Speaker: Louisa Berger, Senior Software Engineer Speaker: Vincent Do, Fullstack Engineer, MongoDB Level: 200 (Intermediate) Track: How We Build MongoDB Last year, MongoDB released Atlas – a new Database as as Service product that takes handles running, monitoring, and maintaining your MongoDB deployment in the Cloud. This winter, we added a new Free Tier option to the product, which allows users to try out Atlas with their own real data for free. Lead Automation engineer Louisa Berger and Atlas engineer Vincent Do will talk about how it works behind the scenes, and why you might want to try out Atlas. This talk is intended for developers, and will take you through the technical details of the architecture, and show you the techniques and challenges in building a multi-tenant MongoDB. What You Will Learn: - Insights on how/why you should use the Atlas free tier - How the Atlas free tier was designed and implemented - Best practices for building a multi-tenant MongoDB application
Building intelligent apps involves combining real-time analytics, machine learning, and artificial intelligence to provide personalized recommendations and automate tasks for customers. Developers can use MongoDB and Google Cloud to build intelligent apps in 3 steps: 1) create a base ecommerce app, 2) add a recommendation engine using machine learning, and 3) enable shopping via chat with artificial intelligence. This brings data scientists and developers together to create applications that understand and assist customers.
Building and deploying a machine learning model can be difficult to do once. Enabling other data scientists (or yourself, one month later) to reproduce your pipeline, to compare the results of different versions, to track what’s running where, and to redeploy and rollback updated models is much harder. In this talk, I’ll introduce MLflow, a new open source project from Databricks that simplifies the machine learning lifecycle. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment, and for managing the deployment of models to production. MLflow is designed to be an open, modular platform, in the sense that you can use it with any existing ML library and development process. MLflow was launched in June 2018 and has already seen significant community contributions, with over 50 contributors and new features including language APIs, integrations with popular ML libraries, and storage backends. I’ll show how MLflow works and explain how to get started with MLflow.
In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production.
Monitoring AI applications with AI The best performing offline algorithm can lose in production. The most accurate model does not always improve business metrics. Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug such types of incidents. Was it possible for Microsoft to test Tay chatbot in advance and then monitor and adjust it continuously in production to prevent its unexpected behaviour? Real mission critical AI systems require advanced monitoring and testing ecosystem which enables continuous and reliable delivery of machine learning models and data pipelines into production. Common production incidents include: Data drifts, new data, wrong features Vulnerability issues, malicious users Concept drifts Model Degradation Biased Training set / training issue Performance issue In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production. Technical part of the talk will cover the following topics: Automatic Data Profiling Anomaly Detection Clustering of inputs and outputs of the model A/B Testing Service Mesh, Envoy Proxy, trafic shadowing Stateless and stateful models Monitoring of regression, classification and prediction models
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenzhong XU | Current 2022 If you are a data scientist or a platform engineer, you probably can relate to the pains of working with the current explosive growth of Data/ML technologies and toolings. With many overlapping options and steep learning curves for each, it’s increasingly challenging for data science teams. Many platform teams started thinking about building an abstracted ML platform layer to support generalized ML use cases. But there are many complexities involved, especially as the underlying real-time data is shifting into the mainstream. In this talk, we’ll discuss why ML platforms can benefit from a simple and ""invisible"" abstraction. We’ll offer some evidence on why you should consider leveraging streaming technologies even if your use cases are not real-time yet. We’ll share learnings (combining both ML and Infra perspectives) about some of the hard complexities involved in building such simple abstractions, the design principles behind them, and some counterintuitive decisions you may come across along the way. By the end of the talk, I hope data scientists can walk away with some tips on how to evaluate ML platforms, and platform engineers learned a few architectural and design tricks.
This document summarizes the key steps and outcomes of a project to build an end-to-end recommendation system for a power utility company. The system was designed to integrate machine learning models with mobile and call center systems to recommend ancillary products to customers. The project involved exploring customer data, developing machine learning models through an iterative process, and operationalizing the models by building APIs and automated workflows. The new system provided recommendations via microservices and represented an improvement over the utility's previous manual, less rigorous approach to data science and modeling.
This document discusses best practices for developing data science products at Philip Morris International (PMI). It covers: - PMI's data science team of over 40 people across four hubs working on fraud prevention and other problems. - Key principles for PMI's data science work, including being business-driven, investing in people, self-organizing, iterating to improve, and co-creating solutions. - Challenges in data product development involving integrating work between data scientists and other teams, and practices like continuous integration/delivery to overcome these challenges. - The role of data scientists in contributing code that is readable, testable, reusable, reproducible, and usable by other teams to integrate into
A talk for SF big analytics meetup. Building, testing, deploying, monitoring and maintaining big data analytics services. http://hydrosphere.io/
This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis. This Data Science with Python presentation will cover the following topics: 1. What is Data Science? 2. Basics of Python for data analysis - Why learn Python? - How to install Python? 3. Python libraries for data analysis 4. Exploratory analysis using Pandas - Introduction to series and dataframe - Loan prediction problem 5. Data wrangling using Pandas 6. Building a predictive model using Scikit-learn - Logistic regression This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course. Why learn Data Science? Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data. You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Learn more at: https://www.simplilearn.com
In this talk Gerbert will give an overview of Artificial Intelligence, outline the current state of the art in research and explain what it takes to actually do an AI project. Using practical cases and tools he will give you insight in the phases of an AI project and explain some of the problems you might encounter along the way and how you might be able to solve them.
Note: The Content was modified from the Microsoft Content team. Deck Owner: Nitah Onsongo Tech/Msg Review: Cesar De La Torre, Simon Tao, Clarke Rahrig --- Event: Insider Dev Tour Berlin Event Description: Microsoft is going on a world tour with the announcements of Build 2019. The Insider Dev Tour focuses on innovations related to Microsoft 365 from a developer's perspective. Date: June 7th, 2019 Event link: https://www.microsoft.com/de-de/techwiese/news/best-of-build-insider-dev-tour-am-7-juni-in-berlin.aspx Linkedin: http://linkedin.com/in/mia-chang/
Intelligent apps are emerging as the next frontier in analytics and application development. Learn how to build intelligent apps on MongoDB powered by Google Cloud with TensorFlow for machine learning and DialogFlow for artificial intelligence. Get your developers and data scientists to finally work together to build applications that understand your customer, automate their tasks, and provide knowledge and decision support.
The document discusses the Lambda Architecture, which is an approach for building data systems to handle large volumes of real-time streaming data. It proposes using three main design principles: handling human errors by making the system fault-tolerant, storing raw immutable data, and enabling recomputation of results from the raw data. The document then provides two case studies of applying Lambda Architecture principles to analyze mobile app usage data and process high-volume web logs in real-time. It concludes with lessons learned, such as studying Lambda concepts, collecting any available data, and turning data into useful insights.
The document discusses real-time machine learning using the Lambda architecture. It describes the need for models that can learn incrementally from streaming data and remain accurate over time. The Lambda architecture is introduced as having a speed layer for real-time processing, a serving layer to query current and batch views, and a batch layer for immutable datasets. Mahout is described as an Apache library for scalable machine learning like recommendation, clustering, and classification using Hadoop. Basic recommendation algorithms are covered along with use cases like e-commerce personalization, fraud detection, and media metadata generation.
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with: MLflow concepts and abstractions for models, experiments, and projects How to get started with MLFlow Understand aspects of MLflow APIs Using tracking APIs during model training Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics Package, save, and deploy an MLflow model Serve it using MLflow REST API What’s next and how to contribute
Bighead: Airbnb's end-to-end machine learning platform Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adopted in Airbnb and we have variety of models running in production. We plan to open source Bighead to allow the wider community to benefit from our work. Speaker: Andrew Hoh Andrew Hoh is the Product Manager for the ML Infrastructure and Applied ML teams at Airbnb. Previously, he has spent time building and growing Microsoft Azure's NoSQL distributed database. He holds a degree in computer science from Dartmouth College.