This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/CgoxjmdyMiU This session will discuss how to get up and running quickly with containerized H2O environments (H2O Flow, Sparkling Water, and Driverless AI) at scale, in a multi-tenant architecture with a shared pool of resources using CPUs and/or GPUs. See how how you can spin up (and tear down) your H2O environments on-demand, with just a few mouse clicks. Find out how to enable quota management of GPU resources for greater efficiency, and easily connect your compute to your datasets for large-scale distributed machine learning. Learn how to operationalize your machine learning pipelines and deliver faster time-to-value for your AI initiative — while ensuring enterprise-grade security and high performance. Bio: Nanda Vijaydev is senior director of solutions at BlueData (now HPE) - where she leverages technologies like Hadoop, Spark, and TensorFlow to build solutions for enterprise analytics and machine learning use cases. Nanda has 10 years of experience in data management and data science. Previously, she worked on data science and big data projects in multiple industries, including healthcare and media; was a principal solutions architect at Silicon Valley Data Science; and served as director of solutions engineering at Karmasphere. Nanda has an in-depth understanding of the data analytics and data management space, particularly in the areas of data integration, ETL, warehousing, reporting, and machine learning.
This document discusses MLOps and Kubeflow. It begins with an introduction to the speaker and defines MLOps as addressing the challenges of independently autoscaling machine learning pipeline stages, choosing different tools for each stage, and seamlessly deploying models across environments. It then introduces Kubeflow as an open source project that uses Kubernetes to minimize MLOps efforts by enabling composability, scalability, and portability of machine learning workloads. The document outlines key MLOps capabilities in Kubeflow like Jupyter notebooks, hyperparameter tuning with Katib, and model serving with KFServing and Seldon Core. It describes the typical machine learning process and how Kubeflow supports experimental and production phases.
This meetup was recorded in Mountain View, CA on January 10th, 2019. Video recording from the meetup can be viewed here: https://youtu.be/yN26i7e_BtM Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via pipeline API. Furthermore, the algorithms benefit from H2O MOJOs - Model Object Optimized - a powerful concept shared across entire H2O platform to store and exchange models. The MOJOs are designed for effective model deployment with focus on scoring speed, traceability, exchangeability, and backward compatibility. In this talk we will explain the architecture of Sparkling Water with focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python. Furthermore, we will show how to utilize pre-trained model MOJOs with Spark pipelines. Speaker's Bio: Michal is the VP of Engineering at H2O.ai! Michal is a geek, developer, Java, Linux, programming languages enthusiast developing software for over 15 years. He obtained PhD from the Charles University in Prague in 2012 and post-doc at Purdue University. During his studies he was interested in construction of not only distributed but also embedded and real-time component-based systems using model-driven methods and domain-specific languages. He participated in design and development of various systems including SOFA and Fractal component systems or jPapabench control system.
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/rKoBJcnsFpM Speaker's Bio: Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based. In his spare time he tries to be part of the IT community by organizing, attending and speaking at conferences and meet ups.
Presented at #H2OWorld 2017 in Mountain View, CA. Learn more about H2O.ai: https://www.h2o.ai/. Follow @h2oai: https://twitter.com/h2oai. - - - Effective volume anomaly detection presents unique challenges when monitoring customer transaction volumes across thousands of platforms and systems. We overcome this by using H2O, building on open source tools, and delivering machine learning anomaly detection for enterprise scale. Hear how we model, visualize then automatically alert on anomalous Mobile app volumes in real-time. Donald Gennetten has over 15 years experience supporting digital channels in the Financial Services industry. In his current role as a Data Engineer for Capital One’s Monitoring Intelligence team, he leads a cross-functional group of Data, Business, and Engineering subject matter experts to deliver Advanced Analytics solutions for real-time customer transaction monitoring and issue detection. Rahul Gupta is a Data Engineer in Capital One's Center for Machine Learning, focusing heavily on back-end development and model creation. His primary efforts include building an Algorithmic IT Operations (AIOps) platform that utilizes a combination of batch and streaming data with Machine Learning capabilities to improve the stability of Capital One services and overall customer experience.
Presented at #H2OWorld 2017 in Mountain View, CA. Enjoy the video: https://youtu.be/r9S3xchrzlY. Learn more about H2O.ai: https://www.h2o.ai/. Follow @h2oai: https://twitter.com/h2oai. - - - Abstract: Venkatesh will explore how driverless AI is helping to keep fraudsters at bay. Share results from experiments conducted on large scale payment transaction data. Venkatesh's Bio: Venkatesh is a senior data scientist at PayPal where he is working on building state-of-the-art tools for payment fraud detection. He has over 20+ years experience in designing, developing and leading teams to build scalable server-side software. In addition to being an expert in big-data technologies, Venkatesh holds a Ph.D. degree in Computer Science with specialization in Machine Learning and Natural Language Processing (NLP) and had worked on various problems in the areas of Anti-Spam, Phishing Detection, and Face Recognition.
The document discusses using Microsoft Azure cloud services for game development and operations. It provides examples of how games use Azure for scalable storage, global load balancing for multiplayer games, predictive analytics using big data, and DevOps approaches for deployment, monitoring, and development. Key Azure services highlighted include Storage, SQL Database, Virtual Machines, Mobile Services, HDInsight, and Application Insights.
Enjoy the webinar recording here: https://youtu.be/Lll1qwQJKVw. Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling, and model deployment. In this presentation, Arno Candel (CTO, H2O.ai), gives a quick overview and guide attendees through an interactive hands-on lab using Qwiklabs. Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy. With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware. For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on-premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.
The presentation topic for this meet-up was covered in two sections without any breaks in-between Section 1: Business Aspects (20 mins) Speaker: Rasmi Mohapatra, Product Owner, Experian https://www.linkedin.com/in/rasmi-m-428b3a46/ Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios Section 2: Tech Aspects (40 mins, slides & demo, Q&A ) Speaker: Santanu Dey, Solution Architect, Iguazio https://www.linkedin.com/in/santanu/ In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc. with relevant demos.
Data Con LA 2020 Description Machine learning is an essential skill in today's job market. But when it comes to learning Machine Learning, beginners get lot of conflicting advice. I have been teaching ML for software engineers for years. In this talk *I will dis-spell some of the myths surrounding machine learning *give you solid, tangible plan on how to go about learning ML *and give you good pointers to start from *and steer you away from common mistakes Speaker Sujee Maniyam, Elephant Scale, Founder, Principal instructor
This in-depth training on H2O Driverless AI was given by Wen Phan on June 28th, 2018. He elaborated on automatic feature engineering, machine learning interpretability, and automatic visualization components of this ground breaking product.
A talk for SF big analytics meetup. Building, testing, deploying, monitoring and maintaining big data analytics services. http://hydrosphere.io/
The document discusses building real-time targeting capabilities at Capital One. It introduces two speakers, Ryan Zotti and Subbu Thiruppathy, and describes challenges around striving for speed in everything. It then covers how to achieve fast model data, training, deployment, and scoring through techniques like using the most up-to-date data, distributed computing in the cloud, automatic model refitting, and response times under 100 milliseconds.
The document discusses designing scalable platforms for artificial intelligence (AI) and machine learning (ML). It outlines several challenges in developing AI applications, including technical debts, unpredictability, different data and compute needs compared to traditional software. It then reviews existing commercial AI platforms and common components of AI platforms, including data access, ML workflows, computing infrastructure, model management, and APIs. The rest of the document focuses on eBay's Krylov project as an example AI platform, outlining its architecture, challenges of deploying platforms at scale, and needed skill sets on the platform team.
In this presentation, Parul Pandey, will provide a history and overview of the field of “Automatic Machine Learning” (AutoML), followed by a detailed look inside H2O’s open source AutoML algorithm. H2O AutoML provides an easy-to-use interface which automates data pre-processing, training and tuning a large selection of candidate models (including multiple stacked ensemble models for superior model performance). The result of the AutoML run is a “leaderboard” of H2O models which can be easily exported for use in production. AutoML is available in all H2O interfaces (R, Python, Scala, web GUI) and due to the distributed nature of the H2O platform, can scale to very large datasets. The presentation will end with a demo of H2O AutoML in R and Python, including a handful of code examples to get you started using automatic machine learning on your own projects. Parul's Bio: Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism and community in her work. Her emphasis is to spread the information about H2O and Driverless AI to as many people as possible, She is also an active writer and has contributed towards various national and international publications.
This session took place at New York City on November 4th, 2019. Speaker Bio: Chemere is a Senior Data Science Training Specialist for H2O.ai. Chemere has a Master's in Business Administration with focus in Marketing Analytics from the University of North Carolina at Charlotte. She is an experienced data scientist with a diverse background in transformational decision-making in various industries including Banking, Manufacturing, Logistics, and Medical Devices. Chemere joins us from Venus Concept/2two5, where she was the Lead Data Scientist focused on building predictive models with Internet of Things (IoT) data and for a subscription-based marketing product for B2B customers. Prior to that, Chemere worked as a Senior Data Scientist at Wells Fargo Bank focused on various applied predictive analytic solutions. More details about the event can be had here: https://www.eventbrite.com/e/dive-into-h2o-new-york-tickets-76351721053