Ray Peck from H2O.ai talks about the roadmap for the upcoming AutoML product in H2O. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Presented at #H2OWorld 2017 in Mountain View, CA. Enjoy the video: https://youtu.be/axIqeaUhow0. Learn more about H2O.ai: https://www.h2o.ai/. Follow @h2oai: https://twitter.com/h2oai. - - - Abstract: Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them. This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
Enjoy the webinar recording here: https://youtu.be/Lll1qwQJKVw. Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling, and model deployment. In this presentation, Arno Candel (CTO, H2O.ai), gives a quick overview and guide attendees through an interactive hands-on lab using Qwiklabs. Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy. With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware. For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on-premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.
The document discusses the Lambda Architecture, which is an approach for building data systems to handle large volumes of real-time streaming data. It proposes using three main design principles: handling human errors by making the system fault-tolerant, storing raw immutable data, and enabling recomputation of results from the raw data. The document then provides two case studies of applying Lambda Architecture principles to analyze mobile app usage data and process high-volume web logs in real-time. It concludes with lessons learned, such as studying Lambda concepts, collecting any available data, and turning data into useful insights.
- The document discusses the Lambda Architecture, a system designed by Nathan Marz for building real-time big data applications. It is based on three principles: human fault-tolerance, data immutability, and recomputation. - The document provides two case studies of applying Lambda Architecture - at Greengar Studios for API monitoring and statistics, and at eClick for real-time data analytics on streaming user event data. - Key lessons discussed are keeping solutions simple, asking the right questions to enable deep analytics and profit, using reactive and functional approaches, and turning data into useful insights.
Data Con LA 2020 When users complain about slowness in their virtual application or desktop, User Experience becomes a subjective measurement, or a feeling of how well the infrastructure is performing. This talk will focus on the objective measurement and what that looks like for your business. Takeaways: *Attendees will learn the method for monitoring User Experience for virtual apps and desktops. *Attendees will learn the do's and don'ts of monitoring for User Experience in the virtual world. *Attendees will gain a sense of importance of monitoring UX for their business cases when purchasing a monitoring solution like eG Enterprise. Typical Audience: Architects, engineers, managers, end-user solutions experts that work in the virtual desktop space such as Citrix, Horizon, DaaS, and more. Speaker Wendy Howard, Eg Innovations, Technical Consultant
Data Con LA 2020 Description Machine learning is an essential skill in today's job market. But when it comes to learning Machine Learning, beginners get lot of conflicting advice. I have been teaching ML for software engineers for years. In this talk *I will dis-spell some of the myths surrounding machine learning *give you solid, tangible plan on how to go about learning ML *and give you good pointers to start from *and steer you away from common mistakes Speaker Sujee Maniyam, Elephant Scale, Founder, Principal instructor
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/CgoxjmdyMiU This session will discuss how to get up and running quickly with containerized H2O environments (H2O Flow, Sparkling Water, and Driverless AI) at scale, in a multi-tenant architecture with a shared pool of resources using CPUs and/or GPUs. See how how you can spin up (and tear down) your H2O environments on-demand, with just a few mouse clicks. Find out how to enable quota management of GPU resources for greater efficiency, and easily connect your compute to your datasets for large-scale distributed machine learning. Learn how to operationalize your machine learning pipelines and deliver faster time-to-value for your AI initiative — while ensuring enterprise-grade security and high performance. Bio: Nanda Vijaydev is senior director of solutions at BlueData (now HPE) - where she leverages technologies like Hadoop, Spark, and TensorFlow to build solutions for enterprise analytics and machine learning use cases. Nanda has 10 years of experience in data management and data science. Previously, she worked on data science and big data projects in multiple industries, including healthcare and media; was a principal solutions architect at Silicon Valley Data Science; and served as director of solutions engineering at Karmasphere. Nanda has an in-depth understanding of the data analytics and data management space, particularly in the areas of data integration, ETL, warehousing, reporting, and machine learning.
ING is a large financial institution operating since 1881 with over 33 million customers. It aims to become more data-driven through its Think Forward strategy. It is building a streaming analytics platform using Apache Flink for real-time processing to enable uses cases like fraud detection and personalized insights. The platform uses a probabilistic approach combining event pattern matching, machine learning models in PMML format, and a post-processing stage to produce notifications. It is developed according to ING's agile way of working and provides both functional and modular flexibility.
Do you know how to use StreamSets Data Collector with Google Cloud Platform (GCP)? In this session we'll explain how YaloChat designed and implemented a streaming architecture that is sustainable, operable and scalable. Discover how we deployed Data Collector to integrate GCP components such as Pub / Sub and BigQuery to achieve DataOps in the cloud
This document discusses an approach to enterprise metadata integration using a multilayer metadata model. Key points include: - Status dashboards provide facts from technical, operational, application, and quality metadata layers - A graph database allows for context exploration across the entire cluster - The integration of metadata from multiple sources provides a more holistic view of business knowledge
The document discusses real-time machine learning using the Lambda architecture. It describes the need for models that can learn incrementally from streaming data and remain accurate over time. The Lambda architecture is introduced as having a speed layer for real-time processing, a serving layer to query current and batch views, and a batch layer for immutable datasets. Mahout is described as an Apache library for scalable machine learning like recommendation, clustering, and classification using Hadoop. Basic recommendation algorithms are covered along with use cases like e-commerce personalization, fraud detection, and media metadata generation.