Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment. In this talk, I will go through the motivation and benefits of Deep Water. After that, I will demonstrate how to build and deploy deep learning models with or without programming experience using H2O's R/Python/Flow (Web) interfaces. Jo-fai (or Joe) is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.
- H2O.ai is a company that provides an open-source machine learning platform called H2O. - Their new project "Deep Water" integrates popular deep learning frameworks like TensorFlow, MXNet, and Caffe into H2O to enable distributed deep learning on GPUs for improved performance. - This provides a unified interface for deep learning within H2O and allows users to easily build, stack, and deploy deep learning models from different frameworks.
This document provides an introduction and overview of machine learning with H2O and Python. It begins with background information about the presenter, Joe Chow, including his work experience and side projects. The agenda then outlines topics to be covered, including an introduction to H2O.ai the company and machine learning platform, followed by a Python tutorial and examples. The tutorial will cover importing and manipulating data, basic and advanced regression and classification models, and using H2O in the cloud.
These slides will show how to approach a multi-class (classification) problem using H2O. The data that is being used is an aggregated log of multiple systems that are constantly providing information about their status, connections and traffic. In large organizations, these log datasets can be very huge and unidentifiable due to the number of sources, legacy systems etc. In our example, we use a created response for each source. The use H2O to classify the source of data. Author Bio: Ashrith Barthur is a Security Scientist at H2O currently working on algorithms that detect anomalous behaviour in user activities, network traffic, attacks, financial fraud and global money movement. He has a PhD from Purdue University in the field of information security, specialized in Anomalous behaviour in DNS protocol. Don’t forget to download H2O! http://www.h2o.ai/download/
Michal Malohlava talks about the PySparkling Water package for Spark and Python users. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
How Deep Learning Will Make Us More Human Again While deep learning is taking over the AI space, most of us are struggling to keep up with the pace of innovation. Arno Candel shares success stories and challenges in training and deploying state-of-the-art machine learning models on real-world datasets. He will also share his insights into what the future of machine learning and deep learning might look like, and how to best prepare for it. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
ISAX is a time series data compression algorithm that can group similar patterns in billions of time series datasets. It is implemented on H2O's distributed architecture and can be used for clustering, classification, anomaly detection, and predictive analytics on compressed time series data from fields like IoT, finance, bioinformatics, and image/sound processing. Examples of ISAX code in H2O are provided.
This document provides an introduction to H2O, an open source machine learning platform, and discusses potential Internet of Things (IoT) use cases for predictive maintenance and outlier detection. The document outlines Joe Chow's background and experience, provides an overview of H2O's capabilities including algorithms, interfaces, and exporting models for production. It then demonstrates how to use H2O for predictive maintenance on a dataset of sensor readings to predict equipment failures, and for outlier detection on the MNIST handwritten digits dataset to identify anomalous images.
Dmitry will show the audience on how get started with Mxnet and building Deep Learning models to classify images, sound and text. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Erin LeDell's presentation on scalable machine learning in R with H2O from the Portland R User Group Meetup in Portland, 08.17.15
Note: Make sure to download the slides to get the high-resolution version! Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction! H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Skutil brings the best of both worlds to H2O and sklearn, delivering an easy transition into the world of distributed computing that H2O offers, while providing the same, familiar interface that sklearn users have come to know and love. - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document summarizes a presentation given by Joe Chow on H2O at the BelgradeR Meetup. The agenda includes an introduction to H2O, the company, why H2O is useful, the H2O machine learning platform, Deep Water for deep learning, latest H2O developments, and demos. Joe will discuss H2O's introduction to machine learning, distributed algorithms, interfaces for R, Python and Flow, and Deep Water for distributed deep learning on GPUs with TensorFlow, MXNet or Caffe.
This document summarizes a presentation by Erin LeDell from H2O.ai about machine learning using the H2O software. H2O is an open-source machine learning platform that provides APIs for R, Python, Scala and other languages. It allows distributed machine learning on large datasets across clusters. The presentation covers H2O's architecture, algorithms like random forests and deep learning, and how to use H2O within R including loading data, training models, and running grid searches. It also discusses H2O on Spark via Sparkling Water and real-world use cases with customers.
This document summarizes a presentation given by Joe Chow on machine learning using H2O.ai's platform. The presentation covered: 1) An introduction to Joe and H2O.ai, including the company's mission to operationalize data science. 2) An overview of the H2O platform for machine learning, including its distributed algorithms, interfaces for R and Python, and model export capabilities. 3) A demonstration of deep learning using H2O's Deep Water integration with TensorFlow, MXNet, and Caffe, allowing users to build and deploy models across different frameworks.
1. The document summarizes steps towards integrating the H2O and Spark frameworks, including allowing data sharing between Spark and H2O. 2. A demonstration is shown of loading airline data from a CSV into a Spark SQL table, querying the table, and transferring the results to an H2O frame to run a GBM algorithm. 3. Next steps discussed include optimizing data transfers between Spark and H2O, developing an H2O backend for MLlib, and addressing open challenges in areas like transferring results and supporting Parquet.
H2O is widely used for machine learning projects. A TechCrunch article, published in January 2017 by John Mannes, reported that around 20% of Fortune 500 companies use H2O. Talk 1: Introduction to Scalable & Automatic Machine Learning with H2O In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O and other tools have made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. In this presentation, Joe will introduce the AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard. Talk 2: Making Multimillion-dollar Baseball Decisions with H2O AutoML and Shiny Joe recently teamed up with IBM and Aginity to create a proof of concept "Moneyball" app for the IBM Think conference in Vegas. The original goal was to prove that different tools (e.g. H2O, Aginity AMP, IBM Data Science Experience, R and Shiny) could work together seamlessly for common business use-cases. Little did Joe know, the app would be used by Ari Kaplan (the real "Moneyball" guy) to validate the future performance of some baseball players. Ari recommended one player to a Major League Baseball team. The player was signed the next day with a multimillion-dollar contract. This talk is about Joe's journey to a real "Moneyball" application. Bio : Jo-fai (or Joe) Chow is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.
This document summarizes a presentation about H2O's machine learning platform and Deep Water distributed deep learning capabilities. The presentation introduces H2O, its open source in-memory machine learning platform, performance advantages, and interfaces for R, Python and Flow. Deep Water is introduced as H2O's integration with TensorFlow, MXNet and Caffe that provides a unified interface for distributed deep learning on GPUs. Examples are shown training convolutional neural networks on image datasets using Deep Water with different backends.
This is my Deep Water talk for the TensorFlow Paris meetup. Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment.
Machine Learning for Smarter Apps with Tom Kraljevic - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document provides an agenda and summary of a presentation on H2O.ai's machine learning platform and recent developments. The presentation includes an introduction to H2O, the company and why their platform H2O is useful. It demonstrates H2O's machine learning capabilities including deep learning, and discusses latest features like integrating xgboost and automatic machine learning. Real-world examples and demos are also provided to illustrate how to use H2O with R, Python and via its web interface.
Erin LeDell presents Intro to H2O Machine Learning in Python at Galvanize Seattle, 02.02.16 - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O presentation at Trevor Hastie and Rob Tibshirani's Short Course on Statistical Learning & Data Mining IV: http://web.stanford.edu/~hastie/sldm.html PDF and Keynote version of the presentation available here: https://github.com/h2oai/h2o-meetups/tree/master/2017_04_06_SLDM4_H2O_New_Developments
Navdeep Gill @ Galvanize Seattle- May 2016 - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an introduction to big data analytics and Hadoop. It discusses: 1) The characteristics of big data including scale, complexity, and speed of data generation. Big data requires new techniques and architectures to manage and extract value from large, diverse datasets. 2) An overview of Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of computers. Hadoop includes the Hadoop Distributed File System (HDFS) and MapReduce programming model. 3) The course will teach students how to manage large datasets with Hadoop, write jobs in languages like Java and Python, and use tools like Pig, Hive, RHadoop and Mahout to perform advanced analytics on
This document introduces several big data technologies that are less well known than traditional solutions like Hadoop and Spark. It discusses Apache Flink for stream processing, Apache Samza for processing real-time data from Kafka, Google Cloud Dataflow which provides a managed service for batch and stream data processing, and StreamSets Data Collector for collecting and processing data in real-time. It also covers machine learning technologies like TensorFlow for building dataflow graphs, and cognitive computing services from Microsoft. The document aims to think beyond traditional stacks and learn from companies building pipelines at scale.
H2O.ai is a machine learning company founded in 2012 with 35 employees based in Mountain View, CA. It was started by Stanford engineers and is an open source leader in machine and deep learning. H2O's software provides interfaces for R, Python, Spark and Hadoop and expands predictive analytics capabilities to large datasets across many industries. The executive team is led by CEO Sri Satish Ambati and CTO Cliff Click, and the scientific advisory council includes experts from Stanford like Trevor Hastie and Stephen Boyd.
This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/CgoxjmdyMiU This session will discuss how to get up and running quickly with containerized H2O environments (H2O Flow, Sparkling Water, and Driverless AI) at scale, in a multi-tenant architecture with a shared pool of resources using CPUs and/or GPUs. See how how you can spin up (and tear down) your H2O environments on-demand, with just a few mouse clicks. Find out how to enable quota management of GPU resources for greater efficiency, and easily connect your compute to your datasets for large-scale distributed machine learning. Learn how to operationalize your machine learning pipelines and deliver faster time-to-value for your AI initiative — while ensuring enterprise-grade security and high performance. Bio: Nanda Vijaydev is senior director of solutions at BlueData (now HPE) - where she leverages technologies like Hadoop, Spark, and TensorFlow to build solutions for enterprise analytics and machine learning use cases. Nanda has 10 years of experience in data management and data science. Previously, she worked on data science and big data projects in multiple industries, including healthcare and media; was a principal solutions architect at Silicon Valley Data Science; and served as director of solutions engineering at Karmasphere. Nanda has an in-depth understanding of the data analytics and data management space, particularly in the areas of data integration, ETL, warehousing, reporting, and machine learning.
This document provides an overview of architecting a first big data implementation. It defines key concepts like Hadoop, NoSQL databases, and real-time processing. It recommends asking questions about data, technology stack, and skills before starting a project. Distributed file systems, batch tools, and streaming systems like Kafka are important technologies for big data architectures. The document emphasizes moving from batch to real-time processing as a major opportunity.
This document summarizes a presentation given by Joe Chow on machine learning using H2O.ai's platform. The presentation introduced H2O, its products like Deep Water for deep learning, and demonstrated examples of building models with R and Python. It showed how H2O provides a unified interface for TensorFlow, MXNet and Caffe, allowing users to easily build and deploy deep learning models with different frameworks. The document provided an overview of the company and platform capabilities like scalable algorithms, model export and multiple language interfaces like R and Python.
This document provides an agenda and overview for a conference session on Big Data and NoSQL for database and BI professionals held from April 10-12 in Chicago, IL. The session will include an overview of big data and NoSQL technologies, then deeper dives into Hadoop, NoSQL databases like HBase, and tools like Hive, Pig, and Sqoop. There will also be demos of technologies like HDInsight, Elastic MapReduce, Impala, and running MapReduce jobs.
This presentation provides an overview of big data open source technologies. It defines big data as large amounts of data from various sources in different formats that traditional databases cannot handle. It discusses that big data technologies are needed to analyze and extract information from extremely large and complex data sets. The top technologies are divided into data storage, analytics, mining and visualization. Several prominent open source technologies are described for each category, including Apache Hadoop, Cassandra, MongoDB, Apache Spark, Presto and ElasticSearch. The presentation provides details on what each technology is used for and its history.
This document discusses Apache Dremio, an open source data virtualization platform that provides self-service SQL access to data sources like Elasticsearch, MongoDB, HDFS, and relational databases. It aims to make data analytics faster by avoiding the need for data staging, warehouses, cubes, and extracts. Dremio uses techniques like reflections, pushdowns, and a universal relational algebra to optimize queries and leverage caches. It is based on projects like Apache Drill, Calcite, Arrow, and Parquet and can be deployed on Hadoop or the cloud. The presentation includes a demo of using Dremio to create datasets, curate/prepare data, accelerate queries with reflections, and manage resources.
This document summarizes a presentation on using SQL Server Integration Services (SSIS) with HDInsight. It introduces Tillmann Eitelberg and Oliver Engels, who are experts on SSIS and HDInsight. The agenda covers traditional ETL processes, challenges of big data, useful Apache Hadoop components for ETL, clarifying statements about Hadoop and ETL, using Hadoop in the ETL process, how SSIS is more than just an ETL tool, tools for working with HDInsight, getting started with Azure HDInsight, and using SSIS to load and transform data on HDInsight clusters.
Erin LeDell's presentation on Intro to H2O Machine Learning in R at SCU - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an overview of H2O.ai, a leading AI platform company. It discusses that H2O.ai was founded in 2012, is funded with $75 million, and has products including its open source H2O machine learning platform and its Driverless AI automated machine learning product. It also describes H2O.ai's leadership in the machine learning platform market according to Gartner, its team of 90 AI experts, and its global presence across several offices. Finally, it outlines H2O.ai's machine learning capabilities and how customers can use its platform and products.
This document provides an agenda and overview of a talk on big data and data science given by Peter Wang. The key points covered include: - An honest perspective on big data trends and challenges over time. - Architecting systems for data exploration and analysis using tools like Continuum Analytics' Blaze and Numba libraries. - Python's role in data science for its ecosystem of libraries and accessibility to domain experts.
“AGI should be open source and in the public domain at the service of humanity and the planet.”
This document provides an overview of H2O.ai, an AI company that offers products and services to democratize AI. It mentions that H2O products are backed by 10% of the world's top data scientists from Kaggle and that H2O has customers in 7 of the top 10 banks, 4 of the top 10 insurance companies, and top manufacturing companies. It also provides details on H2O's founders, funding, customers, products, and vision to make AI accessible to more organizations.
Here are some key points about benchmarking and evaluating generative AI models like large language models: - Foundation models require large, diverse datasets to be trained on in order to learn broad language skills and knowledge. Fine-tuning can then improve performance on specific tasks. - Popular benchmarks evaluate models on tasks involving things like commonsense reasoning, mathematics, science questions, generating truthful vs false responses, and more. This helps identify model capabilities and limitations. - Custom benchmarks can also be designed using tools like Eval Studio to systematically test models on specific applications or scenarios. Both automated and human evaluations are important. - Leaderboards like HELM aggregate benchmark results to compare how different models perform across a wide range of tests and metrics.
Pritika Mehta, Co-Founder, Butternut.ai H2O Open Source GenAI World SF 2023
The document discusses LLMOps (Large Language Model Operations) compared to traditional MLOps. Some key points: - LLMOps and MLOps face similar challenges across the development lifecycle, but LLMOps requires more GPU resources and integration is faster due to more models in each application. Evaluation is also less clear. - The LLMOps field is around the 5th generation of models, with debates around proprietary vs open source models, and balancing privacy, cost and control. - LLMOps platforms are emerging to provide solutions for tasks like prompting, embedding databases, evaluation, and governance, similar to how MLOps platforms have evolved.
The document discusses optimizing question answering systems called RAG (Retrieve-and-Generate) stacks. It outlines challenges with naive RAG approaches and proposes solutions like improved data representations, advanced retrieval techniques, and fine-tuning large language models. Table stakes optimizations include tuning chunk sizes, prompt engineering, and customizing LLMs. More advanced techniques involve small-to-big retrieval, multi-document agents, embedding fine-tuning, and LLM fine-tuning.