Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One

•Download as PPTX, PDF•

4 likes•928 views

Presented at #H2OWorld 2017 in Mountain View, CA. Learn more about H2O.ai: https://www.h2o.ai/. Follow @h2oai: https://twitter.com/h2oai. - - - Effective volume anomaly detection presents unique challenges when monitoring customer transaction volumes across thousands of platforms and systems. We overcome this by using H2O, building on open source tools, and delivering machine learning anomaly detection for enterprise scale. Hear how we model, visualize then automatically alert on anomalous Mobile app volumes in real-time. Donald Gennetten has over 15 years experience supporting digital channels in the Financial Services industry. In his current role as a Data Engineer for Capital One’s Monitoring Intelligence team, he leads a cross-functional group of Data, Business, and Engineering subject matter experts to deliver Advanced Analytics solutions for real-time customer transaction monitoring and issue detection. Rahul Gupta is a Data Engineer in Capital One's Center for Machine Learning, focusing heavily on back-end development and model creation. His primary efforts include building an Algorithmic IT Operations (AIOps) platform that utilizes a combination of batch and streaming data with Machine Learning capabilities to improve the stability of Capital One services and overall customer experience.

Recommended for you

Challenges of Operationalising Data Science in Production

The presentation topic for this meet-up was covered in two sections without any breaks in-between Section 1: Business Aspects (20 mins) Speaker: Rasmi Mohapatra, Product Owner, Experian https://www.linkedin.com/in/rasmi-m-428b3a46/ Once your data science application is in the production, there are many typical data science operational challenges experienced today - across business domains - we will cover a few challenges with example scenarios Section 2: Tech Aspects (40 mins, slides & demo, Q&A ) Speaker: Santanu Dey, Solution Architect, Iguazio https://www.linkedin.com/in/santanu/ In this part of the talk, we will cover how these operational challenges can be overcome e.g. automating data collection & preparation, making ML models portable & deploying in production, monitoring and scaling, etc. with relevant demos.

•by iguazio

data sciencemachine learningmlops

Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...

This talk was recorded in London on October 30th, 2018 and can be viewed here: https://youtu.be/CeOJFynB6BE Real-Time AI: Designing for Low Latency and High Throughput Bio: Dr. Sergei Izrailev is Chief Data Scientist at Beeswax, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development, and scaling of data science-based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery.

•by Sri Ambati

Introducción al Machine Learning Automático

¿Cómo puede llevar el aprendizaje automático a las masas? Los proyectos de Machine Learning con la búsqueda de talento, el tiempo para construir e implementar modelos y confiar en los modelos que se construyen. ¿Cómo puede tener varios equipos en su organización para crear modelos de ML precisos sin ser expertos en ciencia de datos o aprendizaje automático? ¿Se pregunta sobre los diferentes sabores de AutoML? H2O Driverless AI emplea las técnicas de científicos expertos en datos en una aplicación fácil de usar que ayuda a escalar sus esfuerzos de ciencia de datos. La inteligencia artificial Driverless permite a los científicos de datos trabajar en proyectos más rápido utilizando la automatización y la potencia de computación de vanguardia de las GPU para realizar tareas en minutos que solían tomar meses. Con H2O Driverless AI, todos, incluyendo expertos y científicos de datos junior, científicos de dominio e ingenieros de datos pueden desarrollar modelos confiables de aprendizaje automático. Esta plataforma de aprendizaje automático de última generación ofrece una funcionalidad única y avanzada para la visualización de datos, la ingeniería de características, la interpretabilidad del modelo y la implementación de baja latencia. H2O Driverless AI hace: * Visualización automática de datos * Ingeniería automática de funciones a nivel de Grandmaster * Selección automática del modelo * Ajuste y capacitación automáticos del modelo * Paralelización automática utilizando múltiples CPU o GPU * Ensamblaje automático del modelo *automática del Interpretaciónaprendizaje automático (MLI) * Generación automática de código de puntuación ¿Quieres probarlo tú mismo? Puede obtener una prueba gratuita aquí: H2O Driverless AI trial. Venga a esta sesión y descubra cómo comenzar con el Aprendizaje automático automático con AI sin conductor H2O, y cree modelos potentes con solo unos pocos clics. ¡Te veo pronto! Acerca de H2O.ai H2O.ai es una empresa visionaria de software de código abierto de Silicon Valley que creó y reimaginó lo que es posible. Somos una empresa de fabricantes que trajeron al mercado nuevas plataformas y tecnologías para impulsar el movimiento de inteligencia artificial. Somos los creadores de, H2O, la principal plataforma de aprendizaje de ciencia de datos de fuente abierta y de aprendizaje automático utilizada por casi la mitad de Fortune 500 y en la que confían más de 14,000 organizaciones y cientos de miles de científicos de datos de todo el mundo.

•by Sri Ambati

artificial intelligencemachine learningdata science

Why not set volume alerts?
Unlike failure alerts, volume-based thresholds
vary by event type, hour, minute, day of week,
week of the year, holiday, and much more.
100+ customer event types
x
24 hours/day
x
7 days/week
x
52 weeks/year
Over 873k distinct thresholds to calculate, set
and maintain.
Machine Learning should be used
when:
• You cannot effectively code the solution
• You cannot scale

Solving the problem required going
beyond modeling
Visualize/Alert Pilot
Develop
Platform
ModelingDefine DataIdentify Business Case
Our goal was to deliver Machine Learning for Production Monitoring that:
• Followed Governance Requirements
• Used Available Data Science and Machine Learning Resources
• Leveraged Platform Engineering and Open Source Technology
• Ensured Usability and Scalability

Sparkling Water allowed us to rapidly test
and deploy machine learning
• Sparkling Water combines the fast, scalable ML algorithms of H2O, the H2O Flow UI, Scala, and
Python with the capabilities of Apache Spark
• In-memory processing supports big data environment needs
• Spark + Python + Scala enables a unified coding pipeline
• Grid search options allow for greater efficiency
• Test models
• Optimize hyperparameters
• H2O Flow facilitates ad-hoc experimentation
• REST API is easily integrated into production software

GBM provided greater flexibility and
benefits over traditional methods
• Traditional time series techniques assume stationary data (no trends/seasonality), constant variance
over time
• Univariate time series consists of single, sequential observations over equal time increments
• GBM model accepts external explanatory variables
• # accounts having payment due
• Incidents
• Change orders
• Payment due dates
• GBM also enables data filtering/exclusion (e.g., incident data for training set)

Recommended for you

Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...

This session was recorded in San Francisco on February 5th, 2019 and can be viewed here: https://youtu.be/VAW2eDht7JA Bio: Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analytics, statisticians and data scientists. Bio: Balaji Gopalakrishnan has over 15 years experience in the Machine Learning and Data Science space. Balaji has led cross functional data science and engineering teams for developing cutting-edge machine learning and cognitive computing capabilities for insurance fraud and underwriting, telematics, multi-asset class risk, scheduling under uncertainty, and others. He is passionate about driving AI adoption in organizations and strongly believes in the power of cross functional collaboration for this purpose.

•by Sri Ambati

Fast Data at ING – the why, what and how of the streaming analytics platform ...

ING is a large financial institution operating since 1881 with over 33 million customers. It aims to become more data-driven through its Think Forward strategy. It is building a streaming analytics platform using Apache Flink for real-time processing to enable uses cases like fraud detection and personalized insights. The platform uses a probabilistic approach combining event pattern matching, machine learning models in PMML format, and a post-processing stage to produce notifications. It is developed according to ING's agile way of working and provides both functional and modular flexibility.

•by Bas Geerdink

ingfast dataanalytics

Building A Product Assortment Recommendation Engine

Amid the increasingly competitive brewing industry, the ability of retailers and brewers to provide optimal product assortments for their consumers has become a key goal for business stakeholders. Consumer trends, regional heterogeneities and massive product portfolios combine to scale the complexity of assortment selection. At AB InBev, we approach this selection problem through a two-step method rooted in statistical learning techniques. First, regression models and collaborative filtering are used to predict product demand in partnering retailers. The second step involves robust optimization techniques to recommend a set of products that enhance business-specified performance indicators, including retailer revenue and product market share. With the ultimate goal of scaling our approach to over 100k brick-and-mortar retailers across the United States and online platforms, we have implemented our algorithms in custom-built Python libraries using Apache Spark. We package and deploy production versions of Python wheels to a hosted repository for installation to production infrastructure. To orchestrate the execution of these processes at scale, we use a combination of the Databricks API, Azure App Configuration, Azure Functions, Azure Event Grid and some custom-built utilities to deploy the production wheels to on-demand and interactive Databricks clusters. From there, we monitor execution with Azure Application Insights and log evaluation metrics to Databricks Delta tables on ADLS. To create a full-fledged product and deliver value to customers, we built a custom web application using React and GraphQL which allows users to request assortment recommendations in a self-service, ad-hoc fashion.

•by Databricks

We developed an open source, cloud-
based platform for rapid delivery
1 Retrieve volumes for training
2 Provide holiday and other static
data
3 Store forecast with
actual volumes
4 Detect & flag anomalies
Amazon S3 Sparkling Water InfluxDB Amazon EC2 Grafana
5 Display volumes, forecast & anomalies

What does it look like?
Monitoring teams are easily able to visually inspect forecasted and actual volumes in real-time
Forecasts are
available for future
dates to aid in capacity
planning
Now

What does anomalous volume look like?
Small changes in expected volume are easy to detect, measure, and alert
~12% of expected
events were missing
after a planned change
to the streaming data
platform
Alerts triggered due to lower than expected volume; Root cause analysis determined a platform
release was casing dropped data and a code roll back was required to resolve the issue

Does it improve incident detection times?
Anomaly detection alerts are sent ahead of escalation and detection times, including when other
alarms aren't triggered
Anomaly detected at
11:15 p.m. when Login
volumes spiked ~20k
higher than expected
Incident response teams were alerted at 11:17 p.m., more than 4 minutes before other incident
alarms

Recommended for you

H2O World - Building a Smarter Application - Tom Kraljevic

This document discusses building smarter applications that incorporate machine learning models. It provides an overview of combining predictive models with applications, deploying models in production, and a concrete use case of a consumer loan application. The use case involves building two predictive models using H2O - one for predicting if a loan will be bad, and one for predicting the interest rate. The document outlines the steps to build such a smarter application and integrate predictive models via a REST API. It also describes the data, models, and software tools used in the example application code provided.

•by Sri Ambati

machine learningdata scienceh2o

Scalable and Automatic Machine Learning with H2O

H2O is widely used for machine learning projects. A TechCrunch article, published in January 2017 by John Mannes, reported that around 20% of Fortune 500 companies use H2O. Talk 1: Introduction to Scalable & Automatic Machine Learning with H2O In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O and other tools have made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. In this presentation, Joe will introduce the AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard. Talk 2: Making Multimillion-dollar Baseball Decisions with H2O AutoML and Shiny Joe recently teamed up with IBM and Aginity to create a proof of concept "Moneyball" app for the IBM Think conference in Vegas. The original goal was to prove that different tools (e.g. H2O, Aginity AMP, IBM Data Science Experience, R and Shiny) could work together seamlessly for common business use-cases. Little did Joe know, the app would be used by Ari Kaplan (the real "Moneyball" guy) to validate the future performance of some baseball players. Ari recommended one player to a Major League Baseball team. The player was signed the next day with a multimillion-dollar contract. This talk is about Joe's journey to a real "Moneyball" application. Bio : Jo-fai (or Joe) Chow is a data scientist at H2O.ai. Before joining H2O, he was in the business intelligence team at Virgin Media in UK where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab in the US as a data science evangelist promoting products via blogging and giving talks at meetups. Joe has a background in water engineering. Before his data science journey, he was an EngD research engineer at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimization. Prior to that, he was an asset management consultant specialized in data mining and constrained optimization for the utilities sector in the UK and abroad. He also holds an MSc in Environmental Management and a BEng in Civil Engineering.

•by Sri Ambati

Rakuten - Recommendation Platform

This document discusses recommendations and personalization at Rakuten. It notes that Rakuten has over 100 million users and handles over 40 million item views per day. Recommendation challenges include dealing with different languages, user behaviors, business areas, and aggregating data across services. Rakuten uses a member-based business model that connects its various services through a common Rakuten ID. The document outlines Rakuten's business-to-business-to-consumer model and how recommendations must handle many shops, item references, and a global catalog. It also provides an overview of Rakuten's recommendation system and some of the challenges in generating and ranking recommendation candidates.

•by Karthik Murugesan

Solar events as a predictor?
Variation from predicted login volume was easily quantified during the August 21st solar eclipse;
Interest appears to have been lost within 15 minutes of totality
A
A. 12:06 p.m. EDT
(9:06 a.m. PDT)
the solar eclipse
starts in Salem,
Oregon
B. 2:41 p.m. EDT
(11:41 a.m. PDT)
totality begins in
Columbia, South
Carolina
C. 4:06 p.m. EDT
(1:06 p.m. PDT)
eclipse ends
B C
Variation from forecast

What's hot

H2O Driverless AI Workshop

Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One

Related slideshows

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One

Similar to Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One