SlideShare a Scribd company logo
H2O.ai Confidential
Generative AI Masterclass - Model Risk
Management
H2O.ai Confidential
ATLANTA
WELCOME
v
H2O.ai Confidential
Introduction
- Today’s training will look into responsible, explainable and interpretable AI when applied in the context of
Generative AI and specifically Large Language Models (LLMS).
- This will include both several sections on theoretical concepts as well as hands-on labs using Enterprise
h2oGPT and H2O GenAI Applications.
- These hands-on labs focus on applying Gen AI in the context of a Model Risk Manager’s role at a bank or
financial institution.
- NOTE: A separate end-to-end masterclass on Generative AI is also available within the training environment,
as well as on github: https://github.com/h2oai/h2o_genai_training.
Including:
- Data Preparation for LLMs
- Fine-Tuning custom models
- Model Evaluation
- Retrieval-Augmented Generation (RAG)
- Guardrails
- AI Applications
v
H2O.ai Confidential
Section Session Duration Speaker
Welcome Session Kick-off 5m Jon Farland
Interpretability for Generative AI Large Language Model
Interpretability
25m Kim Montgomery
Workshop: Explainable and
Interpretable AI for LLMs
20m Navdeep Gill
Benchmarking and Evaluations Frameworks for Evaluating
Generative AI
20m Srinivas Neppalli
Workshop: Experimental
Design of Gen AI Applications
20m Jon Farland
Security, Guardrails and Hacking Workshop: Guardrails and
Hacking
20m Ashrith Barthur
Applied Generative AI for Banking
- Complaint Summarizer
Workshop: Complaint
Summarizer AI Application
20m Jon Farland
Agenda

Recommended for you

AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf

This document discusses AI and ChatGPT. It begins with an introduction to David Cieslak and his company RKL eSolutions, which provides ERP sales and consulting. It then provides definitions for key AI concepts like artificial intelligence, generative AI, large language models, and ChatGPT. The document discusses OpenAI's ChatGPT tool and how it works. It covers prompts, commands, and potential uses and impacts of generative AI technologies. Finally, it discusses concerns regarding generative AI and the future of life institute's call for more oversight of advanced AI.

[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) combines the concepts of semantic search and LLM-based text generation. When a person makes a query in natural language, the query is compared to the entries in the knowledge base and most relevant results are returned to the LLM, which uses this extra information to generate more accurate and reliable response. RAG can therefore limit hallucination and provide accurate responses from reliable source. In this talk, we will present the concept of RAG and underlying concept of semantic search, and present available libraries and vector databases.

Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe

This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.

nlpgpt2gpt3
v
H2O.ai Confidential
Housekeeping
- The training environment for today is a dedicated instance of the H2O AI Managed
Cloud, a GPU-powered environment capable of training and deploying LLMs, as well
designing and hosting entire AI Applications.
- It an be accessed at https://genai-training.h2o.ai.
- Login credentials should have been provided to the email address you were registered
with.
- If you don’t yet have credentials, or you are otherwise unable to access the
environment, please speak with any member of the H2O.ai team member.
- The training environment will be available to attendees for 3 days after the conference,
but dedicated proof-of-concept environments can be provided (including on-
premise) at request. Please speak to any H2O.ai team member or email
jon.farland@h2o.ai
H2O.ai Confidential
Interpretability for
Generative AI
What is Generative AI?
GenAI enables the creation of novel content
Input
GenAI Model
Learns patterns in
unstructured data
Unstructured data
Output Novel Content
Data
Traditional AI Model
Learns relationship
between data and label
Output Label
Labels
VS
H2O.ai Confidential
More complicated input:
● Prompt phrasing
● Instructions
● Examples
More relevant dimensions to output:
● Truthfulness/Accuracy
● Safety
● Fairness
● Robustness
● Privacy
● Machine Ethics
[TrustLLM: Trustworthiness in Large Language Models, Sun, et al]
GenAI Complications

Recommended for you

Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)

第74回 Machine Learning 15minutes! Broadcast https://machine-learning15minutes.connpass.com/event/272281/

aibingopenai
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf

Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities? This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.

generativeaimlopsgenai
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf

The article "Exploring Opportunities in the Generative AI Value Chain" by McKinsey & Company's QuantumBlack provides insights into the value created by generative artificial intelligence (AI) and its potential applications.

aiartificial intelligencevalue chain
H2O.ai Confidential
● Can the model recognize problematic responses?
○ Inaccurate responses
○ Unethical responses
○ Responses conveying stereotypes
● Can an inappropriate response be provoked?
○ Jailbreaking
○ Provoking toxicity
○ Leading questions / false context
Common tests
H2O.ai Confidential
TrustLLM Result Summary Matrix
[TrustLLM: Trustworthiness in Large Language Models, Sun, et al]
H2O.ai Confidential
TrustLLM Main Conclusions
TrustLLM Main Findings:
● Trustworthiness and utility were positively correlated.
● Generally closed-sourced models outperformed open source.
● Over alignment for trustworthiness can compromise utility.
[TrustLLM: Trustworthiness in Large Language Models, Sun, et
al]
v
H2O.ai Confidential
Accuracy: Traditional ML
Traditional machine
learning
● Comparing a prediction
to an outcome
● Generally the correct
labels are in a simple
format

Recommended for you

Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf

What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.

generative aillms
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf

LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.

llm powered applicationbuild llm powered applicationllm
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021

The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.

artificial intelligencemachine learningfuture
v
H2O.ai Confidential
Accuracy: Example LLMs
The simplest way to measure accuracy is to compare the result
against another source of information.
Example sources:
● Checking results against a given source (RAG)
● Checking results against the tuning data
● Checking results against an external source (eg wikipedia)
● Checking results against the training data (cumbersome).
● Checking for self-consistency (Self-check GPT)
● Checking results against a larger LLM
Scoring methods:
● Natural language inference
● Comparing embeddings
● Influence functions
v
H2O.ai Confidential
RAG (Retrieval Augmented Generation)
01
Chunk and
Embed
Documents 02
Submit a
Query
Retrieve
Relevant
Information via
Similarity
Search
03
04 05
Combine relevant
information to
ground the query to
the model
Generate
Embedding
for Query
v
H2O.ai Confidential
Accuracy: Retrieval Augmented Generation
(RAG) Provides a Simple Solution
v
H2O.ai Confidential
Accuracy: Retrieval Augmented
Generation (RAG) Provides a Simple
Solution

Recommended for you

Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...

Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models. Mihai will share lessons learned building Retrieval Augmented Generation, or “Chat with Documents” platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. He’ll also share tips on writing code using LLM, including building an agent for Ansible and containers. Scaling factors for Large Language Model Architectures: • Vector Database: consider sharding and High Availability • Fine Tuning: collecting data to be used for fine tuning • Governance and Model Benchmarking: how are you testing your model performance over time, with different prompts, one-shot, and various parameters • Chain of Reasoning and Agents • Caching embeddings and responses • Personalization and Conversational Memory Database • Streaming Responses and optimizing performance. A fine tuned 13B model may perform better than a poor 70B one! • Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are terrible at reasoning and prediction, consider calling other models) • Fallback techniques: fallback to a different model, or default answers • API scaling techniques, rate limiting, etc. • Async, streaming and parallelization, multiprocessing, GPU acceleration (including embeddings), generating your API using OpenAPI, etc.

langchainpythongenai
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices

In this event we will cover: - What is Generative AI and how it is being for future of work. - Best practices for developing and deploying generative AI based models in productions. - Future of Generative AI, how generative AI is expected to evolve in the coming years.

#uipathcommunity#rpa#ai
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania

The document discusses advances in large language models from GPT-1 to the potential capabilities of GPT-4, including its ability to simulate human behavior, demonstrate sparks of artificial general intelligence, and generate virtual identities. It also provides tips on how to effectively prompt ChatGPT through techniques like prompt engineering, giving context and examples, and different response formats.

gptgpt4gpt3
H2O.ai Confidential
Influence functions
● Seeks to measure the influence of including a data point in
the training set on model response.
● Datamodels/TRAK
○ Learn model based on binary indicator functions.
○ Directly measure how much a training instance influences
the outcome.
● DataInf
○ Measures the influence of a document during fine tuning.
[DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and
Diffusion Models, Kwon et. al]
[TRAK: Attributing Model Behavior at Scale. Park et. al]
H2O.ai Confidential
Influence functions / computer vision
[TRAK: Attributing Model Behavior at Scale. Park et. al]
H2O.ai Confidential
Influence functions / NLP
Influence functions / nlp
[Studying Large Language Model Generalization with Influence Functions, Grosse, et. al]
H2O.ai Confidential
Self consistency comparison
Self-Check GPT
● Sampling different responses from an LLM.
● Checking for consistency between responses.
● Assuming that hallucinations will occur less consistently.
[SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models, Potsawee Manakul, Adrian Liusie, Mark
JF Gales]

Recommended for you

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1

Session 1 👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed: Introduction to Generative AI & harnessing the power of large language models. What’s generative AI & what’s LLM. How are we using it in our document understanding & communication mining models? How to develop a trustworthy and unbiased AI model using LLM & GenAI. Personal Intelligent Assistant Speakers: 📌George Roth - AI Evangelist at UiPath 📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP 📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP

#uipathcommunity#rpadeveloper#uipath
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing

The document discusses key players in generative AI and their progress. It provides an overview of generative AI including its evolution since 1950, where the spending is focused, how the technology works, and deployment models. It then profiles several major companies leading advancements in generative AI, including their strategies, growth areas, and risks. These companies are TSMC, Nvidia, Microsoft, Google, Amazon, Tesla, Oracle, Salesforce, SAP, and Palo Alto Networks.

aidigital transformationgenai
LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)

How to hack a LLM-integrated system using prompt injection. This is the slide-deck that I used for AVTOKYO 2023.

chatgptllmhacking
v
H2O.ai Confidential
Counterfactual analysis: Traditional ML
● How does changing a feature change the model outcome?
● What is the smallest change that can change the outcome?
v
H2O.ai Confidential
Counterfactual analysis: Model Analyzer
v
H2O.ai Confidential
Counterfactual analysis: LLM
How consistent are results under different:
● Prompts / instructions.
○ Changes in prompt design
○ Changes in prompt instructions
○ Multi-shot examples
○ Word replacement with synonyms
○ Proper names or pronouns (fairness)
○ Chain of thought / other guided reasoning related methods
● Different context / RAG retrieval
v
H2O.ai Confidential
Intervention in the case of problems
If problematic behavior is found in a model there are several options.
● Prompt/ instruction modifications.
● Choosing a different base model.
● Fine-tuning to modify LLM model behavior
● Altering the document retrieval process (RAG)
● Monitoring model output for problematic responses.

Recommended for you

Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs

The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.

Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale

In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.

openaichatgptazure openai
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?

This document provides an overview of a talk by Maxim Salnikov and Jon Jahren at Oslo Spektrum from November 7-9. It discusses using OpenAI with your own data and how to get started. Examples of enterprise use cases for generative AI are presented, such as chatbots, document indexing, and financial analysis. Tools for prompt engineering like LangChain and Semantic Kernel are introduced. Best practices for fine-tuning models on proprietary data are covered, including data formatting, training data size, and an iterative tuning process. Responsible AI techniques like grounding responses and maintaining a positive tone are also discussed.

openaiazure openaichatgpt
H2O.ai Confidential
Conclusions
● Many of the basic problems of understanding LLMs are
similar to that of other large models.
● Through careful testing we can hope to understand and
correct some of the safety issues involved in using LLMs.
H2O.ai Confidential
Kim Montgomery
LLM interpretation
kim.montgomery@h2o.ai
Contact
Thank you!
H2O.ai Confidential
Lab 1 - Using Chain-of-
Verification for Explainable AI
v
H2O.ai Confidential
Chain of Verification (CoVe)
CoVe enhances the reliability of answers provided by Large Language Models, particularly in
factual question and answering scenarios, by systematically verifying and refining responses
to minimize inaccuracies.
The CoVe method consists of the following four sequential steps:
1. Initial Baseline Response Creation: In this step, an initial response to the original question is generated as a starting
point.
1. Verification Question Generation: Verification questions are created to fact-check the baseline response. These
questions are designed to scrutinize the accuracy of the initial response.
1. Execute Verification: The verification questions are independently answered to minimize any potential bias. This step
ensures that the verification process is objective and thorough.
1. Final Refined Answer Generation: Based on the results of the verification process, a final refined answer is generated.
This answer is expected to be more accurate and reliable, reducing the likelihood of hallucinations in the response.

Recommended for you

LLM-Datacraft.pdf
LLM-Datacraft.pdfLLM-Datacraft.pdf
LLM-Datacraft.pdf

Automated generation of high quality dataset based on your documents (pdf, csv, json, text files etc) to evaluate any LLM app endpoints

large language modelnlpbig data
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx

Reviewing progress in the machine learning certification journey 𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid 𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran. A discussion on sample questions to aid certification exam preparation. An interactive Q&A session to clarify doubts and questions. Previewing next steps and topics, including course completions and material reviews.

google cloud
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start

In this session, we'll explore different scenarios where the features of Generative AI can provide added value to an IT solution. We'll also learn how to begin developing your own application powered by AI. Using Azure OpenAI service as an illustration, we'll examine the various APIs it offers, review the best practices of Prompt Engineering, explore different ways to incorporate your own data into the process, and take a glance at several tools and resources that make the developer experience more seamless.

openaichatgptprompt engineering
v
H2O.ai Confidential
Verification Questions
Questions are categorized into three main groups:
1. Wiki Data & Wiki Category List: This category involves questions that expect answers in
the form of a list of entities. For instance, questions like “Who are some politicians born in
Boston?”
2. Multi-Span QA: Questions in this category seek multiple independent answers. An
example would be: “Who invented the first mechanized printing press and in what year?”
The answer is “Johannes Gutenberg, 1450”.
3. Long-form Generation: Any question that requires a detailed or lengthy response falls
under this group.
v
H2O.ai Confidential
Chain of Verification (CoVe)
Dhuliawala, Shehzaad, et al. "Chain-of-Verification Reduces Hallucination in Large Language Models." arXiv
preprint arXiv:2309.11495 (2023)
v
H2O.ai Confidential
CoVe and Explainable AI (XAI)
● Interpretability and Transparency:
○ Verification process generates questions to fact-check baseline
responses, improving transparency in decision-making.
● Reliability and Trust:
○ Refined answers enhance accuracy, building trust and reliability in
model outputs.
● Bias and Fairness:
○ Verification questions in CoVe identify and mitigate potential biases in
model output.
● User Interaction:
○ Verification process involves user interaction through verification
questions.
v
H2O.ai Confidential
https://chain-of-verifcation.genai-training.h2o.ai/
Who are some
politicians born in
Boston?

Recommended for you

Formal Versus Agile: Survival of the Fittest? (Paul Boca)
Formal Versus Agile: Survival of the Fittest? (Paul Boca)Formal Versus Agile: Survival of the Fittest? (Paul Boca)
Formal Versus Agile: Survival of the Fittest? (Paul Boca)

The potential for combining agile and formal methods holds promise. Although it might not always be an easy partnership, it will succeed if it can foster a fruitful interchange of expertise between the two communities. In this talk I explain how formal methods can complement agile practices and vice versa. There are no pre-requisites for this talk, except an open mind and a desire to make software development more reliable. Leave any pre-conceptions at home, and be prepared for myths to be dispelled.

formal methods agile programming do-178b open-do s
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...

Agile Network India - Chennai Title: Conversational AI for Agility in Healthcare by Shiney Jeyaraj Date: March 2024 Hosted by: CGI

agile network indiaagilityai in healthcare
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...

1) Learning user and item representations is challenging due to sparse data and shifting preferences in recommender systems. 2) The presentation outlines research at Google to address sparsity through two approaches: focused learning, which develops specialized models for subsets of data like genres or cold-start items, and factorized deep retrieval, which jointly embeds items and their features to predict preferences for fresh items. 3) The techniques have improved overall viewership and nomination of candidates, demonstrating their effectiveness in production recommender systems.

ml platformmachine learningdeep retrieval
v
H2O.ai Confidential
Who are some
CEOs of banks
in the US?
v
H2O.ai Confidential
What are some
credit scoring
bureaus in the US?
v
H2O.ai Confidential
What are some
agencies assigned to
regulate and oversee
financial institutions
in the US?
v
H2O.ai Confidential
Provide a list of major
investment firms and
financial institutions
headquartered in the
United States?

Recommended for you

Problem prediction model
Problem prediction modelProblem prediction model
Problem prediction model

The document describes a problem prediction model that uses artificial intelligence algorithms to evaluate changes made by an IT company and anticipate potential problems. The model analyzed 194 known problems, 2,400 past changes, and 201 predicted future changes. As a result, the model identified one change from October 29, 2019 that was likely to cause a problem. A team is investigating this potential issue. The document concludes that the naive Bayes classifier model is an important tool for change analysis and problem prediction.

machine learningartificial inteligencenatural science
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...

Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai H2O Open Source GenAI World SF 2023 In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.

Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018

This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling. He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.

v
H2O.ai Confidential
Benefits and Limitations of CoVe
● Benefits:
○ Enhanced Reliability: By incorporating verification steps, users can trust the accuracy of
information obtained from LLMs.
○ Depth of Understanding: The refinement of answers allows users to gain a deeper
understanding of the topic beyond the initial response.
○ Educational Value: Promotes responsible and informed use of LLMs, encouraging users to go
beyond surface-level information.
● Limitations
○ Incomplete Removal of Hallucinations: CoVe does not completely eliminate hallucinations in
generated content, which means it can still produce incorrect or misleading information.
○ Limited Scope of Hallucination Mitigation: CoVe primarily addresses hallucinations in the
form of directly stated factual inaccuracies but may not effectively handle other forms of
hallucinations, such as errors in reasoning or opinions.
○ Increased Computational Expense: Generating and executing verification alongside
responses in CoVe adds to the computational cost, similar to other reasoning methods like
Chain-of-Thought.
○ Upper Bound on Improvement: The effectiveness of CoVe is limited by the overall capabilities
of the underlying language model, particularly in its ability to identify and rectify its own
mistakes.
v
H2O.ai Confidential
How to improve the CoVe pipeline
● Prompt engineering
● External tools
○ Final output highly depends on the answers of the verification questions.
○ For factual questions & answering you can use advanced search tools like google search or
serp API etc.
○ For custom use cases you can always use RAG methods or other retrieval techniques for
answering the verification questions.
● More chains
● Human in the loop
H2O.ai Confidential
Conclusions
● CoVe aim to improves model transparency, reliability, and trust.
● CoVe is not a silver bullet, but it can improve a LLM testing
arsenal.
H2O.ai Confidential
Navdeep Gill
Engineering Manager, AI Governance | Responsible AI
navdeep.gill@h2o.ai
Contact
Thank you

Recommended for you

Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...

With organizations under intense pressure to get products out to market quickly, they can’t afford to operate within operational silos. Yet communicating and collaborating across the organizational boundaries of QA and development can be difficult. Development is typically a black box to QA teams. QA has no visibility into the quality and security of the code until late in the lifecycle. Watch this recorded webcast to learn how to break down the barriers and improve visibility and transparency by integrating development testing results into the IBM Rational Team Concert and providing QA and development with a unified workflow for ensuring code quality. Explore different development testing techniques and the types of defects and security vulnerabilities they can find. About the Presenter: James Croall, Director of Product Management, Coverity Over the last 8 years, James Croall has helped a wide range of customers incorporate static analysis into their software development lifecycle. Prior to Coverity, Mr. Croall spent 10 years in the computer and network security industry as a C/C++ and Java software engineer.

 
by GRUC
rtcgruc webcastqa
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability

1) Generative AI (GenAI) enables the creation of novel content by learning patterns in unstructured data rather than labeling outputs like traditional AI. 2) Both traditional and generative AI models lack transparency and may contain biases, but generative models can additionally hallucinate or leak private information. 3) To interpret generative models, researchers evaluate accuracy globally by checking for hallucinations or undesirable content, and locally by confirming the quality of individual responses.

Agile Methods: Fact or Fiction
Agile Methods: Fact or FictionAgile Methods: Fact or Fiction
Agile Methods: Fact or Fiction

The document discusses Agile software development methods and provides evidence that Agile approaches are effective. It defines Agile development as iterative and incremental with close collaboration. Case studies show organizations achieving better results with Agile, including increased productivity, quality, and customer satisfaction. Adopting Agile practices like Scrum and test-driven development enables organizations to adapt to changing priorities and deliver working software more frequently.

factsfictionscrum
H2O.ai Confidential
Benchmarking and
Evaluation
v
H2O.ai Confidential
Write a 1000 word essay in 1 minute
LLMs are good at generating large amount of text that is consistent and
logical.
Are LLMs smarter than humans?
Introduction
Have LLMs manage your investment portfolio
A model can give a generic advice on safe money management. But we don’t
trust our life savings with a chat bot.
Let a bot reply to your email
It depends on how important the email is. May be we are more comfortable
with the model automatically creating a draft.
v
H2O.ai Confidential
Summarization
Summarizing large documents without losing essential information. Extracting
key-value pairs.
How can we use LLMS while minimizing risk?
Introduction
Customer Service
Answer FAQs from customers. May require retrieving from a knowledge base
and summarizing.
Report Generation - AutoDoc
Create ML interpretation documents. Reports required for regulatory
compliance.
v
H2O.ai Confidential
Risk
How risky are LLMs?
A lawyer used ChatGPT to prepare
a court filing. It went horribly awry.
“While ChatGPT can be useful to
professionals in numerous industries,
including the legal profession, it has
proved itself to be both limited and
unreliable. In this case, the AI invented
court cases that didn't exist, and
asserted that they were real.”
CBS News
Chevy dealership’s AI chatbot
suggests Ford F-150 when asked
for best truck
“As an AI, I don't have personal
preferences but I can provide insights
based on popular opinions and
reviews. Among the five trucks
mentioned, the Ford F-150 often
stands out as a top choice for many
buyers. It's known for its impressive
towing …”
Detroit Free Press

Recommended for you

All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...

Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW

generative aigenaiai
Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Human-Centered Artificial Intelligence: Reliable, Safe & TrustworthyHuman-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy

Reliable, Safe & Trustworthy are some key factors to be considered for Human-Centered AI. There are certain Guidelines for Human-AI Interaction to be taken into evaluation to ensure RST systems overcome autonomy problems.

artificial intelligencehuman resourceshuman
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps

This presentation was made on June 30th, 2020. Recording of the presentation is available here: https://youtu.be/9LajqAL_CU8 As enterprises “make their own AI”, a new set of challenges emerge. Maintaining reproducibility, traceability, and verifiability of machine learning models, as well as recording experiments, tracking insights, and reproducing results, are key. Collaboration between teams is also necessary as “model factories” are created for enterprise-wide model data science efforts. Additionally, monitoring of models ensures that drift or performance degradation is addressed with either retraining or model updates. Finally, data and model lineage in case of rollbacks or addressing regulatory compliance is necessary. H2O ModelOps delivers centralized catalog and management, deployment, monitoring, collaboration, and administration of machine learning models. In this webinar, we learn how H2O can assist with operationalizing, scaling and managing production deployments. Speaker's Bio: Felix is a part of the Customer Success team in Asia Pacific at H2O.ai. An engineer and an IIM alumni, Felix has held prominent positions in the data science industry.

v
H2O.ai Confidential
Data Fine Tuning RAG
Foundation
Model
Leaderboard Risk
Management
Large & Diverse
To train a foundation model, you
need a large, diverse dataset that
covers the tasks the model should
be able to perform.
LLM Lifecycle
Supervised Fine
Tuning
Fine-tuning can improve a model's
performance on a task while
preserving its general language
knowledge.
h2oGPTe
A powerful search assistant to
answer questions from large
volumes of documents, websites,
and workplace content.
Generative AI
They are designed to produce a
wide and general variety of
outputs, such as text, image or
audio generation. They can be
standalone systems or can be used
as a "base" for many other
applications.
HELM
HELM is a framework for evaluating
foundation models. Leaderboard
shows how the various models
perform across different groups of
scenarios and different metrics.
Eval Studio
Design and execute task-specific
benchmarks. Perform both manual
and LLM based evaluations.
Systematically collect and store
results along with metadata.
v
H2O.ai Confidential
MMLU (Massive Multitask Language Understanding)
A test to measure a text model's multitask accuracy. The
test covers 57 tasks including elementary mathematics, US
history, computer science, law, and more.
Evaluation for LLMs
Popular benchmarks on open source leaderboards
HellaSwag
A test of common-sense inference, which is easy for
humans (~95%) but challenging for SOTA models.
A12 Reasoning Challenge (ARC)
A set of grade-school science questions.
Truthful QA
A test to measure a model’s propensity to reproduce
falsehoods commonly found online.
When you drop a ball from rest it accelerates downward at 9.8 m/s². If
you instead throw it downward assuming no air resistance its
acceleration immediately after leaving your hand is
(A) 9.8 m/s²
(B) more than 9.8 m/s²
(C) less than 9.8 m/s²
(D) Cannot say unless the speed of throw is given.
MMLU Example
A woman is outside with a bucket and a dog. The dog is running
around trying to avoid a bath. She…
(A) Rinses the bucket off with soap and blow dry the dog’s head.
(B) Uses a hose to keep it from getting soapy.
(C) Gets the dog wet, then it runs away again.
(D) Cannot say unless the speed of throw is given.
HellaSwag Example
v
H2O.ai Confidential
Hugging Face Open LLM
Leaderboard
It is a popular location to track
various models evaluated using
different metrics.
These metrics include human
baselines that provide us some
idea of how these models have
been drastically improved over
the last two years.
Approaching human baseline
Popular benchmarks on open source leaderboards
H2O.ai Confidential
Benchmarks are not task specific
Benchmarks on open-source
leaderboards are well-rounded and
diverse. They are not sufficient to
reflect the performance of the
model in a domain specific scenario.
The Need for Evaluation
Popular leaderboards are not enough
Some Model Entries may cheat!
There can be models on the
leaderboard that are trained on the
benchmark data itself. We do not
have robust enough tests to detect
this.
Non-verifiable Results
The procedure followed in
conducting the tests and the results
are not completely transparent and
can also vary among different
leaderboards.

Recommended for you

Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & Services

A whirlwind tour of Glasswall Solution’s use of Wardley Maps and experiments with a Service-based operating model. Delivered at Open Security Summit Dec 7th, 2020 as context for a panel discussion, which you can watch here: https://www.youtube.com/watch?v=GS8Vndr-B4A The original 100-slide deck is available here: https://open-security-summit.org/tracks/2020/mini-summits/dec/wardley-maps/wardley-maps-and-services-model-at-glasswall/

wardley mapsservicescell based organisations
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams

This presentation dives into the practical applications of machine learning within Google's operations, providing a comprehensive overview of how to leverage AI technologies to solve real-world business challenges. Key Points Covered: - Introduction to Machine Learning at Google: Discussion on the role of ML and its evolution in enhancing Google's operational efficiency. - Experience Sharing: Insights into the team's long-term engagement with machine learning projects and the impacts on Google’s operational strategies. - Practical Applications: Real-world examples of ML applications within Google’s daily operations, providing a blueprint to adapt similar strategies. - Challenges and Solutions: Discussion on the challenges faced during the implementation of ML projects and the strategic solutions employed to overcome them. - Future of ML at Google: Insights into future trends in machine learning at Google and how they plan to continue integrating AI into their ecosystem.

Automated Testing DITA Content and Customizations
Automated Testing DITA Content and CustomizationsAutomated Testing DITA Content and Customizations
Automated Testing DITA Content and Customizations

The document discusses various methods for automated testing of DITA content and output, including using Schematron for validating content structure, the QA plugin for identifying tagging errors, XMLUnit for comparing XML, and the DITA OT regression test for validating the output of the open-source DITA Open Toolkit. It also covers automating browser tests using Selenium and comparing HTML output using Needle and Nose. Demo examples are provided for several of these automated testing tools and techniques.

testingdita
v
H2O.ai Confidential
Create task specific QA
pairs along with the
Reference documents.
- Bank Teller
- Loan officer
- Program Manager
- Data Analyst
Custom Test Sets
Create custom benchmarks for domain specific scenarios
Task Specific Evals
Create the QA pairs that
test for agreement with
your values, intentions,
and preferences.
- Correctness
- Relevance
- Similarity
- Hallucination
- Precision
- Recall
- Faithfulness
Test for Alignment
Test that all outputs meet
your safety levels.
- Toxicity
- Bias
- Offensive
- PII of customers
- Company Secrets
Test for Safety
Tests to confirm or show
proof of meeting
compliance standards.
- Government
- Company
Test for Compliance
v
H2O.ai Confidential
H2O Eval Studio
Design and Execute task specific benchmarks
All the Evaluators are included
Eval studio contains evaluators to check for
Alignment, Safety, and Compliance as
discussed before.
Create custom benchmarks
Users can upload Documents and create
custom Tests (Question-Answer pairs) based on
the document collection.
Run Evals and visualize results
Once a benchmark has been designed, users
can then run the evaluation against the
benchmark and visualize the results. A detailed
report can also be downloaded.
H2O.ai Confidential
Srinivas Neppalli
Senior AI Engineer
srinivas.neppalli@h2o.ai
Contact
Thank you!
H2O.ai Confidential
Lab 2 - Experimental Design of
Gen AI Evaluations

Recommended for you

Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content

What is Generative AI?

generative aipwckiit
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

“AGI should be open source and in the public domain at the service of humanity and the planet.”

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

This document provides an overview of H2O.ai, an AI company that offers products and services to democratize AI. It mentions that H2O products are backed by 10% of the world's top data scientists from Kaggle and that H2O has customers in 7 of the top 10 banks, 4 of the top 10 insurance companies, and top manufacturing companies. It also provides details on H2O's founders, funding, customers, products, and vision to make AI accessible to more organizations.

h2o.aiwells fargogenai
v
H2O.ai Confidential
Through the Lens of Model Risk Management
One possible definition of “Conceptual Soundness”
for LLMs by themselves might be considered as a
combination of the following choices:
(1)Training Data
(1)Model Architecture
(1)An explanation of why (1) and (2) were made
(1)An explanation of why (1) and (2) are reasonable
for the use case that the LLM will be applied to.
v
H2O.ai Confidential
Through the Lens of Model Risk Management
What about a RAG system?
How does the concept of “Conceptual Soundness” get
applied when not only choices surrounding training
data and model architecture involved, but also choices
around:
- Embeddings
- System Prompts (e.g. Personalities)
- Chunk Sizes
- Chunking Strategies
- OCR Techniques
- RAG-type (e.g. Hypothetical Document Embeddings)
- Mixture-of-Experts or Ensembling
H2O.ai Confidential
Models / Systems / Agents are the fundamental AI
systems under scrutiny. As opposed to traditional
machine learning models, Generative AI include many
choices beyond the models themselves
H2O.ai Confidential
Benchmarks / Tests are the sets of prompts and
response that are used gauge how well an AI system can
perform a certain task or use case.

Recommended for you

AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek

Pritika Mehta, Co-Founder, Butternut.ai H2O Open Source GenAI World SF 2023

LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th

The document discusses LLMOps (Large Language Model Operations) compared to traditional MLOps. Some key points: - LLMOps and MLOps face similar challenges across the development lifecycle, but LLMOps requires more GPU resources and integration is faster due to more models in each application. Evaluation is also less clear. - The LLMOps field is around the 5th generation of models, with debates around proprietary vs open source models, and balancing privacy, cost and control. - LLMOps platforms are emerging to provide solutions for tasks like prompting, embedding databases, evaluation, and governance, similar to how MLOps platforms have evolved.

Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs

Patrick Hall, Professor, AI Risk Management, The George Washington University H2O Open Source GenAI World SF 2023 Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!

H2O.ai Confidential
Evaluators are the mathematical functions used to
evaluate various dimensions of performance.
H2O.ai Confidential
Documents are the data sets used for evaluation in the
case of RAG systems, combining models, parsing, OCR,
chunking, embeddings and other components of an
evaluation.
v
H2O.ai Confidential
What is the primary unit of analysis when evaluating an AI system or model?
An eval can be defined as a series of tuples each of size 3.
Each tuple consists of:
(1)Context / Prompt / Question
(1)Output / Response / Ground Truth Answer
(1)Document (in the case of RAG)
Source: https://www.jobtestprep.com/bank-teller-sample-questions
Designing Your Own Eval
v
H2O.ai Confidential
Problem statement: How well does my Bank Teller AI Application correctly answer
questions related to being a Bank Teller?
Create an eval test case that can be used to evaluate how well BankTellerGPT can
answer questions related to being a Bank Teller.
LLM-only Example Test Case
{
Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves
256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32
people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should
be hired? A. 4 B. 5 C. 9 D. 12,
Response: B. 5,
Document: None
}
Designing Your Own Eval - BankTellerGPT
Source: https://www.jobtestprep.com/bank-teller-
sample-questions

Recommended for you

Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way

Dr. Alexy Khrabrov, Open Source Science Community Director, IBM H2O Open Source GenAI World SF 2023 In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.

Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O

The document announces the launch of the H2O GenAI App Store, which provides a collection of applications that make it easier for average users to leverage large language models through custom interfaces for specific tasks like getting gardening advice or feedback on code. The app store is designed to accelerate the development of these GenAI apps using the H2O Wave platform and provides access to H2OGPTE for retrieval augmented generation and language model calls. Developers can also contribute their own apps through the GitHub repository listed.

Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical

Megan Kurka, Vice President, Customer Data Scientist, H2O.ai H2O Open Source GenAI World SF 2023 Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.

v
H2O.ai Confidential
Designing Your Own Eval - BankTellerGPT
Problem statement: How well does my Bank Teller AI Application actually answer
questions related to being a Bank Teller?
Create an eval test case that can be used to evaluate how well BankTellerGPT can
answer questions related to being a Bank Teller.
RAG Example Test Case
{
Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves
256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32
people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should
be hired? A. 4 B. 5 C. 9 D. 12,
Response: B. 5,
Document: “Internal Bank Teller Knowledge Base”
}
Source: https://www.jobtestprep.com/bank-teller-
sample-questions
v
H2O.ai Confidential
Designing Your Own Eval
Task # 1: Create your own GenAI Test Benchmark for the SR 11-7 document
Some possible test cases
Prompt: How should banks approach model development?
Response: Banks should approach model development with a focus on sound risk management practices. They
should ensure that models are developed and used in a controlled environment, with proper documentation,
testing, and validation.
Prompt: How can model risk be reduced?
Response: Model risk can be reduced by establishing limits on model use, monitoring model performance,
adjusting or revising models over time, and supplementing model results with other analysis and information.
Prompt: How often should a bank update its model inventory?
Response: A bank should update its model inventory regularly to ensure that it remains current and accurate.
v
H2O.ai Confidential
Designing Your Own Eval - BankTellerGPT
Task # 2: Create and launch LLM-only eval
leaderboard
To complete this, you will need to
1. Pick an evaluator (e.g. Token presence)
1. Pick a connection (e.g. Enterprise h2oGPT - LLM Only)
1. Pick a set of eval tests (e.g. Bank Teller Benchmark)
v
H2O.ai Confidential
Designing Your Own Eval - SR 11-7
Task # 3: Create a new evaluator based on RAG
and launch leaderboard
To complete this, you will need to
1. Pick an evaluator (e.g. Answer correctness)
1. Pick a connection (e.g. Enterprise h2oGPT-RAG)
1. Pick your test created in step 1

Recommended for you

Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers

This document discusses techniques for improving language models (LLMs) discussed in recent papers. It describes building blocks of LLMs like fine-tuning, foundation training, memory, and databases. Specific techniques covered include LIMA which uses 1,000 carefully curated examples, instruction backtranslation to generate question-answer pairs, fine-tuning models on API examples like Gorilla, and reducing false answers through techniques like not agreeing with incorrect user opinions. The goal is to discuss cutting edge tricks to build better LLMs.

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...

Pascal Pfeiffer, Principal Data Scientist, H2O.ai H2O Open Source GenAI World SF 2023 This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...

This document discusses using large language models (LLMs) for text classification tasks. It begins by describing how LLMs are commonly used for text generation and question answering. For classification, models are usually trained supervised on labeled data. The document then explores using LLMs for zero-shot classification without training, and techniques like fine-tuning LLMs on tasks to improve performance. It provides an example of fine-tuning an LLM on a financial sentiment dataset. The document concludes by describing H2O.ai's LLM Studio tool for fine-tuning and a few Kaggle competitions where LLMs achieved success in text classification.

v
H2O.ai Confidential
Evaluators
Evaluator RAG LLM Purpose Method
PII
(privacy)
Yes Yes Assess whether the answer contains personally identifiable
information (PII) like credit card numbers, phone numbers, social
security numbers, street addresses, email addresses and employee
names.
Regex suite which quickly and reliably detects formatted PII - credit card numbers, social
security numbers (SSN) and emails.
Sensitive data
(security)
Yes Yes Assess whether the answer contains security-related information
like activation keys, passwords, API keys, tokens or certificates.
Regex suite which quickly and reliably detects formatted sensitive data - certificates
(SSL/TLS certs in PEM format), API keys (H2O.ai and OpenAI), activation keys
(Windows).
Answer Correctness Yes Yes Assess whether the answer is correct given the expected answer
(ground truth).
A score based on combined and weighted semantic and factual similarity between the
answer and ground truth (see Answer Semantic Similarity and Faithfulness below).
Answer Relevance Yes Yes Assess whether the answer is (in)complete and does not contain
redundant information which was not asked - noise.
A score based on the cosine similarity of the question and generated questions, where
generated questions are created by prompting an LLM to generate questions from the
actual answer.
Answer Similarity Yes Yes Assess semantic similarity of the answer and expected answer. A score based on similarity metric value of the actual and expected answer calculated by
a cross-encoder model (NLP).
Context Precision Yes No Assess the quality of the retrieved context considering order and
relevance of the text chunks on the context stack.
A score based on the presence of the expected answer - ground truth - in the text chunks
at the top of the retrieved context chunk stack - relevant chunks deep in the stack,
irrelevant chunks and unnecessarily big context make the score lower.
Context Recall Yes No Assess how much of the ground truth is represented in the
retrieved context.
A score based on the ratio of the number of sentences in the ground truth that can be
attributed to the context to the total number of sentences in the ground truth.
Context Relevance Yes No Assess whether the context is (in)complete and does not contain
redundant information which is not needed - noise.
A score based on the ratio of context sentences which are needed to generate the answer
to the total number of sentences in the retrieved context.
H2O EvalStudio evaluators overview
TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM
answer generation in RAG.
v
H2O.ai Confidential
Evaluators (continued)
Evaluator RAG LLM Purpose Method
Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e.
factual consistency of the answer given the context. (hallucinations)
A score which is based on the ratio of the answer’s claims which present in the context
to the total number of answer’s claims.
Hallucination Metric Yes No Asses the RAG’s base LLM model hallucination. A score based on the Vectara hallucination evaluation cross-encoder model which
assesses RAG’s base LLM hallucination when it generates the actual answer from the
retrieved context.
RAGAs Yes No Assess overall answer quality considering both context and answer. Composite metrics score which is harmonic mean of Faithfulness, Answer Relevancy,
Context Precision and Context Recall metrics.
Tokens Presence Yes Yes Assesses whether both retrieved context and answer contain
required string tokens.
Scored based on the substring and/or regular expression based search of the required
set of strings in the retrieved context and answer.
Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e.
factual consistency of the answer given the context. (hallucinations)
A score which is based on the ratio of the answer’s claims which present in the context
to the total number of answer’s claims.
H2O EvalStudio evaluators overview
TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM
answer generation in RAG.
H2O.ai Confidential
Security, Guardrails and
Hacking
v
H2O.ai Confidential
● LLM Guardrails are a set of predefined constraints and guidelines
that are applied to LLMs to manage their behavior.
● Guardrails serve to ensure responsible, ethical, and safe usage of
LLMs, mitigate potential risks, and promote transparency and
accountability.
● Guardrails are a form of proactive control and oversight over the
output and behavior of language models, which are otherwise
capable of generating diverse content, including text that may be
biased, inappropriate, or harmful.
Understanding the distinct functions of each type of guardrail is pivotal
in creating a comprehensive and effective strategy for governing AI
systems.
Guardrails

Recommended for you

Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again

Luiz Pizzato, Executive Manager Artificial Intelligence, Commonwealth Bank H2O Open Source GenAI World SF 2023

Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)

En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.

aimlopen source
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...

Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto. In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.

h2o.aih2o driverless aidriverless ai
v
H2O.ai Confidential
● Content Filter Guardrails: Content filtering is crucial to prevent
harmful, offensive, or inappropriate content from being generated by
LLMs. These guardrails help ensure that the outputs conform to
community guidelines, curbing hate speech, explicit content, and
misinformation.
● Bias Mitigation Guardrails: Bias is an ongoing concern in AI, and
mitigating bias is critical. These guardrails aim to reduce the model's
inclination to produce content that perpetuates stereotypes or
discriminates against particular groups. They work to promote fairness
and inclusivity in the model's responses.
● Safety and Privacy Guardrails: Protecting user privacy is paramount.
Safety and privacy guardrails are designed to prevent the generation of
content that may infringe on user privacy or include sensitive, personal
information. These measures safeguard users against unintended data
exposure.
Types of Guardrails
v
H2O.ai Confidential
Types of Guardrails
● Fact-Checking & Hallucination Guardrails: To combat misinformation,
fact-checking guardrails are used to verify the accuracy of the
information generated by LLMs. They help ensure that the model's
responses align with factual accuracy, especially in contexts like news
reporting or educational content.
● Context/Topic and User Intent Guardrails: For LLMs to be effective,
they must produce responses that are contextually relevant and aligned
with user intent. These guardrails aim to prevent instances where the
model generates content that is unrelated or fails to address the user's
queries effectively.
● Explainability and Transparency Guardrails: In the pursuit of making
LLMs more interpretable, these guardrails require the model to provide
explanations for its responses. This promotes transparency by helping
users understand why a particular output was generated, fostering
trust and accountability.
● Jailbreak Guardrails: Ensure robustness to malicious user attacks such
as prompt injection.
H2O.ai Confidential
Lab 3 - Hacking and Security
Posture
v
H2O.ai Confidential
https://jailbreaking.genai-training.h2o.ai/

Recommended for you

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...

In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization. To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g Speaker: Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)

aiai foundationsai journey
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey

The chances of successfully implementing AI strategies within an organization significantly improve when you can recognize where your organization is on the maturity scale. Over this course, you will learn the keys to unlocking value with AI which include asking the right questions about the problems you are solving and ensuring you have the right cross-section of talent, tools, and resources. By the end of this module, you should be able to recognize where your organization is on the AI transformation spectrum and identify some strategies that can get you to the next stage in your journey. To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course To find the Youtube video about this presentation: https://youtu.be/PJgr2epM6qs Speakers: Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist) Ingrid Burton (H2O.ai - CMO)

aiai foundationsai journey
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF

Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow YouTube Video URL: https://youtu.be/gB0bTH-L6DE Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge.

apache nificloudera flow managementcfm
H2O.ai Confidential
Ashrith Barthur
Principal Data Scientist
ashrith.barthur@h2o.ai
Contact
Thank you!
H2O.ai Confidential
Applied Generative AI for
Banking
v
H2O.ai Confidential
Dataset: Consumer Complaint Database
H2O.ai Confidential
Lab 4 - Complaint Summarizer

Recommended for you

Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O

This presentation was made on June 18, 2020. Video recording of the session can be viewed here: https://youtu.be/YEtDwYSXXJo For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance. Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance. In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity, and improves model governance. Speaker's Bio: Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.

Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation

This presentation was made on June 16, 2020. A recording of the presentation can be viewed here: https://youtu.be/khjW1t0gtSA AI is unlocking new potential for every enterprise. Organizations are using AI and machine learning technology to inform business decisions, predict potential issues, and provide more efficient, customized customer experiences. The results can enable a competitive edge for the business. H2O.ai is a visionary leader in AI and machine learning and is on a mission to democratize AI for everyone. We believe that every company can become an AI company, not just the AI Superpowers. We are empowering companies with our leading AI and Machine Learning platforms, our expertise, experience and training to embark on their own AI journey to become AI companies themselves. All companies in all industries can participate in this AI Transformation. Tune into this virtual meetup to learn how companies are transforming their business with the power of AI and where to start. About Parul Pandey: Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science , evangelism and community in her work. Her emphasis is to spread the information about H2O and Driverless AI to as many people as possible, She is also an active writer and has contributed towards various national and international publications.

AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing

H2O.ai provides open source machine learning platforms and enterprise AI solutions that help companies implement artificial intelligence. It offers tools for data scientists to build models using Python and R and also provides support services to help customers successfully deploy models in production. H2O.ai aims to democratize AI and help companies become AI-driven by leveraging its experts, community knowledge, and world-class technology.

v
H2O.ai Confidential
https://complaint-analysis.genai-training.h2o.ai/
H2O.ai Confidential
Determine what is the one
credit product with the
highest number of complaints.
Task 1
Complaint Summarizer
Applied Generative AI for Banking
H2O.ai Confidential
Determine what is the one
credit product with the
highest number of complaints.
Answer: Credit Reporting
Task 1
Complaint Summarizer
Applied Generative AI for Banking
H2O.ai Confidential
Determine what is the top
complaint for TransUnion?
Task 2
Complaint Summarizer
Applied Generative AI for Banking

Recommended for you

Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers

The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era. Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.

programmingcodingcivil engineering
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...

Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.

insider securitycybersecurity threatsenterprise security
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...

Slide of the tutorial entitled "Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Emerging Trends" held at UMAP'24: 32nd ACM Conference on User Modeling, Adaptation and Personalization (July 1, 2024 | Cagliari, Italy)

user modelinguser profilinguser model
H2O.ai Confidential
Determine what is the top
complaint for TransUnion?
Answer: Violation of
Consumers Rights to Privacy
and Confidentiality Under the
Fair Credit Reporting Act.
Task 2
Complaint Summarizer
Applied Generative AI for Banking
H2O.ai Confidential
Use H2OGPT to summarize a
complaint from the database
and provided immediate next
steps
Task 3
Complaint Summarizer
Applied Generative AI for Banking
H2O.ai Confidential
Use H2OGPT to summarize a
complaint from the database
and provided immediate next
steps
Answer: [See screenshot]
Task 3
Complaint Summarizer
Applied Generative AI for Banking
H2O.ai Confidential
CERTIFICATION

Recommended for you

Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx

MuleSoft Meetup on APM and IDP

mulesoftai
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses

CIO Council Cal Poly Humboldt September 22, 2023

national research platformdistributed supercomputerdistributed systems
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition

The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.

H2O.ai Confidential
Certification Exam
Link to Exam!
H2O.ai Confidential
Contact
Thank you!
MAKERS
Jonathan Farland
Director of Solution Engineering
jon.farland@h2o.ai
www.linkedin.com/in/jonfarland/
H2O.ai Confidential
TECHNICAL APPENDIX
H2O.ai Confidential
Retrieval-Augmented Generation (RAG)
RAG as a system is a
particularly good use of
vector databases.
RAG systems take
advantage the context
window for LLMs, filling it
with only the most relevant
examples from real data.
This “grounds” the LLM to
relevant context and
greatly minimizes any
hallucination.

Recommended for you

RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx

Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation

rpa in healthcarerpa in healthcare usarpa in healthcare industry
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf

Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment. How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.

pigging solutionsprocess piggingproduct transfers
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf

Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 : - Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants. - REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.

genaicloudrgpd
Embedding Models Source: https://huggingface.co/blog/1b-sentence-embeddings
Embedding Models - INSTRUCTOR Source: https://arxiv.org/pdf/2212.09741.pdf
Instruction-based Omnifarious
Representations
Model is trained to generate embeddings
using both the instruction as well as the
textual input.
Applicable to virtually every use case, due
to its ability to create latent vector
representations that include instruction.
Embedding Models - BGE Source: https://arxiv.org/pdf/2310.07554.pdf
LLM-Embedder
This embedding model is trained
specifically for use with RAG systems.
Reward model introduced that provides
higher rewards to a retrieval candidate
if it results in a higher generation
likelihood for the expected output
Uses contrastive learning to directly
get at optimizing for RAG
applications
H2O.ai Confidential
AI Engines Deployment Consumption
LLM AppStudio
LLM DataStudio
LLM EvalStudio
H2O LLMs Ecosystem
AppStore
End Users
Generative AI with H2O.ai
MLOps
AI Engine Manager
Doc-QA
Enterprise
h2oGPT

Recommended for you

Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces

An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)

augmented realitycross realityvirtual reality
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection

Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.

cybersecurityanomaly detectionadvanced techniques
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf

Profile portofolio

H2O.ai Confidential
Explanations in
Highlights the most important regions of an image
Highlights the most important words
H2O.ai Confidential
Code Translation
H2O.ai Confidential
LLM Context Length Testing Source:
https://github.com/gkamradt/LLMTest_NeedleInAHaystack?tab=read
me-ov-file
H2O.ai Confidential
LLM Context Length Testing Source:
https://github.com/gkamradt/LLMTest_NeedleInAHaystack?tab=read
me-ov-file

Recommended for you

TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In

Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk. What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year? Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year. This webinar will review: - Key changes to privacy regulations in 2024 - Key themes in privacy and data governance in 2024 - How to maximize your privacy program in the second half of 2024

data privacyprivacy complianceai
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf

As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models. This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through: - Standard ways of running dbt (and when to utilize other methods) - How Cosmos can be used to run and visualize your dbt projects in Airflow - Common challenges and how to address them, including performance, dependency conflicts, and more - How running dbt projects in Airflow helps with cost optimization Webinar given on 9 July 2024

apache airflowdbtdbt-core
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx

How do we build an IoT product, and make it profitable? Talk from the IoT meetup in March 2024. https://www.meetup.com/iot-sweden/events/299487375/

iot
H2O.ai Confidential
Ethical Considerations, Data Privacy, and User Consent
Assess the potential impact of generative AI on individuals and society. Give users control over
how their data is used by generative AI. Consent mechanisms should be transparent and user-
friendly.
Monitoring, Regulation, and Security
Detect misuse or anomalies in generative AI behavior. Regulatory
compliance ensures adherence to ethical and legal guidelines. Security
measures are crucial to protect AI models from adversarial attacks or
unauthorized access.
Accountability and Oversight
Define roles and responsibilities for AI development and
deployment. Oversight mechanisms ensure that responsible
practices are followed.
Education and Awareness
Users and developers should be informed about
generative AI capabilities, limitations, and ethical
considerations.
Stakeholder Involvement
Involving various stakeholders in AI discussions promotes diverse
perspectives and responsible decision-making.
Continuous Evaluation and Improvement
Continually assess models to ensure fairness, accuracy, and
alignment with ethical standards.
Transparency, Explainability, Bias Mitigation,Debugging,
and Guardrails
Recognize and mitigate both subtle and glaring biases that may emerge from training
data. Ensures that users can understand and trust the decisions made by generative
AI models. Debug models with techniques such as adversarial prompt engineering.
Proactively manage risks and maintain control over the model's behavior with
guardrails.
Responsible Generative AI
Responsible
Generative
AI
Audit Input Data, Benchmarks, and Test the Unknown
Assess quality of data used as input to train Generative AI models. Utilize benchmarks
and random attacks for testing.
v
H2O.ai Confidential
CoVe method reduces factual errors in large language models by drafting, fact-checking, and
verifying responses - it deliberates on its own responses and self-correcting them
Steps:
1. Given a user query, a LLM generates a baseline response that may contain inaccuracies, e.g.
factual hallucinations
2. To improve this, CoVe first generates a plan of a set of verification questions to ask, and then
executes that plan by answering them and hence checking for agreement
3. Individual verification questions are typically answered with higher accuracy than the
original accuracy of the facts in the original longform generation
4. Finally, the revised response takes into account the verifications
5. The factored version of CoVe answers verification questions such that they cannot condition
on the original response, avoiding repetition and improving performance
Minimizing Model Hallucinations
Using Chain-of-Verification (CoVe) Method
v
H2O.ai Confidential
Minimizing Model Hallucinations
Using Chain-of-Verification (CoVe) Method

More Related Content

What's hot

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
Liming Zhu
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
DavidCieslak4
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc1
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Naoki (Neo) SATO
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
Dung Hoang
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
M Waleed Kadous
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
AnastasiaSteele10
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
DianaGray10
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
Michal Jaskolski
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
Vishal Sharma
 
LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)
Shota Shinogi
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
Jim Steele
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
Maxim Salnikov
 

What's hot (20)

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
 
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
Exploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdfExploring Opportunities in the Generative AI Value Chain.pdf
Exploring Opportunities in the Generative AI Value Chain.pdf
 
Use Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdfUse Case Patterns for LLM Applications (1).pdf
Use Case Patterns for LLM Applications (1).pdf
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Leveraging Generative AI & Best practices
Leveraging Generative AI & Best practicesLeveraging Generative AI & Best practices
Leveraging Generative AI & Best practices
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
 
LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)LLM App Hacking (AVTOKYO2023)
LLM App Hacking (AVTOKYO2023)
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 

Similar to Generative AI Masterclass - Model Risk Management.pptx

Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
Maxim Salnikov
 
LLM-Datacraft.pdf
LLM-Datacraft.pdfLLM-Datacraft.pdf
LLM-Datacraft.pdf
Jyotirmoy Sundi
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
gdgsurrey
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start
Maxim Salnikov
 
Formal Versus Agile: Survival of the Fittest? (Paul Boca)
Formal Versus Agile: Survival of the Fittest? (Paul Boca)Formal Versus Agile: Survival of the Fittest? (Paul Boca)
Formal Versus Agile: Survival of the Fittest? (Paul Boca)
AdaCore
 
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
AgileNetwork
 
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
Ed Chi
 
Problem prediction model
Problem prediction modelProblem prediction model
Problem prediction model
Guttenberg Ferreira Passos
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati
 
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
GRUC
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Agile Methods: Fact or Fiction
Agile Methods: Fact or FictionAgile Methods: Fact or Fiction
Agile Methods: Fact or Fiction
Matt Ganis
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
 
Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Human-Centered Artificial Intelligence: Reliable, Safe & TrustworthyHuman-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
JalnaAfridi
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
Sri Ambati
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & Services
Steve Purkis
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
UXDXConf
 
Automated Testing DITA Content and Customizations
Automated Testing DITA Content and CustomizationsAutomated Testing DITA Content and Customizations
Automated Testing DITA Content and Customizations
Steve Anderson
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 

Similar to Generative AI Masterclass - Model Risk Management.pptx (20)

Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
LLM-Datacraft.pdf
LLM-Datacraft.pdfLLM-Datacraft.pdf
LLM-Datacraft.pdf
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Building Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to startBuilding Generative AI-infused apps: what's possible and how to start
Building Generative AI-infused apps: what's possible and how to start
 
Formal Versus Agile: Survival of the Fittest? (Paul Boca)
Formal Versus Agile: Survival of the Fittest? (Paul Boca)Formal Versus Agile: Survival of the Fittest? (Paul Boca)
Formal Versus Agile: Survival of the Fittest? (Paul Boca)
 
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
ANIn Chennai March 2024 |Conversational AI for Agility in Healthcare by Shine...
 
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
 
Problem prediction model
Problem prediction modelProblem prediction model
Problem prediction model
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Agile Methods: Fact or Fiction
Agile Methods: Fact or FictionAgile Methods: Fact or Fiction
Agile Methods: Fact or Fiction
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Human-Centered Artificial Intelligence: Reliable, Safe & TrustworthyHuman-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & Services
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Automated Testing DITA Content and Customizations
Automated Testing DITA Content and CustomizationsAutomated Testing DITA Content and Customizations
Automated Testing DITA Content and Customizations
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
Sri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Sri Ambati
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
Sri Ambati
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
Sri Ambati
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing
 

Recently uploaded

Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
Awais Yaseen
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Bert Blevins
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 

Recently uploaded (20)

Best Programming Language for Civil Engineers
Best Programming Language for Civil EngineersBest Programming Language for Civil Engineers
Best Programming Language for Civil Engineers
 
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 

Generative AI Masterclass - Model Risk Management.pptx

  • 1. H2O.ai Confidential Generative AI Masterclass - Model Risk Management
  • 3. v H2O.ai Confidential Introduction - Today’s training will look into responsible, explainable and interpretable AI when applied in the context of Generative AI and specifically Large Language Models (LLMS). - This will include both several sections on theoretical concepts as well as hands-on labs using Enterprise h2oGPT and H2O GenAI Applications. - These hands-on labs focus on applying Gen AI in the context of a Model Risk Manager’s role at a bank or financial institution. - NOTE: A separate end-to-end masterclass on Generative AI is also available within the training environment, as well as on github: https://github.com/h2oai/h2o_genai_training. Including: - Data Preparation for LLMs - Fine-Tuning custom models - Model Evaluation - Retrieval-Augmented Generation (RAG) - Guardrails - AI Applications
  • 4. v H2O.ai Confidential Section Session Duration Speaker Welcome Session Kick-off 5m Jon Farland Interpretability for Generative AI Large Language Model Interpretability 25m Kim Montgomery Workshop: Explainable and Interpretable AI for LLMs 20m Navdeep Gill Benchmarking and Evaluations Frameworks for Evaluating Generative AI 20m Srinivas Neppalli Workshop: Experimental Design of Gen AI Applications 20m Jon Farland Security, Guardrails and Hacking Workshop: Guardrails and Hacking 20m Ashrith Barthur Applied Generative AI for Banking - Complaint Summarizer Workshop: Complaint Summarizer AI Application 20m Jon Farland Agenda
  • 5. v H2O.ai Confidential Housekeeping - The training environment for today is a dedicated instance of the H2O AI Managed Cloud, a GPU-powered environment capable of training and deploying LLMs, as well designing and hosting entire AI Applications. - It an be accessed at https://genai-training.h2o.ai. - Login credentials should have been provided to the email address you were registered with. - If you don’t yet have credentials, or you are otherwise unable to access the environment, please speak with any member of the H2O.ai team member. - The training environment will be available to attendees for 3 days after the conference, but dedicated proof-of-concept environments can be provided (including on- premise) at request. Please speak to any H2O.ai team member or email jon.farland@h2o.ai
  • 7. What is Generative AI? GenAI enables the creation of novel content Input GenAI Model Learns patterns in unstructured data Unstructured data Output Novel Content Data Traditional AI Model Learns relationship between data and label Output Label Labels VS
  • 8. H2O.ai Confidential More complicated input: ● Prompt phrasing ● Instructions ● Examples More relevant dimensions to output: ● Truthfulness/Accuracy ● Safety ● Fairness ● Robustness ● Privacy ● Machine Ethics [TrustLLM: Trustworthiness in Large Language Models, Sun, et al] GenAI Complications
  • 9. H2O.ai Confidential ● Can the model recognize problematic responses? ○ Inaccurate responses ○ Unethical responses ○ Responses conveying stereotypes ● Can an inappropriate response be provoked? ○ Jailbreaking ○ Provoking toxicity ○ Leading questions / false context Common tests
  • 10. H2O.ai Confidential TrustLLM Result Summary Matrix [TrustLLM: Trustworthiness in Large Language Models, Sun, et al]
  • 11. H2O.ai Confidential TrustLLM Main Conclusions TrustLLM Main Findings: ● Trustworthiness and utility were positively correlated. ● Generally closed-sourced models outperformed open source. ● Over alignment for trustworthiness can compromise utility. [TrustLLM: Trustworthiness in Large Language Models, Sun, et al]
  • 12. v H2O.ai Confidential Accuracy: Traditional ML Traditional machine learning ● Comparing a prediction to an outcome ● Generally the correct labels are in a simple format
  • 13. v H2O.ai Confidential Accuracy: Example LLMs The simplest way to measure accuracy is to compare the result against another source of information. Example sources: ● Checking results against a given source (RAG) ● Checking results against the tuning data ● Checking results against an external source (eg wikipedia) ● Checking results against the training data (cumbersome). ● Checking for self-consistency (Self-check GPT) ● Checking results against a larger LLM Scoring methods: ● Natural language inference ● Comparing embeddings ● Influence functions
  • 14. v H2O.ai Confidential RAG (Retrieval Augmented Generation) 01 Chunk and Embed Documents 02 Submit a Query Retrieve Relevant Information via Similarity Search 03 04 05 Combine relevant information to ground the query to the model Generate Embedding for Query
  • 15. v H2O.ai Confidential Accuracy: Retrieval Augmented Generation (RAG) Provides a Simple Solution
  • 16. v H2O.ai Confidential Accuracy: Retrieval Augmented Generation (RAG) Provides a Simple Solution
  • 17. H2O.ai Confidential Influence functions ● Seeks to measure the influence of including a data point in the training set on model response. ● Datamodels/TRAK ○ Learn model based on binary indicator functions. ○ Directly measure how much a training instance influences the outcome. ● DataInf ○ Measures the influence of a document during fine tuning. [DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models, Kwon et. al] [TRAK: Attributing Model Behavior at Scale. Park et. al]
  • 18. H2O.ai Confidential Influence functions / computer vision [TRAK: Attributing Model Behavior at Scale. Park et. al]
  • 19. H2O.ai Confidential Influence functions / NLP Influence functions / nlp [Studying Large Language Model Generalization with Influence Functions, Grosse, et. al]
  • 20. H2O.ai Confidential Self consistency comparison Self-Check GPT ● Sampling different responses from an LLM. ● Checking for consistency between responses. ● Assuming that hallucinations will occur less consistently. [SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, Potsawee Manakul, Adrian Liusie, Mark JF Gales]
  • 21. v H2O.ai Confidential Counterfactual analysis: Traditional ML ● How does changing a feature change the model outcome? ● What is the smallest change that can change the outcome?
  • 23. v H2O.ai Confidential Counterfactual analysis: LLM How consistent are results under different: ● Prompts / instructions. ○ Changes in prompt design ○ Changes in prompt instructions ○ Multi-shot examples ○ Word replacement with synonyms ○ Proper names or pronouns (fairness) ○ Chain of thought / other guided reasoning related methods ● Different context / RAG retrieval
  • 24. v H2O.ai Confidential Intervention in the case of problems If problematic behavior is found in a model there are several options. ● Prompt/ instruction modifications. ● Choosing a different base model. ● Fine-tuning to modify LLM model behavior ● Altering the document retrieval process (RAG) ● Monitoring model output for problematic responses.
  • 25. H2O.ai Confidential Conclusions ● Many of the basic problems of understanding LLMs are similar to that of other large models. ● Through careful testing we can hope to understand and correct some of the safety issues involved in using LLMs.
  • 26. H2O.ai Confidential Kim Montgomery LLM interpretation kim.montgomery@h2o.ai Contact Thank you!
  • 27. H2O.ai Confidential Lab 1 - Using Chain-of- Verification for Explainable AI
  • 28. v H2O.ai Confidential Chain of Verification (CoVe) CoVe enhances the reliability of answers provided by Large Language Models, particularly in factual question and answering scenarios, by systematically verifying and refining responses to minimize inaccuracies. The CoVe method consists of the following four sequential steps: 1. Initial Baseline Response Creation: In this step, an initial response to the original question is generated as a starting point. 1. Verification Question Generation: Verification questions are created to fact-check the baseline response. These questions are designed to scrutinize the accuracy of the initial response. 1. Execute Verification: The verification questions are independently answered to minimize any potential bias. This step ensures that the verification process is objective and thorough. 1. Final Refined Answer Generation: Based on the results of the verification process, a final refined answer is generated. This answer is expected to be more accurate and reliable, reducing the likelihood of hallucinations in the response.
  • 29. v H2O.ai Confidential Verification Questions Questions are categorized into three main groups: 1. Wiki Data & Wiki Category List: This category involves questions that expect answers in the form of a list of entities. For instance, questions like “Who are some politicians born in Boston?” 2. Multi-Span QA: Questions in this category seek multiple independent answers. An example would be: “Who invented the first mechanized printing press and in what year?” The answer is “Johannes Gutenberg, 1450”. 3. Long-form Generation: Any question that requires a detailed or lengthy response falls under this group.
  • 30. v H2O.ai Confidential Chain of Verification (CoVe) Dhuliawala, Shehzaad, et al. "Chain-of-Verification Reduces Hallucination in Large Language Models." arXiv preprint arXiv:2309.11495 (2023)
  • 31. v H2O.ai Confidential CoVe and Explainable AI (XAI) ● Interpretability and Transparency: ○ Verification process generates questions to fact-check baseline responses, improving transparency in decision-making. ● Reliability and Trust: ○ Refined answers enhance accuracy, building trust and reliability in model outputs. ● Bias and Fairness: ○ Verification questions in CoVe identify and mitigate potential biases in model output. ● User Interaction: ○ Verification process involves user interaction through verification questions.
  • 33. v H2O.ai Confidential Who are some CEOs of banks in the US?
  • 34. v H2O.ai Confidential What are some credit scoring bureaus in the US?
  • 35. v H2O.ai Confidential What are some agencies assigned to regulate and oversee financial institutions in the US?
  • 36. v H2O.ai Confidential Provide a list of major investment firms and financial institutions headquartered in the United States?
  • 37. v H2O.ai Confidential Benefits and Limitations of CoVe ● Benefits: ○ Enhanced Reliability: By incorporating verification steps, users can trust the accuracy of information obtained from LLMs. ○ Depth of Understanding: The refinement of answers allows users to gain a deeper understanding of the topic beyond the initial response. ○ Educational Value: Promotes responsible and informed use of LLMs, encouraging users to go beyond surface-level information. ● Limitations ○ Incomplete Removal of Hallucinations: CoVe does not completely eliminate hallucinations in generated content, which means it can still produce incorrect or misleading information. ○ Limited Scope of Hallucination Mitigation: CoVe primarily addresses hallucinations in the form of directly stated factual inaccuracies but may not effectively handle other forms of hallucinations, such as errors in reasoning or opinions. ○ Increased Computational Expense: Generating and executing verification alongside responses in CoVe adds to the computational cost, similar to other reasoning methods like Chain-of-Thought. ○ Upper Bound on Improvement: The effectiveness of CoVe is limited by the overall capabilities of the underlying language model, particularly in its ability to identify and rectify its own mistakes.
  • 38. v H2O.ai Confidential How to improve the CoVe pipeline ● Prompt engineering ● External tools ○ Final output highly depends on the answers of the verification questions. ○ For factual questions & answering you can use advanced search tools like google search or serp API etc. ○ For custom use cases you can always use RAG methods or other retrieval techniques for answering the verification questions. ● More chains ● Human in the loop
  • 39. H2O.ai Confidential Conclusions ● CoVe aim to improves model transparency, reliability, and trust. ● CoVe is not a silver bullet, but it can improve a LLM testing arsenal.
  • 40. H2O.ai Confidential Navdeep Gill Engineering Manager, AI Governance | Responsible AI navdeep.gill@h2o.ai Contact Thank you
  • 42. v H2O.ai Confidential Write a 1000 word essay in 1 minute LLMs are good at generating large amount of text that is consistent and logical. Are LLMs smarter than humans? Introduction Have LLMs manage your investment portfolio A model can give a generic advice on safe money management. But we don’t trust our life savings with a chat bot. Let a bot reply to your email It depends on how important the email is. May be we are more comfortable with the model automatically creating a draft.
  • 43. v H2O.ai Confidential Summarization Summarizing large documents without losing essential information. Extracting key-value pairs. How can we use LLMS while minimizing risk? Introduction Customer Service Answer FAQs from customers. May require retrieving from a knowledge base and summarizing. Report Generation - AutoDoc Create ML interpretation documents. Reports required for regulatory compliance.
  • 44. v H2O.ai Confidential Risk How risky are LLMs? A lawyer used ChatGPT to prepare a court filing. It went horribly awry. “While ChatGPT can be useful to professionals in numerous industries, including the legal profession, it has proved itself to be both limited and unreliable. In this case, the AI invented court cases that didn't exist, and asserted that they were real.” CBS News Chevy dealership’s AI chatbot suggests Ford F-150 when asked for best truck “As an AI, I don't have personal preferences but I can provide insights based on popular opinions and reviews. Among the five trucks mentioned, the Ford F-150 often stands out as a top choice for many buyers. It's known for its impressive towing …” Detroit Free Press
  • 45. v H2O.ai Confidential Data Fine Tuning RAG Foundation Model Leaderboard Risk Management Large & Diverse To train a foundation model, you need a large, diverse dataset that covers the tasks the model should be able to perform. LLM Lifecycle Supervised Fine Tuning Fine-tuning can improve a model's performance on a task while preserving its general language knowledge. h2oGPTe A powerful search assistant to answer questions from large volumes of documents, websites, and workplace content. Generative AI They are designed to produce a wide and general variety of outputs, such as text, image or audio generation. They can be standalone systems or can be used as a "base" for many other applications. HELM HELM is a framework for evaluating foundation models. Leaderboard shows how the various models perform across different groups of scenarios and different metrics. Eval Studio Design and execute task-specific benchmarks. Perform both manual and LLM based evaluations. Systematically collect and store results along with metadata.
  • 46. v H2O.ai Confidential MMLU (Massive Multitask Language Understanding) A test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. Evaluation for LLMs Popular benchmarks on open source leaderboards HellaSwag A test of common-sense inference, which is easy for humans (~95%) but challenging for SOTA models. A12 Reasoning Challenge (ARC) A set of grade-school science questions. Truthful QA A test to measure a model’s propensity to reproduce falsehoods commonly found online. When you drop a ball from rest it accelerates downward at 9.8 m/s². If you instead throw it downward assuming no air resistance its acceleration immediately after leaving your hand is (A) 9.8 m/s² (B) more than 9.8 m/s² (C) less than 9.8 m/s² (D) Cannot say unless the speed of throw is given. MMLU Example A woman is outside with a bucket and a dog. The dog is running around trying to avoid a bath. She… (A) Rinses the bucket off with soap and blow dry the dog’s head. (B) Uses a hose to keep it from getting soapy. (C) Gets the dog wet, then it runs away again. (D) Cannot say unless the speed of throw is given. HellaSwag Example
  • 47. v H2O.ai Confidential Hugging Face Open LLM Leaderboard It is a popular location to track various models evaluated using different metrics. These metrics include human baselines that provide us some idea of how these models have been drastically improved over the last two years. Approaching human baseline Popular benchmarks on open source leaderboards
  • 48. H2O.ai Confidential Benchmarks are not task specific Benchmarks on open-source leaderboards are well-rounded and diverse. They are not sufficient to reflect the performance of the model in a domain specific scenario. The Need for Evaluation Popular leaderboards are not enough Some Model Entries may cheat! There can be models on the leaderboard that are trained on the benchmark data itself. We do not have robust enough tests to detect this. Non-verifiable Results The procedure followed in conducting the tests and the results are not completely transparent and can also vary among different leaderboards.
  • 49. v H2O.ai Confidential Create task specific QA pairs along with the Reference documents. - Bank Teller - Loan officer - Program Manager - Data Analyst Custom Test Sets Create custom benchmarks for domain specific scenarios Task Specific Evals Create the QA pairs that test for agreement with your values, intentions, and preferences. - Correctness - Relevance - Similarity - Hallucination - Precision - Recall - Faithfulness Test for Alignment Test that all outputs meet your safety levels. - Toxicity - Bias - Offensive - PII of customers - Company Secrets Test for Safety Tests to confirm or show proof of meeting compliance standards. - Government - Company Test for Compliance
  • 50. v H2O.ai Confidential H2O Eval Studio Design and Execute task specific benchmarks All the Evaluators are included Eval studio contains evaluators to check for Alignment, Safety, and Compliance as discussed before. Create custom benchmarks Users can upload Documents and create custom Tests (Question-Answer pairs) based on the document collection. Run Evals and visualize results Once a benchmark has been designed, users can then run the evaluation against the benchmark and visualize the results. A detailed report can also be downloaded.
  • 51. H2O.ai Confidential Srinivas Neppalli Senior AI Engineer srinivas.neppalli@h2o.ai Contact Thank you!
  • 52. H2O.ai Confidential Lab 2 - Experimental Design of Gen AI Evaluations
  • 53. v H2O.ai Confidential Through the Lens of Model Risk Management One possible definition of “Conceptual Soundness” for LLMs by themselves might be considered as a combination of the following choices: (1)Training Data (1)Model Architecture (1)An explanation of why (1) and (2) were made (1)An explanation of why (1) and (2) are reasonable for the use case that the LLM will be applied to.
  • 54. v H2O.ai Confidential Through the Lens of Model Risk Management What about a RAG system? How does the concept of “Conceptual Soundness” get applied when not only choices surrounding training data and model architecture involved, but also choices around: - Embeddings - System Prompts (e.g. Personalities) - Chunk Sizes - Chunking Strategies - OCR Techniques - RAG-type (e.g. Hypothetical Document Embeddings) - Mixture-of-Experts or Ensembling
  • 55. H2O.ai Confidential Models / Systems / Agents are the fundamental AI systems under scrutiny. As opposed to traditional machine learning models, Generative AI include many choices beyond the models themselves
  • 56. H2O.ai Confidential Benchmarks / Tests are the sets of prompts and response that are used gauge how well an AI system can perform a certain task or use case.
  • 57. H2O.ai Confidential Evaluators are the mathematical functions used to evaluate various dimensions of performance.
  • 58. H2O.ai Confidential Documents are the data sets used for evaluation in the case of RAG systems, combining models, parsing, OCR, chunking, embeddings and other components of an evaluation.
  • 59. v H2O.ai Confidential What is the primary unit of analysis when evaluating an AI system or model? An eval can be defined as a series of tuples each of size 3. Each tuple consists of: (1)Context / Prompt / Question (1)Output / Response / Ground Truth Answer (1)Document (in the case of RAG) Source: https://www.jobtestprep.com/bank-teller-sample-questions Designing Your Own Eval
  • 60. v H2O.ai Confidential Problem statement: How well does my Bank Teller AI Application correctly answer questions related to being a Bank Teller? Create an eval test case that can be used to evaluate how well BankTellerGPT can answer questions related to being a Bank Teller. LLM-only Example Test Case { Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves 256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32 people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should be hired? A. 4 B. 5 C. 9 D. 12, Response: B. 5, Document: None } Designing Your Own Eval - BankTellerGPT Source: https://www.jobtestprep.com/bank-teller- sample-questions
  • 61. v H2O.ai Confidential Designing Your Own Eval - BankTellerGPT Problem statement: How well does my Bank Teller AI Application actually answer questions related to being a Bank Teller? Create an eval test case that can be used to evaluate how well BankTellerGPT can answer questions related to being a Bank Teller. RAG Example Test Case { Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves 256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32 people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should be hired? A. 4 B. 5 C. 9 D. 12, Response: B. 5, Document: “Internal Bank Teller Knowledge Base” } Source: https://www.jobtestprep.com/bank-teller- sample-questions
  • 62. v H2O.ai Confidential Designing Your Own Eval Task # 1: Create your own GenAI Test Benchmark for the SR 11-7 document Some possible test cases Prompt: How should banks approach model development? Response: Banks should approach model development with a focus on sound risk management practices. They should ensure that models are developed and used in a controlled environment, with proper documentation, testing, and validation. Prompt: How can model risk be reduced? Response: Model risk can be reduced by establishing limits on model use, monitoring model performance, adjusting or revising models over time, and supplementing model results with other analysis and information. Prompt: How often should a bank update its model inventory? Response: A bank should update its model inventory regularly to ensure that it remains current and accurate.
  • 63. v H2O.ai Confidential Designing Your Own Eval - BankTellerGPT Task # 2: Create and launch LLM-only eval leaderboard To complete this, you will need to 1. Pick an evaluator (e.g. Token presence) 1. Pick a connection (e.g. Enterprise h2oGPT - LLM Only) 1. Pick a set of eval tests (e.g. Bank Teller Benchmark)
  • 64. v H2O.ai Confidential Designing Your Own Eval - SR 11-7 Task # 3: Create a new evaluator based on RAG and launch leaderboard To complete this, you will need to 1. Pick an evaluator (e.g. Answer correctness) 1. Pick a connection (e.g. Enterprise h2oGPT-RAG) 1. Pick your test created in step 1
  • 65. v H2O.ai Confidential Evaluators Evaluator RAG LLM Purpose Method PII (privacy) Yes Yes Assess whether the answer contains personally identifiable information (PII) like credit card numbers, phone numbers, social security numbers, street addresses, email addresses and employee names. Regex suite which quickly and reliably detects formatted PII - credit card numbers, social security numbers (SSN) and emails. Sensitive data (security) Yes Yes Assess whether the answer contains security-related information like activation keys, passwords, API keys, tokens or certificates. Regex suite which quickly and reliably detects formatted sensitive data - certificates (SSL/TLS certs in PEM format), API keys (H2O.ai and OpenAI), activation keys (Windows). Answer Correctness Yes Yes Assess whether the answer is correct given the expected answer (ground truth). A score based on combined and weighted semantic and factual similarity between the answer and ground truth (see Answer Semantic Similarity and Faithfulness below). Answer Relevance Yes Yes Assess whether the answer is (in)complete and does not contain redundant information which was not asked - noise. A score based on the cosine similarity of the question and generated questions, where generated questions are created by prompting an LLM to generate questions from the actual answer. Answer Similarity Yes Yes Assess semantic similarity of the answer and expected answer. A score based on similarity metric value of the actual and expected answer calculated by a cross-encoder model (NLP). Context Precision Yes No Assess the quality of the retrieved context considering order and relevance of the text chunks on the context stack. A score based on the presence of the expected answer - ground truth - in the text chunks at the top of the retrieved context chunk stack - relevant chunks deep in the stack, irrelevant chunks and unnecessarily big context make the score lower. Context Recall Yes No Assess how much of the ground truth is represented in the retrieved context. A score based on the ratio of the number of sentences in the ground truth that can be attributed to the context to the total number of sentences in the ground truth. Context Relevance Yes No Assess whether the context is (in)complete and does not contain redundant information which is not needed - noise. A score based on the ratio of context sentences which are needed to generate the answer to the total number of sentences in the retrieved context. H2O EvalStudio evaluators overview TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM answer generation in RAG.
  • 66. v H2O.ai Confidential Evaluators (continued) Evaluator RAG LLM Purpose Method Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e. factual consistency of the answer given the context. (hallucinations) A score which is based on the ratio of the answer’s claims which present in the context to the total number of answer’s claims. Hallucination Metric Yes No Asses the RAG’s base LLM model hallucination. A score based on the Vectara hallucination evaluation cross-encoder model which assesses RAG’s base LLM hallucination when it generates the actual answer from the retrieved context. RAGAs Yes No Assess overall answer quality considering both context and answer. Composite metrics score which is harmonic mean of Faithfulness, Answer Relevancy, Context Precision and Context Recall metrics. Tokens Presence Yes Yes Assesses whether both retrieved context and answer contain required string tokens. Scored based on the substring and/or regular expression based search of the required set of strings in the retrieved context and answer. Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e. factual consistency of the answer given the context. (hallucinations) A score which is based on the ratio of the answer’s claims which present in the context to the total number of answer’s claims. H2O EvalStudio evaluators overview TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM answer generation in RAG.
  • 68. v H2O.ai Confidential ● LLM Guardrails are a set of predefined constraints and guidelines that are applied to LLMs to manage their behavior. ● Guardrails serve to ensure responsible, ethical, and safe usage of LLMs, mitigate potential risks, and promote transparency and accountability. ● Guardrails are a form of proactive control and oversight over the output and behavior of language models, which are otherwise capable of generating diverse content, including text that may be biased, inappropriate, or harmful. Understanding the distinct functions of each type of guardrail is pivotal in creating a comprehensive and effective strategy for governing AI systems. Guardrails
  • 69. v H2O.ai Confidential ● Content Filter Guardrails: Content filtering is crucial to prevent harmful, offensive, or inappropriate content from being generated by LLMs. These guardrails help ensure that the outputs conform to community guidelines, curbing hate speech, explicit content, and misinformation. ● Bias Mitigation Guardrails: Bias is an ongoing concern in AI, and mitigating bias is critical. These guardrails aim to reduce the model's inclination to produce content that perpetuates stereotypes or discriminates against particular groups. They work to promote fairness and inclusivity in the model's responses. ● Safety and Privacy Guardrails: Protecting user privacy is paramount. Safety and privacy guardrails are designed to prevent the generation of content that may infringe on user privacy or include sensitive, personal information. These measures safeguard users against unintended data exposure. Types of Guardrails
  • 70. v H2O.ai Confidential Types of Guardrails ● Fact-Checking & Hallucination Guardrails: To combat misinformation, fact-checking guardrails are used to verify the accuracy of the information generated by LLMs. They help ensure that the model's responses align with factual accuracy, especially in contexts like news reporting or educational content. ● Context/Topic and User Intent Guardrails: For LLMs to be effective, they must produce responses that are contextually relevant and aligned with user intent. These guardrails aim to prevent instances where the model generates content that is unrelated or fails to address the user's queries effectively. ● Explainability and Transparency Guardrails: In the pursuit of making LLMs more interpretable, these guardrails require the model to provide explanations for its responses. This promotes transparency by helping users understand why a particular output was generated, fostering trust and accountability. ● Jailbreak Guardrails: Ensure robustness to malicious user attacks such as prompt injection.
  • 71. H2O.ai Confidential Lab 3 - Hacking and Security Posture
  • 73. H2O.ai Confidential Ashrith Barthur Principal Data Scientist ashrith.barthur@h2o.ai Contact Thank you!
  • 76. H2O.ai Confidential Lab 4 - Complaint Summarizer
  • 78. H2O.ai Confidential Determine what is the one credit product with the highest number of complaints. Task 1 Complaint Summarizer Applied Generative AI for Banking
  • 79. H2O.ai Confidential Determine what is the one credit product with the highest number of complaints. Answer: Credit Reporting Task 1 Complaint Summarizer Applied Generative AI for Banking
  • 80. H2O.ai Confidential Determine what is the top complaint for TransUnion? Task 2 Complaint Summarizer Applied Generative AI for Banking
  • 81. H2O.ai Confidential Determine what is the top complaint for TransUnion? Answer: Violation of Consumers Rights to Privacy and Confidentiality Under the Fair Credit Reporting Act. Task 2 Complaint Summarizer Applied Generative AI for Banking
  • 82. H2O.ai Confidential Use H2OGPT to summarize a complaint from the database and provided immediate next steps Task 3 Complaint Summarizer Applied Generative AI for Banking
  • 83. H2O.ai Confidential Use H2OGPT to summarize a complaint from the database and provided immediate next steps Answer: [See screenshot] Task 3 Complaint Summarizer Applied Generative AI for Banking
  • 86. H2O.ai Confidential Contact Thank you! MAKERS Jonathan Farland Director of Solution Engineering jon.farland@h2o.ai www.linkedin.com/in/jonfarland/
  • 88. H2O.ai Confidential Retrieval-Augmented Generation (RAG) RAG as a system is a particularly good use of vector databases. RAG systems take advantage the context window for LLMs, filling it with only the most relevant examples from real data. This “grounds” the LLM to relevant context and greatly minimizes any hallucination.
  • 89. Embedding Models Source: https://huggingface.co/blog/1b-sentence-embeddings
  • 90. Embedding Models - INSTRUCTOR Source: https://arxiv.org/pdf/2212.09741.pdf Instruction-based Omnifarious Representations Model is trained to generate embeddings using both the instruction as well as the textual input. Applicable to virtually every use case, due to its ability to create latent vector representations that include instruction.
  • 91. Embedding Models - BGE Source: https://arxiv.org/pdf/2310.07554.pdf LLM-Embedder This embedding model is trained specifically for use with RAG systems. Reward model introduced that provides higher rewards to a retrieval candidate if it results in a higher generation likelihood for the expected output Uses contrastive learning to directly get at optimizing for RAG applications
  • 92. H2O.ai Confidential AI Engines Deployment Consumption LLM AppStudio LLM DataStudio LLM EvalStudio H2O LLMs Ecosystem AppStore End Users Generative AI with H2O.ai MLOps AI Engine Manager Doc-QA Enterprise h2oGPT
  • 93. H2O.ai Confidential Explanations in Highlights the most important regions of an image Highlights the most important words
  • 95. H2O.ai Confidential LLM Context Length Testing Source: https://github.com/gkamradt/LLMTest_NeedleInAHaystack?tab=read me-ov-file
  • 96. H2O.ai Confidential LLM Context Length Testing Source: https://github.com/gkamradt/LLMTest_NeedleInAHaystack?tab=read me-ov-file
  • 97. H2O.ai Confidential Ethical Considerations, Data Privacy, and User Consent Assess the potential impact of generative AI on individuals and society. Give users control over how their data is used by generative AI. Consent mechanisms should be transparent and user- friendly. Monitoring, Regulation, and Security Detect misuse or anomalies in generative AI behavior. Regulatory compliance ensures adherence to ethical and legal guidelines. Security measures are crucial to protect AI models from adversarial attacks or unauthorized access. Accountability and Oversight Define roles and responsibilities for AI development and deployment. Oversight mechanisms ensure that responsible practices are followed. Education and Awareness Users and developers should be informed about generative AI capabilities, limitations, and ethical considerations. Stakeholder Involvement Involving various stakeholders in AI discussions promotes diverse perspectives and responsible decision-making. Continuous Evaluation and Improvement Continually assess models to ensure fairness, accuracy, and alignment with ethical standards. Transparency, Explainability, Bias Mitigation,Debugging, and Guardrails Recognize and mitigate both subtle and glaring biases that may emerge from training data. Ensures that users can understand and trust the decisions made by generative AI models. Debug models with techniques such as adversarial prompt engineering. Proactively manage risks and maintain control over the model's behavior with guardrails. Responsible Generative AI Responsible Generative AI Audit Input Data, Benchmarks, and Test the Unknown Assess quality of data used as input to train Generative AI models. Utilize benchmarks and random attacks for testing.
  • 98. v H2O.ai Confidential CoVe method reduces factual errors in large language models by drafting, fact-checking, and verifying responses - it deliberates on its own responses and self-correcting them Steps: 1. Given a user query, a LLM generates a baseline response that may contain inaccuracies, e.g. factual hallucinations 2. To improve this, CoVe first generates a plan of a set of verification questions to ask, and then executes that plan by answering them and hence checking for agreement 3. Individual verification questions are typically answered with higher accuracy than the original accuracy of the facts in the original longform generation 4. Finally, the revised response takes into account the verifications 5. The factored version of CoVe answers verification questions such that they cannot condition on the original response, avoiding repetition and improving performance Minimizing Model Hallucinations Using Chain-of-Verification (CoVe) Method
  • 99. v H2O.ai Confidential Minimizing Model Hallucinations Using Chain-of-Verification (CoVe) Method