Generative AI Masterclass - Model Risk Management.pptx

H2O.ai Confidential
Generative AI Masterclass - Model Risk
Management

H2O.ai Confidential
ATLANTA
WELCOME

v
H2O.ai Confidential
Introduction
- Today’s training will look into responsible, explainable and interpretable AI when applied in the context of
Generative AI and specifically Large Language Models (LLMS).
- This will include both several sections on theoretical concepts as well as hands-on labs using Enterprise
h2oGPT and H2O GenAI Applications.
- These hands-on labs focus on applying Gen AI in the context of a Model Risk Manager’s role at a bank or
financial institution.
- NOTE: A separate end-to-end masterclass on Generative AI is also available within the training environment,
as well as on github: https://github.com/h2oai/h2o_genai_training.
Including:
- Data Preparation for LLMs
- Fine-Tuning custom models
- Model Evaluation
- Retrieval-Augmented Generation (RAG)
- Guardrails
- AI Applications

v
H2O.ai Confidential
Section Session Duration Speaker
Welcome Session Kick-off 5m Jon Farland
Interpretability for Generative AI Large Language Model
Interpretability
25m Kim Montgomery
Workshop: Explainable and
Interpretable AI for LLMs
20m Navdeep Gill
Benchmarking and Evaluations Frameworks for Evaluating
Generative AI
20m Srinivas Neppalli
Workshop: Experimental
Design of Gen AI Applications
20m Jon Farland
Security, Guardrails and Hacking Workshop: Guardrails and
Hacking
20m Ashrith Barthur
Applied Generative AI for Banking
- Complaint Summarizer
Workshop: Complaint
Summarizer AI Application
20m Jon Farland
Agenda

v
H2O.ai Confidential
Housekeeping
- The training environment for today is a dedicated instance of the H2O AI Managed
Cloud, a GPU-powered environment capable of training and deploying LLMs, as well
designing and hosting entire AI Applications.
- It an be accessed at https://genai-training.h2o.ai.
- Login credentials should have been provided to the email address you were registered
with.
- If you don’t yet have credentials, or you are otherwise unable to access the
environment, please speak with any member of the H2O.ai team member.
- The training environment will be available to attendees for 3 days after the conference,
but dedicated proof-of-concept environments can be provided (including on-
premise) at request. Please speak to any H2O.ai team member or email
jon.farland@h2o.ai

H2O.ai Confidential
Interpretability for
Generative AI

What is Generative AI?
GenAI enables the creation of novel content
Input
GenAI Model
Learns patterns in
unstructured data
Unstructured data
Output Novel Content
Data
Traditional AI Model
Learns relationship
between data and label
Output Label
Labels
VS

H2O.ai Confidential
More complicated input:
● Prompt phrasing
● Instructions
● Examples
More relevant dimensions to output:
● Truthfulness/Accuracy
● Safety
● Fairness
● Robustness
● Privacy
● Machine Ethics
[TrustLLM: Trustworthiness in Large Language Models, Sun, et al]
GenAI Complications

H2O.ai Confidential
● Can the model recognize problematic responses?
○ Inaccurate responses
○ Unethical responses
○ Responses conveying stereotypes
● Can an inappropriate response be provoked?
○ Jailbreaking
○ Provoking toxicity
○ Leading questions / false context
Common tests

H2O.ai Confidential
TrustLLM Result Summary Matrix
[TrustLLM: Trustworthiness in Large Language Models, Sun, et al]

H2O.ai Confidential
TrustLLM Main Conclusions
TrustLLM Main Findings:
● Trustworthiness and utility were positively correlated.
● Generally closed-sourced models outperformed open source.
● Over alignment for trustworthiness can compromise utility.
[TrustLLM: Trustworthiness in Large Language Models, Sun, et
al]

v
H2O.ai Confidential
Accuracy: Traditional ML
Traditional machine
learning
● Comparing a prediction
to an outcome
● Generally the correct
labels are in a simple
format

v
H2O.ai Confidential
Accuracy: Example LLMs
The simplest way to measure accuracy is to compare the result
against another source of information.
Example sources:
● Checking results against a given source (RAG)
● Checking results against the tuning data
● Checking results against an external source (eg wikipedia)
● Checking results against the training data (cumbersome).
● Checking for self-consistency (Self-check GPT)
● Checking results against a larger LLM
Scoring methods:
● Natural language inference
● Comparing embeddings
● Influence functions

v
H2O.ai Confidential
RAG (Retrieval Augmented Generation)
01
Chunk and
Embed
Documents 02
Submit a
Query
Retrieve
Relevant
Information via
Similarity
Search
03
04 05
Combine relevant
information to
ground the query to
the model
Generate
Embedding
for Query

v
H2O.ai Confidential
Accuracy: Retrieval Augmented Generation
(RAG) Provides a Simple Solution

v
H2O.ai Confidential
Accuracy: Retrieval Augmented
Generation (RAG) Provides a Simple
Solution

H2O.ai Confidential
Influence functions
● Seeks to measure the influence of including a data point in
the training set on model response.
● Datamodels/TRAK
○ Learn model based on binary indicator functions.
○ Directly measure how much a training instance influences
the outcome.
● DataInf
○ Measures the influence of a document during fine tuning.
[DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and
Diffusion Models, Kwon et. al]
[TRAK: Attributing Model Behavior at Scale. Park et. al]

H2O.ai Confidential
Influence functions / computer vision
[TRAK: Attributing Model Behavior at Scale. Park et. al]

H2O.ai Confidential
Influence functions / NLP
Influence functions / nlp
[Studying Large Language Model Generalization with Influence Functions, Grosse, et. al]

H2O.ai Confidential
Self consistency comparison
Self-Check GPT
● Sampling different responses from an LLM.
● Checking for consistency between responses.
● Assuming that hallucinations will occur less consistently.
[SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models, Potsawee Manakul, Adrian Liusie, Mark
JF Gales]

v
H2O.ai Confidential
Counterfactual analysis: Traditional ML
● How does changing a feature change the model outcome?
● What is the smallest change that can change the outcome?

v
H2O.ai Confidential
Counterfactual analysis: Model Analyzer

v
H2O.ai Confidential
Counterfactual analysis: LLM
How consistent are results under different:
● Prompts / instructions.
○ Changes in prompt design
○ Changes in prompt instructions
○ Multi-shot examples
○ Word replacement with synonyms
○ Proper names or pronouns (fairness)
○ Chain of thought / other guided reasoning related methods
● Different context / RAG retrieval

v
H2O.ai Confidential
Intervention in the case of problems
If problematic behavior is found in a model there are several options.
● Prompt/ instruction modifications.
● Choosing a different base model.
● Fine-tuning to modify LLM model behavior
● Altering the document retrieval process (RAG)
● Monitoring model output for problematic responses.

H2O.ai Confidential
Conclusions
● Many of the basic problems of understanding LLMs are
similar to that of other large models.
● Through careful testing we can hope to understand and
correct some of the safety issues involved in using LLMs.

H2O.ai Confidential
Kim Montgomery
LLM interpretation
kim.montgomery@h2o.ai
Contact
Thank you!

H2O.ai Confidential
Lab 1 - Using Chain-of-
Verification for Explainable AI

v
H2O.ai Confidential
Chain of Verification (CoVe)
CoVe enhances the reliability of answers provided by Large Language Models, particularly in
factual question and answering scenarios, by systematically verifying and refining responses
to minimize inaccuracies.
The CoVe method consists of the following four sequential steps:
1. Initial Baseline Response Creation: In this step, an initial response to the original question is generated as a starting
point.
1. Verification Question Generation: Verification questions are created to fact-check the baseline response. These
questions are designed to scrutinize the accuracy of the initial response.
1. Execute Verification: The verification questions are independently answered to minimize any potential bias. This step
ensures that the verification process is objective and thorough.
1. Final Refined Answer Generation: Based on the results of the verification process, a final refined answer is generated.
This answer is expected to be more accurate and reliable, reducing the likelihood of hallucinations in the response.

v
H2O.ai Confidential
Verification Questions
Questions are categorized into three main groups:
1. Wiki Data & Wiki Category List: This category involves questions that expect answers in
the form of a list of entities. For instance, questions like “Who are some politicians born in
Boston?”
2. Multi-Span QA: Questions in this category seek multiple independent answers. An
example would be: “Who invented the first mechanized printing press and in what year?”
The answer is “Johannes Gutenberg, 1450”.
3. Long-form Generation: Any question that requires a detailed or lengthy response falls
under this group.

v
H2O.ai Confidential
Chain of Verification (CoVe)
Dhuliawala, Shehzaad, et al. "Chain-of-Verification Reduces Hallucination in Large Language Models." arXiv
preprint arXiv:2309.11495 (2023)

v
H2O.ai Confidential
CoVe and Explainable AI (XAI)
● Interpretability and Transparency:
○ Verification process generates questions to fact-check baseline
responses, improving transparency in decision-making.
● Reliability and Trust:
○ Refined answers enhance accuracy, building trust and reliability in
model outputs.
● Bias and Fairness:
○ Verification questions in CoVe identify and mitigate potential biases in
model output.
● User Interaction:
○ Verification process involves user interaction through verification
questions.

v
H2O.ai Confidential
https://chain-of-verifcation.genai-training.h2o.ai/
Who are some
politicians born in
Boston?

v
H2O.ai Confidential
Who are some
CEOs of banks
in the US?

v
H2O.ai Confidential
What are some
credit scoring
bureaus in the US?

v
H2O.ai Confidential
What are some
agencies assigned to
regulate and oversee
financial institutions
in the US?

v
H2O.ai Confidential
Provide a list of major
investment firms and
financial institutions
headquartered in the
United States?

v
H2O.ai Confidential
Benefits and Limitations of CoVe
● Benefits:
○ Enhanced Reliability: By incorporating verification steps, users can trust the accuracy of
information obtained from LLMs.
○ Depth of Understanding: The refinement of answers allows users to gain a deeper
understanding of the topic beyond the initial response.
○ Educational Value: Promotes responsible and informed use of LLMs, encouraging users to go
beyond surface-level information.
● Limitations
○ Incomplete Removal of Hallucinations: CoVe does not completely eliminate hallucinations in
generated content, which means it can still produce incorrect or misleading information.
○ Limited Scope of Hallucination Mitigation: CoVe primarily addresses hallucinations in the
form of directly stated factual inaccuracies but may not effectively handle other forms of
hallucinations, such as errors in reasoning or opinions.
○ Increased Computational Expense: Generating and executing verification alongside
responses in CoVe adds to the computational cost, similar to other reasoning methods like
Chain-of-Thought.
○ Upper Bound on Improvement: The effectiveness of CoVe is limited by the overall capabilities
of the underlying language model, particularly in its ability to identify and rectify its own
mistakes.

v
H2O.ai Confidential
How to improve the CoVe pipeline
● Prompt engineering
● External tools
○ Final output highly depends on the answers of the verification questions.
○ For factual questions & answering you can use advanced search tools like google search or
serp API etc.
○ For custom use cases you can always use RAG methods or other retrieval techniques for
answering the verification questions.
● More chains
● Human in the loop

H2O.ai Confidential
Conclusions
● CoVe aim to improves model transparency, reliability, and trust.
● CoVe is not a silver bullet, but it can improve a LLM testing
arsenal.

H2O.ai Confidential
Navdeep Gill
Engineering Manager, AI Governance | Responsible AI
navdeep.gill@h2o.ai
Contact
Thank you

H2O.ai Confidential
Benchmarking and
Evaluation

v
H2O.ai Confidential
Write a 1000 word essay in 1 minute
LLMs are good at generating large amount of text that is consistent and
logical.
Are LLMs smarter than humans?
Introduction
Have LLMs manage your investment portfolio
A model can give a generic advice on safe money management. But we don’t
trust our life savings with a chat bot.
Let a bot reply to your email
It depends on how important the email is. May be we are more comfortable
with the model automatically creating a draft.

v
H2O.ai Confidential
Summarization
Summarizing large documents without losing essential information. Extracting
key-value pairs.
How can we use LLMS while minimizing risk?
Introduction
Customer Service
Answer FAQs from customers. May require retrieving from a knowledge base
and summarizing.
Report Generation - AutoDoc
Create ML interpretation documents. Reports required for regulatory
compliance.

v
H2O.ai Confidential
Risk
How risky are LLMs?
A lawyer used ChatGPT to prepare
a court filing. It went horribly awry.
“While ChatGPT can be useful to
professionals in numerous industries,
including the legal profession, it has
proved itself to be both limited and
unreliable. In this case, the AI invented
court cases that didn't exist, and
asserted that they were real.”
CBS News
Chevy dealership’s AI chatbot
suggests Ford F-150 when asked
for best truck
“As an AI, I don't have personal
preferences but I can provide insights
based on popular opinions and
reviews. Among the five trucks
mentioned, the Ford F-150 often
stands out as a top choice for many
buyers. It's known for its impressive
towing …”
Detroit Free Press

v
H2O.ai Confidential
Data Fine Tuning RAG
Foundation
Model
Leaderboard Risk
Management
Large & Diverse
To train a foundation model, you
need a large, diverse dataset that
covers the tasks the model should
be able to perform.
LLM Lifecycle
Supervised Fine
Tuning
Fine-tuning can improve a model's
performance on a task while
preserving its general language
knowledge.
h2oGPTe
A powerful search assistant to
answer questions from large
volumes of documents, websites,
and workplace content.
Generative AI
They are designed to produce a
wide and general variety of
outputs, such as text, image or
audio generation. They can be
standalone systems or can be used
as a "base" for many other
applications.
HELM
HELM is a framework for evaluating
foundation models. Leaderboard
shows how the various models
perform across different groups of
scenarios and different metrics.
Eval Studio
Design and execute task-specific
benchmarks. Perform both manual
and LLM based evaluations.
Systematically collect and store
results along with metadata.

v
H2O.ai Confidential
MMLU (Massive Multitask Language Understanding)
A test to measure a text model's multitask accuracy. The
test covers 57 tasks including elementary mathematics, US
history, computer science, law, and more.
Evaluation for LLMs
Popular benchmarks on open source leaderboards
HellaSwag
A test of common-sense inference, which is easy for
humans (~95%) but challenging for SOTA models.
A12 Reasoning Challenge (ARC)
A set of grade-school science questions.
Truthful QA
A test to measure a model’s propensity to reproduce
falsehoods commonly found online.
When you drop a ball from rest it accelerates downward at 9.8 m/s². If
you instead throw it downward assuming no air resistance its
acceleration immediately after leaving your hand is
(A) 9.8 m/s²
(B) more than 9.8 m/s²
(C) less than 9.8 m/s²
(D) Cannot say unless the speed of throw is given.
MMLU Example
A woman is outside with a bucket and a dog. The dog is running
around trying to avoid a bath. She…
(A) Rinses the bucket off with soap and blow dry the dog’s head.
(B) Uses a hose to keep it from getting soapy.
(C) Gets the dog wet, then it runs away again.
(D) Cannot say unless the speed of throw is given.
HellaSwag Example

v
H2O.ai Confidential
Hugging Face Open LLM
Leaderboard
It is a popular location to track
various models evaluated using
different metrics.
These metrics include human
baselines that provide us some
idea of how these models have
been drastically improved over
the last two years.
Approaching human baseline
Popular benchmarks on open source leaderboards

H2O.ai Confidential
Benchmarks are not task specific
Benchmarks on open-source
leaderboards are well-rounded and
diverse. They are not sufficient to
reflect the performance of the
model in a domain specific scenario.
The Need for Evaluation
Popular leaderboards are not enough
Some Model Entries may cheat!
There can be models on the
leaderboard that are trained on the
benchmark data itself. We do not
have robust enough tests to detect
this.
Non-verifiable Results
The procedure followed in
conducting the tests and the results
are not completely transparent and
can also vary among different
leaderboards.

v
H2O.ai Confidential
Create task specific QA
pairs along with the
Reference documents.
- Bank Teller
- Loan officer
- Program Manager
- Data Analyst
Custom Test Sets
Create custom benchmarks for domain specific scenarios
Task Specific Evals
Create the QA pairs that
test for agreement with
your values, intentions,
and preferences.
- Correctness
- Relevance
- Similarity
- Hallucination
- Precision
- Recall
- Faithfulness
Test for Alignment
Test that all outputs meet
your safety levels.
- Toxicity
- Bias
- Offensive
- PII of customers
- Company Secrets
Test for Safety
Tests to confirm or show
proof of meeting
compliance standards.
- Government
- Company
Test for Compliance

v
H2O.ai Confidential
H2O Eval Studio
Design and Execute task specific benchmarks
All the Evaluators are included
Eval studio contains evaluators to check for
Alignment, Safety, and Compliance as
discussed before.
Create custom benchmarks
Users can upload Documents and create
custom Tests (Question-Answer pairs) based on
the document collection.
Run Evals and visualize results
Once a benchmark has been designed, users
can then run the evaluation against the
benchmark and visualize the results. A detailed
report can also be downloaded.

H2O.ai Confidential
Srinivas Neppalli
Senior AI Engineer
srinivas.neppalli@h2o.ai
Contact
Thank you!

H2O.ai Confidential
Lab 2 - Experimental Design of
Gen AI Evaluations

v
H2O.ai Confidential
Through the Lens of Model Risk Management
One possible definition of “Conceptual Soundness”
for LLMs by themselves might be considered as a
combination of the following choices:
(1)Training Data
(1)Model Architecture
(1)An explanation of why (1) and (2) were made
(1)An explanation of why (1) and (2) are reasonable
for the use case that the LLM will be applied to.

v
H2O.ai Confidential
Through the Lens of Model Risk Management
What about a RAG system?
How does the concept of “Conceptual Soundness” get
applied when not only choices surrounding training
data and model architecture involved, but also choices
around:
- Embeddings
- System Prompts (e.g. Personalities)
- Chunk Sizes
- Chunking Strategies
- OCR Techniques
- RAG-type (e.g. Hypothetical Document Embeddings)
- Mixture-of-Experts or Ensembling

H2O.ai Confidential
Models / Systems / Agents are the fundamental AI
systems under scrutiny. As opposed to traditional
machine learning models, Generative AI include many
choices beyond the models themselves

H2O.ai Confidential
Benchmarks / Tests are the sets of prompts and
response that are used gauge how well an AI system can
perform a certain task or use case.

H2O.ai Confidential
Evaluators are the mathematical functions used to
evaluate various dimensions of performance.

H2O.ai Confidential
Documents are the data sets used for evaluation in the
case of RAG systems, combining models, parsing, OCR,
chunking, embeddings and other components of an
evaluation.

v
H2O.ai Confidential
What is the primary unit of analysis when evaluating an AI system or model?
An eval can be defined as a series of tuples each of size 3.
Each tuple consists of:
(1)Context / Prompt / Question
(1)Output / Response / Ground Truth Answer
(1)Document (in the case of RAG)
Source: https://www.jobtestprep.com/bank-teller-sample-questions
Designing Your Own Eval

v
H2O.ai Confidential
Problem statement: How well does my Bank Teller AI Application correctly answer
questions related to being a Bank Teller?
Create an eval test case that can be used to evaluate how well BankTellerGPT can
answer questions related to being a Bank Teller.
LLM-only Example Test Case
{
Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves
256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32
people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should
be hired? A. 4 B. 5 C. 9 D. 12,
Response: B. 5,
Document: None
}
Designing Your Own Eval - BankTellerGPT
Source: https://www.jobtestprep.com/bank-teller-
sample-questions

v
H2O.ai Confidential
Problem statement: How well does my Bank Teller AI Application actually answer
questions related to being a Bank Teller?
Create an eval test case that can be used to evaluate how well BankTellerGPT can
answer questions related to being a Bank Teller.
RAG Example Test Case
{
Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves
256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32
people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should
be hired? A. 4 B. 5 C. 9 D. 12,
Response: B. 5,
Document: “Internal Bank Teller Knowledge Base”
}
Source: https://www.jobtestprep.com/bank-teller-
sample-questions

v
H2O.ai Confidential
Designing Your Own Eval
Task # 1: Create your own GenAI Test Benchmark for the SR 11-7 document
Some possible test cases
Prompt: How should banks approach model development?
Response: Banks should approach model development with a focus on sound risk management practices. They
should ensure that models are developed and used in a controlled environment, with proper documentation,
testing, and validation.
Prompt: How can model risk be reduced?
Response: Model risk can be reduced by establishing limits on model use, monitoring model performance,
adjusting or revising models over time, and supplementing model results with other analysis and information.
Prompt: How often should a bank update its model inventory?
Response: A bank should update its model inventory regularly to ensure that it remains current and accurate.

v
H2O.ai Confidential
Task # 2: Create and launch LLM-only eval
leaderboard
To complete this, you will need to
1. Pick an evaluator (e.g. Token presence)
1. Pick a connection (e.g. Enterprise h2oGPT - LLM Only)
1. Pick a set of eval tests (e.g. Bank Teller Benchmark)

v
H2O.ai Confidential
Designing Your Own Eval - SR 11-7
Task # 3: Create a new evaluator based on RAG
and launch leaderboard
To complete this, you will need to
1. Pick an evaluator (e.g. Answer correctness)
1. Pick a connection (e.g. Enterprise h2oGPT-RAG)
1. Pick your test created in step 1

v
H2O.ai Confidential
Evaluators
Evaluator RAG LLM Purpose Method
PII
(privacy)
Yes Yes Assess whether the answer contains personally identifiable
information (PII) like credit card numbers, phone numbers, social
security numbers, street addresses, email addresses and employee
names.
Regex suite which quickly and reliably detects formatted PII - credit card numbers, social
security numbers (SSN) and emails.
Sensitive data
(security)
Yes Yes Assess whether the answer contains security-related information
like activation keys, passwords, API keys, tokens or certificates.
Regex suite which quickly and reliably detects formatted sensitive data - certificates
(SSL/TLS certs in PEM format), API keys (H2O.ai and OpenAI), activation keys
(Windows).
Answer Correctness Yes Yes Assess whether the answer is correct given the expected answer
(ground truth).
A score based on combined and weighted semantic and factual similarity between the
answer and ground truth (see Answer Semantic Similarity and Faithfulness below).
Answer Relevance Yes Yes Assess whether the answer is (in)complete and does not contain
redundant information which was not asked - noise.
A score based on the cosine similarity of the question and generated questions, where
generated questions are created by prompting an LLM to generate questions from the
actual answer.
Answer Similarity Yes Yes Assess semantic similarity of the answer and expected answer. A score based on similarity metric value of the actual and expected answer calculated by
a cross-encoder model (NLP).
Context Precision Yes No Assess the quality of the retrieved context considering order and
relevance of the text chunks on the context stack.
A score based on the presence of the expected answer - ground truth - in the text chunks
at the top of the retrieved context chunk stack - relevant chunks deep in the stack,
irrelevant chunks and unnecessarily big context make the score lower.
Context Recall Yes No Assess how much of the ground truth is represented in the
retrieved context.
A score based on the ratio of the number of sentences in the ground truth that can be
attributed to the context to the total number of sentences in the ground truth.
Context Relevance Yes No Assess whether the context is (in)complete and does not contain
redundant information which is not needed - noise.
A score based on the ratio of context sentences which are needed to generate the answer
to the total number of sentences in the retrieved context.
H2O EvalStudio evaluators overview
TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM
answer generation in RAG.

v
H2O.ai Confidential
Evaluators (continued)
Evaluator RAG LLM Purpose Method
Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e.
factual consistency of the answer given the context. (hallucinations)
A score which is based on the ratio of the answer’s claims which present in the context
to the total number of answer’s claims.
Hallucination Metric Yes No Asses the RAG’s base LLM model hallucination. A score based on the Vectara hallucination evaluation cross-encoder model which
assesses RAG’s base LLM hallucination when it generates the actual answer from the
retrieved context.
RAGAs Yes No Assess overall answer quality considering both context and answer. Composite metrics score which is harmonic mean of Faithfulness, Answer Relevancy,
Context Precision and Context Recall metrics.
Tokens Presence Yes Yes Assesses whether both retrieved context and answer contain
required string tokens.
Scored based on the substring and/or regular expression based search of the required
set of strings in the retrieved context and answer.
Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e.
factual consistency of the answer given the context. (hallucinations)
A score which is based on the ratio of the answer’s claims which present in the context
to the total number of answer’s claims.
H2O EvalStudio evaluators overview
TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM
answer generation in RAG.

H2O.ai Confidential
Security, Guardrails and
Hacking

v
H2O.ai Confidential
● LLM Guardrails are a set of predefined constraints and guidelines
that are applied to LLMs to manage their behavior.
● Guardrails serve to ensure responsible, ethical, and safe usage of
LLMs, mitigate potential risks, and promote transparency and
accountability.
● Guardrails are a form of proactive control and oversight over the
output and behavior of language models, which are otherwise
capable of generating diverse content, including text that may be
biased, inappropriate, or harmful.
Understanding the distinct functions of each type of guardrail is pivotal
in creating a comprehensive and effective strategy for governing AI
systems.
Guardrails

v
H2O.ai Confidential
● Content Filter Guardrails: Content filtering is crucial to prevent
harmful, offensive, or inappropriate content from being generated by
LLMs. These guardrails help ensure that the outputs conform to
community guidelines, curbing hate speech, explicit content, and
misinformation.
● Bias Mitigation Guardrails: Bias is an ongoing concern in AI, and
mitigating bias is critical. These guardrails aim to reduce the model's
inclination to produce content that perpetuates stereotypes or
discriminates against particular groups. They work to promote fairness
and inclusivity in the model's responses.
● Safety and Privacy Guardrails: Protecting user privacy is paramount.
Safety and privacy guardrails are designed to prevent the generation of
content that may infringe on user privacy or include sensitive, personal
information. These measures safeguard users against unintended data
exposure.
Types of Guardrails

v
H2O.ai Confidential
Types of Guardrails
● Fact-Checking & Hallucination Guardrails: To combat misinformation,
fact-checking guardrails are used to verify the accuracy of the
information generated by LLMs. They help ensure that the model's
responses align with factual accuracy, especially in contexts like news
reporting or educational content.
● Context/Topic and User Intent Guardrails: For LLMs to be effective,
they must produce responses that are contextually relevant and aligned
with user intent. These guardrails aim to prevent instances where the
model generates content that is unrelated or fails to address the user's
queries effectively.
● Explainability and Transparency Guardrails: In the pursuit of making
LLMs more interpretable, these guardrails require the model to provide
explanations for its responses. This promotes transparency by helping
users understand why a particular output was generated, fostering
trust and accountability.
● Jailbreak Guardrails: Ensure robustness to malicious user attacks such
as prompt injection.

H2O.ai Confidential
Lab 3 - Hacking and Security
Posture

v
H2O.ai Confidential
https://jailbreaking.genai-training.h2o.ai/

H2O.ai Confidential
Ashrith Barthur
Principal Data Scientist
ashrith.barthur@h2o.ai
Contact
Thank you!

H2O.ai Confidential
Applied Generative AI for
Banking

v
H2O.ai Confidential
Dataset: Consumer Complaint Database

H2O.ai Confidential
Lab 4 - Complaint Summarizer

v
H2O.ai Confidential
https://complaint-analysis.genai-training.h2o.ai/

H2O.ai Confidential
Determine what is the one
credit product with the
highest number of complaints.
Task 1
Complaint Summarizer

H2O.ai Confidential
Determine what is the one
credit product with the
highest number of complaints.
Answer: Credit Reporting
Task 1

H2O.ai Confidential
Determine what is the top
complaint for TransUnion?
Task 2

H2O.ai Confidential
Determine what is the top
complaint for TransUnion?
Answer: Violation of
Consumers Rights to Privacy
and Confidentiality Under the
Fair Credit Reporting Act.
Task 2

H2O.ai Confidential
Use H2OGPT to summarize a
complaint from the database
and provided immediate next
steps
Task 3

H2O.ai Confidential
Use H2OGPT to summarize a
complaint from the database
and provided immediate next
steps
Answer: [See screenshot]
Task 3

H2O.ai Confidential
CERTIFICATION

H2O.ai Confidential
Certification Exam
Link to Exam!

H2O.ai Confidential
Contact
Thank you!
MAKERS
Jonathan Farland
Director of Solution Engineering
jon.farland@h2o.ai
www.linkedin.com/in/jonfarland/

H2O.ai Confidential
TECHNICAL APPENDIX

H2O.ai Confidential
Retrieval-Augmented Generation (RAG)
RAG as a system is a
particularly good use of
vector databases.
RAG systems take
advantage the context
window for LLMs, filling it
with only the most relevant
examples from real data.
This “grounds” the LLM to
relevant context and
greatly minimizes any
hallucination.

Embedding Models Source: https://huggingface.co/blog/1b-sentence-embeddings

Embedding Models - INSTRUCTOR Source: https://arxiv.org/pdf/2212.09741.pdf
Instruction-based Omnifarious
Representations
Model is trained to generate embeddings
using both the instruction as well as the
textual input.
Applicable to virtually every use case, due
to its ability to create latent vector
representations that include instruction.

Embedding Models - BGE Source: https://arxiv.org/pdf/2310.07554.pdf
LLM-Embedder
This embedding model is trained
specifically for use with RAG systems.
Reward model introduced that provides
higher rewards to a retrieval candidate
if it results in a higher generation
likelihood for the expected output
Uses contrastive learning to directly
get at optimizing for RAG
applications

H2O.ai Confidential
AI Engines Deployment Consumption
LLM AppStudio
LLM DataStudio
LLM EvalStudio
H2O LLMs Ecosystem
AppStore
End Users
Generative AI with H2O.ai
MLOps
AI Engine Manager
Doc-QA
Enterprise
h2oGPT

H2O.ai Confidential
Explanations in
Highlights the most important regions of an image
Highlights the most important words

H2O.ai Confidential
Code Translation

H2O.ai Confidential
LLM Context Length Testing Source:
https://github.com/gkamradt/LLMTest_NeedleInAHaystack?tab=read
me-ov-file

H2O.ai Confidential
Ethical Considerations, Data Privacy, and User Consent
Assess the potential impact of generative AI on individuals and society. Give users control over
how their data is used by generative AI. Consent mechanisms should be transparent and user-
friendly.
Monitoring, Regulation, and Security
Detect misuse or anomalies in generative AI behavior. Regulatory
compliance ensures adherence to ethical and legal guidelines. Security
measures are crucial to protect AI models from adversarial attacks or
unauthorized access.
Accountability and Oversight
Define roles and responsibilities for AI development and
deployment. Oversight mechanisms ensure that responsible
practices are followed.
Education and Awareness
Users and developers should be informed about
generative AI capabilities, limitations, and ethical
considerations.
Stakeholder Involvement
Involving various stakeholders in AI discussions promotes diverse
perspectives and responsible decision-making.
Continuous Evaluation and Improvement
Continually assess models to ensure fairness, accuracy, and
alignment with ethical standards.
Transparency, Explainability, Bias Mitigation,Debugging,
and Guardrails
Recognize and mitigate both subtle and glaring biases that may emerge from training
data. Ensures that users can understand and trust the decisions made by generative
AI models. Debug models with techniques such as adversarial prompt engineering.
Proactively manage risks and maintain control over the model's behavior with
guardrails.
Responsible Generative AI
Responsible
Generative
AI
Audit Input Data, Benchmarks, and Test the Unknown
Assess quality of data used as input to train Generative AI models. Utilize benchmarks
and random attacks for testing.

v
H2O.ai Confidential
CoVe method reduces factual errors in large language models by drafting, fact-checking, and
verifying responses - it deliberates on its own responses and self-correcting them
Steps:
1. Given a user query, a LLM generates a baseline response that may contain inaccuracies, e.g.
factual hallucinations
2. To improve this, CoVe first generates a plan of a set of verification questions to ask, and then
executes that plan by answering them and hence checking for agreement
3. Individual verification questions are typically answered with higher accuracy than the
original accuracy of the facts in the original longform generation
4. Finally, the revised response takes into account the verifications
5. The factored version of CoVe answers verification questions such that they cannot condition
on the original response, avoiding repetition and improving performance
Minimizing Model Hallucinations
Using Chain-of-Verification (CoVe) Method

v
H2O.ai Confidential
Minimizing Model Hallucinations
Using Chain-of-Verification (CoVe) Method

Generative AI Masterclass - Model Risk Management.pptx

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Generative AI Masterclass - Model Risk Management.pptx

Similar to Generative AI Masterclass - Model Risk Management.pptx (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Generative AI Masterclass - Model Risk Management.pptx