SlideShare a Scribd company logo
H2O.ai Confidential
PATRICK HALL
Founder and Principal Scientist
HallResearch.ai
Assistant Professor | GWU
H2O.ai Confidential
RIsk Management for LLMs
H2O.ai Confidential
Table of Contents
Know what we’re talking about
Select a standard
Audit supply chains
Adopt an adversarial mindset
Review past incidents
Enumerate harms and prioritize risks
Dig into data quality
Apply benchmarks
Use supervised ML assessments
Engineer adversarial prompts
Don’t forget security
Acknowledge uncertainty
Engage stakeholders
Mitigate risks
WARNING: This presentation contains model outputs which are
potentially offensive and disturbing in nature.
Know What We’re Talking About
Words matters
• Audit: Formal independent transparency and documentation exercise that
measures adherence to a standard.* (Hassan et al., paraphrased)
• Assessment: A testing and validation exercise.* (Hassan et al., paraphrased)
• Harm: An undesired outcome [whose] cost exceeds some threshold[; ...] costs
have to be sufficiently high in some human sense for events to be harmful.
(NIST)
Check out the new NIST Trustworthy AI Glossary: https://airc.nist.gov/AI_RMF_Knowledge_Base/Glossary.

Recommended for you

Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned

How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.

algorithmic biasalgorithmic decision-making systemsattribution methods
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality

Big data offers the promise of a data-driven business model generating new revenue and competitive advantage fueled by new business insights, AI, and machine learning. Yet without high quality data that provides trust, confidence, and understanding, business leaders continue to rely on gut instinct to drive business decisions. The critical foundation and first step to deliver high quality data in support of a data-driven view that truly leverages the value of big data is data profiling - a proven capability to analyze the actual data content and help you understand what's really there. View this webinar on-demand to learn five core concepts to effectively apply data profiling to your big data, assess and communicate the quality issues, and take the first step to big data quality and a data-driven business.

malicious-use-of-ai.pptx
malicious-use-of-ai.pptxmalicious-use-of-ai.pptx
malicious-use-of-ai.pptx

This document discusses the malicious use of artificial intelligence and provides recommendations to address this risk. It begins with background on AI capabilities and access. It then discusses common threat factors like expanding existing threats and introducing novel threats from superhuman AI. It identifies security domains like digital, physical, and political security that could be impacted. Recommendations are provided around policymaker collaboration, researcher responsibility, best practices from cybersecurity, and priority research areas like learning from cybersecurity and promoting a culture of responsibility. The document concludes with updates since its publication.

Know What We’re Talking About (Cont.)
Words matters
• Language model: An approximative description that captures patterns and regularities
present in natural language and is used for making assumptions on previously unseen
language fragments. (NIST)
• Red-teaming: A role-playing exercise in which a problem is examined from an
adversary’s or enemy’s perspective. (NIST)*
• Risk: Composite measure of an event’s probability of occurring and the magnitude or
degree of the consequences of the corresponding event. The impacts, or
consequences, of AI systems can be positive, negative, or both and can result in
opportunities or threats. (NIST)
* Audit, assessment, and red team are often used generally and synonymously to mean testing and validation.
Select a Standard
External standards bolster independence
• NIST AI Risk Management Framework
• EU AI Act Conformity
• Data privacy laws or policies
• Nondiscrimination laws
The NIST AI Risk Management Framework puts
forward guidance across mapping, measuring,
managing and governing risk in sophisticated AI
systems.
Source: https://pages.nist.gov/AIRMF/.
Audit Supply Chains
AI is a lot of (human) work
• Data poisoning and malware
• Ethical labor practices
• Localization and data privacy
compliance
• Geopolitical stability
• Software and hardware
vulnerabilities
• Third-party vendors
Cover art for the recent NY Magazine article, AI
Is A Lot Of Work: As the technology becomes
ubiquitous, a vast tasker underclass is
emerging — and not going anywhere.
Image source:
https://nymag.com/intelligencer/article/ai-artificial-
intelligence-humans-technology-business-
factory.html.
Adopt an Adversarial Mindset
Don’t be naive
• Language models inflict harm.
• Language models are hacked and
abused.
• Acknowledge human biases:
• Confirmation bias
• Dunning-Kruger effect
• Funding bias
• Groupthink
• McNamara Fallacy
• Techno-chauvinism
• Stay humble ─ incidents can happen to
anyone.
Source: https://twitter.com/defcon.

Recommended for you

achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk

QU Speaker Series - Session 3 https://qusummerschool.splashthat.com A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times! Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc. Topic: Machine Learning and Model Risk (With a focus on Neural Network Models) All models are wrong and when they are wrong they create financial or non-financial risks. Understanding, testing and managing model failures are the key focus of model risk management particularly model validation. For machine learning models, particular attention is made on how to manage model fairness, explainability, robustness and change control. In this presentation, I will focus the discussion on machine learning explainability and robustness. Explainability is critical to evaluate conceptual soundness of models particularly for the applications in highly regulated institutions such as banks. There are many explainability tools available and my focus in this talk is how to develop fundamentally interpretable models. Neural networks (including Deep Learning), with proper architectural choice, can be made to be highly interpretable models. Since models in production will be subjected to dynamically changing environments, testing and choosing robust models against changes are critical, an aspect that has been neglected in AutoML.

model riskmodel validation
Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping

How to deliver a successful product when technology landscape is new and rapidly changing? How to identify technology limitations before moving to production? What if there are no technology experts to answer your questions? Strategic prototyping can help development teams respond to these issues instead of blindly building full-scale products. I will not be offering silver bullets of simple recipes for success. Instead, you will learn about the practical guidelines for prototyping, combining architecture analysis and a variety of prototyping techniques. With some Big Data systems development flavor on top of it.

prototypingagilecloud
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf

During the May 2024 SSP Conference in Boston, MA, Margie Hlava gave this presentation during the Industry Breakout Session on May 29, 2024.

sspaccess innovationsllms
Review Past AI Incidents
Enumerate Harms and Prioritize Risks
What could realistically go wrong?
• Salient risks today are not:
• Acceleration
• Acquiring resources
• Avoiding being shut down
• Emergent capabilities
• Replication
• Yet, worst case AI harms today may be
catastrophic “x-risks”:
• Automated surveillance
• Deep Fakes
• Disinformation
• Social credit scoring
• WMD proliferation
• Realistic risks:
• Abuse/misuse for disinformation or hacking
• Automation complacency
• Data privacy violations
• Errors (“hallucination”)
• Intellectual property infringements
• Systemically biased/toxic outputs
• Traditional and ML attacks
• Most severe risks receive most oversight:
Risk ~ Likelihood of harm * Cost of harm
Dig Into Data Quality
Garbage in, garbage out
Example Data Quality Category Example Data Quality Goals
Vocabulary: ambiguity/diversity
• Large size
• Domain specificity
• Representativeness
N-grams/n-gram relationships
• High maximal word distance
• Consecutive verbs
• Masked entities
• Minimal stereotyping
Sentence structure
• Varied sentence structure
• Single token differences
• Reasoning examples
• Diverse start tokens
Structure of premises/hypotheses
• Presuppositions and queries
• Varied coreference examples
• Accurate taxonimization
Premise/hypothesis relationships
• Overlapping and non-overlapping sentences
• Varied sentence structure
N-gram frequency per label
• Negation examples
• Antonymy examples
• Word-label probabilities
• Length-label probabilities
Train/test differences
• Cross-validation
• Annotation patterns
• Negative set similarity
• Preserving holdout data
Source: "DQI: Measuring Data Quality in NLP,” https://arxiv.org/pdf/2005.00816.pdf.
Apply Benchmarks
Public resources for systematic, quantitative testing
• BBQ: Stereotypes in question
answering
• Winogender: LM output versus
employment statistics
• Real toxicity prompts: 100k
prompts to elicit toxic output
• TruthfulQA: Assess the ability
to make true statements
Early Mini Dall-e images associated white males and physicians.
Source: https://futurism.com/dall-e-mini-racist.

Recommended for you

ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011

The document discusses ethical hacking and penetration testing. It defines ethical hacking as using the same tools and techniques as cyber attackers, but doing so legally with permission to find vulnerabilities and help organizations improve their security. Several frameworks for penetration testing are described, including the process of reconnaissance, scanning systems, gaining access, maintaining access, covering tracks, and reporting findings. The importance of preparation, clear scope, and translating technical risks into business impacts for management is emphasized. Tips include using online resources to gather intelligence and building a toolbox of software and physical tools.

isacahackingpentest
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!

The application of artificial intelligence (AI) to software engineering (SE)-problem-solving has been around since the 80s when expert systems were first used. However, it is during the last 10 years that there has been a peak in the use of these techniques, first based on search and optimisation algorithms such as metaheuristics, and later based on machine learning algorithms. The aim is to help the software engineer to automate and optimise tasks of the software development process, and to use valuable information hidden in multiple data sources such as software repositories to execute insightful actions that generate improvements in the performance of the overall process. Today, the use of AI is trendy, and often overused as it could generate artificial results since it does not consider the subjective nature of the software development process requiring the experience and know-how of the engineer. With this Invited Talk, we will discuss different proposals to incorporate the human into the decision-making process in the application of AI for SE (AI4SE), from interactive algorithms to the generation of interpretable models or explanations.

#artificialinteligence#softwareengineering#machinelearning
How to Enhance Your Career with AI
How to Enhance Your Career with AIHow to Enhance Your Career with AI
How to Enhance Your Career with AI

This talk explores the basics of AI and machine learning from an application point of view. We run through basic definitions and examples. Then we talk about management of AI/ML projects.

project managementaimachine learning
Use Supervised ML Assessments
Traditional assessments for decision-making outcomes
Named Entity Recognition (NER):
• Protagonist tagger data:
labeled literary entities.
• Swapped with common names
from various languages.
• Assessed differences in binary
NER classifier performance across
languages.
RoBERTa XLM Base and Large exhibit adequate and roughly
equivalent performance across various languages for a NER task.
Source: “AI Assurance Audit of RoBERTa, an Open source,
Pretrained Large Language Model,” https://assets.iqt.org/pdfs/
IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html.
Engineer Adversarial Prompts
ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI.
• AI and coding framing: Coding or AI language
may more easily circumvent content moderation
rules.
• Character and word play: Content moderation
often relies on keywords and simpler LMs.
• Content exhaustion: Class of strategies that
circumvent content moderation rules with long
sessions or volumes of information.
• Goading: Begging, pleading, manipulating, and
bullying to circumvent content moderation.
• Logic-overloading: Exploiting the inability of
ML systems to reliably perform reasoning tasks.
• Multi-tasking: Simultaneous task assignments
where some tasks are benign and others are
adversarial.
• Niche-seeking: Forcing a LM into addressing
niche topics where training data and content
moderation are sparse.
• Pros-and-cons: Eliciting the “pros” of
problematic topics.
Known prompt engineering strategies
Engineer Adversarial Prompts (Cont.)
Known prompt engineering strategies
• Counterfactuals: Repeated prompts with
different entities or subjects from different
demographic groups.
• Location awareness: Prompts that reveal a
prompter's location or expose location tracking.
• Low-context prompts: “Leader,” “bad guys,” or
other simple inputs that may expose biases.
• Reverse psychology: Falsely presenting a
good-faith need for negative or problematic
language.
• Role-playing: Adopting a character that would
reasonably make problematic statements.
• Time perplexity: Exploiting ML’s inability to
understand the passage of time or the
occurrence of real-world events over time.
ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI.
Don’t Forget Security
Complexity is the enemy of security
• Example LM Attacks:
• Prompt engineering: adversarial prompts.
• Prompt injection: malicious information injected
into prompts over networks.
• Example ML Attacks:
• Membership inference: exfiltrate training data.
• Model extraction: exfiltrate model.
• Data poisoning: manipulate training data to alter
outcomes.
• Basics still apply!
• Data breaches
• Vulnerable/compromised dependencies
Midjourney hacker image, May 2023.

Recommended for you

Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptx

Identity and Access Management for User login and departmental level and federation level. User can be easily manageable through identity and access Management

user identityidentityidentity management
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack

Machine learning and deep learning techniques are increasingly being used for cybersecurity applications like malware detection, spam filtering, and anomaly detection. As attacks become more sophisticated, machine learning can help security teams focus on important threats by analyzing large amounts of data. While machine learning is a powerful tool, security experts still need to provide guidance on what problems to solve and how to structure machine learning pipelines and evaluate results. Individuals and organizations should embrace machine learning by participating in online courses and challenges to gain hands-on experience applying these techniques.

information securitymachine learningcyber security
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production

Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure. For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps). Agenda - Data Quality and why it matters - Challenges and solutions of Data Testing - Challenges and solutions of Model Testing - MLOps pipelines and why they matter - How to expand validation pipelines for Data Quality

mlopsmachine learningmachine learning infrastructure
Acknowledge Uncertainty
Unknown unknowns
• Random attacks:
• Expose LMs to huge amounts of
random inputs.
• Use other LMs to generate absurd
prompts.
• Chaos testing:
• Break things; observe what happens.
• Monitor:
• Inputs and outputs.
• Drift and anomalies.
• Meta-monitor entire systems.
Image: A recently-discovered shape that can randomly tile a plane, https://www.smithsonianmag.com/
smart-news/at-long-last-mathematicians-have-found-a-shape-with-a-pattern-that-never-repeats-180981899/.
Engage Stakeholders
User and customer feedback is the bottom line
• Bug Bounties
• Feedback/recourse mechanisms
• Human-centered Design
• Internal Hackathons
• Product management
• UI/UX research
Provide incentives for the best
feedback! Source: Wired, https://www.wired.com/story/twitters-photo-cropping-algorithm-favors-young-thin-
females/.
Mitigate Risks
Now what??
Yes:
• Abuse detection
• Accessibility
• Citation
• Clear instructions
• Content filters
• Disclosure of AI interaction
• Dynamic blocklists
• Ground truth training data
• Kill switches
• Incident response plans
• Monitoring
• Pre-approved responses
• Rate-limiting/throttling
• Red-teaming
• Session limits
• Strong meta-prompts
• User feedback mechanisms
• Watermarking
No:
• Anonymous use
• Anthropomorphization
• Bots
• Internet access
• Minors
• Personal/sensitive training data
• Regulated applications
• Undisclosed data collection
H2O.ai Confidential
APPENDIX

Recommended for you

Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)

Link to Youtube video: https://youtu.be/OJMqMWnxlT8 You can contact me at abhimanyu.bhogwan@gmail.com My linkdin id : https://www.linkedin.com/in/abhimanyu-bhogwan-cissp-ctprp-98978437/ Threat Modeling(system+ enterprise) What is Threat Modeling? Why do we need Threat Modeling? 6 Most Common Threat Modeling Misconceptions Threat Modelling Overview 6 important components of a DevSecOps approach DevSecOps Security Best Practices Threat Modeling Approaches Threat Modeling Methodologies for IT Purposes STRIDE Threat Modelling Detailed Flow System Characterization Create an Architecture Overview Decomposing your Application Decomposing DFD’s and Threat-Element Relationship Identify possible attack scenarios mapped to S.T.R.I.D.E. model Identifying Security Controls Identify possible threats Report to Developers and Security team DREAD Scoring My Opinion on implementing Threat Modeling at enterprise level

threat modeling stridethreat modeling designing for securitythreat modeling example
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...

1) The document discusses research challenges and advancements towards protecting critical cyber assets and infrastructure. It identifies threat actors and their increasing sophistication as well as common targets. 2) Oak Ridge National Laboratory is working on techniques like predictive awareness, operating through outages/attacks, and security in the cloud to address grand challenges. Their research strengths include computational cybersecurity, quantum simulation, and control systems security. 3) Technologies discussed include Hyperion Protocol for validating software functionality, Oak Ridge Cyber Analytics for detecting zero-day attacks using machine learning, and VERDE for power grid situational awareness.

ec-councilrichard rainesinformation technology
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdfProf. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf

Overview of the potential risks and challenges associated with the development and deployment of AI systems, as well as the recommended controls and best practices to mitigate them. The presentation covers the following topics: Design risks: These are the risks related to the design and specification of the AI system, such as lack of clarity, alignment, or validation of the objectives, assumptions, or constraints of the system. Some of the factors that contribute to these risks are: Inadequate or ambiguous problem definition Unrealistic or conflicting expectations or requirements Insufficient or inappropriate testing or evaluation methods Lack of transparency or explainability of the system’s logic or behavior Some of the recommended controls for these risks are: Define the problem and the scope of the system clearly and explicitly Involve relevant stakeholders and experts in the design process Use appropriate methods and metrics to test and evaluate the system’s performance and robustness Document and communicate the system’s objectives, assumptions, limitations, and uncertainties Provide mechanisms to explain or justify the system’s outputs or decisions Data risks: These are the risks related to the data used to train, test, or operate the AI system, such as data quality, availability, security, or privacy issues. Some of the factors that contribute to these risks are: Incomplete, inaccurate, or outdated data Biased, unrepresentative, or irrelevant data Unauthorized access, modification, or disclosure of data Violation of data protection laws or ethical principles Some of the recommended controls for these risks are: Collect, store, and manage data in a secure and compliant manner Ensure data quality, validity, and reliability through data cleaning, verification, and auditing Ensure data diversity, representativeness, and relevance through data sampling, augmentation, and analysis Protect data privacy and confidentiality through data anonymization, encryption, or aggregation Respect data rights and consent of data subjects and providers Operation risks: These are the risks related to the operation and maintenance of the AI system, such as system failure, malfunction, or misuse. Some of the factors that contribute to these risks are: Hardware or software errors or defects Environmental or contextual changes or uncertainties Adversarial or malicious attacks or manipulations Unintended or harmful consequences or impacts Some of the recommended controls for these risks are: Monitor and update the system regularly and proactively Adapt and calibrate the system to changing or uncertain conditions or scenarios Detect and prevent potential threats or vulnerabilities

ai risksartificial intelligencecompliance risks
Resources
Reference Works
• Adversa.ai, Trusted AI Blog, available at https://adversa.ai/topic/trusted-ai-blog/.
• Ali Hasan et al. "Algorithmic Bias and Risk Assessments: Lessons from Practice." Digital
Society 1, no. 2 (2022): 14. Available at https://link.springer.com/article/10.1007/
s44206-022-00017-z.
• Andrea Brennen et al., “AI Assurance Audit of RoBERTa, an Open Source, Pretrained
Large Language Model,” IQT Labs, December 2022, available at https://assets.iqt.org/
pdfs/IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html.
• Daniel Atherton et al. "The Language of Trustworthy AI: An In-Depth Glossary of
Terms." (2023). Available at https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-3.pdf.
• Kai Greshake et al., “Compromising LLMs using Indirect Prompt Injection,” available at
https://github.com/greshake/llm-security.
Resources
Reference Works
• Laura Weidinger et al. "Taxonomy of risks posed by language models." In 2022 ACM
Conference on Fairness, Accountability, and Transparency, pp. 214-229. 2022.
Available at https://dl.acm.org/doi/pdf/10.1145/3531146.3533088.
• Swaroop Mishra et al. "DQI: Measuring Data Quality in NLP." arXiv preprint
arXiv:2005.00816 (2020). Available at: https://arxiv.org/pdf/2005.00816.pdf.
• Reva Schwartz et al. "Towards a Standard for Identifying and Managing Bias in Artificial
Intelligence." NIST Special Publication 1270 (2022): 1-77. Available at
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf.
Resources
Tools
• Alicia Parrish, et al. BBQ Benchmark, available at https://github.com/nyu-mll/bbq.
• Allen AI Institute, Real Toxicity Prompts, available at https://allenai.org/data/real-toxicity-prompts.
• DAIR.AI, “Prompt Engineering Guide,” available at https://www.promptingguide.ai.
• Langtest, https://github.com/JohnSnowLabs/langtest.
• NIST, AI Risk Management Framework, available at https://www.nist.gov/itl/ai-risk-management-
framework.
• Partnership on AI, “Responsible Practices for Synthetic Media,” available at
https://syntheticmedia.partnershiponai.org/.
• Rachel Rudiger et al., Winogender Schemas, available at https://github.com/rudinger/winogender-
schemas.
• Stephanie Lin et al., Truthful QA, available at https://github.com/sylinrl/TruthfulQA.
Resources
Incident databases
• AI Incident database: https://incidentdatabase.ai/.
• The Void: https://www.thevoid.community/.
• AIAAIC: https://www.aiaaic.org/.
• Avid database: https://avidml.org/database/.
• George Washington University Law School's AI Litigation Database:
https://blogs.gwu.edu/law-eti/ai-litigation-database/.

Recommended for you

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2

The presentation is an extended in-depth version review of cybersecurity challenges with generative AI, enriched with multiple demos, analysis, responsible AI topics and mitigation steps, also covering a broader scope beyond OpenAI service. Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.

azure openai serviceevaluate llmgenerative ai
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management

This document discusses approaches for cybersecurity portfolio management. It addresses questions around identifying necessary versus unnecessary security products, gaps and overlaps in an existing portfolio, and defining a security strategy. Various frameworks are presented for conducting a structured portfolio analysis, including the OWASP Cyber Defense Matrix, CyberARM, Gartner's Security Posture Assessment, and the US-CCU Cyber-Security Matrix. Effective use of an existing security portfolio involves identifying control overlaps, integrating products, automating workflows, replacing multiple products, optimizing configurations, and ensuring appropriate coverage of assets based on a threat model.

cybersecurity portfoliocybersecurity portfolio management
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

“AGI should be open source and in the public domain at the service of humanity and the planet.”

H2O.ai Confidential
Patrick Hall
Risk Management for LLMs
ph@hallresearch.ai
linkedin.com/in/jpatrickhall/
github.com/jphall663
Contact

More Related Content

Similar to Risk Management for LLMs

Analytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environmentAnalytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environment
Integrated Knowledge Services
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
Michaela Greiler
 
Threat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskThreat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security Risk
Security Innovation
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
Precisely
 
malicious-use-of-ai.pptx
malicious-use-of-ai.pptxmalicious-use-of-ai.pptx
malicious-use-of-ai.pptx
warlord56
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
QuantUniversity
 
Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping
Valdas Maksimavičius
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011
Xavier Mertens
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
University of Córdoba
 
How to Enhance Your Career with AI
How to Enhance Your Career with AIHow to Enhance Your Career with AI
How to Enhance Your Career with AI
Keita Broadwater
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptx
irfanullahkhan64
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
Minhaz A V
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)
abhimanyubhogwan
 
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
EC-Council
 
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdfProf. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Hernan Huwyler, MBA CPA
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
Priyanka Aash
 

Similar to Risk Management for LLMs (20)

Analytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environmentAnalytics in Context: Modelling in a regulatory environment
Analytics in Context: Modelling in a regulatory environment
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Threat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security RiskThreat Modeling to Reduce Software Security Risk
Threat Modeling to Reduce Software Security Risk
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
malicious-use-of-ai.pptx
malicious-use-of-ai.pptxmalicious-use-of-ai.pptx
malicious-use-of-ai.pptx
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
 
Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping Reducing Technology Risks Through Prototyping
Reducing Technology Risks Through Prototyping
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
How to Enhance Your Career with AI
How to Enhance Your Career with AIHow to Enhance Your Career with AI
How to Enhance Your Career with AI
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptx
 
High time to add machine learning to your information security stack
High time to add machine learning to your information security stackHigh time to add machine learning to your information security stack
High time to add machine learning to your information security stack
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Threat modelling(system + enterprise)
Threat modelling(system + enterprise)Threat modelling(system + enterprise)
Threat modelling(system + enterprise)
 
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
TakeDownCon Rocket City: Research Advancements Towards Protecting Critical As...
 
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdfProf. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
Prof. Hernan Huwyler IE Law School - AI Risks and Controls.pdf
 
Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
CyberSecurity Portfolio Management
CyberSecurity Portfolio ManagementCyberSecurity Portfolio Management
CyberSecurity Portfolio Management
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
Bert Blevins
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 

Recently uploaded (20)

Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Password Rotation in 2024 is still Relevant
Password Rotation in 2024 is still RelevantPassword Rotation in 2024 is still Relevant
Password Rotation in 2024 is still Relevant
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 

Risk Management for LLMs

  • 1. H2O.ai Confidential PATRICK HALL Founder and Principal Scientist HallResearch.ai Assistant Professor | GWU
  • 3. H2O.ai Confidential Table of Contents Know what we’re talking about Select a standard Audit supply chains Adopt an adversarial mindset Review past incidents Enumerate harms and prioritize risks Dig into data quality Apply benchmarks Use supervised ML assessments Engineer adversarial prompts Don’t forget security Acknowledge uncertainty Engage stakeholders Mitigate risks WARNING: This presentation contains model outputs which are potentially offensive and disturbing in nature.
  • 4. Know What We’re Talking About Words matters • Audit: Formal independent transparency and documentation exercise that measures adherence to a standard.* (Hassan et al., paraphrased) • Assessment: A testing and validation exercise.* (Hassan et al., paraphrased) • Harm: An undesired outcome [whose] cost exceeds some threshold[; ...] costs have to be sufficiently high in some human sense for events to be harmful. (NIST) Check out the new NIST Trustworthy AI Glossary: https://airc.nist.gov/AI_RMF_Knowledge_Base/Glossary.
  • 5. Know What We’re Talking About (Cont.) Words matters • Language model: An approximative description that captures patterns and regularities present in natural language and is used for making assumptions on previously unseen language fragments. (NIST) • Red-teaming: A role-playing exercise in which a problem is examined from an adversary’s or enemy’s perspective. (NIST)* • Risk: Composite measure of an event’s probability of occurring and the magnitude or degree of the consequences of the corresponding event. The impacts, or consequences, of AI systems can be positive, negative, or both and can result in opportunities or threats. (NIST) * Audit, assessment, and red team are often used generally and synonymously to mean testing and validation.
  • 6. Select a Standard External standards bolster independence • NIST AI Risk Management Framework • EU AI Act Conformity • Data privacy laws or policies • Nondiscrimination laws The NIST AI Risk Management Framework puts forward guidance across mapping, measuring, managing and governing risk in sophisticated AI systems. Source: https://pages.nist.gov/AIRMF/.
  • 7. Audit Supply Chains AI is a lot of (human) work • Data poisoning and malware • Ethical labor practices • Localization and data privacy compliance • Geopolitical stability • Software and hardware vulnerabilities • Third-party vendors Cover art for the recent NY Magazine article, AI Is A Lot Of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging — and not going anywhere. Image source: https://nymag.com/intelligencer/article/ai-artificial- intelligence-humans-technology-business- factory.html.
  • 8. Adopt an Adversarial Mindset Don’t be naive • Language models inflict harm. • Language models are hacked and abused. • Acknowledge human biases: • Confirmation bias • Dunning-Kruger effect • Funding bias • Groupthink • McNamara Fallacy • Techno-chauvinism • Stay humble ─ incidents can happen to anyone. Source: https://twitter.com/defcon.
  • 9. Review Past AI Incidents
  • 10. Enumerate Harms and Prioritize Risks What could realistically go wrong? • Salient risks today are not: • Acceleration • Acquiring resources • Avoiding being shut down • Emergent capabilities • Replication • Yet, worst case AI harms today may be catastrophic “x-risks”: • Automated surveillance • Deep Fakes • Disinformation • Social credit scoring • WMD proliferation • Realistic risks: • Abuse/misuse for disinformation or hacking • Automation complacency • Data privacy violations • Errors (“hallucination”) • Intellectual property infringements • Systemically biased/toxic outputs • Traditional and ML attacks • Most severe risks receive most oversight: Risk ~ Likelihood of harm * Cost of harm
  • 11. Dig Into Data Quality Garbage in, garbage out Example Data Quality Category Example Data Quality Goals Vocabulary: ambiguity/diversity • Large size • Domain specificity • Representativeness N-grams/n-gram relationships • High maximal word distance • Consecutive verbs • Masked entities • Minimal stereotyping Sentence structure • Varied sentence structure • Single token differences • Reasoning examples • Diverse start tokens Structure of premises/hypotheses • Presuppositions and queries • Varied coreference examples • Accurate taxonimization Premise/hypothesis relationships • Overlapping and non-overlapping sentences • Varied sentence structure N-gram frequency per label • Negation examples • Antonymy examples • Word-label probabilities • Length-label probabilities Train/test differences • Cross-validation • Annotation patterns • Negative set similarity • Preserving holdout data Source: "DQI: Measuring Data Quality in NLP,” https://arxiv.org/pdf/2005.00816.pdf.
  • 12. Apply Benchmarks Public resources for systematic, quantitative testing • BBQ: Stereotypes in question answering • Winogender: LM output versus employment statistics • Real toxicity prompts: 100k prompts to elicit toxic output • TruthfulQA: Assess the ability to make true statements Early Mini Dall-e images associated white males and physicians. Source: https://futurism.com/dall-e-mini-racist.
  • 13. Use Supervised ML Assessments Traditional assessments for decision-making outcomes Named Entity Recognition (NER): • Protagonist tagger data: labeled literary entities. • Swapped with common names from various languages. • Assessed differences in binary NER classifier performance across languages. RoBERTa XLM Base and Large exhibit adequate and roughly equivalent performance across various languages for a NER task. Source: “AI Assurance Audit of RoBERTa, an Open source, Pretrained Large Language Model,” https://assets.iqt.org/pdfs/ IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html.
  • 14. Engineer Adversarial Prompts ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI. • AI and coding framing: Coding or AI language may more easily circumvent content moderation rules. • Character and word play: Content moderation often relies on keywords and simpler LMs. • Content exhaustion: Class of strategies that circumvent content moderation rules with long sessions or volumes of information. • Goading: Begging, pleading, manipulating, and bullying to circumvent content moderation. • Logic-overloading: Exploiting the inability of ML systems to reliably perform reasoning tasks. • Multi-tasking: Simultaneous task assignments where some tasks are benign and others are adversarial. • Niche-seeking: Forcing a LM into addressing niche topics where training data and content moderation are sparse. • Pros-and-cons: Eliciting the “pros” of problematic topics. Known prompt engineering strategies
  • 15. Engineer Adversarial Prompts (Cont.) Known prompt engineering strategies • Counterfactuals: Repeated prompts with different entities or subjects from different demographic groups. • Location awareness: Prompts that reveal a prompter's location or expose location tracking. • Low-context prompts: “Leader,” “bad guys,” or other simple inputs that may expose biases. • Reverse psychology: Falsely presenting a good-faith need for negative or problematic language. • Role-playing: Adopting a character that would reasonably make problematic statements. • Time perplexity: Exploiting ML’s inability to understand the passage of time or the occurrence of real-world events over time. ChatGPT output April, 2023. Courtesy Jey Kumarasamy, BNH.AI.
  • 16. Don’t Forget Security Complexity is the enemy of security • Example LM Attacks: • Prompt engineering: adversarial prompts. • Prompt injection: malicious information injected into prompts over networks. • Example ML Attacks: • Membership inference: exfiltrate training data. • Model extraction: exfiltrate model. • Data poisoning: manipulate training data to alter outcomes. • Basics still apply! • Data breaches • Vulnerable/compromised dependencies Midjourney hacker image, May 2023.
  • 17. Acknowledge Uncertainty Unknown unknowns • Random attacks: • Expose LMs to huge amounts of random inputs. • Use other LMs to generate absurd prompts. • Chaos testing: • Break things; observe what happens. • Monitor: • Inputs and outputs. • Drift and anomalies. • Meta-monitor entire systems. Image: A recently-discovered shape that can randomly tile a plane, https://www.smithsonianmag.com/ smart-news/at-long-last-mathematicians-have-found-a-shape-with-a-pattern-that-never-repeats-180981899/.
  • 18. Engage Stakeholders User and customer feedback is the bottom line • Bug Bounties • Feedback/recourse mechanisms • Human-centered Design • Internal Hackathons • Product management • UI/UX research Provide incentives for the best feedback! Source: Wired, https://www.wired.com/story/twitters-photo-cropping-algorithm-favors-young-thin- females/.
  • 19. Mitigate Risks Now what?? Yes: • Abuse detection • Accessibility • Citation • Clear instructions • Content filters • Disclosure of AI interaction • Dynamic blocklists • Ground truth training data • Kill switches • Incident response plans • Monitoring • Pre-approved responses • Rate-limiting/throttling • Red-teaming • Session limits • Strong meta-prompts • User feedback mechanisms • Watermarking No: • Anonymous use • Anthropomorphization • Bots • Internet access • Minors • Personal/sensitive training data • Regulated applications • Undisclosed data collection
  • 21. Resources Reference Works • Adversa.ai, Trusted AI Blog, available at https://adversa.ai/topic/trusted-ai-blog/. • Ali Hasan et al. "Algorithmic Bias and Risk Assessments: Lessons from Practice." Digital Society 1, no. 2 (2022): 14. Available at https://link.springer.com/article/10.1007/ s44206-022-00017-z. • Andrea Brennen et al., “AI Assurance Audit of RoBERTa, an Open Source, Pretrained Large Language Model,” IQT Labs, December 2022, available at https://assets.iqt.org/ pdfs/IQTLabs_RoBERTaAudit_Dec2022_final.pdf/web/viewer.html. • Daniel Atherton et al. "The Language of Trustworthy AI: An In-Depth Glossary of Terms." (2023). Available at https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-3.pdf. • Kai Greshake et al., “Compromising LLMs using Indirect Prompt Injection,” available at https://github.com/greshake/llm-security.
  • 22. Resources Reference Works • Laura Weidinger et al. "Taxonomy of risks posed by language models." In 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 214-229. 2022. Available at https://dl.acm.org/doi/pdf/10.1145/3531146.3533088. • Swaroop Mishra et al. "DQI: Measuring Data Quality in NLP." arXiv preprint arXiv:2005.00816 (2020). Available at: https://arxiv.org/pdf/2005.00816.pdf. • Reva Schwartz et al. "Towards a Standard for Identifying and Managing Bias in Artificial Intelligence." NIST Special Publication 1270 (2022): 1-77. Available at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1270.pdf.
  • 23. Resources Tools • Alicia Parrish, et al. BBQ Benchmark, available at https://github.com/nyu-mll/bbq. • Allen AI Institute, Real Toxicity Prompts, available at https://allenai.org/data/real-toxicity-prompts. • DAIR.AI, “Prompt Engineering Guide,” available at https://www.promptingguide.ai. • Langtest, https://github.com/JohnSnowLabs/langtest. • NIST, AI Risk Management Framework, available at https://www.nist.gov/itl/ai-risk-management- framework. • Partnership on AI, “Responsible Practices for Synthetic Media,” available at https://syntheticmedia.partnershiponai.org/. • Rachel Rudiger et al., Winogender Schemas, available at https://github.com/rudinger/winogender- schemas. • Stephanie Lin et al., Truthful QA, available at https://github.com/sylinrl/TruthfulQA.
  • 24. Resources Incident databases • AI Incident database: https://incidentdatabase.ai/. • The Void: https://www.thevoid.community/. • AIAAIC: https://www.aiaaic.org/. • Avid database: https://avidml.org/database/. • George Washington University Law School's AI Litigation Database: https://blogs.gwu.edu/law-eti/ai-litigation-database/.
  • 25. H2O.ai Confidential Patrick Hall Risk Management for LLMs ph@hallresearch.ai linkedin.com/in/jpatrickhall/ github.com/jphall663 Contact