Here are some key points about benchmarking and evaluating generative AI models like large language models:
- Foundation models require large, diverse datasets to be trained on in order to learn broad language skills and knowledge. Fine-tuning can then improve performance on specific tasks.
- Popular benchmarks evaluate models on tasks involving things like commonsense reasoning, mathematics, science questions, generating truthful vs false responses, and more. This helps identify model capabilities and limitations.
- Custom benchmarks can also be designed using tools like Eval Studio to systematically test models on specific applications or scenarios. Both automated and human evaluations are important.
- Leaderboards like HELM aggregate benchmark results to compare how different models perform across a wide range of tests and metrics.
This document discusses AI and ChatGPT. It begins with an introduction to David Cieslak and his company RKL eSolutions, which provides ERP sales and consulting. It then provides definitions for key AI concepts like artificial intelligence, generative AI, large language models, and ChatGPT. The document discusses OpenAI's ChatGPT tool and how it works. It covers prompts, commands, and potential uses and impacts of generative AI technologies. Finally, it discusses concerns regarding generative AI and the future of life institute's call for more oversight of advanced AI.
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) combines the concepts of semantic search and LLM-based text generation. When a person makes a query in natural language, the query is compared to the entries in the knowledge base and most relevant results are returned to the LLM, which uses this extra information to generate more accurate and reliable response. RAG can therefore limit hallucination and provide accurate responses from reliable source. In this talk, we will present the concept of RAG and underlying concept of semantic search, and present available libraries and vector databases.
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
Unlocking the Power of Generative AI An Executive's Guide.pdf
Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities?
This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.
Exploring Opportunities in the Generative AI Value Chain.pdf
The article "Exploring Opportunities in the Generative AI Value Chain" by McKinsey & Company's QuantumBlack provides insights into the value created by generative artificial intelligence (AI) and its potential applications.
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
Build an LLM-powered application using LangChain.pdf
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
The Future of AI is Generative not Discriminative 5/26/2021
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models.
Mihai will share lessons learned building Retrieval Augmented Generation, or “Chat with Documents” platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. He’ll also share tips on writing code using LLM, including building an agent for Ansible and containers.
Scaling factors for Large Language Model Architectures:
• Vector Database: consider sharding and High Availability
• Fine Tuning: collecting data to be used for fine tuning
• Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
• Chain of Reasoning and Agents
• Caching embeddings and responses
• Personalization and Conversational Memory Database
• Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
• Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
• Fallback techniques: fallback to a different model, or default answers
• API scaling techniques, rate limiting, etc.
• Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
In this event we will cover:
- What is Generative AI and how it is being for future of work.
- Best practices for developing and deploying generative AI based models in productions.
- Future of Generative AI, how generative AI is expected to evolve in the coming years.
The document discusses advances in large language models from GPT-1 to the potential capabilities of GPT-4, including its ability to simulate human behavior, demonstrate sparks of artificial general intelligence, and generate virtual identities. It also provides tips on how to effectively prompt ChatGPT through techniques like prompt engineering, giving context and examples, and different response formats.
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
Session 1
👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed:
Introduction to Generative AI & harnessing the power of large language models.
What’s generative AI & what’s LLM.
How are we using it in our document understanding & communication mining models?
How to develop a trustworthy and unbiased AI model using LLM & GenAI.
Personal Intelligent Assistant
Speakers:
📌George Roth - AI Evangelist at UiPath
📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP
📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP
Generative AI - The New Reality: How Key Players Are Progressing
The document discusses key players in generative AI and their progress. It provides an overview of generative AI including its evolution since 1950, where the spending is focused, how the technology works, and deployment models. It then profiles several major companies leading advancements in generative AI, including their strategies, growth areas, and risks. These companies are TSMC, Nvidia, Microsoft, Google, Amazon, Tesla, Oracle, Salesforce, SAP, and Palo Alto Networks.
The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.
In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.
Using the power of OpenAI with your own data: what's possible and how to start?
This document provides an overview of a talk by Maxim Salnikov and Jon Jahren at Oslo Spektrum from November 7-9. It discusses using OpenAI with your own data and how to get started. Examples of enterprise use cases for generative AI are presented, such as chatbots, document indexing, and financial analysis. Tools for prompt engineering like LangChain and Semantic Kernel are introduced. Best practices for fine-tuning models on proprietary data are covered, including data formatting, training data size, and an iterative tuning process. Responsible AI techniques like grounding responses and maintaining a positive tone are also discussed.
Reviewing progress in the machine learning certification journey
𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang
C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid
𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran.
A discussion on sample questions to aid certification exam preparation.
An interactive Q&A session to clarify doubts and questions.
Previewing next steps and topics, including course completions and material reviews.
Building Generative AI-infused apps: what's possible and how to start
In this session, we'll explore different scenarios where the features of Generative AI can provide added value to an IT solution. We'll also learn how to begin developing your own application powered by AI. Using Azure OpenAI service as an illustration, we'll examine the various APIs it offers, review the best practices of Prompt Engineering, explore different ways to incorporate your own data into the process, and take a glance at several tools and resources that make the developer experience more seamless.
Formal Versus Agile: Survival of the Fittest? (Paul Boca)
The potential for combining agile and formal methods holds promise. Although it might not always be an easy partnership, it will succeed if it can foster a fruitful interchange of expertise between the two communities. In this talk I explain how formal methods can complement agile practices and vice versa. There are no pre-requisites for this talk, except an open mind and a desire to make software development more reliable. Leave any pre-conceptions at home, and be prepared for myths to be dispelled.
2017 10-10 (netflix ml platform meetup) learning item and user representation...
1) Learning user and item representations is challenging due to sparse data and shifting preferences in recommender systems.
2) The presentation outlines research at Google to address sparsity through two approaches: focused learning, which develops specialized models for subsets of data like genres or cold-start items, and factorized deep retrieval, which jointly embeds items and their features to predict preferences for fresh items.
3) The techniques have improved overall viewership and nomination of candidates, demonstrating their effectiveness in production recommender systems.
The document describes a problem prediction model that uses artificial intelligence algorithms to evaluate changes made by an IT company and anticipate potential problems. The model analyzed 194 known problems, 2,400 past changes, and 201 predicted future changes. As a result, the model identified one change from October 29, 2019 that was likely to cause a problem. A team is investigating this potential issue. The document concludes that the naive Bayes classifier model is an important tool for change analysis and problem prediction.
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...
With organizations under intense pressure to get products out to market quickly, they can’t afford to operate within operational silos. Yet communicating and collaborating across the organizational boundaries of QA and development can be difficult. Development is typically a black box to QA teams. QA has no visibility into the quality and security of the code until late in the lifecycle.
Watch this recorded webcast to learn how to break down the barriers and improve visibility and transparency by integrating development testing results into the IBM Rational Team Concert and providing QA and development with a unified workflow for ensuring code quality. Explore different development testing techniques and the types of defects and security vulnerabilities they can find.
About the Presenter:
James Croall, Director of Product Management, Coverity
Over the last 8 years, James Croall has helped a wide range of customers incorporate static analysis into their software development lifecycle. Prior to Coverity, Mr. Croall spent 10 years in the computer and network security industry as a C/C++ and Java software engineer.
1) Generative AI (GenAI) enables the creation of novel content by learning patterns in unstructured data rather than labeling outputs like traditional AI.
2) Both traditional and generative AI models lack transparency and may contain biases, but generative models can additionally hallucinate or leak private information.
3) To interpret generative models, researchers evaluate accuracy globally by checking for hallucinations or undesirable content, and locally by confirming the quality of individual responses.
The document discusses Agile software development methods and provides evidence that Agile approaches are effective. It defines Agile development as iterative and incremental with close collaboration. Case studies show organizations achieving better results with Agile, including increased productivity, quality, and customer satisfaction. Adopting Agile practices like Scrum and test-driven development enables organizations to adapt to changing priorities and deliver working software more frequently.
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW
Reliable, Safe & Trustworthy are some key factors to be considered for Human-Centered AI. There are certain Guidelines for Human-AI Interaction to be taken into evaluation to ensure RST systems overcome autonomy problems.
Scaling & Managing Production Deployments with H2O ModelOps
This presentation was made on June 30th, 2020.
Recording of the presentation is available here: https://youtu.be/9LajqAL_CU8
As enterprises “make their own AI”, a new set of challenges emerge. Maintaining reproducibility, traceability, and verifiability of machine learning models, as well as recording experiments, tracking insights, and reproducing results, are key. Collaboration between teams is also necessary as “model factories” are created for enterprise-wide model data science efforts. Additionally, monitoring of models ensures that drift or performance degradation is addressed with either retraining or model updates. Finally, data and model lineage in case of rollbacks or addressing regulatory compliance is necessary.
H2O ModelOps delivers centralized catalog and management, deployment, monitoring, collaboration, and administration of machine learning models. In this webinar, we learn how H2O can assist with operationalizing, scaling and managing production deployments.
Speaker's Bio:
Felix is a part of the Customer Success team in Asia Pacific at H2O.ai. An engineer and an IIM alumni, Felix has held prominent positions in the data science industry.
A whirlwind tour of Glasswall Solution’s use of Wardley Maps and experiments with a Service-based operating model. Delivered at Open Security Summit Dec 7th, 2020 as context for a panel discussion, which you can watch here:
https://www.youtube.com/watch?v=GS8Vndr-B4A
The original 100-slide deck is available here:
https://open-security-summit.org/tracks/2020/mini-summits/dec/wardley-maps/wardley-maps-and-services-model-at-glasswall/
This presentation dives into the practical applications of machine learning within Google's operations, providing a comprehensive overview of how to leverage AI technologies to solve real-world business challenges.
Key Points Covered:
- Introduction to Machine Learning at Google: Discussion on the role of ML and its evolution in enhancing Google's operational efficiency.
- Experience Sharing: Insights into the team's long-term engagement with machine learning projects and the impacts on Google’s operational strategies.
- Practical Applications: Real-world examples of ML applications within Google’s daily operations, providing a blueprint to adapt similar strategies.
- Challenges and Solutions: Discussion on the challenges faced during the implementation of ML projects and the strategic solutions employed to overcome them.
- Future of ML at Google: Insights into future trends in machine learning at Google and how they plan to continue integrating AI into their ecosystem.
The document discusses various methods for automated testing of DITA content and output, including using Schematron for validating content structure, the QA plugin for identifying tagging errors, XMLUnit for comparing XML, and the DITA OT regression test for validating the output of the open-source DITA Open Toolkit. It also covers automating browser tests using Selenium and comparing HTML output using Needle and Nose. Demo examples are provided for several of these automated testing tools and techniques.
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
This document provides an overview of H2O.ai, an AI company that offers products and services to democratize AI. It mentions that H2O products are backed by 10% of the world's top data scientists from Kaggle and that H2O has customers in 7 of the top 10 banks, 4 of the top 10 insurance companies, and top manufacturing companies. It also provides details on H2O's founders, funding, customers, products, and vision to make AI accessible to more organizations.
The document discusses LLMOps (Large Language Model Operations) compared to traditional MLOps. Some key points:
- LLMOps and MLOps face similar challenges across the development lifecycle, but LLMOps requires more GPU resources and integration is faster due to more models in each application. Evaluation is also less clear.
- The LLMOps field is around the 5th generation of models, with debates around proprietary vs open source models, and balancing privacy, cost and control.
- LLMOps platforms are emerging to provide solutions for tasks like prompting, embedding databases, evaluation, and governance, similar to how MLOps platforms have evolved.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
The document announces the launch of the H2O GenAI App Store, which provides a collection of applications that make it easier for average users to leverage large language models through custom interfaces for specific tasks like getting gardening advice or feedback on code. The app store is designed to accelerate the development of these GenAI apps using the H2O Wave platform and provides access to H2OGPTE for retrieval augmented generation and language model calls. Developers can also contribute their own apps through the GitHub repository listed.
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
This document discusses techniques for improving language models (LLMs) discussed in recent papers. It describes building blocks of LLMs like fine-tuning, foundation training, memory, and databases. Specific techniques covered include LIMA which uses 1,000 carefully curated examples, instruction backtranslation to generate question-answer pairs, fine-tuning models on API examples like Gorilla, and reducing false answers through techniques like not agreeing with incorrect user opinions. The goal is to discuss cutting edge tricks to build better LLMs.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
This document discusses using large language models (LLMs) for text classification tasks. It begins by describing how LLMs are commonly used for text generation and question answering. For classification, models are usually trained supervised on labeled data. The document then explores using LLMs for zero-shot classification without training, and techniques like fine-tuning LLMs on tasks to improve performance. It provides an example of fine-tuning an LLM on a financial sentiment dataset. The document concludes by describing H2O.ai's LLM Studio tool for fine-tuning and a few Kaggle competitions where LLMs achieved success in text classification.
Introducción al Aprendizaje Automatico con H2O-3 (1)
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
AI Foundations Course Module 1 - An AI Transformation Journey
The chances of successfully implementing AI strategies within an organization significantly improve when you can recognize where your organization is on the maturity scale. Over this course, you will learn the keys to unlocking value with AI which include asking the right questions about the problems you are solving and ensuring you have the right cross-section of talent, tools, and resources. By the end of this module, you should be able to recognize where your organization is on the AI transformation spectrum and identify some strategies that can get you to the next stage in your journey.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/PJgr2epM6qs
Speakers:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
Ingrid Burton (H2O.ai - CMO)
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow
YouTube Video URL: https://youtu.be/gB0bTH-L6DE
Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge.
This presentation was made on June 18, 2020.
Video recording of the session can be viewed here: https://youtu.be/YEtDwYSXXJo
For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance.
Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity, and improves model governance.
Speaker's Bio:
Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.
This presentation was made on June 16, 2020.
A recording of the presentation can be viewed here: https://youtu.be/khjW1t0gtSA
AI is unlocking new potential for every enterprise. Organizations are using AI and machine learning technology to inform business decisions, predict potential issues, and provide more efficient, customized customer experiences. The results can enable a competitive edge for the business.
H2O.ai is a visionary leader in AI and machine learning and is on a mission to democratize AI for everyone. We believe that every company can become an AI company, not just the AI Superpowers. We are empowering companies with our leading AI and Machine Learning platforms, our expertise, experience and training to embark on their own AI journey to become AI companies themselves. All companies in all industries can participate in this AI Transformation.
Tune into this virtual meetup to learn how companies are transforming their business with the power of AI and where to start.
About Parul Pandey:
Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science , evangelism and community in her work. Her emphasis is to spread the information about H2O and Driverless AI to as many people as possible, She is also an active writer and has contributed towards various national and international publications.
H2O.ai provides open source machine learning platforms and enterprise AI solutions that help companies implement artificial intelligence. It offers tools for data scientists to build models using Python and R and also provides support services to help customers successfully deploy models in production. H2O.ai aims to democratize AI and help companies become AI-driven by leveraging its experts, community knowledge, and world-class technology.
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Slide of the tutorial entitled "Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Emerging Trends" held at UMAP'24: 32nd ACM Conference on User Modeling, Adaptation and Personalization (July 1, 2024 | Cagliari, Italy)
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
Best Practices for Effectively Running dbt in Airflow.pdf
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
This document discusses generative AI and its potential transformations and use cases. It outlines how generative AI could enable more low-cost experimentation, blur division boundaries, and allow "talking to data" for innovation and operational excellence. The document also references responsible AI frameworks and a pattern catalogue for developing foundation model-based systems. Potential use cases discussed include automated reporting, digital twins, data integration, operation planning, communication, and innovation applications like surrogate models and cross-discipline synthesis.
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
In this episode we'll discuss the different flavors of prompt engineering in the LLM/GPT space. According to your skill level you should be able to pick up at any of the following:
Leveling up with GPT
1: Use ChatGPT / GPT Powered Apps
2: Become a Prompt Engineer on ChatGPT/GPT
3: Use GPT API with NoCode Automation, App Builders
4: Create Workflows to Automate Tasks with NoCode
5: Use GPT API with Code, make your own APIs
6: Create Workflows to Automate Tasks with Code
7: Use GPT API with your Data / a Framework
8: Use GPT API with your Data / a Framework to Make your own APIs
9: Create Workflows to Automate Tasks with your Data /a Framework
10: Use Another LLM API other than GPT (Cohere, HuggingFace)
11: Use open source LLM models on your computer
12: Finetune / Build your own models
Series: Using AI / ChatGPT at Work - GPT Automation
Are you a small business owner or web developer interested in leveraging the power of GPT (Generative Pretrained Transformer) technology to enhance your business processes?
If so, Join us for a series of events focused on using GPT in business. Whether you're a small business owner or a web developer, you'll learn how to leverage GPT to improve your workflow and provide better services to your customers.
This document discusses AI and ChatGPT. It begins with an introduction to David Cieslak and his company RKL eSolutions, which provides ERP sales and consulting. It then provides definitions for key AI concepts like artificial intelligence, generative AI, large language models, and ChatGPT. The document discusses OpenAI's ChatGPT tool and how it works. It covers prompts, commands, and potential uses and impacts of generative AI technologies. Finally, it discusses concerns regarding generative AI and the future of life institute's call for more oversight of advanced AI.
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented GenerationDataScienceConferenc1
Retrieval Augmented Generation (RAG) combines the concepts of semantic search and LLM-based text generation. When a person makes a query in natural language, the query is compared to the entries in the knowledge base and most relevant results are returned to the LLM, which uses this extra information to generate more accurate and reliable response. RAG can therefore limit hallucination and provide accurate responses from reliable source. In this talk, we will present the concept of RAG and underlying concept of semantic search, and present available libraries and vector databases.
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities?
This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.
Exploring Opportunities in the Generative AI Value Chain.pdfDung Hoang
The article "Exploring Opportunities in the Generative AI Value Chain" by McKinsey & Company's QuantumBlack provides insights into the value created by generative artificial intelligence (AI) and its potential applications.
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models.
Mihai will share lessons learned building Retrieval Augmented Generation, or “Chat with Documents” platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. He’ll also share tips on writing code using LLM, including building an agent for Ansible and containers.
Scaling factors for Large Language Model Architectures:
• Vector Database: consider sharding and High Availability
• Fine Tuning: collecting data to be used for fine tuning
• Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
• Chain of Reasoning and Agents
• Caching embeddings and responses
• Personalization and Conversational Memory Database
• Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
• Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
• Fallback techniques: fallback to a different model, or default answers
• API scaling techniques, rate limiting, etc.
• Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
Leveraging Generative AI & Best practicesDianaGray10
In this event we will cover:
- What is Generative AI and how it is being for future of work.
- Best practices for developing and deploying generative AI based models in productions.
- Future of Generative AI, how generative AI is expected to evolve in the coming years.
The document discusses advances in large language models from GPT-1 to the potential capabilities of GPT-4, including its ability to simulate human behavior, demonstrate sparks of artificial general intelligence, and generate virtual identities. It also provides tips on how to effectively prompt ChatGPT through techniques like prompt engineering, giving context and examples, and different response formats.
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
Session 1
👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed:
Introduction to Generative AI & harnessing the power of large language models.
What’s generative AI & what’s LLM.
How are we using it in our document understanding & communication mining models?
How to develop a trustworthy and unbiased AI model using LLM & GenAI.
Personal Intelligent Assistant
Speakers:
📌George Roth - AI Evangelist at UiPath
📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP
📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP
Generative AI - The New Reality: How Key Players Are Progressing Vishal Sharma
The document discusses key players in generative AI and their progress. It provides an overview of generative AI including its evolution since 1950, where the spending is focused, how the technology works, and deployment models. It then profiles several major companies leading advancements in generative AI, including their strategies, growth areas, and risks. These companies are TSMC, Nvidia, Microsoft, Google, Amazon, Tesla, Oracle, Salesforce, SAP, and Palo Alto Networks.
The document discusses different methods for customizing large language models (LLMs) with proprietary or private data, including training a custom model, fine-tuning a general model, and prompting with expanded inputs. Fine-tuning techniques like low-rank adaptation and supervised fine-tuning allow emphasizing custom knowledge without full retraining. Prompt expansion using techniques like retrieval augmented generation can provide additional context beyond the character limit.
In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.
Using the power of OpenAI with your own data: what's possible and how to start?Maxim Salnikov
This document provides an overview of a talk by Maxim Salnikov and Jon Jahren at Oslo Spektrum from November 7-9. It discusses using OpenAI with your own data and how to get started. Examples of enterprise use cases for generative AI are presented, such as chatbots, document indexing, and financial analysis. Tools for prompt engineering like LangChain and Semantic Kernel are introduced. Best practices for fine-tuning models on proprietary data are covered, including data formatting, training data size, and an iterative tuning process. Responsible AI techniques like grounding responses and maintaining a positive tone are also discussed.
Reviewing progress in the machine learning certification journey
𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang
C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid
𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran.
A discussion on sample questions to aid certification exam preparation.
An interactive Q&A session to clarify doubts and questions.
Previewing next steps and topics, including course completions and material reviews.
Building Generative AI-infused apps: what's possible and how to startMaxim Salnikov
In this session, we'll explore different scenarios where the features of Generative AI can provide added value to an IT solution. We'll also learn how to begin developing your own application powered by AI. Using Azure OpenAI service as an illustration, we'll examine the various APIs it offers, review the best practices of Prompt Engineering, explore different ways to incorporate your own data into the process, and take a glance at several tools and resources that make the developer experience more seamless.
Formal Versus Agile: Survival of the Fittest? (Paul Boca)AdaCore
The potential for combining agile and formal methods holds promise. Although it might not always be an easy partnership, it will succeed if it can foster a fruitful interchange of expertise between the two communities. In this talk I explain how formal methods can complement agile practices and vice versa. There are no pre-requisites for this talk, except an open mind and a desire to make software development more reliable. Leave any pre-conceptions at home, and be prepared for myths to be dispelled.
2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi
1) Learning user and item representations is challenging due to sparse data and shifting preferences in recommender systems.
2) The presentation outlines research at Google to address sparsity through two approaches: focused learning, which develops specialized models for subsets of data like genres or cold-start items, and factorized deep retrieval, which jointly embeds items and their features to predict preferences for fresh items.
3) The techniques have improved overall viewership and nomination of candidates, demonstrating their effectiveness in production recommender systems.
The document describes a problem prediction model that uses artificial intelligence algorithms to evaluate changes made by an IT company and anticipate potential problems. The model analyzed 194 known problems, 2,400 past changes, and 201 predicted future changes. As a result, the model identified one change from October 29, 2019 that was likely to cause a problem. A team is investigating this potential issue. The document concludes that the naive Bayes classifier model is an important tool for change analysis and problem prediction.
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
Sandeep Singh, Head of Applied AI Computer Vision, Beans.ai
H2O Open Source GenAI World SF 2023
In the modern era of machine learning, leveraging both open-source and closed-source solutions has become paramount for achieving cutting-edge results. This talk delves into the intricacies of seamlessly integrating open-source Large Language Model (LLM) solutions like Vicuna, Falcon, and Llama with industry giants such as ChatGPT and Google's Palm. As the demand for fine-tuned and specialized datasets grows, it is imperative to understand the synergy between these tools. Attendees will gain insights into best practices for building and enriching datasets tailored for fine-tuning tasks, ensuring that their LLM projects are both robust and efficient. Through real-world examples and hands-on demonstrations, this talk will equip attendees with the knowledge to harness the power of both open and closed-source tools in a coherent and effective manner.
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
Webcast Presentation: Accelerate Continuous Delivery with Development Testing...GRUC
With organizations under intense pressure to get products out to market quickly, they can’t afford to operate within operational silos. Yet communicating and collaborating across the organizational boundaries of QA and development can be difficult. Development is typically a black box to QA teams. QA has no visibility into the quality and security of the code until late in the lifecycle.
Watch this recorded webcast to learn how to break down the barriers and improve visibility and transparency by integrating development testing results into the IBM Rational Team Concert and providing QA and development with a unified workflow for ensuring code quality. Explore different development testing techniques and the types of defects and security vulnerabilities they can find.
About the Presenter:
James Croall, Director of Product Management, Coverity
Over the last 8 years, James Croall has helped a wide range of customers incorporate static analysis into their software development lifecycle. Prior to Coverity, Mr. Croall spent 10 years in the computer and network security industry as a C/C++ and Java software engineer.
1) Generative AI (GenAI) enables the creation of novel content by learning patterns in unstructured data rather than labeling outputs like traditional AI.
2) Both traditional and generative AI models lack transparency and may contain biases, but generative models can additionally hallucinate or leak private information.
3) To interpret generative models, researchers evaluate accuracy globally by checking for hallucinations or undesirable content, and locally by confirming the quality of individual responses.
The document discusses Agile software development methods and provides evidence that Agile approaches are effective. It defines Agile development as iterative and incremental with close collaboration. Case studies show organizations achieving better results with Agile, including increased productivity, quality, and customer satisfaction. Adopting Agile practices like Scrum and test-driven development enables organizations to adapt to changing priorities and deliver working software more frequently.
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW
Reliable, Safe & Trustworthy are some key factors to be considered for Human-Centered AI. There are certain Guidelines for Human-AI Interaction to be taken into evaluation to ensure RST systems overcome autonomy problems.
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
This presentation was made on June 30th, 2020.
Recording of the presentation is available here: https://youtu.be/9LajqAL_CU8
As enterprises “make their own AI”, a new set of challenges emerge. Maintaining reproducibility, traceability, and verifiability of machine learning models, as well as recording experiments, tracking insights, and reproducing results, are key. Collaboration between teams is also necessary as “model factories” are created for enterprise-wide model data science efforts. Additionally, monitoring of models ensures that drift or performance degradation is addressed with either retraining or model updates. Finally, data and model lineage in case of rollbacks or addressing regulatory compliance is necessary.
H2O ModelOps delivers centralized catalog and management, deployment, monitoring, collaboration, and administration of machine learning models. In this webinar, we learn how H2O can assist with operationalizing, scaling and managing production deployments.
Speaker's Bio:
Felix is a part of the Customer Success team in Asia Pacific at H2O.ai. An engineer and an IIM alumni, Felix has held prominent positions in the data science industry.
A whirlwind tour of Glasswall Solution’s use of Wardley Maps and experiments with a Service-based operating model. Delivered at Open Security Summit Dec 7th, 2020 as context for a panel discussion, which you can watch here:
https://www.youtube.com/watch?v=GS8Vndr-B4A
The original 100-slide deck is available here:
https://open-security-summit.org/tracks/2020/mini-summits/dec/wardley-maps/wardley-maps-and-services-model-at-glasswall/
Strategic AI Integration in Engineering TeamsUXDXConf
This presentation dives into the practical applications of machine learning within Google's operations, providing a comprehensive overview of how to leverage AI technologies to solve real-world business challenges.
Key Points Covered:
- Introduction to Machine Learning at Google: Discussion on the role of ML and its evolution in enhancing Google's operational efficiency.
- Experience Sharing: Insights into the team's long-term engagement with machine learning projects and the impacts on Google’s operational strategies.
- Practical Applications: Real-world examples of ML applications within Google’s daily operations, providing a blueprint to adapt similar strategies.
- Challenges and Solutions: Discussion on the challenges faced during the implementation of ML projects and the strategic solutions employed to overcome them.
- Future of ML at Google: Insights into future trends in machine learning at Google and how they plan to continue integrating AI into their ecosystem.
Automated Testing DITA Content and CustomizationsSteve Anderson
The document discusses various methods for automated testing of DITA content and output, including using Schematron for validating content structure, the QA plugin for identifying tagging errors, XMLUnit for comparing XML, and the DITA OT regression test for validating the output of the open-source DITA Open Toolkit. It also covers automating browser tests using Selenium and comparing HTML output using Needle and Nose. Demo examples are provided for several of these automated testing tools and techniques.
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
This document provides an overview of H2O.ai, an AI company that offers products and services to democratize AI. It mentions that H2O products are backed by 10% of the world's top data scientists from Kaggle and that H2O has customers in 7 of the top 10 banks, 4 of the top 10 insurance companies, and top manufacturing companies. It also provides details on H2O's founders, funding, customers, products, and vision to make AI accessible to more organizations.
LLMOps: Match report from the top of the 5thSri Ambati
The document discusses LLMOps (Large Language Model Operations) compared to traditional MLOps. Some key points:
- LLMOps and MLOps face similar challenges across the development lifecycle, but LLMOps requires more GPU resources and integration is faster due to more models in each application. Evaluation is also less clear.
- The LLMOps field is around the 5th generation of models, with debates around proprietary vs open source models, and balancing privacy, cost and control.
- LLMOps platforms are emerging to provide solutions for tasks like prompting, embedding databases, evaluation, and governance, similar to how MLOps platforms have evolved.
Patrick Hall, Professor, AI Risk Management, The George Washington University
H2O Open Source GenAI World SF 2023
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you!
Dr. Alexy Khrabrov, Open Source Science Community Director, IBM
H2O Open Source GenAI World SF 2023
In this talk, Dr. Alexy Khrabrov, recently elected Chair of the new Generative AI Commons at Linux Foundation for AI & Data, outlines the OSS AI landscape, challenges, and opportunities. With new models and frameworks being unveiled weekly, one thing remains constant: community building and validation of all aspects of AI is key to reliable and responsible AI we can use for business and society needs. Industrial AI is one key area where such community validation can prove invaluable.
The document announces the launch of the H2O GenAI App Store, which provides a collection of applications that make it easier for average users to leverage large language models through custom interfaces for specific tasks like getting gardening advice or feedback on code. The app store is designed to accelerate the development of these GenAI apps using the H2O Wave platform and provides access to H2OGPTE for retrieval augmented generation and language model calls. Developers can also contribute their own apps through the GitHub repository listed.
Applied Gen AI for the Finance Vertical Sri Ambati
Megan Kurka, Vice President, Customer Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
Discover the transformative power of Applied Gen AI. Learn how the H2O team builds customized applications and workflows that integrate capabilities of Gen AI and AutoML specifically designed to address and enhance financial use cases. Explore real world examples, learn best practices, and witness firsthand how our innovative solutions are reshaping the landscape of finance technology.
This document discusses techniques for improving language models (LLMs) discussed in recent papers. It describes building blocks of LLMs like fine-tuning, foundation training, memory, and databases. Specific techniques covered include LIMA which uses 1,000 carefully curated examples, instruction backtranslation to generate question-answer pairs, fine-tuning models on API examples like Gorilla, and reducing false answers through techniques like not agreeing with incorrect user opinions. The goal is to discuss cutting edge tricks to build better LLMs.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
This document discusses using large language models (LLMs) for text classification tasks. It begins by describing how LLMs are commonly used for text generation and question answering. For classification, models are usually trained supervised on labeled data. The document then explores using LLMs for zero-shot classification without training, and techniques like fine-tuning LLMs on tasks to improve performance. It provides an example of fine-tuning an LLM on a financial sentiment dataset. The document concludes by describing H2O.ai's LLM Studio tool for fine-tuning and a few Kaggle competitions where LLMs achieved success in text classification.
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
Numerai is an open, crowd-sourced hedge fund powered by predictions from data scientists around the world. In return, participants are rewarded with weekly payouts in crypto.
In this talk, Joe will give an overview of the Numerai tournament based on his own experience. He will then explain how he automates the time-consuming tasks such as testing different modelling strategies, scoring new datasets, submitting predictions to Numerai as well as monitoring model performance with H2O Driverless AI and R.
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
In this session, you will learn about what you should do after you’ve taken an AI transformation baseline. Over the span of this session, we will discuss the next steps in moving toward AI readiness through alignment of talent and tools to drive successful adoption and continuous use within an organization.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/K1Cl3x3rd8g
Speaker:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
The chances of successfully implementing AI strategies within an organization significantly improve when you can recognize where your organization is on the maturity scale. Over this course, you will learn the keys to unlocking value with AI which include asking the right questions about the problems you are solving and ensuring you have the right cross-section of talent, tools, and resources. By the end of this module, you should be able to recognize where your organization is on the AI transformation spectrum and identify some strategies that can get you to the next stage in your journey.
To find additional videos on AI courses, earn badges, join the courses at H2O.ai Learning Center: https://training.h2o.ai/products/ai-foundations-course
To find the Youtube video about this presentation: https://youtu.be/PJgr2epM6qs
Speakers:
Chemere Davis (H2O.ai - Senior Data Scientist Training Specialist)
Ingrid Burton (H2O.ai - CMO)
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow
YouTube Video URL: https://youtu.be/gB0bTH-L6DE
Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge.
This presentation was made on June 18, 2020.
Video recording of the session can be viewed here: https://youtu.be/YEtDwYSXXJo
For many companies, model documentation is a requirement for any model to be used in the business. For other companies, model documentation is part of a data science team’s best practices. Model documentation includes how a model was created, training and test data characteristics, what alternatives were considered, how the model was evaluated, and information on model performance.
Collecting and documenting this information can take a data scientist days to complete for each model. The model document needs to be comprehensive and consistent across various projects. The process of creating this documentation is tedious for the data scientist and wasteful for the business because the data scientist could be using that time to build additional models and create more value. Inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.
In this virtual meetup, we will learn how to create comprehensive, high-quality model documentation in minutes that saves time, increases productivity, and improves model governance.
Speaker's Bio:
Nikhil Shekhar: Nikhil is a Machine Learning Engineer at H2O.ai. He is currently working on our automatic machine learning platform, Driverless AI. He graduated from the University of Buffalo majoring in Artificial Intelligence and is interested in developing scalable machine learning algorithms.
This presentation was made on June 16, 2020.
A recording of the presentation can be viewed here: https://youtu.be/khjW1t0gtSA
AI is unlocking new potential for every enterprise. Organizations are using AI and machine learning technology to inform business decisions, predict potential issues, and provide more efficient, customized customer experiences. The results can enable a competitive edge for the business.
H2O.ai is a visionary leader in AI and machine learning and is on a mission to democratize AI for everyone. We believe that every company can become an AI company, not just the AI Superpowers. We are empowering companies with our leading AI and Machine Learning platforms, our expertise, experience and training to embark on their own AI journey to become AI companies themselves. All companies in all industries can participate in this AI Transformation.
Tune into this virtual meetup to learn how companies are transforming their business with the power of AI and where to start.
About Parul Pandey:
Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science , evangelism and community in her work. Her emphasis is to spread the information about H2O and Driverless AI to as many people as possible, She is also an active writer and has contributed towards various national and international publications.
H2O.ai provides open source machine learning platforms and enterprise AI solutions that help companies implement artificial intelligence. It offers tools for data scientists to build models using Python and R and also provides support services to help customers successfully deploy models in production. H2O.ai aims to democratize AI and help companies become AI-driven by leveraging its experts, community knowledge, and world-class technology.
Best Programming Language for Civil EngineersAwais Yaseen
The integration of programming into civil engineering is transforming the industry. We can design complex infrastructure projects and analyse large datasets. Imagine revolutionizing the way we build our cities and infrastructure, all by the power of coding. Programming skills are no longer just a bonus—they’re a game changer in this era.
Technology is revolutionizing civil engineering by integrating advanced tools and techniques. Programming allows for the automation of repetitive tasks, enhancing the accuracy of designs, simulations, and analyses. With the advent of artificial intelligence and machine learning, engineers can now predict structural behaviors under various conditions, optimize material usage, and improve project planning.
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Erasmo Purificato
Slide of the tutorial entitled "Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Emerging Trends" held at UMAP'24: 32nd ACM Conference on User Modeling, Adaptation and Personalization (July 1, 2024 | Cagliari, Italy)
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2023 and the first deals of 2024.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxSynapseIndia
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionBert Blevins
Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
Best Practices for Effectively Running dbt in Airflow.pdfTatiana Al-Chueyr
As a popular open-source library for analytics engineering, dbt is often used in combination with Airflow. Orchestrating and executing dbt models as DAGs ensures an additional layer of control over tasks, observability, and provides a reliable, scalable environment to run dbt models.
This webinar will cover a step-by-step guide to Cosmos, an open source package from Astronomer that helps you easily run your dbt Core projects as Airflow DAGs and Task Groups, all with just a few lines of code. We’ll walk through:
- Standard ways of running dbt (and when to utilize other methods)
- How Cosmos can be used to run and visualize your dbt projects in Airflow
- Common challenges and how to address them, including performance, dependency conflicts, and more
- How running dbt projects in Airflow helps with cost optimization
Webinar given on 9 July 2024
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
Are you interested in dipping your toes in the cloud native observability waters, but as an engineer you are not sure where to get started with tracing problems through your microservices and application landscapes on Kubernetes? Then this is the session for you, where we take you on your first steps in an active open-source project that offers a buffet of languages, challenges, and opportunities for getting started with telemetry data.
The project is called openTelemetry, but before diving into the specifics, we’ll start with de-mystifying key concepts and terms such as observability, telemetry, instrumentation, cardinality, percentile to lay a foundation. After understanding the nuts and bolts of observability and distributed traces, we’ll explore the openTelemetry community; its Special Interest Groups (SIGs), repositories, and how to become not only an end-user, but possibly a contributor.We will wrap up with an overview of the components in this project, such as the Collector, the OpenTelemetry protocol (OTLP), its APIs, and its SDKs.
Attendees will leave with an understanding of key observability concepts, become grounded in distributed tracing terminology, be aware of the components of openTelemetry, and know how to take their first steps to an open-source contribution!
Key Takeaways: Open source, vendor neutral instrumentation is an exciting new reality as the industry standardizes on openTelemetry for observability. OpenTelemetry is on a mission to enable effective observability by making high-quality, portable telemetry ubiquitous. The world of observability and monitoring today has a steep learning curve and in order to achieve ubiquity, the project would benefit from growing our contributor community.
3. v
H2O.ai Confidential
Introduction
- Today’s training will look into responsible, explainable and interpretable AI when applied in the context of
Generative AI and specifically Large Language Models (LLMS).
- This will include both several sections on theoretical concepts as well as hands-on labs using Enterprise
h2oGPT and H2O GenAI Applications.
- These hands-on labs focus on applying Gen AI in the context of a Model Risk Manager’s role at a bank or
financial institution.
- NOTE: A separate end-to-end masterclass on Generative AI is also available within the training environment,
as well as on github: https://github.com/h2oai/h2o_genai_training.
Including:
- Data Preparation for LLMs
- Fine-Tuning custom models
- Model Evaluation
- Retrieval-Augmented Generation (RAG)
- Guardrails
- AI Applications
4. v
H2O.ai Confidential
Section Session Duration Speaker
Welcome Session Kick-off 5m Jon Farland
Interpretability for Generative AI Large Language Model
Interpretability
25m Kim Montgomery
Workshop: Explainable and
Interpretable AI for LLMs
20m Navdeep Gill
Benchmarking and Evaluations Frameworks for Evaluating
Generative AI
20m Srinivas Neppalli
Workshop: Experimental
Design of Gen AI Applications
20m Jon Farland
Security, Guardrails and Hacking Workshop: Guardrails and
Hacking
20m Ashrith Barthur
Applied Generative AI for Banking
- Complaint Summarizer
Workshop: Complaint
Summarizer AI Application
20m Jon Farland
Agenda
5. v
H2O.ai Confidential
Housekeeping
- The training environment for today is a dedicated instance of the H2O AI Managed
Cloud, a GPU-powered environment capable of training and deploying LLMs, as well
designing and hosting entire AI Applications.
- It an be accessed at https://genai-training.h2o.ai.
- Login credentials should have been provided to the email address you were registered
with.
- If you don’t yet have credentials, or you are otherwise unable to access the
environment, please speak with any member of the H2O.ai team member.
- The training environment will be available to attendees for 3 days after the conference,
but dedicated proof-of-concept environments can be provided (including on-
premise) at request. Please speak to any H2O.ai team member or email
jon.farland@h2o.ai
7. What is Generative AI?
GenAI enables the creation of novel content
Input
GenAI Model
Learns patterns in
unstructured data
Unstructured data
Output Novel Content
Data
Traditional AI Model
Learns relationship
between data and label
Output Label
Labels
VS
8. H2O.ai Confidential
More complicated input:
● Prompt phrasing
● Instructions
● Examples
More relevant dimensions to output:
● Truthfulness/Accuracy
● Safety
● Fairness
● Robustness
● Privacy
● Machine Ethics
[TrustLLM: Trustworthiness in Large Language Models, Sun, et al]
GenAI Complications
9. H2O.ai Confidential
● Can the model recognize problematic responses?
○ Inaccurate responses
○ Unethical responses
○ Responses conveying stereotypes
● Can an inappropriate response be provoked?
○ Jailbreaking
○ Provoking toxicity
○ Leading questions / false context
Common tests
11. H2O.ai Confidential
TrustLLM Main Conclusions
TrustLLM Main Findings:
● Trustworthiness and utility were positively correlated.
● Generally closed-sourced models outperformed open source.
● Over alignment for trustworthiness can compromise utility.
[TrustLLM: Trustworthiness in Large Language Models, Sun, et
al]
13. v
H2O.ai Confidential
Accuracy: Example LLMs
The simplest way to measure accuracy is to compare the result
against another source of information.
Example sources:
● Checking results against a given source (RAG)
● Checking results against the tuning data
● Checking results against an external source (eg wikipedia)
● Checking results against the training data (cumbersome).
● Checking for self-consistency (Self-check GPT)
● Checking results against a larger LLM
Scoring methods:
● Natural language inference
● Comparing embeddings
● Influence functions
14. v
H2O.ai Confidential
RAG (Retrieval Augmented Generation)
01
Chunk and
Embed
Documents 02
Submit a
Query
Retrieve
Relevant
Information via
Similarity
Search
03
04 05
Combine relevant
information to
ground the query to
the model
Generate
Embedding
for Query
17. H2O.ai Confidential
Influence functions
● Seeks to measure the influence of including a data point in
the training set on model response.
● Datamodels/TRAK
○ Learn model based on binary indicator functions.
○ Directly measure how much a training instance influences
the outcome.
● DataInf
○ Measures the influence of a document during fine tuning.
[DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and
Diffusion Models, Kwon et. al]
[TRAK: Attributing Model Behavior at Scale. Park et. al]
19. H2O.ai Confidential
Influence functions / NLP
Influence functions / nlp
[Studying Large Language Model Generalization with Influence Functions, Grosse, et. al]
20. H2O.ai Confidential
Self consistency comparison
Self-Check GPT
● Sampling different responses from an LLM.
● Checking for consistency between responses.
● Assuming that hallucinations will occur less consistently.
[SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models, Potsawee Manakul, Adrian Liusie, Mark
JF Gales]
23. v
H2O.ai Confidential
Counterfactual analysis: LLM
How consistent are results under different:
● Prompts / instructions.
○ Changes in prompt design
○ Changes in prompt instructions
○ Multi-shot examples
○ Word replacement with synonyms
○ Proper names or pronouns (fairness)
○ Chain of thought / other guided reasoning related methods
● Different context / RAG retrieval
24. v
H2O.ai Confidential
Intervention in the case of problems
If problematic behavior is found in a model there are several options.
● Prompt/ instruction modifications.
● Choosing a different base model.
● Fine-tuning to modify LLM model behavior
● Altering the document retrieval process (RAG)
● Monitoring model output for problematic responses.
25. H2O.ai Confidential
Conclusions
● Many of the basic problems of understanding LLMs are
similar to that of other large models.
● Through careful testing we can hope to understand and
correct some of the safety issues involved in using LLMs.
28. v
H2O.ai Confidential
Chain of Verification (CoVe)
CoVe enhances the reliability of answers provided by Large Language Models, particularly in
factual question and answering scenarios, by systematically verifying and refining responses
to minimize inaccuracies.
The CoVe method consists of the following four sequential steps:
1. Initial Baseline Response Creation: In this step, an initial response to the original question is generated as a starting
point.
1. Verification Question Generation: Verification questions are created to fact-check the baseline response. These
questions are designed to scrutinize the accuracy of the initial response.
1. Execute Verification: The verification questions are independently answered to minimize any potential bias. This step
ensures that the verification process is objective and thorough.
1. Final Refined Answer Generation: Based on the results of the verification process, a final refined answer is generated.
This answer is expected to be more accurate and reliable, reducing the likelihood of hallucinations in the response.
29. v
H2O.ai Confidential
Verification Questions
Questions are categorized into three main groups:
1. Wiki Data & Wiki Category List: This category involves questions that expect answers in
the form of a list of entities. For instance, questions like “Who are some politicians born in
Boston?”
2. Multi-Span QA: Questions in this category seek multiple independent answers. An
example would be: “Who invented the first mechanized printing press and in what year?”
The answer is “Johannes Gutenberg, 1450”.
3. Long-form Generation: Any question that requires a detailed or lengthy response falls
under this group.
30. v
H2O.ai Confidential
Chain of Verification (CoVe)
Dhuliawala, Shehzaad, et al. "Chain-of-Verification Reduces Hallucination in Large Language Models." arXiv
preprint arXiv:2309.11495 (2023)
31. v
H2O.ai Confidential
CoVe and Explainable AI (XAI)
● Interpretability and Transparency:
○ Verification process generates questions to fact-check baseline
responses, improving transparency in decision-making.
● Reliability and Trust:
○ Refined answers enhance accuracy, building trust and reliability in
model outputs.
● Bias and Fairness:
○ Verification questions in CoVe identify and mitigate potential biases in
model output.
● User Interaction:
○ Verification process involves user interaction through verification
questions.
36. v
H2O.ai Confidential
Provide a list of major
investment firms and
financial institutions
headquartered in the
United States?
37. v
H2O.ai Confidential
Benefits and Limitations of CoVe
● Benefits:
○ Enhanced Reliability: By incorporating verification steps, users can trust the accuracy of
information obtained from LLMs.
○ Depth of Understanding: The refinement of answers allows users to gain a deeper
understanding of the topic beyond the initial response.
○ Educational Value: Promotes responsible and informed use of LLMs, encouraging users to go
beyond surface-level information.
● Limitations
○ Incomplete Removal of Hallucinations: CoVe does not completely eliminate hallucinations in
generated content, which means it can still produce incorrect or misleading information.
○ Limited Scope of Hallucination Mitigation: CoVe primarily addresses hallucinations in the
form of directly stated factual inaccuracies but may not effectively handle other forms of
hallucinations, such as errors in reasoning or opinions.
○ Increased Computational Expense: Generating and executing verification alongside
responses in CoVe adds to the computational cost, similar to other reasoning methods like
Chain-of-Thought.
○ Upper Bound on Improvement: The effectiveness of CoVe is limited by the overall capabilities
of the underlying language model, particularly in its ability to identify and rectify its own
mistakes.
38. v
H2O.ai Confidential
How to improve the CoVe pipeline
● Prompt engineering
● External tools
○ Final output highly depends on the answers of the verification questions.
○ For factual questions & answering you can use advanced search tools like google search or
serp API etc.
○ For custom use cases you can always use RAG methods or other retrieval techniques for
answering the verification questions.
● More chains
● Human in the loop
39. H2O.ai Confidential
Conclusions
● CoVe aim to improves model transparency, reliability, and trust.
● CoVe is not a silver bullet, but it can improve a LLM testing
arsenal.
42. v
H2O.ai Confidential
Write a 1000 word essay in 1 minute
LLMs are good at generating large amount of text that is consistent and
logical.
Are LLMs smarter than humans?
Introduction
Have LLMs manage your investment portfolio
A model can give a generic advice on safe money management. But we don’t
trust our life savings with a chat bot.
Let a bot reply to your email
It depends on how important the email is. May be we are more comfortable
with the model automatically creating a draft.
43. v
H2O.ai Confidential
Summarization
Summarizing large documents without losing essential information. Extracting
key-value pairs.
How can we use LLMS while minimizing risk?
Introduction
Customer Service
Answer FAQs from customers. May require retrieving from a knowledge base
and summarizing.
Report Generation - AutoDoc
Create ML interpretation documents. Reports required for regulatory
compliance.
44. v
H2O.ai Confidential
Risk
How risky are LLMs?
A lawyer used ChatGPT to prepare
a court filing. It went horribly awry.
“While ChatGPT can be useful to
professionals in numerous industries,
including the legal profession, it has
proved itself to be both limited and
unreliable. In this case, the AI invented
court cases that didn't exist, and
asserted that they were real.”
CBS News
Chevy dealership’s AI chatbot
suggests Ford F-150 when asked
for best truck
“As an AI, I don't have personal
preferences but I can provide insights
based on popular opinions and
reviews. Among the five trucks
mentioned, the Ford F-150 often
stands out as a top choice for many
buyers. It's known for its impressive
towing …”
Detroit Free Press
45. v
H2O.ai Confidential
Data Fine Tuning RAG
Foundation
Model
Leaderboard Risk
Management
Large & Diverse
To train a foundation model, you
need a large, diverse dataset that
covers the tasks the model should
be able to perform.
LLM Lifecycle
Supervised Fine
Tuning
Fine-tuning can improve a model's
performance on a task while
preserving its general language
knowledge.
h2oGPTe
A powerful search assistant to
answer questions from large
volumes of documents, websites,
and workplace content.
Generative AI
They are designed to produce a
wide and general variety of
outputs, such as text, image or
audio generation. They can be
standalone systems or can be used
as a "base" for many other
applications.
HELM
HELM is a framework for evaluating
foundation models. Leaderboard
shows how the various models
perform across different groups of
scenarios and different metrics.
Eval Studio
Design and execute task-specific
benchmarks. Perform both manual
and LLM based evaluations.
Systematically collect and store
results along with metadata.
46. v
H2O.ai Confidential
MMLU (Massive Multitask Language Understanding)
A test to measure a text model's multitask accuracy. The
test covers 57 tasks including elementary mathematics, US
history, computer science, law, and more.
Evaluation for LLMs
Popular benchmarks on open source leaderboards
HellaSwag
A test of common-sense inference, which is easy for
humans (~95%) but challenging for SOTA models.
A12 Reasoning Challenge (ARC)
A set of grade-school science questions.
Truthful QA
A test to measure a model’s propensity to reproduce
falsehoods commonly found online.
When you drop a ball from rest it accelerates downward at 9.8 m/s². If
you instead throw it downward assuming no air resistance its
acceleration immediately after leaving your hand is
(A) 9.8 m/s²
(B) more than 9.8 m/s²
(C) less than 9.8 m/s²
(D) Cannot say unless the speed of throw is given.
MMLU Example
A woman is outside with a bucket and a dog. The dog is running
around trying to avoid a bath. She…
(A) Rinses the bucket off with soap and blow dry the dog’s head.
(B) Uses a hose to keep it from getting soapy.
(C) Gets the dog wet, then it runs away again.
(D) Cannot say unless the speed of throw is given.
HellaSwag Example
47. v
H2O.ai Confidential
Hugging Face Open LLM
Leaderboard
It is a popular location to track
various models evaluated using
different metrics.
These metrics include human
baselines that provide us some
idea of how these models have
been drastically improved over
the last two years.
Approaching human baseline
Popular benchmarks on open source leaderboards
48. H2O.ai Confidential
Benchmarks are not task specific
Benchmarks on open-source
leaderboards are well-rounded and
diverse. They are not sufficient to
reflect the performance of the
model in a domain specific scenario.
The Need for Evaluation
Popular leaderboards are not enough
Some Model Entries may cheat!
There can be models on the
leaderboard that are trained on the
benchmark data itself. We do not
have robust enough tests to detect
this.
Non-verifiable Results
The procedure followed in
conducting the tests and the results
are not completely transparent and
can also vary among different
leaderboards.
49. v
H2O.ai Confidential
Create task specific QA
pairs along with the
Reference documents.
- Bank Teller
- Loan officer
- Program Manager
- Data Analyst
Custom Test Sets
Create custom benchmarks for domain specific scenarios
Task Specific Evals
Create the QA pairs that
test for agreement with
your values, intentions,
and preferences.
- Correctness
- Relevance
- Similarity
- Hallucination
- Precision
- Recall
- Faithfulness
Test for Alignment
Test that all outputs meet
your safety levels.
- Toxicity
- Bias
- Offensive
- PII of customers
- Company Secrets
Test for Safety
Tests to confirm or show
proof of meeting
compliance standards.
- Government
- Company
Test for Compliance
50. v
H2O.ai Confidential
H2O Eval Studio
Design and Execute task specific benchmarks
All the Evaluators are included
Eval studio contains evaluators to check for
Alignment, Safety, and Compliance as
discussed before.
Create custom benchmarks
Users can upload Documents and create
custom Tests (Question-Answer pairs) based on
the document collection.
Run Evals and visualize results
Once a benchmark has been designed, users
can then run the evaluation against the
benchmark and visualize the results. A detailed
report can also be downloaded.
53. v
H2O.ai Confidential
Through the Lens of Model Risk Management
One possible definition of “Conceptual Soundness”
for LLMs by themselves might be considered as a
combination of the following choices:
(1)Training Data
(1)Model Architecture
(1)An explanation of why (1) and (2) were made
(1)An explanation of why (1) and (2) are reasonable
for the use case that the LLM will be applied to.
54. v
H2O.ai Confidential
Through the Lens of Model Risk Management
What about a RAG system?
How does the concept of “Conceptual Soundness” get
applied when not only choices surrounding training
data and model architecture involved, but also choices
around:
- Embeddings
- System Prompts (e.g. Personalities)
- Chunk Sizes
- Chunking Strategies
- OCR Techniques
- RAG-type (e.g. Hypothetical Document Embeddings)
- Mixture-of-Experts or Ensembling
55. H2O.ai Confidential
Models / Systems / Agents are the fundamental AI
systems under scrutiny. As opposed to traditional
machine learning models, Generative AI include many
choices beyond the models themselves
56. H2O.ai Confidential
Benchmarks / Tests are the sets of prompts and
response that are used gauge how well an AI system can
perform a certain task or use case.
58. H2O.ai Confidential
Documents are the data sets used for evaluation in the
case of RAG systems, combining models, parsing, OCR,
chunking, embeddings and other components of an
evaluation.
59. v
H2O.ai Confidential
What is the primary unit of analysis when evaluating an AI system or model?
An eval can be defined as a series of tuples each of size 3.
Each tuple consists of:
(1)Context / Prompt / Question
(1)Output / Response / Ground Truth Answer
(1)Document (in the case of RAG)
Source: https://www.jobtestprep.com/bank-teller-sample-questions
Designing Your Own Eval
60. v
H2O.ai Confidential
Problem statement: How well does my Bank Teller AI Application correctly answer
questions related to being a Bank Teller?
Create an eval test case that can be used to evaluate how well BankTellerGPT can
answer questions related to being a Bank Teller.
LLM-only Example Test Case
{
Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves
256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32
people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should
be hired? A. 4 B. 5 C. 9 D. 12,
Response: B. 5,
Document: None
}
Designing Your Own Eval - BankTellerGPT
Source: https://www.jobtestprep.com/bank-teller-
sample-questions
61. v
H2O.ai Confidential
Designing Your Own Eval - BankTellerGPT
Problem statement: How well does my Bank Teller AI Application actually answer
questions related to being a Bank Teller?
Create an eval test case that can be used to evaluate how well BankTellerGPT can
answer questions related to being a Bank Teller.
RAG Example Test Case
{
Prompt: Respond to the following questions with single letter answer. Question: A specific bank branch serves
256 clients on average every day. The ratio between tellers and clients is 1:32, so that every teller serves 32
people on average every day. The management wishes to change this ratio to 1:20. How many new tellers should
be hired? A. 4 B. 5 C. 9 D. 12,
Response: B. 5,
Document: “Internal Bank Teller Knowledge Base”
}
Source: https://www.jobtestprep.com/bank-teller-
sample-questions
62. v
H2O.ai Confidential
Designing Your Own Eval
Task # 1: Create your own GenAI Test Benchmark for the SR 11-7 document
Some possible test cases
Prompt: How should banks approach model development?
Response: Banks should approach model development with a focus on sound risk management practices. They
should ensure that models are developed and used in a controlled environment, with proper documentation,
testing, and validation.
Prompt: How can model risk be reduced?
Response: Model risk can be reduced by establishing limits on model use, monitoring model performance,
adjusting or revising models over time, and supplementing model results with other analysis and information.
Prompt: How often should a bank update its model inventory?
Response: A bank should update its model inventory regularly to ensure that it remains current and accurate.
63. v
H2O.ai Confidential
Designing Your Own Eval - BankTellerGPT
Task # 2: Create and launch LLM-only eval
leaderboard
To complete this, you will need to
1. Pick an evaluator (e.g. Token presence)
1. Pick a connection (e.g. Enterprise h2oGPT - LLM Only)
1. Pick a set of eval tests (e.g. Bank Teller Benchmark)
64. v
H2O.ai Confidential
Designing Your Own Eval - SR 11-7
Task # 3: Create a new evaluator based on RAG
and launch leaderboard
To complete this, you will need to
1. Pick an evaluator (e.g. Answer correctness)
1. Pick a connection (e.g. Enterprise h2oGPT-RAG)
1. Pick your test created in step 1
65. v
H2O.ai Confidential
Evaluators
Evaluator RAG LLM Purpose Method
PII
(privacy)
Yes Yes Assess whether the answer contains personally identifiable
information (PII) like credit card numbers, phone numbers, social
security numbers, street addresses, email addresses and employee
names.
Regex suite which quickly and reliably detects formatted PII - credit card numbers, social
security numbers (SSN) and emails.
Sensitive data
(security)
Yes Yes Assess whether the answer contains security-related information
like activation keys, passwords, API keys, tokens or certificates.
Regex suite which quickly and reliably detects formatted sensitive data - certificates
(SSL/TLS certs in PEM format), API keys (H2O.ai and OpenAI), activation keys
(Windows).
Answer Correctness Yes Yes Assess whether the answer is correct given the expected answer
(ground truth).
A score based on combined and weighted semantic and factual similarity between the
answer and ground truth (see Answer Semantic Similarity and Faithfulness below).
Answer Relevance Yes Yes Assess whether the answer is (in)complete and does not contain
redundant information which was not asked - noise.
A score based on the cosine similarity of the question and generated questions, where
generated questions are created by prompting an LLM to generate questions from the
actual answer.
Answer Similarity Yes Yes Assess semantic similarity of the answer and expected answer. A score based on similarity metric value of the actual and expected answer calculated by
a cross-encoder model (NLP).
Context Precision Yes No Assess the quality of the retrieved context considering order and
relevance of the text chunks on the context stack.
A score based on the presence of the expected answer - ground truth - in the text chunks
at the top of the retrieved context chunk stack - relevant chunks deep in the stack,
irrelevant chunks and unnecessarily big context make the score lower.
Context Recall Yes No Assess how much of the ground truth is represented in the
retrieved context.
A score based on the ratio of the number of sentences in the ground truth that can be
attributed to the context to the total number of sentences in the ground truth.
Context Relevance Yes No Assess whether the context is (in)complete and does not contain
redundant information which is not needed - noise.
A score based on the ratio of context sentences which are needed to generate the answer
to the total number of sentences in the retrieved context.
H2O EvalStudio evaluators overview
TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM
answer generation in RAG.
66. v
H2O.ai Confidential
Evaluators (continued)
Evaluator RAG LLM Purpose Method
Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e.
factual consistency of the answer given the context. (hallucinations)
A score which is based on the ratio of the answer’s claims which present in the context
to the total number of answer’s claims.
Hallucination Metric Yes No Asses the RAG’s base LLM model hallucination. A score based on the Vectara hallucination evaluation cross-encoder model which
assesses RAG’s base LLM hallucination when it generates the actual answer from the
retrieved context.
RAGAs Yes No Assess overall answer quality considering both context and answer. Composite metrics score which is harmonic mean of Faithfulness, Answer Relevancy,
Context Precision and Context Recall metrics.
Tokens Presence Yes Yes Assesses whether both retrieved context and answer contain
required string tokens.
Scored based on the substring and/or regular expression based search of the required
set of strings in the retrieved context and answer.
Faithfulness Yes No Assess whether answer claims can be inferred from the context i.e.
factual consistency of the answer given the context. (hallucinations)
A score which is based on the ratio of the answer’s claims which present in the context
to the total number of answer’s claims.
H2O EvalStudio evaluators overview
TERMINOLOGY: answer ~ actual RAG/LLM answer / expected answer ~ expected RAG/LLM answer i.e. ground truth | retrieved context ~ text chunks retrieved from the vector DB prior LLM
answer generation in RAG.
68. v
H2O.ai Confidential
● LLM Guardrails are a set of predefined constraints and guidelines
that are applied to LLMs to manage their behavior.
● Guardrails serve to ensure responsible, ethical, and safe usage of
LLMs, mitigate potential risks, and promote transparency and
accountability.
● Guardrails are a form of proactive control and oversight over the
output and behavior of language models, which are otherwise
capable of generating diverse content, including text that may be
biased, inappropriate, or harmful.
Understanding the distinct functions of each type of guardrail is pivotal
in creating a comprehensive and effective strategy for governing AI
systems.
Guardrails
69. v
H2O.ai Confidential
● Content Filter Guardrails: Content filtering is crucial to prevent
harmful, offensive, or inappropriate content from being generated by
LLMs. These guardrails help ensure that the outputs conform to
community guidelines, curbing hate speech, explicit content, and
misinformation.
● Bias Mitigation Guardrails: Bias is an ongoing concern in AI, and
mitigating bias is critical. These guardrails aim to reduce the model's
inclination to produce content that perpetuates stereotypes or
discriminates against particular groups. They work to promote fairness
and inclusivity in the model's responses.
● Safety and Privacy Guardrails: Protecting user privacy is paramount.
Safety and privacy guardrails are designed to prevent the generation of
content that may infringe on user privacy or include sensitive, personal
information. These measures safeguard users against unintended data
exposure.
Types of Guardrails
70. v
H2O.ai Confidential
Types of Guardrails
● Fact-Checking & Hallucination Guardrails: To combat misinformation,
fact-checking guardrails are used to verify the accuracy of the
information generated by LLMs. They help ensure that the model's
responses align with factual accuracy, especially in contexts like news
reporting or educational content.
● Context/Topic and User Intent Guardrails: For LLMs to be effective,
they must produce responses that are contextually relevant and aligned
with user intent. These guardrails aim to prevent instances where the
model generates content that is unrelated or fails to address the user's
queries effectively.
● Explainability and Transparency Guardrails: In the pursuit of making
LLMs more interpretable, these guardrails require the model to provide
explanations for its responses. This promotes transparency by helping
users understand why a particular output was generated, fostering
trust and accountability.
● Jailbreak Guardrails: Ensure robustness to malicious user attacks such
as prompt injection.
78. H2O.ai Confidential
Determine what is the one
credit product with the
highest number of complaints.
Task 1
Complaint Summarizer
Applied Generative AI for Banking
79. H2O.ai Confidential
Determine what is the one
credit product with the
highest number of complaints.
Answer: Credit Reporting
Task 1
Complaint Summarizer
Applied Generative AI for Banking
81. H2O.ai Confidential
Determine what is the top
complaint for TransUnion?
Answer: Violation of
Consumers Rights to Privacy
and Confidentiality Under the
Fair Credit Reporting Act.
Task 2
Complaint Summarizer
Applied Generative AI for Banking
82. H2O.ai Confidential
Use H2OGPT to summarize a
complaint from the database
and provided immediate next
steps
Task 3
Complaint Summarizer
Applied Generative AI for Banking
83. H2O.ai Confidential
Use H2OGPT to summarize a
complaint from the database
and provided immediate next
steps
Answer: [See screenshot]
Task 3
Complaint Summarizer
Applied Generative AI for Banking
88. H2O.ai Confidential
Retrieval-Augmented Generation (RAG)
RAG as a system is a
particularly good use of
vector databases.
RAG systems take
advantage the context
window for LLMs, filling it
with only the most relevant
examples from real data.
This “grounds” the LLM to
relevant context and
greatly minimizes any
hallucination.
90. Embedding Models - INSTRUCTOR Source: https://arxiv.org/pdf/2212.09741.pdf
Instruction-based Omnifarious
Representations
Model is trained to generate embeddings
using both the instruction as well as the
textual input.
Applicable to virtually every use case, due
to its ability to create latent vector
representations that include instruction.
91. Embedding Models - BGE Source: https://arxiv.org/pdf/2310.07554.pdf
LLM-Embedder
This embedding model is trained
specifically for use with RAG systems.
Reward model introduced that provides
higher rewards to a retrieval candidate
if it results in a higher generation
likelihood for the expected output
Uses contrastive learning to directly
get at optimizing for RAG
applications
92. H2O.ai Confidential
AI Engines Deployment Consumption
LLM AppStudio
LLM DataStudio
LLM EvalStudio
H2O LLMs Ecosystem
AppStore
End Users
Generative AI with H2O.ai
MLOps
AI Engine Manager
Doc-QA
Enterprise
h2oGPT
97. H2O.ai Confidential
Ethical Considerations, Data Privacy, and User Consent
Assess the potential impact of generative AI on individuals and society. Give users control over
how their data is used by generative AI. Consent mechanisms should be transparent and user-
friendly.
Monitoring, Regulation, and Security
Detect misuse or anomalies in generative AI behavior. Regulatory
compliance ensures adherence to ethical and legal guidelines. Security
measures are crucial to protect AI models from adversarial attacks or
unauthorized access.
Accountability and Oversight
Define roles and responsibilities for AI development and
deployment. Oversight mechanisms ensure that responsible
practices are followed.
Education and Awareness
Users and developers should be informed about
generative AI capabilities, limitations, and ethical
considerations.
Stakeholder Involvement
Involving various stakeholders in AI discussions promotes diverse
perspectives and responsible decision-making.
Continuous Evaluation and Improvement
Continually assess models to ensure fairness, accuracy, and
alignment with ethical standards.
Transparency, Explainability, Bias Mitigation,Debugging,
and Guardrails
Recognize and mitigate both subtle and glaring biases that may emerge from training
data. Ensures that users can understand and trust the decisions made by generative
AI models. Debug models with techniques such as adversarial prompt engineering.
Proactively manage risks and maintain control over the model's behavior with
guardrails.
Responsible Generative AI
Responsible
Generative
AI
Audit Input Data, Benchmarks, and Test the Unknown
Assess quality of data used as input to train Generative AI models. Utilize benchmarks
and random attacks for testing.
98. v
H2O.ai Confidential
CoVe method reduces factual errors in large language models by drafting, fact-checking, and
verifying responses - it deliberates on its own responses and self-correcting them
Steps:
1. Given a user query, a LLM generates a baseline response that may contain inaccuracies, e.g.
factual hallucinations
2. To improve this, CoVe first generates a plan of a set of verification questions to ask, and then
executes that plan by answering them and hence checking for agreement
3. Individual verification questions are typically answered with higher accuracy than the
original accuracy of the facts in the original longform generation
4. Finally, the revised response takes into account the verifications
5. The factored version of CoVe answers verification questions such that they cannot condition
on the original response, avoiding repetition and improving performance
Minimizing Model Hallucinations
Using Chain-of-Verification (CoVe) Method